tidaldb/docs/planning/milestone-5/phase-4/task-02-creator-vector-index.md
jordan 192c473f55 feat: complete Milestone 5 — full-text search, RRF fusion, and creator search
- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs)
- M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates)
- M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking)
- M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators)
- Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.)
- Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.)
- Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers
- Add benches: fusion, search, session, text_index
- Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index)
- Update blog posts, roadmap, content strategy, and M5 planning docs
- Add tmp/ and .claude/worktrees/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:53:16 -07:00

1.4 KiB

Task 02: Creator Vector Index

Goal

Add write_creator_embedding() and read_creator_embedding() to TidalDb. These register and populate the (EntityKind::Creator, "content") slot in the existing EmbeddingSlotRegistry.

Files to Modify

  • tidal/src/db/mod.rs — add write_creator_embedding() and read_creator_embedding()

Implementation

pub fn write_creator_embedding(&self, id: EntityId, embedding: &[f32]) -> crate::Result<()> {
    let mut registry = self.embedding_registry.write()...;
    if registry.get(EntityKind::Creator, "content").is_none() {
        // auto-register slot
        let state = EmbeddingSlotState::new(embedding.len(), QuantizationLevel::F32, EmbeddingSource::External);
        registry.register(EntityKind::Creator, "content".to_string(), state)?;
    }
    let slot = registry.get_mut(EntityKind::Creator, "content")...;
    slot.index.add(id.as_u64(), embedding)?;
    Ok(())
}

pub fn read_creator_embedding(&self, id: EntityId) -> crate::Result<Option<Vec<f32>>> {
    let registry = self.embedding_registry.read()...;
    let slot = match registry.get(EntityKind::Creator, "content") { None => return Ok(None), Some(s) => s };
    Ok(slot.index.get(id.as_u64()))
}

Acceptance Criteria

  • write_creator_embedding(id, &vec) succeeds and auto-registers the slot
  • read_creator_embedding(id) returns the stored vector
  • ANN search on (EntityKind::Creator, "content") returns results