- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs) - M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates) - M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking) - M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators) - Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.) - Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.) - Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers - Add benches: fusion, search, session, text_index - Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index) - Update blog posts, roadmap, content strategy, and M5 planning docs - Add tmp/ and .claude/worktrees/ to .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1.4 KiB
1.4 KiB
Task 02: Creator Vector Index
Goal
Add write_creator_embedding() and read_creator_embedding() to TidalDb. These register and populate the (EntityKind::Creator, "content") slot in the existing EmbeddingSlotRegistry.
Files to Modify
tidal/src/db/mod.rs— addwrite_creator_embedding()andread_creator_embedding()
Implementation
pub fn write_creator_embedding(&self, id: EntityId, embedding: &[f32]) -> crate::Result<()> {
let mut registry = self.embedding_registry.write()...;
if registry.get(EntityKind::Creator, "content").is_none() {
// auto-register slot
let state = EmbeddingSlotState::new(embedding.len(), QuantizationLevel::F32, EmbeddingSource::External);
registry.register(EntityKind::Creator, "content".to_string(), state)?;
}
let slot = registry.get_mut(EntityKind::Creator, "content")...;
slot.index.add(id.as_u64(), embedding)?;
Ok(())
}
pub fn read_creator_embedding(&self, id: EntityId) -> crate::Result<Option<Vec<f32>>> {
let registry = self.embedding_registry.read()...;
let slot = match registry.get(EntityKind::Creator, "content") { None => return Ok(None), Some(s) => s };
Ok(slot.index.get(id.as_u64()))
}
Acceptance Criteria
write_creator_embedding(id, &vec)succeeds and auto-registers the slotread_creator_embedding(id)returns the stored vector- ANN search on
(EntityKind::Creator, "content")returns results