tidaldb/docs/planning/milestone-5/phase-4/OVERVIEW.md
jordan 192c473f55 feat: complete Milestone 5 — full-text search, RRF fusion, and creator search
- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs)
- M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates)
- M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking)
- M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators)
- Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.)
- Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.)
- Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers
- Add benches: fusion, search, session, text_index
- Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index)
- Update blog posts, roadmap, content strategy, and M5 planning docs
- Add tmp/ and .claude/worktrees/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:53:16 -07:00

1.6 KiB
Raw Permalink Blame History

Milestone 5, Phase 4: Creator and People Search

Goal

Prove that the same SEARCH pipeline that indexes items can also index creator entities. After this phase, a developer can call db.search(&Search { entity_kind: Creator, query: "jazz" }) and receive BM25-ranked creators with optional vector fusion, filters, and sort modes.

Motivation

m5p1p3 built a complete SEARCH pipeline for items. Creators are a first-class entity in tidalDB with their own storage engine, metadata, and embeddings. Extending the pipeline to creators validates the multi-entity-kind design and unlocks the people-search use case.

Tasks

Task Title Status
task-01 Creator Text Indexing pending
task-02 Creator Vector Index pending
task-03 Creator Search Executor pending

Execution Order

task-01 (Creator Text Indexing)
    |
    v
task-02 (Creator Vector Index)
    |
    v
task-03 (Creator Search Executor)

All tasks are sequential.

Verification

cargo check --manifest-path tidal/Cargo.toml
cargo clippy --manifest-path tidal/Cargo.toml -- -D warnings
cargo test --manifest-path tidal/Cargo.toml --lib
cargo test --manifest-path tidal/Cargo.toml --test m5p4_creator_search
cargo bench --manifest-path tidal/Cargo.toml --bench search -- bench_search_creator_text_200

Key assertions:

  • db.search(Search { entity_kind: Creator, query: "jazz" }) returns BM25-ranked creators
  • filter(verified = true) excludes non-verified creators
  • similar_to lookup triggers ANN on (EntityKind::Creator, "content") slot
  • bench_search_creator_text_200 < 20ms
  • All existing m5p3 item search tests still pass