tidaldb/docs/planning/milestone-5/phase-4/OVERVIEW.md
jordan 192c473f55 feat: complete Milestone 5 — full-text search, RRF fusion, and creator search
- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs)
- M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates)
- M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking)
- M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators)
- Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.)
- Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.)
- Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers
- Add benches: fusion, search, session, text_index
- Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index)
- Update blog posts, roadmap, content strategy, and M5 planning docs
- Add tmp/ and .claude/worktrees/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:53:16 -07:00

49 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Milestone 5, Phase 4: Creator and People Search
## Goal
Prove that the same SEARCH pipeline that indexes items can also index creator entities. After this phase, a developer can call `db.search(&Search { entity_kind: Creator, query: "jazz" })` and receive BM25-ranked creators with optional vector fusion, filters, and sort modes.
## Motivation
m5p1p3 built a complete SEARCH pipeline for items. Creators are a first-class entity in tidalDB with their own storage engine, metadata, and embeddings. Extending the pipeline to creators validates the multi-entity-kind design and unlocks the people-search use case.
## Tasks
| Task | Title | Status |
|------|-------|--------|
| task-01 | Creator Text Indexing | pending |
| task-02 | Creator Vector Index | pending |
| task-03 | Creator Search Executor | pending |
## Execution Order
```
task-01 (Creator Text Indexing)
|
v
task-02 (Creator Vector Index)
|
v
task-03 (Creator Search Executor)
```
All tasks are sequential.
## Verification
```bash
cargo check --manifest-path tidal/Cargo.toml
cargo clippy --manifest-path tidal/Cargo.toml -- -D warnings
cargo test --manifest-path tidal/Cargo.toml --lib
cargo test --manifest-path tidal/Cargo.toml --test m5p4_creator_search
cargo bench --manifest-path tidal/Cargo.toml --bench search -- bench_search_creator_text_200
```
**Key assertions:**
- `db.search(Search { entity_kind: Creator, query: "jazz" })` returns BM25-ranked creators
- `filter(verified = true)` excludes non-verified creators
- `similar_to` lookup triggers ANN on `(EntityKind::Creator, "content")` slot
- `bench_search_creator_text_200` < 20ms
- All existing m5p3 item search tests still pass