- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs) - M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates) - M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking) - M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators) - Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.) - Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.) - Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers - Add benches: fusion, search, session, text_index - Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index) - Update blog posts, roadmap, content strategy, and M5 planning docs - Add tmp/ and .claude/worktrees/ to .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
49 lines
1.6 KiB
Markdown
49 lines
1.6 KiB
Markdown
# Milestone 5, Phase 4: Creator and People Search
|
||
|
||
## Goal
|
||
|
||
Prove that the same SEARCH pipeline that indexes items can also index creator entities. After this phase, a developer can call `db.search(&Search { entity_kind: Creator, query: "jazz" })` and receive BM25-ranked creators with optional vector fusion, filters, and sort modes.
|
||
|
||
## Motivation
|
||
|
||
m5p1–p3 built a complete SEARCH pipeline for items. Creators are a first-class entity in tidalDB with their own storage engine, metadata, and embeddings. Extending the pipeline to creators validates the multi-entity-kind design and unlocks the people-search use case.
|
||
|
||
## Tasks
|
||
|
||
| Task | Title | Status |
|
||
|------|-------|--------|
|
||
| task-01 | Creator Text Indexing | pending |
|
||
| task-02 | Creator Vector Index | pending |
|
||
| task-03 | Creator Search Executor | pending |
|
||
|
||
## Execution Order
|
||
|
||
```
|
||
task-01 (Creator Text Indexing)
|
||
|
|
||
v
|
||
task-02 (Creator Vector Index)
|
||
|
|
||
v
|
||
task-03 (Creator Search Executor)
|
||
```
|
||
|
||
All tasks are sequential.
|
||
|
||
## Verification
|
||
|
||
```bash
|
||
cargo check --manifest-path tidal/Cargo.toml
|
||
cargo clippy --manifest-path tidal/Cargo.toml -- -D warnings
|
||
cargo test --manifest-path tidal/Cargo.toml --lib
|
||
cargo test --manifest-path tidal/Cargo.toml --test m5p4_creator_search
|
||
cargo bench --manifest-path tidal/Cargo.toml --bench search -- bench_search_creator_text_200
|
||
```
|
||
|
||
**Key assertions:**
|
||
- `db.search(Search { entity_kind: Creator, query: "jazz" })` returns BM25-ranked creators
|
||
- `filter(verified = true)` excludes non-verified creators
|
||
- `similar_to` lookup triggers ANN on `(EntityKind::Creator, "content")` slot
|
||
- `bench_search_creator_text_200` < 20ms
|
||
- All existing m5p3 item search tests still pass
|