- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test coverage for every public API surface - m1p5: TidalDb public API — write_item, signal, read_decay_score, read_windowed_count, read_velocity; StorageBox enum routing memory vs fjall; WalSender/WalHandleWriter bridge; WAL replay on open - Periodic checkpoint: 30s background thread for persistent+schema mode; FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful shutdown via Arc<AtomicBool> + join before final checkpoint - ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing) - Milestone 2 planning scaffolding added under docs/planning/milestone-2/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.9 KiB
Milestone 2, Phase 4: Diversity Enforcement
Phase Deliverable
A post-scoring diversity pass that selects results from a scored candidate list to satisfy diversity constraints (max_per_creator, format_mix), without reducing result count. Implemented as a single greedy selection pass O(n) over the sorted candidate list. When constraints cannot be fully satisfied, the selector relaxes constraints in a defined order and returns results with a warning flag rather than an error.
This is the phase that turns ranked results from "the top N by score" into "the top N by score that a user would actually want to scroll through." Without diversity, a trending creator dominates the feed. With diversity, the database enforces variety -- no application logic required.
Acceptance Criteria
max_per_creator:Nenforced: no more than N items from any single creator in the result setformat_mix:trueenforced: no more than 60% of results from any single format- Diversity pass does not reduce result count -- it selects the next-best candidate that satisfies constraints
- Diversity pass adds < 1ms for 200 candidates (benchmarked)
- When diversity constraints cannot be fully satisfied (too few creators), results are returned with a warning flag, not an error
- Property test: diversity constraints hold for 10,000 random candidate sets
Dependencies
- Requires: m2p3 (profile executor produces
Vec<ScoredCandidate>sorted by score descending, withentity_id,score, andsignal_snapshot;DiversitySpectype defined onRankingProfilewithmax_per_creator,format_mix,topic_diversity,category_min) - Blocks: m2p5 (RETRIEVE executor calls diversity enforcement as the penultimate step before result return)
Research References
- thoughts.md -- Part V.14 (MMR post-scoring diversity enforcement)
- docs/research/ann_for_tidaldb.md -- Filtered search and post-retrieval reranking patterns
Spec References
- docs/specs/09-ranking-scoring.md -- Section 9 (Diversity Enforcement):
- Section 9.1 (DiversitySpec structure: max_per_creator, format_mix, topic_diversity, category_min)
- Section 9.2 (Greedy MMR reranking algorithm pseudocode)
- Section 9.3 (Constraint details: per-page enforcement, format bonus, category minimum)
- Section 9.4 (Diversity and pagination: per-page, not global)
- Section 9.5 (Diversity as reordering, not filtering; relaxation under pressure)
- docs/specs/09-ranking-scoring.md -- Section 4 (Scoring pipeline: diversity is Stage 8)
- docs/specs/09-ranking-scoring.md -- Section 16 (Invariants INV-RANK-5: diversity never reduces result count, INV-RANK-6: diversity preserves relative score order within same-constraint group)
Task Index
| # | Task | Delivers | Depends On | Complexity |
|---|---|---|---|---|
| 01 | Diversity Types + Greedy Selector | DiversityConstraints, DiversityResult, ConstraintViolation, DiversitySelector, greedy selection algorithm with three-stage relaxation |
None | M |
| 02 | Property Tests + Benchmarks | proptest property tests (10,000 random candidate sets), Criterion benchmarks (200-candidate < 1ms) | Task 01 | S |
Task Dependency DAG
Task 01: Diversity Types + Greedy Selector
|
v
Task 02: Property Tests + Benchmarks
Task 01 delivers all types and the selection algorithm. Task 02 validates correctness via property tests and performance via benchmarks. Strictly sequential -- Task 02 tests the implementation from Task 01.
File Layout
tidal/src/
ranking/
diversity.rs -- DiversityConstraints, DiversityResult, DiversitySelector,
ConstraintViolation (Task 01)
mod.rs -- add `pub mod diversity;` and re-exports (Task 01)
tidal/benches/
ranking.rs -- add diversity benchmarks (Task 02) to the existing ranking bench file
Open Questions
-
Creator ID and format in ScoredCandidate: The diversity selector needs each candidate's
creator_idandformatto apply constraints. These are entity metadata fields.ScoredCandidatefrom m2p3 hasentity_id,score, andsignal_snapshotbut not a general metadata map. Options:- (A) Add
creator_id: Option<EntityId>andformat: Option<String>fields toScoredCandidate-- cleanest, no extra lookup - (B)
DiversitySelectortakes&EntityStoreand loads metadata per candidate -- more flexible, extra lookup cost (~50ns per candidate) - Decision for M2: Option A. The executor adds
creator_idandformattoScoredCandidateat scoring time (they are already loaded from entity metadata during scoring). This keeps diversity O(n) without extra I/O. TheScoredCandidatestruct gains two optional fields. This change is made as part of Task 01 in this phase.
- (A) Add
-
min_explorationconstraint: The exploration budget (10% of results from unfollowed creators) is an M3 feature (Spec 09 Section 10).DiversityConstraintsincludes amin_exploration: Option<f64>field for forward compatibility, but the M2 selector ignores it if set. Atodo!()comment is added in the selector with "M3: implement exploration budget after relationship graph is available." -
Relaxation order: The three-stage relaxation (double max_per_creator, ignore format_mix, accept anything) is the default for M2. The caller (m2p5 RETRIEVE executor) can configure a stricter relaxation policy in future milestones. For M2, hardcode the three-stage order.
-
DiversitySpecvsDiversityConstraints:DiversitySpecis already defined onRankingProfile(m2p3 Task 01) with fieldsmax_per_creator,format_mix,topic_diversity,category_min. TheDiversityConstraintsstruct in this phase is the runtime representation used by the selector, derived fromDiversitySpecplus query-level overrides (theDIVERSITYclause). For M2,DiversityConstraintsis constructed fromDiversitySpecwith aFromimpl. Query-level overrides are wired in m2p5. -
topic_diversityandcategory_min: These are fields onDiversitySpecfrom the spec (Section 9.1). For M2, onlymax_per_creatorandformat_mixare implemented.topic_diversityrequires embedding distance computation (O(n*k) where k = selected count) which changes the algorithm from greedy to MMR.category_minrequires category metadata on each candidate. Both are deferred to M6. TheDiversityConstraintsstruct includes these fields asOptiontypes but the selector skips them with atracing::debug!message when set. -
Diversity and pagination (Spec 09 Section 9.4): Diversity constraints apply per page, not globally across all pages. The selector operates on a single page's worth of candidates. The RETRIEVE executor (m2p5) handles pagination by passing the correct candidate slice to the selector. No pagination logic is needed in the diversity module itself.