# Milestone 2, Phase 4: Diversity Enforcement ## Phase Deliverable A post-scoring diversity pass that selects results from a scored candidate list to satisfy diversity constraints (`max_per_creator`, `format_mix`), without reducing result count. Implemented as a single greedy selection pass O(n) over the sorted candidate list. When constraints cannot be fully satisfied, the selector relaxes constraints in a defined order and returns results with a warning flag rather than an error. This is the phase that turns ranked results from "the top N by score" into "the top N by score that a user would actually want to scroll through." Without diversity, a trending creator dominates the feed. With diversity, the database enforces variety -- no application logic required. ## Acceptance Criteria - [ ] `max_per_creator:N` enforced: no more than N items from any single creator in the result set - [ ] `format_mix:true` enforced: no more than 60% of results from any single format - [ ] Diversity pass does not reduce result count -- it selects the next-best candidate that satisfies constraints - [ ] Diversity pass adds < 1ms for 200 candidates (benchmarked) - [ ] When diversity constraints cannot be fully satisfied (too few creators), results are returned with a warning flag, not an error - [ ] Property test: diversity constraints hold for 10,000 random candidate sets ## Dependencies - **Requires:** m2p3 (profile executor produces `Vec` sorted by score descending, with `entity_id`, `score`, and `signal_snapshot`; `DiversitySpec` type defined on `RankingProfile` with `max_per_creator`, `format_mix`, `topic_diversity`, `category_min`) - **Blocks:** m2p5 (RETRIEVE executor calls diversity enforcement as the penultimate step before result return) ## Research References - [thoughts.md](../../../../thoughts.md) -- Part V.14 (MMR post-scoring diversity enforcement) - [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- Filtered search and post-retrieval reranking patterns ## Spec References - [docs/specs/09-ranking-scoring.md](../../../specs/09-ranking-scoring.md) -- Section 9 (Diversity Enforcement): - Section 9.1 (DiversitySpec structure: max_per_creator, format_mix, topic_diversity, category_min) - Section 9.2 (Greedy MMR reranking algorithm pseudocode) - Section 9.3 (Constraint details: per-page enforcement, format bonus, category minimum) - Section 9.4 (Diversity and pagination: per-page, not global) - Section 9.5 (Diversity as reordering, not filtering; relaxation under pressure) - [docs/specs/09-ranking-scoring.md](../../../specs/09-ranking-scoring.md) -- Section 4 (Scoring pipeline: diversity is Stage 8) - [docs/specs/09-ranking-scoring.md](../../../specs/09-ranking-scoring.md) -- Section 16 (Invariants INV-RANK-5: diversity never reduces result count, INV-RANK-6: diversity preserves relative score order within same-constraint group) ## Task Index | # | Task | Delivers | Depends On | Complexity | |---|------|----------|------------|------------| | 01 | Diversity Types + Greedy Selector | `DiversityConstraints`, `DiversityResult`, `ConstraintViolation`, `DiversitySelector`, greedy selection algorithm with three-stage relaxation | None | M | | 02 | Property Tests + Benchmarks | proptest property tests (10,000 random candidate sets), Criterion benchmarks (200-candidate < 1ms) | Task 01 | S | ## Task Dependency DAG ``` Task 01: Diversity Types + Greedy Selector | v Task 02: Property Tests + Benchmarks ``` Task 01 delivers all types and the selection algorithm. Task 02 validates correctness via property tests and performance via benchmarks. Strictly sequential -- Task 02 tests the implementation from Task 01. ## File Layout ``` tidal/src/ ranking/ diversity.rs -- DiversityConstraints, DiversityResult, DiversitySelector, ConstraintViolation (Task 01) mod.rs -- add `pub mod diversity;` and re-exports (Task 01) tidal/benches/ ranking.rs -- add diversity benchmarks (Task 02) to the existing ranking bench file ``` ## Open Questions 1. **Creator ID and format in ScoredCandidate**: The diversity selector needs each candidate's `creator_id` and `format` to apply constraints. These are entity metadata fields. `ScoredCandidate` from m2p3 has `entity_id`, `score`, and `signal_snapshot` but not a general metadata map. Options: - (A) Add `creator_id: Option` and `format: Option` fields to `ScoredCandidate` -- cleanest, no extra lookup - (B) `DiversitySelector` takes `&EntityStore` and loads metadata per candidate -- more flexible, extra lookup cost (~50ns per candidate) - **Decision for M2:** Option A. The executor adds `creator_id` and `format` to `ScoredCandidate` at scoring time (they are already loaded from entity metadata during scoring). This keeps diversity O(n) without extra I/O. The `ScoredCandidate` struct gains two optional fields. This change is made as part of Task 01 in this phase. 2. **`min_exploration` constraint**: The exploration budget (10% of results from unfollowed creators) is an M3 feature (Spec 09 Section 10). `DiversityConstraints` includes a `min_exploration: Option` field for forward compatibility, but the M2 selector ignores it if set. A `todo!()` comment is added in the selector with "M3: implement exploration budget after relationship graph is available." 3. **Relaxation order**: The three-stage relaxation (double max_per_creator, ignore format_mix, accept anything) is the default for M2. The caller (m2p5 RETRIEVE executor) can configure a stricter relaxation policy in future milestones. For M2, hardcode the three-stage order. 4. **`DiversitySpec` vs `DiversityConstraints`**: `DiversitySpec` is already defined on `RankingProfile` (m2p3 Task 01) with fields `max_per_creator`, `format_mix`, `topic_diversity`, `category_min`. The `DiversityConstraints` struct in this phase is the runtime representation used by the selector, derived from `DiversitySpec` plus query-level overrides (the `DIVERSITY` clause). For M2, `DiversityConstraints` is constructed from `DiversitySpec` with a `From` impl. Query-level overrides are wired in m2p5. 5. **`topic_diversity` and `category_min`**: These are fields on `DiversitySpec` from the spec (Section 9.1). For M2, only `max_per_creator` and `format_mix` are implemented. `topic_diversity` requires embedding distance computation (O(n*k) where k = selected count) which changes the algorithm from greedy to MMR. `category_min` requires category metadata on each candidate. Both are deferred to M6. The `DiversityConstraints` struct includes these fields as `Option` types but the selector skips them with a `tracing::debug!` message when set. 6. **Diversity and pagination (Spec 09 Section 9.4)**: Diversity constraints apply per page, not globally across all pages. The selector operates on a single page's worth of candidates. The RETRIEVE executor (m2p5) handles pagination by passing the correct candidate slice to the selector. No pagination logic is needed in the diversity module itself.