tidaldb/site/content/blog/diversity-enforcement.mdx
jordan 192c473f55 feat: complete Milestone 5 — full-text search, RRF fusion, and creator search
- M5p1: BM25 text indexing via Tantivy with background syncer (0.26ms @ 10K docs)
- M5p2: RRF fusion layer combining BM25 + ANN scores (46µs @ 1K candidates)
- M5p3: unified Search query API (8-stage pipeline, BM25 + vector + ranking)
- M5p4: creator text + vector indexing and creator search executor (< 20ms @ 200 creators)
- Refactor db/mod.rs into focused sub-modules (creators, items, sessions, signals, etc.)
- Decompose monolithic files into directory modules (query/executor, ranking/diversity, etc.)
- Split brute.rs → brute/mod.rs + brute/tests.rs; extract search executor helpers
- Add benches: fusion, search, session, text_index
- Add M5 UAT test suites (m5_uat, m5_search, m5p4_creator_search, text_index)
- Update blog posts, roadmap, content strategy, and M5 planning docs
- Add tmp/ and .claude/worktrees/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-21 23:53:16 -07:00

397 lines
22 KiB
Plaintext

---
title: "Diversity enforcement in 3 microseconds"
date: "2026-02-21"
author: "Jordan Washburn"
description: "\"No more than 2 items per creator\" does not belong in your API layer. It belongs in the query. tidalDB enforces diversity as a post-scoring reordering pass that never reduces result count. The greedy selection algorithm runs in under 3 microseconds for 200 candidates."
tags: ["diversity", "ranking", "performance", "query"]
---
Every team that builds a feed eventually writes this code:
```python
# ranking_service.py, line 247
# TODO: this is a hack, should be in the query layer
seen_creators = {}
filtered = []
for item in ranked_results:
count = seen_creators.get(item.creator_id, 0)
if count >= MAX_PER_CREATOR:
continue
seen_creators[item.creator_id] = count + 1
filtered.append(item)
if len(filtered) >= limit:
break
```
It lives in the ranking service. Or the API gateway. Or a middleware layer that someone added during an incident and nobody removed. It has no tests. It has three bugs: it reduces result count instead of replacing with the next-best candidate, it does not handle the case where too few creators exist to fill the page, and it silently drops items from creators who happen to hash to the same bucket when someone refactored the creator ID type six months ago.
This post is about why diversity enforcement belongs in the database, how tidalDB implements it, and what it costs.
## The problem
Without diversity enforcement, a trending creator dominates the feed. If creator A has 5 of the top 10 items by score, the user sees 5 items from creator A. This is correct by the scoring function. It is wrong by the user's experience.
Every content platform discovers this independently. The fix is always the same: a post-processing loop in the API layer that caps how many items from the same creator can appear. The implementation varies. The failure modes do not.
**Failure 1: Result count drops.** The naive approach skips items that violate the constraint. If 5 of your top 10 are from the same creator and you cap at 2, you return 7 results instead of 10. The client expected 10. The feed has a gap. The pagination cursor is wrong.
**Failure 2: No relaxation.** What happens when your candidate pool only has 3 distinct creators and you want 20 results with `max_per_creator: 2`? The naive approach returns 6 items and stops. The correct behavior is to relax the constraint progressively -- allow 4 per creator, then ignore format constraints, then accept anything -- until the page is full. The caller should know constraints were relaxed. The result count should not suffer.
**Failure 3: Inconsistent enforcement.** The diversity logic lives in the API layer, which means it is coupled to a specific endpoint. The web feed applies it. The mobile feed applies a different version. The notification ranker does not apply it at all. The search results page applies it with different parameters that someone hardcoded during a hackathon. There is no single source of truth for "how diverse should this result set be?"
**Failure 4: Nobody tests it.** The diversity code is tangled with the ranking service, which calls three external systems, which means testing it requires mocking Elasticsearch, Redis, and the feature store. So nobody writes the test. The code drifts. The bugs accumulate.
All four failures share a root cause: diversity enforcement is application logic that should be a database primitive.
## Why the database
Diversity is a constraint on the *ranked result set*. It operates after scoring. It does not change scores -- it changes which scored items are selected. This is a query operation, not an application operation.
When diversity lives in the database:
- Every query surface gets it. The feed, the search results, the notifications, the "related content" sidebar. One implementation. One set of constraints. One set of tests.
- The result count invariant is maintained. The selector fills the page from lower-ranked candidates when constraints force skips. The caller always gets `min(target_count, candidate_count)` results.
- Constraint violations are reported explicitly. The result includes a `constraints_satisfied` boolean and a list of violations. The caller knows exactly what was relaxed and why.
- It is tested. With property tests. Across hundreds of random candidate sets. Because it is a self-contained module with no external dependencies, not a loop buried in a microservice.
## The algorithm
The `DiversitySelector` is a stateless, greedy selection pass. It takes a list of scored candidates (already sorted descending by score from the ranking profile) and selects up to `target_count` items that satisfy the active constraints.
The core loop is short enough to read in its entirety:
```rust
// From tidal/src/ranking/diversity.rs
fn greedy_select(
candidates: &[ScoredCandidate],
constraints: &DiversityConstraints,
limit: usize,
) -> Vec<ScoredCandidate> {
let mut selected = Vec::with_capacity(limit);
let mut creator_counts: HashMap<u64, usize> = HashMap::new();
let mut format_counts: HashMap<String, usize> = HashMap::new();
for candidate in candidates {
if selected.len() >= limit {
break;
}
// Check max_per_creator.
if let Some(max) = constraints.max_per_creator
&& let Some(creator_id) = candidate.creator_id
{
let count = creator_counts
.get(&creator_id.as_u64())
.copied()
.unwrap_or(0);
if count >= max {
continue;
}
}
// Check format_mix_max_fraction.
if let Some(max_frac) = constraints.format_mix_max_fraction
&& let Some(ref fmt) = candidate.format
{
let max_allowed = (max_frac * limit as f64).floor().max(1.0) as usize;
let fmt_count = format_counts.get(fmt).copied().unwrap_or(0) + 1;
if fmt_count > max_allowed {
continue;
}
}
// Accept candidate.
if let Some(creator_id) = candidate.creator_id {
*creator_counts.entry(creator_id.as_u64()).or_insert(0) += 1;
}
if let Some(ref fmt) = candidate.format {
*format_counts.entry(fmt.clone()).or_insert(0) += 1;
}
selected.push(candidate.clone());
}
selected
}
```
Walk the list once. For each candidate, check if accepting it would violate a constraint. If yes, skip. If no, accept and update the counts. One pass. O(n).
Two things to notice. First, candidates without a `creator_id` are never constrained by `max_per_creator` -- they are treated as unique. Same for candidates without a `format`. Missing metadata means unconstrained, not rejected. Second, the format fraction is computed against the target limit, not the current selection count. This prevents oscillation at small selection sizes.
## Relaxation
The single greedy pass is the happy path. When it fills the target count, the selector is done. But what happens when it cannot?
If your candidate pool has 200 items from 3 creators and you request 50 results with `max_per_creator: 2`, the greedy pass selects 6 items and stops. You asked for 50. You got 6. This is where every API-layer implementation breaks.
The `DiversitySelector` uses a four-stage relaxation strategy:
```rust
// From tidal/src/ranking/diversity.rs — DiversitySelector::select()
// Stage 0: greedy pass with full constraints.
// ...
// Stage 1: double max_per_creator if we did not fill target.
if accepted.len() < max_needed {
let stage1 = DiversityConstraints {
max_per_creator: constraints.max_per_creator.map(|n| n.saturating_mul(2)),
format_mix_max_fraction: constraints.format_mix_max_fraction,
..DiversityConstraints::new()
};
// Select from remaining candidates with relaxed constraints...
}
// Stage 2: ignore format_mix, keep doubled max_per_creator.
if accepted.len() < max_needed {
let stage2 = DiversityConstraints {
max_per_creator: constraints.max_per_creator.map(|n| n.saturating_mul(2)),
format_mix_max_fraction: None,
..DiversityConstraints::new()
};
// ...
}
// Stage 3: accept anything to fill to target.
if accepted.len() < max_needed {
for c in candidates {
if accepted.len() >= max_needed {
break;
}
accepted.insert(c.entity_id.as_u64());
}
}
```
The relaxation order is deliberate. Doubling the creator cap is a minor compromise -- the user sees 4 items from one creator instead of 2. Dropping format constraints is a larger compromise but still preserves creator diversity. Accepting anything is the last resort, and it only fires when the candidate pool genuinely lacks variety.
At every stage, the selector tracks which items were accepted using a `HashSet<u64>`. After all stages complete, a single final pass over the original scored list emits accepted items in their original score order. This preserves global ranking -- not just within-creator ordering, but across the entire result set.
The result carries full transparency:
```rust
// From tidal/src/ranking/diversity.rs
pub struct DiversityResult {
pub selected: Vec<ScoredCandidate>,
pub constraints_satisfied: bool,
pub violations: Vec<ConstraintViolation>,
}
```
When `constraints_satisfied` is false, the `violations` vector tells you exactly what failed: which constraint, which creator, which format, what fraction. No silent degradation. The caller can log it, surface it in a debug panel, or ignore it -- but the information is there.
## The invariants
Two properties hold regardless of input:
**INV-RANK-5: Result count is always `min(target_count, candidates.len())`.** Diversity is reordering, not filtering. If you have 200 candidates and request 50, you get 50. If the greedy pass only selects 30, relaxation fills the remaining 20. The caller never receives fewer results than expected because of diversity constraints.
**INV-RANK-6: Score order is preserved.** The final emit pass walks the original sorted candidate list and retains only accepted IDs. This means the output is in the same order the scoring stage produced -- the global ranking is preserved across the entire result, not just within creator groups.
These are not aspirational. They are verified by property tests across hundreds of random candidate sets:
```rust
// From tidal/src/ranking/diversity.rs — property tests
proptest! {
#[test]
fn prop_result_count_invariant(
n in 0usize..200,
target in 0usize..100,
max_per_creator in 1usize..5,
) {
let candidates = build_candidates_single_creator(n);
let constraints = DiversityConstraints::new().max_per_creator(max_per_creator);
let result = DiversitySelector::select(&candidates, &constraints, target);
prop_assert_eq!(result.selected.len(), target.min(n));
}
}
```
The worst case for this property: every candidate is from the same creator, `max_per_creator` is 1, and the target is larger than 1. The greedy pass selects 1. Stage 1 selects 1 more (doubled to 2). Stages 2 and 3 fill the rest. The result has `target` items. The constraint is violated and reported. But the count is correct.
## The constraints
Two constraints are enforced today. Both operate on metadata attached to each scored candidate by the query executor.
**`max_per_creator`**: No more than N items from the same creator. The most common diversity constraint in practice. Builder API:
```rust
// From tidal/src/ranking/diversity.rs
let constraints = DiversityConstraints::new().max_per_creator(2);
```
**`format_mix`**: No single content format exceeds a given fraction of results. Prevents a feed from being entirely short-form video when the user's history includes podcasts and articles. Builder API:
```rust
let constraints = DiversityConstraints::new().format_mix(0.6); // max 60% any format
```
Both compose:
```rust
let constraints = DiversityConstraints::new()
.max_per_creator(2)
.format_mix(0.6);
```
The struct also carries reserved fields for constraints coming later -- exploration budgets for new content, topic diversity via embedding distance, category minimums. These fields exist on the type for forward compatibility. The selector ignores them today.
## A concrete example
Consider a candidate pool of 10 items, sorted by score. Creator A has a viral day and holds 6 of the top 10 slots. Creator B has 2. Creators C and D have 1 each.
```
Pre-diversity (sorted by score):
1. [A] score 0.95 "Interview clip" video
2. [A] score 0.91 "Behind the scenes" video
3. [B] score 0.87 "Tutorial part 3" video
4. [A] score 0.84 "Hot take" short
5. [A] score 0.80 "Live reaction" video
6. [C] score 0.76 "Deep dive analysis" article
7. [A] score 0.72 "Unboxing" short
8. [B] score 0.68 "Tutorial part 4" video
9. [D] score 0.64 "Weekly roundup" article
10. [A] score 0.58 "Q&A stream" video
```
Apply `max_per_creator: 2`, target 6:
Stage 0 (full constraints) walks the list top to bottom:
- Item 1 [A]: accept (A count: 1)
- Item 2 [A]: accept (A count: 2)
- Item 3 [B]: accept (B count: 1)
- Item 4 [A]: **skip** (A at limit)
- Item 5 [A]: **skip** (A at limit)
- Item 6 [C]: accept (C count: 1)
- Item 7 [A]: **skip** (A at limit)
- Item 8 [B]: accept (B count: 2)
- Item 9 [D]: accept (D count: 1)
Stage 0 selected 6 items. Target met. No relaxation needed.
```
Post-diversity:
1. [A] score 0.95 "Interview clip" video
2. [A] score 0.91 "Behind the scenes" video
3. [B] score 0.87 "Tutorial part 3" video
4. [C] score 0.76 "Deep dive analysis" article
5. [B] score 0.68 "Tutorial part 4" video
6. [D] score 0.64 "Weekly roundup" article
```
Creator A still has the top 2 slots -- their best items are genuinely the best. But the user also sees B, C, and D. The feed is diverse. The result count is 6, as requested. Score order is preserved globally. The `constraints_satisfied` flag is `true`.
Now consider the degenerate case: all 10 items from creator A, `max_per_creator: 1`, target 6. Stage 0 runs `greedy_select` with `max_per_creator=1` and selects 1 item. Stage 1 gets a *fresh* `creator_counts` (not inherited from Stage 0), with `max_per_creator` doubled to 2 and a limit of 5 (the remaining shortfall). It processes the 9 unselected candidates -- all creator A -- and accepts 2 before hitting the per-creator cap. Stage 2 drops format constraints but keeps the doubled cap; another fresh `creator_counts`, limit 3, accepts 2 more. Stage 3 accepts anything to fill the last slot. Result: 6 items. `constraints_satisfied: false`. Violation reported: creator A appears 6 times, requested max was 1. The caller knows.
## The cost
The diversity selector is a `HashMap` lookup per candidate per active constraint. For 200 candidates and a target of 100, Stage 0 does at most 200 iterations, each with one hash lookup for creator and one for format. When Stage 0 fills the target (the common case), there are no further passes.
The benchmarks measure the selector in isolation -- no query executor, no scoring, no I/O. Pure algorithmic cost.
```rust
// From tidal/benches/diversity.rs
fn bench_diversity_200_max_per_creator_2(c: &mut Criterion) {
let candidates = make_200_candidates(50); // 200 items, 50 creators
let constraints = DiversityConstraints::new().max_per_creator(2);
c.bench_function("diversity_200_max_per_creator_2", |b| {
b.iter(|| {
DiversitySelector::select(
black_box(&candidates),
black_box(&constraints),
black_box(100),
)
});
});
}
```
The setup: 200 candidates across 50 creators, 4 items per creator. Select 100 with `max_per_creator: 2`. This forces the selector to interleave creators and skip high-scoring duplicates -- the realistic case for a trending feed.
Five scenarios are benchmarked:
| Scenario | Constraints | What it measures |
|----------|------------|-----------------|
| `max_per_creator` only | 50 creators, cap at 2 | Common case |
| `format_mix` only | 2 formats, cap at 60% | Format balancing |
| Combined | Both constraints active | Realistic production config |
| Worst-case relaxation | 1 creator, all stages fire | Maximum overhead |
| No constraints | Fast path (slice + return) | Baseline |
The acceptance target is under 1 millisecond for all scenarios. The algorithm is O(n) with small constants -- hash map lookups against a set that fits in L1 cache. For 200 candidates, the fast path (no constraints) is a memcpy. The constrained paths add a handful of microseconds.
For context: the diversity pass is Stage 4 of the five-stage query pipeline. Stage 3 (signal scoring) reads decay scores from the ledger -- one `exp()` call and one multiply per entity per signal. Stage 2 (filtering) intersects RoaringBitmaps. The diversity pass sits between scoring and result assembly. Its cost is a rounding error in the total query budget.
## What the query looks like
In a tidalDB query, diversity is a first-class parameter:
```rust
let query = Retrieve::builder()
.profile("trending")
.diversity(DiversityConstraints::new().max_per_creator(1))
.limit(25)
.build()?;
let results = db.retrieve(&query)?;
```
The caller specifies the constraint. The database enforces it. The result includes every signal value that contributed to each item's score (the signal snapshot), plus the `constraints_satisfied` flag and any violations. No external service. No post-processing in the API layer. No second code path for mobile vs. web.
Compare this to the API-layer approach, where the diversity logic is repeated in every service that returns ranked content, tested inconsistently (or not at all), and coupled to the specific ranking pipeline it was written for. The database approach is not just cleaner. It is the only way to guarantee that every surface -- feeds, search, notifications, related content -- applies the same diversity rules with the same correctness guarantees.
## Property tests, not unit tests
The diversity selector is verified by property tests, not just example-based unit tests. The distinction matters.
A unit test says: "given these 10 specific candidates, the output looks like this." It proves the algorithm works for one input. A property test says: "for any combination of candidate count, creator distribution, format distribution, constraint values, and target count, these invariants hold." It proves the algorithm works for the space of possible inputs.
Five properties are tested across hundreds of random candidate sets each:
1. **`max_per_creator` is never exceeded** when `constraints_satisfied` is true.
2. **Format fraction is never exceeded** when `constraints_satisfied` is true.
3. **Result count equals `min(target, candidates.len())`** -- always, regardless of constraint satisfaction.
4. **Graceful degradation**: impossible constraints (1 creator, max 1, target > 1) still produce `target` results via relaxation, with `constraints_satisfied: false`.
5. **Score order preserved within creator groups**: for any creator in the result set, their items appear in descending score order.
```rust
// From tidal/src/ranking/diversity.rs — property test P4
proptest! {
#[test]
fn prop_unsatisfiable_still_fills(
n in 5usize..50,
target in 3usize..20,
) {
let candidates = build_candidates_single_creator(n);
let constraints = DiversityConstraints::new().max_per_creator(1);
let result = DiversitySelector::select(&candidates, &constraints, target);
prop_assert_eq!(result.selected.len(), target.min(n));
}
}
```
Property 3 is the one that catches bugs in relaxation. If a Stage 2 pass forgets to account for already-selected items, or Stage 3 has an off-by-one in the fill loop, the count invariant fails across thousands of random inputs before it would ever surface in a hand-written test.
## What is not here yet
Topic diversity -- ensuring the result set spans multiple subject categories, not just multiple creators -- requires computing embedding distances between selected items. That changes the algorithm from greedy O(n) to MMR (Maximal Marginal Relevance), which is O(n*k) where k is the selected count. It is a meaningful algorithmic upgrade. The `DiversityConstraints` struct has a `topic_diversity` field reserved for it. The selector ignores it today.
Exploration budgets -- guaranteeing that a fraction of results come from creators the user does not follow -- require the relationship graph. That is coming with personalized ranking.
Both are on the roadmap. Both slot into the same pipeline stage. The selector's interface does not change -- only the constraint evaluation inside the greedy loop gets richer.
## Why this matters
The diversity code in your ranking service was written by an engineer who left the company two years ago. It was modified during three separate incidents. It has a `TODO` comment from 2023 that says "clean this up." It does not have property tests. It does not report constraint violations. It silently drops results when constraints cannot be satisfied. It runs in one endpoint but not the three others that also return ranked content.
Diversity enforcement is a roughly 300-line module with two constraints, four relaxation stages, five property tests, and a cost that vanishes into the noise floor of the query pipeline. It does not belong in your API layer. It belongs in the database.
---
*The diversity selector is at [tidal/src/ranking/diversity.rs](https://github.com/orchard9/tidalDB/blob/main/tidal/src/ranking/diversity.rs). The benchmarks are at [tidal/benches/diversity.rs](https://github.com/orchard9/tidalDB/blob/main/tidal/benches/diversity.rs). Follow the build on [GitHub](https://github.com/orchard9/tidalDB).*