- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test coverage for every public API surface - m1p5: TidalDb public API — write_item, signal, read_decay_score, read_windowed_count, read_velocity; StorageBox enum routing memory vs fjall; WalSender/WalHandleWriter bridge; WAL replay on open - Periodic checkpoint: 30s background thread for persistent+schema mode; FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful shutdown via Arc<AtomicBool> + join before final checkpoint - ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing) - Milestone 2 planning scaffolding added under docs/planning/milestone-2/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
19 KiB
Task 02: Diversity Property Tests + Benchmarks
Context
Milestone: 2 -- Ranked Retrieval Phase: m2p4 -- Diversity Enforcement Depends On: Task 01 (DiversityConstraints, DiversityResult, DiversitySelector, ConstraintViolation, ScoredCandidate with diversity metadata) Blocks: m2p5 (RETRIEVE executor relies on diversity enforcement being correct and performant) Complexity: S
Objective
Deliver property-based tests (proptest) that verify diversity constraints hold across 10,000 random candidate sets, and Criterion benchmarks that verify the diversity pass completes in under 1ms for 200 candidates. The property tests are the core acceptance gate for this phase -- they prove that no matter what the scored candidate list looks like, the diversity selector either satisfies all constraints or correctly reports which constraints it could not satisfy.
The property tests must cover:
max_per_creatornever exceeded in the selected set (when satisfiable)format_mixfraction never exceeded (when satisfiable)- Result count never drops below
min(target_count, candidates.len()) - Graceful degradation when constraints are unsatisfiable
- Score order preserved within same-creator groups
The benchmarks must demonstrate:
- 200 candidates with
max_per_creator=2and 50 creators: select 50 in under 1ms - 200 candidates with
format_mix=0.6and 5 formats: select 50 in under 1ms
Requirements
- 5 proptest property tests covering the diversity invariants
- 2 Criterion benchmarks meeting the < 1ms performance target
- All property tests use
ProptestConfigwithcases = 10_000 - Benchmarks added to the existing
tidal/benches/ranking.rsfile - No
unsafecode
Technical Design
Module Structure
tidal/src/ranking/
diversity.rs -- property tests added to existing #[cfg(test)] mod at bottom (Task 02)
tidal/benches/
ranking.rs -- Criterion benchmarks for diversity (Task 02)
Property Tests
// === ranking/diversity.rs (added to #[cfg(test)] module) ===
use proptest::prelude::*;
/// Strategy to generate a random scored candidate with diversity metadata.
fn arb_scored_candidate(
max_entity_id: u64,
max_creator_id: u64,
formats: &'static [&'static str],
) -> impl Strategy<Value = ScoredCandidate> {
(
0..max_entity_id,
prop::num::f64::POSITIVE, // score > 0
0..max_creator_id,
prop::sample::select(formats),
).prop_map(|(entity, score, creator, format)| {
ScoredCandidate::with_diversity_metadata(
EntityId::new(entity),
score,
Some(EntityId::new(creator + 1000)), // offset to distinguish from entity IDs
Some(format.to_string()),
)
})
}
/// Strategy to generate a Vec of scored candidates, sorted by score descending.
fn arb_candidate_set(
min_size: usize,
max_size: usize,
max_creators: u64,
formats: &'static [&'static str],
) -> impl Strategy<Value = Vec<ScoredCandidate>> {
prop::collection::vec(
arb_scored_candidate(10_000, max_creators, formats),
min_size..=max_size,
).prop_map(|mut candidates| {
// Deduplicate by entity_id (keep first occurrence)
let mut seen = std::collections::HashSet::new();
candidates.retain(|c| seen.insert(c.entity_id.clone()));
// Sort by score descending (selector expects sorted input)
candidates.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
candidates
})
}
const TEST_FORMATS: &[&str] = &["video", "article", "short", "podcast", "live"];
// P1: max_per_creator is never exceeded in the selected set.
//
// When the candidate set has enough unique creators to satisfy the constraint
// at the requested target_count, every creator in the result set has at most
// max_per_creator items. When it cannot be satisfied, constraints_satisfied
// is false.
proptest! {
#![proptest_config(ProptestConfig::with_cases(10_000))]
#[test]
fn prop_max_per_creator_holds(
candidates in arb_candidate_set(10, 200, 50, TEST_FORMATS),
max_per_creator in 1usize..=5,
target in 5usize..=50,
) {
let constraints = DiversityConstraints {
max_per_creator: Some(max_per_creator),
..DiversityConstraints::none()
};
let target_count = target.min(candidates.len());
let result = DiversitySelector::select(candidates, &constraints, target_count);
if result.constraints_satisfied {
// When satisfied, no creator exceeds the limit
let mut creator_counts: std::collections::HashMap<EntityId, usize> =
std::collections::HashMap::new();
for c in &result.selected {
if let Some(ref id) = c.creator_id {
*creator_counts.entry(id.clone()).or_insert(0) += 1;
}
}
for (creator_id, count) in &creator_counts {
prop_assert!(*count <= max_per_creator,
"creator {:?} has {} items, max is {}",
creator_id, count, max_per_creator);
}
} else {
// When not satisfied, violations are reported
prop_assert!(!result.violations.is_empty(),
"constraints_satisfied=false but no violations reported");
}
}
}
// P2: format_mix fraction is never exceeded in the selected set.
//
// When format_mix is enforced, no single format exceeds 60% of the result set
// (for result sets of size >= 5). When it cannot be satisfied, a violation is
// reported.
proptest! {
#![proptest_config(ProptestConfig::with_cases(10_000))]
#[test]
fn prop_format_mix_holds(
candidates in arb_candidate_set(10, 200, 50, TEST_FORMATS),
target in 5usize..=50,
) {
let constraints = DiversityConstraints {
format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
..DiversityConstraints::none()
};
let target_count = target.min(candidates.len());
let result = DiversitySelector::select(candidates, &constraints, target_count);
if result.constraints_satisfied && result.selected.len() >= 5 {
let mut format_counts: std::collections::HashMap<&str, usize> =
std::collections::HashMap::new();
for c in &result.selected {
if let Some(ref fmt) = c.format {
*format_counts.entry(fmt.as_str()).or_insert(0) += 1;
}
}
let total = result.selected.len() as f64;
for (fmt, count) in &format_counts {
let fraction = *count as f64 / total;
prop_assert!(fraction <= DEFAULT_FORMAT_MIX_MAX_FRACTION + 0.01,
"format '{}' has fraction {:.3}, exceeds {:.2}",
fmt, fraction, DEFAULT_FORMAT_MIX_MAX_FRACTION);
}
}
}
}
// P3: result count never drops below min(target_count, candidates.len()).
//
// The diversity selector MUST return at least as many items as the minimum
// of the target count and the candidate count. Diversity is reordering,
// not filtering (INV-RANK-5).
proptest! {
#![proptest_config(ProptestConfig::with_cases(10_000))]
#[test]
fn prop_no_result_count_reduction_when_possible(
candidates in arb_candidate_set(1, 200, 50, TEST_FORMATS),
max_per_creator in 1usize..=5,
target in 1usize..=50,
) {
let constraints = DiversityConstraints {
max_per_creator: Some(max_per_creator),
format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
..DiversityConstraints::none()
};
let expected_count = target.min(candidates.len());
let result = DiversitySelector::select(candidates, &constraints, target);
prop_assert_eq!(result.selected.len(), expected_count,
"expected {} results, got {} (target={}, candidates={})",
expected_count, result.selected.len(), target, expected_count);
}
}
// P4: graceful degradation under impossible constraints.
//
// When constraints are unsatisfiable (e.g., all items from 1 creator with
// max_per_creator=1 and target > 1), the selector still returns results
// (via relaxation) and reports constraints_satisfied=false with violations.
proptest! {
#![proptest_config(ProptestConfig::with_cases(10_000))]
#[test]
fn prop_graceful_degradation(
num_items in 5usize..=100,
target in 2usize..=20,
) {
// All items from a single creator -- impossible to satisfy max_per_creator=1
let creator = EntityId::new(999);
let candidates: Vec<ScoredCandidate> = (0..num_items).map(|i| {
ScoredCandidate::with_diversity_metadata(
EntityId::new(i as u64),
(num_items - i) as f64 * 0.01,
Some(creator.clone()),
Some("video".to_string()),
)
}).collect();
let constraints = DiversityConstraints {
max_per_creator: Some(1),
..DiversityConstraints::none()
};
let target_count = target.min(num_items);
let result = DiversitySelector::select(candidates, &constraints, target_count);
// Should still return target_count items via relaxation
prop_assert_eq!(result.selected.len(), target_count,
"expected {} items after relaxation, got {}",
target_count, result.selected.len());
// Should report unsatisfied constraints when target > 1
// (with 1 creator, max_per_creator=1, and target > 1)
if target_count > 1 {
prop_assert!(!result.constraints_satisfied,
"constraints should not be satisfied: 1 creator, max_per_creator=1, target={}",
target_count);
prop_assert!(!result.violations.is_empty(),
"violations should be reported");
}
}
}
// P5: score order preserved within same-creator items.
//
// Among selected items from the same creator, they appear in the order
// they were encountered in the input (which is score-descending). This
// means the greedy selector always picks the highest-scored item from
// a creator before picking a lower-scored one.
proptest! {
#![proptest_config(ProptestConfig::with_cases(10_000))]
#[test]
fn prop_score_order_preserved(
candidates in arb_candidate_set(10, 200, 20, TEST_FORMATS),
max_per_creator in 1usize..=5,
target in 5usize..=50,
) {
let constraints = DiversityConstraints {
max_per_creator: Some(max_per_creator),
..DiversityConstraints::none()
};
let target_count = target.min(candidates.len());
let result = DiversitySelector::select(candidates, &constraints, target_count);
// Group selected items by creator
let mut by_creator: std::collections::HashMap<EntityId, Vec<f64>> =
std::collections::HashMap::new();
for c in &result.selected {
if let Some(ref id) = c.creator_id {
by_creator.entry(id.clone()).or_default().push(c.score);
}
}
// Within each creator, scores must be non-increasing
for (creator_id, scores) in &by_creator {
for pair in scores.windows(2) {
prop_assert!(pair[0] >= pair[1],
"creator {:?} has scores out of order: {} < {}",
creator_id, pair[0], pair[1]);
}
}
}
}
Criterion Benchmarks
// === tidal/benches/ranking.rs (additions to existing benchmark file) ===
use tidaldb::ranking::diversity::*;
/// Setup: create 200 scored candidates with 50 creators and mixed formats.
fn setup_diversity_candidates_200() -> Vec<ScoredCandidate> {
let formats = ["video", "article", "short", "podcast", "live"];
(0..200).map(|i| {
let creator = EntityId::new((i % 50) as u64 + 1000);
let format = formats[i % formats.len()].to_string();
ScoredCandidate::with_diversity_metadata(
EntityId::new(i as u64),
(200 - i) as f64 * 0.005, // scores from 1.0 to 0.005
Some(creator),
Some(format),
)
}).collect()
}
/// KEY BENCHMARK: 200 candidates, max_per_creator=2, 50 creators -> select 50.
/// Target: < 1ms.
///
/// This is the primary diversity performance gate from the ROADMAP
/// acceptance criteria.
fn bench_diversity_200_candidates_max_per_creator(c: &mut Criterion) {
let candidates = setup_diversity_candidates_200();
let constraints = DiversityConstraints {
max_per_creator: Some(2),
..DiversityConstraints::none()
};
c.bench_function("diversity_200_max_per_creator_2", |b| {
b.iter(|| {
DiversitySelector::select(
candidates.clone(),
&constraints,
50,
)
})
});
}
/// BENCHMARK: 200 candidates, format_mix=0.6, 5 formats -> select 50.
/// Target: < 1ms.
fn bench_diversity_200_candidates_format_mix(c: &mut Criterion) {
let candidates = setup_diversity_candidates_200();
let constraints = DiversityConstraints {
format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
..DiversityConstraints::none()
};
c.bench_function("diversity_200_format_mix", |b| {
b.iter(|| {
DiversitySelector::select(
candidates.clone(),
&constraints,
50,
)
})
});
}
/// BENCHMARK: 200 candidates, both constraints -> select 50.
/// Target: < 1ms.
fn bench_diversity_200_candidates_combined(c: &mut Criterion) {
let candidates = setup_diversity_candidates_200();
let constraints = DiversityConstraints {
max_per_creator: Some(2),
format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
..DiversityConstraints::none()
};
c.bench_function("diversity_200_combined", |b| {
b.iter(|| {
DiversitySelector::select(
candidates.clone(),
&constraints,
50,
)
})
});
}
/// BENCHMARK: worst case -- all candidates from 1 creator, requires relaxation.
/// Target: < 1ms even with relaxation passes.
fn bench_diversity_200_candidates_worst_case_relaxation(c: &mut Criterion) {
let creator = EntityId::new(999);
let candidates: Vec<ScoredCandidate> = (0..200).map(|i| {
ScoredCandidate::with_diversity_metadata(
EntityId::new(i as u64),
(200 - i) as f64 * 0.005,
Some(creator.clone()),
Some("video".to_string()),
)
}).collect();
let constraints = DiversityConstraints {
max_per_creator: Some(2),
format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
..DiversityConstraints::none()
};
c.bench_function("diversity_200_worst_case_relaxation", |b| {
b.iter(|| {
DiversitySelector::select(
candidates.clone(),
&constraints,
50,
)
})
});
}
/// BENCHMARK: no constraints (fast path).
fn bench_diversity_200_candidates_no_constraints(c: &mut Criterion) {
let candidates = setup_diversity_candidates_200();
let constraints = DiversityConstraints::none();
c.bench_function("diversity_200_no_constraints", |b| {
b.iter(|| {
DiversitySelector::select(
candidates.clone(),
&constraints,
50,
)
})
});
}
// Add to the existing criterion_group!:
//
// criterion_group!(
// benches,
// // ... existing m2p3 benchmarks ...
// bench_diversity_200_candidates_max_per_creator,
// bench_diversity_200_candidates_format_mix,
// bench_diversity_200_candidates_combined,
// bench_diversity_200_candidates_worst_case_relaxation,
// bench_diversity_200_candidates_no_constraints,
// );
Acceptance Criteria
- Property test P1 (
prop_max_per_creator_holds): for 10,000 random candidate sets, max_per_creator is never exceeded when constraints are satisfied - Property test P2 (
prop_format_mix_holds): for 10,000 random candidate sets, no format exceeds 60% when constraints are satisfied - Property test P3 (
prop_no_result_count_reduction_when_possible): for 10,000 random candidate sets, result count equalsmin(target_count, candidates.len())-- diversity never reduces count - Property test P4 (
prop_graceful_degradation): for 10,000 random single-creator candidate sets, selector returns target count via relaxation and reportsconstraints_satisfied=false - Property test P5 (
prop_score_order_preserved): for 10,000 random candidate sets, items from the same creator appear in score-descending order - Criterion benchmark
diversity_200_max_per_creator_2: 200 candidates, 50 creators, max_per_creator=2, select 50 in < 1ms - Criterion benchmark
diversity_200_format_mix: 200 candidates, 5 formats, format_mix=0.6, select 50 in < 1ms - Criterion benchmark
diversity_200_combined: 200 candidates, both constraints, select 50 in < 1ms - Criterion benchmark
diversity_200_worst_case_relaxation: 200 candidates from 1 creator, both constraints, select 50 in < 1ms (measures relaxation overhead) - Criterion benchmark
diversity_200_no_constraints: 200 candidates, no constraints (fast path), select 50 (baseline measurement) - Benchmarks added to existing
tidal/benches/ranking.rsand registered incriterion_group! proptestalready in[dev-dependencies]from m1p4; no new dependency neededcargo clippy -- -D warningspasses- All property tests and benchmarks pass
Spec References
- docs/specs/09-ranking-scoring.md -- Section 9.5 (INV-RANK-5: diversity never reduces result count), Section 15 (Performance targets: diversity pass < 1ms for 200 candidates)
Implementation Notes
proptestis already a[dev-dependencies]entry from m1p4 property tests. No new dependency needed.criterionis already configured fortidal/benches/ranking.rsfrom m2p3. The diversity benchmarks are added to the same file and registered in the existingcriterion_group!.- The
arb_scored_candidatestrategy generates candidates withprop::num::f64::POSITIVEfor scores, which includes very small and very large values. This is intentional -- the selector should handle any score distribution. - The candidate deduplication in
arb_candidate_set(retain unique entity IDs) is necessary because the selector may use entity IDs as keys in internal data structures. Duplicate entity IDs in the input would be a caller bug, not something the selector needs to handle. - The format fraction tolerance in P2 (
+ 0.01) accounts for integer division effects. With 10 items and 6 of one format, the fraction is 0.6 exactly -- but floating point comparison needs a small epsilon. - The
candidates.clone()in benchmarks measures the clone cost as part of the benchmark. This is intentional: in production, the diversity selector owns the candidate vec (it is consumed, not borrowed). The clone simulates the real allocation pattern. If the clone dominates benchmark time, switch toiter_batchedwith a setup closure. - P3 is the critical property test -- it proves INV-RANK-5 (diversity never reduces result count). If this test fails, the relaxation logic has a bug. This is the acceptance gate for the phase.