jordan 6fdaa1584b feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint

- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples
  (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test
  coverage for every public API surface
- m1p5: TidalDb public API — write_item, signal, read_decay_score,
  read_windowed_count, read_velocity; StorageBox enum routing memory vs
  fjall; WalSender/WalHandleWriter bridge; WAL replay on open
- Periodic checkpoint: 30s background thread for persistent+schema mode;
  FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful
  shutdown via Arc<AtomicBool> + join before final checkpoint
- ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing)
- Milestone 2 planning scaffolding added under docs/planning/milestone-2/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 22:45:10 -07:00

19 KiB

Raw Blame History

Task 02: Diversity Property Tests + Benchmarks

Context

Milestone: 2 -- Ranked Retrieval Phase: m2p4 -- Diversity Enforcement Depends On: Task 01 (DiversityConstraints, DiversityResult, DiversitySelector, ConstraintViolation, ScoredCandidate with diversity metadata) Blocks: m2p5 (RETRIEVE executor relies on diversity enforcement being correct and performant) Complexity: S

Objective

Deliver property-based tests (proptest) that verify diversity constraints hold across 10,000 random candidate sets, and Criterion benchmarks that verify the diversity pass completes in under 1ms for 200 candidates. The property tests are the core acceptance gate for this phase -- they prove that no matter what the scored candidate list looks like, the diversity selector either satisfies all constraints or correctly reports which constraints it could not satisfy.

The property tests must cover:

max_per_creator never exceeded in the selected set (when satisfiable)
format_mix fraction never exceeded (when satisfiable)
Result count never drops below min(target_count, candidates.len())
Graceful degradation when constraints are unsatisfiable
Score order preserved within same-creator groups

The benchmarks must demonstrate:

200 candidates with max_per_creator=2 and 50 creators: select 50 in under 1ms
200 candidates with format_mix=0.6 and 5 formats: select 50 in under 1ms

Requirements

5 proptest property tests covering the diversity invariants
2 Criterion benchmarks meeting the < 1ms performance target
All property tests use ProptestConfig with cases = 10_000
Benchmarks added to the existing tidal/benches/ranking.rs file
No unsafe code

Technical Design

Module Structure

tidal/src/ranking/
  diversity.rs       -- property tests added to existing #[cfg(test)] mod at bottom (Task 02)
tidal/benches/
  ranking.rs         -- Criterion benchmarks for diversity (Task 02)

Property Tests

// === ranking/diversity.rs (added to #[cfg(test)] module) ===

use proptest::prelude::*;

/// Strategy to generate a random scored candidate with diversity metadata.
fn arb_scored_candidate(
    max_entity_id: u64,
    max_creator_id: u64,
    formats: &'static [&'static str],
) -> impl Strategy<Value = ScoredCandidate> {
    (
        0..max_entity_id,
        prop::num::f64::POSITIVE,  // score > 0
        0..max_creator_id,
        prop::sample::select(formats),
    ).prop_map(|(entity, score, creator, format)| {
        ScoredCandidate::with_diversity_metadata(
            EntityId::new(entity),
            score,
            Some(EntityId::new(creator + 1000)), // offset to distinguish from entity IDs
            Some(format.to_string()),
        )
    })
}

/// Strategy to generate a Vec of scored candidates, sorted by score descending.
fn arb_candidate_set(
    min_size: usize,
    max_size: usize,
    max_creators: u64,
    formats: &'static [&'static str],
) -> impl Strategy<Value = Vec<ScoredCandidate>> {
    prop::collection::vec(
        arb_scored_candidate(10_000, max_creators, formats),
        min_size..=max_size,
    ).prop_map(|mut candidates| {
        // Deduplicate by entity_id (keep first occurrence)
        let mut seen = std::collections::HashSet::new();
        candidates.retain(|c| seen.insert(c.entity_id.clone()));
        // Sort by score descending (selector expects sorted input)
        candidates.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal));
        candidates
    })
}

const TEST_FORMATS: &[&str] = &["video", "article", "short", "podcast", "live"];

// P1: max_per_creator is never exceeded in the selected set.
//
// When the candidate set has enough unique creators to satisfy the constraint
// at the requested target_count, every creator in the result set has at most
// max_per_creator items. When it cannot be satisfied, constraints_satisfied
// is false.
proptest! {
    #![proptest_config(ProptestConfig::with_cases(10_000))]
    #[test]
    fn prop_max_per_creator_holds(
        candidates in arb_candidate_set(10, 200, 50, TEST_FORMATS),
        max_per_creator in 1usize..=5,
        target in 5usize..=50,
    ) {
        let constraints = DiversityConstraints {
            max_per_creator: Some(max_per_creator),
            ..DiversityConstraints::none()
        };
        let target_count = target.min(candidates.len());
        let result = DiversitySelector::select(candidates, &constraints, target_count);

        if result.constraints_satisfied {
            // When satisfied, no creator exceeds the limit
            let mut creator_counts: std::collections::HashMap<EntityId, usize> =
                std::collections::HashMap::new();
            for c in &result.selected {
                if let Some(ref id) = c.creator_id {
                    *creator_counts.entry(id.clone()).or_insert(0) += 1;
                }
            }
            for (creator_id, count) in &creator_counts {
                prop_assert!(*count <= max_per_creator,
                    "creator {:?} has {} items, max is {}",
                    creator_id, count, max_per_creator);
            }
        } else {
            // When not satisfied, violations are reported
            prop_assert!(!result.violations.is_empty(),
                "constraints_satisfied=false but no violations reported");
        }
    }
}

// P2: format_mix fraction is never exceeded in the selected set.
//
// When format_mix is enforced, no single format exceeds 60% of the result set
// (for result sets of size >= 5). When it cannot be satisfied, a violation is
// reported.
proptest! {
    #![proptest_config(ProptestConfig::with_cases(10_000))]
    #[test]
    fn prop_format_mix_holds(
        candidates in arb_candidate_set(10, 200, 50, TEST_FORMATS),
        target in 5usize..=50,
    ) {
        let constraints = DiversityConstraints {
            format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
            ..DiversityConstraints::none()
        };
        let target_count = target.min(candidates.len());
        let result = DiversitySelector::select(candidates, &constraints, target_count);

        if result.constraints_satisfied && result.selected.len() >= 5 {
            let mut format_counts: std::collections::HashMap<&str, usize> =
                std::collections::HashMap::new();
            for c in &result.selected {
                if let Some(ref fmt) = c.format {
                    *format_counts.entry(fmt.as_str()).or_insert(0) += 1;
                }
            }
            let total = result.selected.len() as f64;
            for (fmt, count) in &format_counts {
                let fraction = *count as f64 / total;
                prop_assert!(fraction <= DEFAULT_FORMAT_MIX_MAX_FRACTION + 0.01,
                    "format '{}' has fraction {:.3}, exceeds {:.2}",
                    fmt, fraction, DEFAULT_FORMAT_MIX_MAX_FRACTION);
            }
        }
    }
}

// P3: result count never drops below min(target_count, candidates.len()).
//
// The diversity selector MUST return at least as many items as the minimum
// of the target count and the candidate count. Diversity is reordering,
// not filtering (INV-RANK-5).
proptest! {
    #![proptest_config(ProptestConfig::with_cases(10_000))]
    #[test]
    fn prop_no_result_count_reduction_when_possible(
        candidates in arb_candidate_set(1, 200, 50, TEST_FORMATS),
        max_per_creator in 1usize..=5,
        target in 1usize..=50,
    ) {
        let constraints = DiversityConstraints {
            max_per_creator: Some(max_per_creator),
            format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
            ..DiversityConstraints::none()
        };
        let expected_count = target.min(candidates.len());
        let result = DiversitySelector::select(candidates, &constraints, target);

        prop_assert_eq!(result.selected.len(), expected_count,
            "expected {} results, got {} (target={}, candidates={})",
            expected_count, result.selected.len(), target, expected_count);
    }
}

// P4: graceful degradation under impossible constraints.
//
// When constraints are unsatisfiable (e.g., all items from 1 creator with
// max_per_creator=1 and target > 1), the selector still returns results
// (via relaxation) and reports constraints_satisfied=false with violations.
proptest! {
    #![proptest_config(ProptestConfig::with_cases(10_000))]
    #[test]
    fn prop_graceful_degradation(
        num_items in 5usize..=100,
        target in 2usize..=20,
    ) {
        // All items from a single creator -- impossible to satisfy max_per_creator=1
        let creator = EntityId::new(999);
        let candidates: Vec<ScoredCandidate> = (0..num_items).map(|i| {
            ScoredCandidate::with_diversity_metadata(
                EntityId::new(i as u64),
                (num_items - i) as f64 * 0.01,
                Some(creator.clone()),
                Some("video".to_string()),
            )
        }).collect();

        let constraints = DiversityConstraints {
            max_per_creator: Some(1),
            ..DiversityConstraints::none()
        };
        let target_count = target.min(num_items);
        let result = DiversitySelector::select(candidates, &constraints, target_count);

        // Should still return target_count items via relaxation
        prop_assert_eq!(result.selected.len(), target_count,
            "expected {} items after relaxation, got {}",
            target_count, result.selected.len());

        // Should report unsatisfied constraints when target > 1
        // (with 1 creator, max_per_creator=1, and target > 1)
        if target_count > 1 {
            prop_assert!(!result.constraints_satisfied,
                "constraints should not be satisfied: 1 creator, max_per_creator=1, target={}",
                target_count);
            prop_assert!(!result.violations.is_empty(),
                "violations should be reported");
        }
    }
}

// P5: score order preserved within same-creator items.
//
// Among selected items from the same creator, they appear in the order
// they were encountered in the input (which is score-descending). This
// means the greedy selector always picks the highest-scored item from
// a creator before picking a lower-scored one.
proptest! {
    #![proptest_config(ProptestConfig::with_cases(10_000))]
    #[test]
    fn prop_score_order_preserved(
        candidates in arb_candidate_set(10, 200, 20, TEST_FORMATS),
        max_per_creator in 1usize..=5,
        target in 5usize..=50,
    ) {
        let constraints = DiversityConstraints {
            max_per_creator: Some(max_per_creator),
            ..DiversityConstraints::none()
        };
        let target_count = target.min(candidates.len());
        let result = DiversitySelector::select(candidates, &constraints, target_count);

        // Group selected items by creator
        let mut by_creator: std::collections::HashMap<EntityId, Vec<f64>> =
            std::collections::HashMap::new();
        for c in &result.selected {
            if let Some(ref id) = c.creator_id {
                by_creator.entry(id.clone()).or_default().push(c.score);
            }
        }

        // Within each creator, scores must be non-increasing
        for (creator_id, scores) in &by_creator {
            for pair in scores.windows(2) {
                prop_assert!(pair[0] >= pair[1],
                    "creator {:?} has scores out of order: {} < {}",
                    creator_id, pair[0], pair[1]);
            }
        }
    }
}

Criterion Benchmarks

// === tidal/benches/ranking.rs (additions to existing benchmark file) ===

use tidaldb::ranking::diversity::*;

/// Setup: create 200 scored candidates with 50 creators and mixed formats.
fn setup_diversity_candidates_200() -> Vec<ScoredCandidate> {
    let formats = ["video", "article", "short", "podcast", "live"];
    (0..200).map(|i| {
        let creator = EntityId::new((i % 50) as u64 + 1000);
        let format = formats[i % formats.len()].to_string();
        ScoredCandidate::with_diversity_metadata(
            EntityId::new(i as u64),
            (200 - i) as f64 * 0.005, // scores from 1.0 to 0.005
            Some(creator),
            Some(format),
        )
    }).collect()
}

/// KEY BENCHMARK: 200 candidates, max_per_creator=2, 50 creators -> select 50.
/// Target: < 1ms.
///
/// This is the primary diversity performance gate from the ROADMAP
/// acceptance criteria.
fn bench_diversity_200_candidates_max_per_creator(c: &mut Criterion) {
    let candidates = setup_diversity_candidates_200();
    let constraints = DiversityConstraints {
        max_per_creator: Some(2),
        ..DiversityConstraints::none()
    };

    c.bench_function("diversity_200_max_per_creator_2", |b| {
        b.iter(|| {
            DiversitySelector::select(
                candidates.clone(),
                &constraints,
                50,
            )
        })
    });
}

/// BENCHMARK: 200 candidates, format_mix=0.6, 5 formats -> select 50.
/// Target: < 1ms.
fn bench_diversity_200_candidates_format_mix(c: &mut Criterion) {
    let candidates = setup_diversity_candidates_200();
    let constraints = DiversityConstraints {
        format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
        ..DiversityConstraints::none()
    };

    c.bench_function("diversity_200_format_mix", |b| {
        b.iter(|| {
            DiversitySelector::select(
                candidates.clone(),
                &constraints,
                50,
            )
        })
    });
}

/// BENCHMARK: 200 candidates, both constraints -> select 50.
/// Target: < 1ms.
fn bench_diversity_200_candidates_combined(c: &mut Criterion) {
    let candidates = setup_diversity_candidates_200();
    let constraints = DiversityConstraints {
        max_per_creator: Some(2),
        format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
        ..DiversityConstraints::none()
    };

    c.bench_function("diversity_200_combined", |b| {
        b.iter(|| {
            DiversitySelector::select(
                candidates.clone(),
                &constraints,
                50,
            )
        })
    });
}

/// BENCHMARK: worst case -- all candidates from 1 creator, requires relaxation.
/// Target: < 1ms even with relaxation passes.
fn bench_diversity_200_candidates_worst_case_relaxation(c: &mut Criterion) {
    let creator = EntityId::new(999);
    let candidates: Vec<ScoredCandidate> = (0..200).map(|i| {
        ScoredCandidate::with_diversity_metadata(
            EntityId::new(i as u64),
            (200 - i) as f64 * 0.005,
            Some(creator.clone()),
            Some("video".to_string()),
        )
    }).collect();

    let constraints = DiversityConstraints {
        max_per_creator: Some(2),
        format_mix_max_fraction: Some(DEFAULT_FORMAT_MIX_MAX_FRACTION),
        ..DiversityConstraints::none()
    };

    c.bench_function("diversity_200_worst_case_relaxation", |b| {
        b.iter(|| {
            DiversitySelector::select(
                candidates.clone(),
                &constraints,
                50,
            )
        })
    });
}

/// BENCHMARK: no constraints (fast path).
fn bench_diversity_200_candidates_no_constraints(c: &mut Criterion) {
    let candidates = setup_diversity_candidates_200();
    let constraints = DiversityConstraints::none();

    c.bench_function("diversity_200_no_constraints", |b| {
        b.iter(|| {
            DiversitySelector::select(
                candidates.clone(),
                &constraints,
                50,
            )
        })
    });
}

// Add to the existing criterion_group!:
//
// criterion_group!(
//     benches,
//     // ... existing m2p3 benchmarks ...
//     bench_diversity_200_candidates_max_per_creator,
//     bench_diversity_200_candidates_format_mix,
//     bench_diversity_200_candidates_combined,
//     bench_diversity_200_candidates_worst_case_relaxation,
//     bench_diversity_200_candidates_no_constraints,
// );

Acceptance Criteria

Property test P1 (prop_max_per_creator_holds): for 10,000 random candidate sets, max_per_creator is never exceeded when constraints are satisfied
Property test P2 (prop_format_mix_holds): for 10,000 random candidate sets, no format exceeds 60% when constraints are satisfied
Property test P3 (prop_no_result_count_reduction_when_possible): for 10,000 random candidate sets, result count equals min(target_count, candidates.len()) -- diversity never reduces count
Property test P4 (prop_graceful_degradation): for 10,000 random single-creator candidate sets, selector returns target count via relaxation and reports constraints_satisfied=false
Property test P5 (prop_score_order_preserved): for 10,000 random candidate sets, items from the same creator appear in score-descending order
Criterion benchmark diversity_200_max_per_creator_2: 200 candidates, 50 creators, max_per_creator=2, select 50 in < 1ms
Criterion benchmark diversity_200_format_mix: 200 candidates, 5 formats, format_mix=0.6, select 50 in < 1ms
Criterion benchmark diversity_200_combined: 200 candidates, both constraints, select 50 in < 1ms
Criterion benchmark diversity_200_worst_case_relaxation: 200 candidates from 1 creator, both constraints, select 50 in < 1ms (measures relaxation overhead)
Criterion benchmark diversity_200_no_constraints: 200 candidates, no constraints (fast path), select 50 (baseline measurement)
Benchmarks added to existing tidal/benches/ranking.rs and registered in criterion_group!
proptest already in [dev-dependencies] from m1p4; no new dependency needed
cargo clippy -- -D warnings passes
All property tests and benchmarks pass

Spec References

docs/specs/09-ranking-scoring.md -- Section 9.5 (INV-RANK-5: diversity never reduces result count), Section 15 (Performance targets: diversity pass < 1ms for 200 candidates)

Implementation Notes

proptest is already a [dev-dependencies] entry from m1p4 property tests. No new dependency needed.
criterion is already configured for tidal/benches/ranking.rs from m2p3. The diversity benchmarks are added to the same file and registered in the existing criterion_group!.
The arb_scored_candidate strategy generates candidates with prop::num::f64::POSITIVE for scores, which includes very small and very large values. This is intentional -- the selector should handle any score distribution.
The candidate deduplication in arb_candidate_set (retain unique entity IDs) is necessary because the selector may use entity IDs as keys in internal data structures. Duplicate entity IDs in the input would be a caller bug, not something the selector needs to handle.
The format fraction tolerance in P2 (+ 0.01) accounts for integer division effects. With 10 items and 6 of one format, the fraction is 0.6 exactly -- but floating point comparison needs a small epsilon.
The candidates.clone() in benchmarks measures the clone cost as part of the benchmark. This is intentional: in production, the diversity selector owns the candidate vec (it is consumed, not borrowed). The clone simulates the real allocation pattern. If the clone dominates benchmark time, switch to iter_batched with a setup closure.
P3 is the critical property test -- it proves INV-RANK-5 (diversity never reduces result count). If this test fails, the relaxation logic has a bug. This is the acceptance gate for the phase.

19 KiB Raw Blame History