jordan 213b8efcca feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-23 22:41:16 -07:00

9.7 KiB

Raw Blame History

Task 01: Scale Benchmark Suite

Delivers

A Criterion benchmark suite operating at 1M items / 100K users / 10K creators that establishes performance baselines for RETRIEVE (for_you, trending, following), SEARCH (hybrid, text-only), and signal write throughput. All subsequent m7p3 tasks use these baselines to measure the impact of their optimizations.

Complexity

Dependencies

m7p1 complete (TidalDb API, entity writes, signal writes, RETRIEVE, SEARCH all operational)
Existing bench files: tidal/benches/query.rs, tidal/benches/search.rs, tidal/benches/signals.rs

Technical Design

1. New bench target: `tidal/benches/scale.rs`

[[bench]]
name = "scale"
harness = false

2. Shared setup harness

The 1M-item universe takes minutes to construct. Build it once per benchmark group using a LazyLock (or std::sync::OnceLock) so all bench functions share the same populated TidalDb instance.

#![allow(clippy::unwrap_used, clippy::cast_precision_loss)]

use std::collections::HashMap;
use std::sync::LazyLock;
use std::time::Duration;

use criterion::{
    Criterion, black_box, criterion_group, criterion_main,
    BenchmarkId, SamplingMode,
};
use tidaldb::TidalDb;
use tidaldb::query::retrieve::Retrieve;
use tidaldb::query::search::Search;
use tidaldb::ranking::diversity::DiversityConstraints;
use tidaldb::schema::{
    DecaySpec, EntityId, EntityKind, SchemaBuilder, TextFieldDef,
    TextFieldType, Timestamp, Window,
};
use tidaldb::storage::indexes::filter::FilterExpr;

const ITEM_COUNT: u64 = 1_000_000;
const USER_COUNT: u64 = 100_000;
const CREATOR_COUNT: u64 = 10_000;

fn scale_schema() -> tidaldb::schema::Schema {
    let mut builder = SchemaBuilder::new();
    for sig in &["view", "like", "share", "skip", "completion"] {
        let _ = builder
            .signal(
                sig,
                EntityKind::Item,
                DecaySpec::Exponential {
                    half_life: Duration::from_secs(7 * 24 * 3600),
                },
            )
            .windows(&[Window::OneHour, Window::TwentyFourHours, Window::SevenDays])
            .velocity(true)
            .add();
    }
    builder.text_field("title", TextFieldType::Text);
    builder.text_field("description", TextFieldType::Text);
    builder.text_field("category", TextFieldType::Keyword);
    builder.build().unwrap()
}

/// Build a TidalDb with 1M items, signal data, text fields, and embeddings.
///
/// Item distribution:
/// - 1M items, each assigned to one of 10K creators (100 items per creator)
/// - category: cycling through 20 categories
/// - title/description: varied vocabulary for realistic BM25 IDF
/// - 10% of items have view signals, 5% have like signals
/// - Embeddings: 128D random unit vectors for ANN (not 1536D -- that would
///   require ~5.7 GB of RAM for vectors alone; 128D is sufficient for
///   benchmark fidelity and uses ~0.5 GB)
fn build_scale_db() -> TidalDb {
    let db = TidalDb::builder()
        .ephemeral()
        .with_schema(scale_schema())
        .open()
        .unwrap();

    let categories = [
        "music", "programming", "cooking", "sports", "science",
        "art", "travel", "history", "math", "philosophy",
        "gaming", "fitness", "photography", "writing", "design",
        "finance", "health", "education", "nature", "technology",
    ];

    let ts = Timestamp::now();

    for i in 0..ITEM_COUNT {
        let mut meta = HashMap::new();
        meta.insert("title".to_string(), format!("Item {i} tutorial guide"));
        meta.insert(
            "description".to_string(),
            format!("A comprehensive guide about topic {} with examples", i % 500),
        );
        let cat = categories[(i % 20) as usize];
        meta.insert("category".to_string(), cat.to_string());
        meta.insert("creator_id".to_string(), (i % CREATOR_COUNT).to_string());

        db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();

        // 10% of items get view signals (spread across the corpus)
        if i % 10 == 0 {
            db.signal("view", EntityId::new(i), 1.0, ts).unwrap();
        }
        // 5% get like signals
        if i % 20 == 0 {
            db.signal("like", EntityId::new(i), 1.0, ts).unwrap();
        }
    }

    // Wait for text syncer to commit, then reload
    std::thread::sleep(Duration::from_secs(3));
    db.reload_text_index().unwrap();

    db
}

static SCALE_DB: LazyLock<TidalDb> = LazyLock::new(build_scale_db);

3. RETRIEVE benchmarks

fn bench_retrieve_for_you_1m(c: &mut Criterion) {
    let db = &*SCALE_DB;
    let mut group = c.benchmark_group("retrieve_1m");
    group.sample_size(10);
    group.measurement_time(Duration::from_secs(30));
    group.sampling_mode(SamplingMode::Flat);

    // for_you: signal-ranked candidates + diversity enforcement
    let for_you = Retrieve::builder()
        .profile("for_you")
        .limit(20)
        .diversity(DiversityConstraints::new().max_per_creator(2))
        .build()
        .unwrap();

    group.bench_function("for_you", |b| {
        b.iter(|| db.retrieve(black_box(&for_you)).unwrap());
    });

    // trending: windowed count ranking, no diversity
    let trending = Retrieve::builder()
        .profile("trending")
        .limit(20)
        .build()
        .unwrap();

    group.bench_function("trending", |b| {
        b.iter(|| db.retrieve(black_box(&trending)).unwrap());
    });

    // new: creation-time sort, category filter (~5% selectivity)
    let new_filtered = Retrieve::builder()
        .profile("new")
        .limit(20)
        .filter(FilterExpr::CategoryEq("programming".into()))
        .build()
        .unwrap();

    group.bench_function("new_filtered", |b| {
        b.iter(|| db.retrieve(black_box(&new_filtered)).unwrap());
    });

    group.finish();
}

4. SEARCH benchmarks

fn bench_search_1m(c: &mut Criterion) {
    let db = &*SCALE_DB;
    let mut group = c.benchmark_group("search_1m");
    group.sample_size(10);
    group.measurement_time(Duration::from_secs(30));
    group.sampling_mode(SamplingMode::Flat);

    // Text-only search (BM25)
    let text_only = Search::builder()
        .query("tutorial guide")
        .limit(20)
        .build()
        .unwrap();

    group.bench_function("text_only", |b| {
        b.iter(|| db.search(black_box(&text_only)).unwrap());
    });

    // Text search with category filter
    let text_filtered = Search::builder()
        .query("tutorial guide")
        .limit(20)
        .filter(FilterExpr::CategoryEq("programming".into()))
        .build()
        .unwrap();

    group.bench_function("text_filtered", |b| {
        b.iter(|| db.search(black_box(&text_filtered)).unwrap());
    });

    group.finish();
}

5. Signal write throughput benchmark

fn bench_signal_write_1m(c: &mut Criterion) {
    let db = &*SCALE_DB;
    let mut group = c.benchmark_group("signal_write_1m");

    // Measure amortized write cost against a pre-populated 1M-item ledger.
    // Use a rotating entity ID to avoid DashMap contention on a single shard.
    let ts = Timestamp::now();
    let mut entity_counter = 0u64;

    group.bench_function("view_write", |b| {
        b.iter(|| {
            let entity_id = EntityId::new(entity_counter % ITEM_COUNT);
            entity_counter += 1;
            db.signal(
                black_box("view"),
                black_box(entity_id),
                black_box(1.0),
                black_box(ts),
            )
            .unwrap();
        });
    });

    group.finish();
}

criterion_group!(
    scale_benches,
    bench_retrieve_for_you_1m,
    bench_search_1m,
    bench_signal_write_1m,
);
criterion_main!(scale_benches);

6. Measurement methodology

Metric	Target	How measured
RETRIEVE for_you p99	< 50ms	`criterion` flat sampling, 10 samples, 30s measurement
RETRIEVE trending p99	< 50ms	Same
SEARCH text-only p99	< 100ms	Same
SEARCH text+filter p99	< 100ms	Same
Signal write amortized	< 100us	`criterion` default sampling, 1000+ iterations

The p99 values are approximated from Criterion's reported [low est, high est] range. If the high est exceeds the target, the benchmark fails.

7. Setup time management

Building a 1M-item TidalDb is expensive. The LazyLock pattern ensures construction happens once. For CI, these benchmarks should be tagged with #[ignore] or gated behind a feature flag so they do not run on every cargo test --lib.

Acceptance Criteria

tidal/benches/scale.rs registered in Cargo.toml as [[bench]] target
cargo bench --manifest-path tidal/Cargo.toml --bench scale runs successfully
RETRIEVE benchmarks at 1M items: for_you, trending, new_filtered all produce valid results
SEARCH benchmarks at 1M items: text_only, text_filtered both return results (non-empty)
Signal write benchmark at 1M items: amortized cost measured and recorded
Baseline numbers documented in docs/profiling/scale-baselines.md
All benchmarks use sample_size(10) and measurement_time(30s) for large-scale tests
LazyLock or equivalent ensures 1M-item DB is built only once per bench run

Test Strategy

This task is itself a test artifact -- the benchmarks are the deliverable. Validation:

Smoke test: Run cargo bench --manifest-path tidal/Cargo.toml --bench scale -- --test to verify benchmarks compile and can execute a single iteration without error.
Result validation: Each benchmark iteration must return a non-empty result set (RETRIEVE: items.len() > 0, SEARCH: items.len() > 0). Assert this inside the b.iter() closure with debug_assert!.
Baseline recording: After the first successful run, record results in docs/profiling/scale-baselines.md with hardware specs, date, and exact Criterion output.

9.7 KiB Raw Blame History