# Task 01: Scale Benchmark Suite ## Delivers A Criterion benchmark suite operating at 1M items / 100K users / 10K creators that establishes performance baselines for RETRIEVE (for_you, trending, following), SEARCH (hybrid, text-only), and signal write throughput. All subsequent m7p3 tasks use these baselines to measure the impact of their optimizations. ## Complexity L ## Dependencies - m7p1 complete (TidalDb API, entity writes, signal writes, RETRIEVE, SEARCH all operational) - Existing bench files: `tidal/benches/query.rs`, `tidal/benches/search.rs`, `tidal/benches/signals.rs` ## Technical Design ### 1. New bench target: `tidal/benches/scale.rs` Register in `Cargo.toml`: ```toml [[bench]] name = "scale" harness = false ``` ### 2. Shared setup harness The 1M-item universe takes minutes to construct. Build it once per benchmark group using a `LazyLock` (or `std::sync::OnceLock`) so all bench functions share the same populated `TidalDb` instance. ```rust #![allow(clippy::unwrap_used, clippy::cast_precision_loss)] use std::collections::HashMap; use std::sync::LazyLock; use std::time::Duration; use criterion::{ Criterion, black_box, criterion_group, criterion_main, BenchmarkId, SamplingMode, }; use tidaldb::TidalDb; use tidaldb::query::retrieve::Retrieve; use tidaldb::query::search::Search; use tidaldb::ranking::diversity::DiversityConstraints; use tidaldb::schema::{ DecaySpec, EntityId, EntityKind, SchemaBuilder, TextFieldDef, TextFieldType, Timestamp, Window, }; use tidaldb::storage::indexes::filter::FilterExpr; const ITEM_COUNT: u64 = 1_000_000; const USER_COUNT: u64 = 100_000; const CREATOR_COUNT: u64 = 10_000; fn scale_schema() -> tidaldb::schema::Schema { let mut builder = SchemaBuilder::new(); for sig in &["view", "like", "share", "skip", "completion"] { let _ = builder .signal( sig, EntityKind::Item, DecaySpec::Exponential { half_life: Duration::from_secs(7 * 24 * 3600), }, ) .windows(&[Window::OneHour, Window::TwentyFourHours, Window::SevenDays]) .velocity(true) .add(); } builder.text_field("title", TextFieldType::Text); builder.text_field("description", TextFieldType::Text); builder.text_field("category", TextFieldType::Keyword); builder.build().unwrap() } /// Build a TidalDb with 1M items, signal data, text fields, and embeddings. /// /// Item distribution: /// - 1M items, each assigned to one of 10K creators (100 items per creator) /// - category: cycling through 20 categories /// - title/description: varied vocabulary for realistic BM25 IDF /// - 10% of items have view signals, 5% have like signals /// - Embeddings: 128D random unit vectors for ANN (not 1536D -- that would /// require ~5.7 GB of RAM for vectors alone; 128D is sufficient for /// benchmark fidelity and uses ~0.5 GB) fn build_scale_db() -> TidalDb { let db = TidalDb::builder() .ephemeral() .with_schema(scale_schema()) .open() .unwrap(); let categories = [ "music", "programming", "cooking", "sports", "science", "art", "travel", "history", "math", "philosophy", "gaming", "fitness", "photography", "writing", "design", "finance", "health", "education", "nature", "technology", ]; let ts = Timestamp::now(); for i in 0..ITEM_COUNT { let mut meta = HashMap::new(); meta.insert("title".to_string(), format!("Item {i} tutorial guide")); meta.insert( "description".to_string(), format!("A comprehensive guide about topic {} with examples", i % 500), ); let cat = categories[(i % 20) as usize]; meta.insert("category".to_string(), cat.to_string()); meta.insert("creator_id".to_string(), (i % CREATOR_COUNT).to_string()); db.write_item_with_metadata(EntityId::new(i), &meta).unwrap(); // 10% of items get view signals (spread across the corpus) if i % 10 == 0 { db.signal("view", EntityId::new(i), 1.0, ts).unwrap(); } // 5% get like signals if i % 20 == 0 { db.signal("like", EntityId::new(i), 1.0, ts).unwrap(); } } // Wait for text syncer to commit, then reload std::thread::sleep(Duration::from_secs(3)); db.reload_text_index().unwrap(); db } static SCALE_DB: LazyLock = LazyLock::new(build_scale_db); ``` ### 3. RETRIEVE benchmarks ```rust fn bench_retrieve_for_you_1m(c: &mut Criterion) { let db = &*SCALE_DB; let mut group = c.benchmark_group("retrieve_1m"); group.sample_size(10); group.measurement_time(Duration::from_secs(30)); group.sampling_mode(SamplingMode::Flat); // for_you: signal-ranked candidates + diversity enforcement let for_you = Retrieve::builder() .profile("for_you") .limit(20) .diversity(DiversityConstraints::new().max_per_creator(2)) .build() .unwrap(); group.bench_function("for_you", |b| { b.iter(|| db.retrieve(black_box(&for_you)).unwrap()); }); // trending: windowed count ranking, no diversity let trending = Retrieve::builder() .profile("trending") .limit(20) .build() .unwrap(); group.bench_function("trending", |b| { b.iter(|| db.retrieve(black_box(&trending)).unwrap()); }); // new: creation-time sort, category filter (~5% selectivity) let new_filtered = Retrieve::builder() .profile("new") .limit(20) .filter(FilterExpr::CategoryEq("programming".into())) .build() .unwrap(); group.bench_function("new_filtered", |b| { b.iter(|| db.retrieve(black_box(&new_filtered)).unwrap()); }); group.finish(); } ``` ### 4. SEARCH benchmarks ```rust fn bench_search_1m(c: &mut Criterion) { let db = &*SCALE_DB; let mut group = c.benchmark_group("search_1m"); group.sample_size(10); group.measurement_time(Duration::from_secs(30)); group.sampling_mode(SamplingMode::Flat); // Text-only search (BM25) let text_only = Search::builder() .query("tutorial guide") .limit(20) .build() .unwrap(); group.bench_function("text_only", |b| { b.iter(|| db.search(black_box(&text_only)).unwrap()); }); // Text search with category filter let text_filtered = Search::builder() .query("tutorial guide") .limit(20) .filter(FilterExpr::CategoryEq("programming".into())) .build() .unwrap(); group.bench_function("text_filtered", |b| { b.iter(|| db.search(black_box(&text_filtered)).unwrap()); }); group.finish(); } ``` ### 5. Signal write throughput benchmark ```rust fn bench_signal_write_1m(c: &mut Criterion) { let db = &*SCALE_DB; let mut group = c.benchmark_group("signal_write_1m"); // Measure amortized write cost against a pre-populated 1M-item ledger. // Use a rotating entity ID to avoid DashMap contention on a single shard. let ts = Timestamp::now(); let mut entity_counter = 0u64; group.bench_function("view_write", |b| { b.iter(|| { let entity_id = EntityId::new(entity_counter % ITEM_COUNT); entity_counter += 1; db.signal( black_box("view"), black_box(entity_id), black_box(1.0), black_box(ts), ) .unwrap(); }); }); group.finish(); } criterion_group!( scale_benches, bench_retrieve_for_you_1m, bench_search_1m, bench_signal_write_1m, ); criterion_main!(scale_benches); ``` ### 6. Measurement methodology | Metric | Target | How measured | |--------|--------|-------------| | RETRIEVE for_you p99 | < 50ms | `criterion` flat sampling, 10 samples, 30s measurement | | RETRIEVE trending p99 | < 50ms | Same | | SEARCH text-only p99 | < 100ms | Same | | SEARCH text+filter p99 | < 100ms | Same | | Signal write amortized | < 100us | `criterion` default sampling, 1000+ iterations | The p99 values are approximated from Criterion's reported `[low est, high est]` range. If the `high est` exceeds the target, the benchmark fails. ### 7. Setup time management Building a 1M-item TidalDb is expensive. The `LazyLock` pattern ensures construction happens once. For CI, these benchmarks should be tagged with `#[ignore]` or gated behind a feature flag so they do not run on every `cargo test --lib`. ## Acceptance Criteria - [ ] `tidal/benches/scale.rs` registered in `Cargo.toml` as `[[bench]]` target - [ ] `cargo bench --manifest-path tidal/Cargo.toml --bench scale` runs successfully - [ ] RETRIEVE benchmarks at 1M items: for_you, trending, new_filtered all produce valid results - [ ] SEARCH benchmarks at 1M items: text_only, text_filtered both return results (non-empty) - [ ] Signal write benchmark at 1M items: amortized cost measured and recorded - [ ] Baseline numbers documented in `docs/profiling/scale-baselines.md` - [ ] All benchmarks use `sample_size(10)` and `measurement_time(30s)` for large-scale tests - [ ] LazyLock or equivalent ensures 1M-item DB is built only once per bench run ## Test Strategy This task is itself a test artifact -- the benchmarks are the deliverable. Validation: 1. **Smoke test:** Run `cargo bench --manifest-path tidal/Cargo.toml --bench scale -- --test` to verify benchmarks compile and can execute a single iteration without error. 2. **Result validation:** Each benchmark iteration must return a non-empty result set (RETRIEVE: items.len() > 0, SEARCH: items.len() > 0). Assert this inside the `b.iter()` closure with `debug_assert!`. 3. **Baseline recording:** After the first successful run, record results in `docs/profiling/scale-baselines.md` with hardware specs, date, and exact Criterion output.