9.7 KiB
Task 01: Scale Benchmark Suite
Delivers
A Criterion benchmark suite operating at 1M items / 100K users / 10K creators that establishes performance baselines for RETRIEVE (for_you, trending, following), SEARCH (hybrid, text-only), and signal write throughput. All subsequent m7p3 tasks use these baselines to measure the impact of their optimizations.
Complexity
L
Dependencies
- m7p1 complete (TidalDb API, entity writes, signal writes, RETRIEVE, SEARCH all operational)
- Existing bench files:
tidal/benches/query.rs,tidal/benches/search.rs,tidal/benches/signals.rs
Technical Design
1. New bench target: tidal/benches/scale.rs
Register in Cargo.toml:
[[bench]]
name = "scale"
harness = false
2. Shared setup harness
The 1M-item universe takes minutes to construct. Build it once per benchmark group using a LazyLock (or std::sync::OnceLock) so all bench functions share the same populated TidalDb instance.
#![allow(clippy::unwrap_used, clippy::cast_precision_loss)]
use std::collections::HashMap;
use std::sync::LazyLock;
use std::time::Duration;
use criterion::{
Criterion, black_box, criterion_group, criterion_main,
BenchmarkId, SamplingMode,
};
use tidaldb::TidalDb;
use tidaldb::query::retrieve::Retrieve;
use tidaldb::query::search::Search;
use tidaldb::ranking::diversity::DiversityConstraints;
use tidaldb::schema::{
DecaySpec, EntityId, EntityKind, SchemaBuilder, TextFieldDef,
TextFieldType, Timestamp, Window,
};
use tidaldb::storage::indexes::filter::FilterExpr;
const ITEM_COUNT: u64 = 1_000_000;
const USER_COUNT: u64 = 100_000;
const CREATOR_COUNT: u64 = 10_000;
fn scale_schema() -> tidaldb::schema::Schema {
let mut builder = SchemaBuilder::new();
for sig in &["view", "like", "share", "skip", "completion"] {
let _ = builder
.signal(
sig,
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(7 * 24 * 3600),
},
)
.windows(&[Window::OneHour, Window::TwentyFourHours, Window::SevenDays])
.velocity(true)
.add();
}
builder.text_field("title", TextFieldType::Text);
builder.text_field("description", TextFieldType::Text);
builder.text_field("category", TextFieldType::Keyword);
builder.build().unwrap()
}
/// Build a TidalDb with 1M items, signal data, text fields, and embeddings.
///
/// Item distribution:
/// - 1M items, each assigned to one of 10K creators (100 items per creator)
/// - category: cycling through 20 categories
/// - title/description: varied vocabulary for realistic BM25 IDF
/// - 10% of items have view signals, 5% have like signals
/// - Embeddings: 128D random unit vectors for ANN (not 1536D -- that would
/// require ~5.7 GB of RAM for vectors alone; 128D is sufficient for
/// benchmark fidelity and uses ~0.5 GB)
fn build_scale_db() -> TidalDb {
let db = TidalDb::builder()
.ephemeral()
.with_schema(scale_schema())
.open()
.unwrap();
let categories = [
"music", "programming", "cooking", "sports", "science",
"art", "travel", "history", "math", "philosophy",
"gaming", "fitness", "photography", "writing", "design",
"finance", "health", "education", "nature", "technology",
];
let ts = Timestamp::now();
for i in 0..ITEM_COUNT {
let mut meta = HashMap::new();
meta.insert("title".to_string(), format!("Item {i} tutorial guide"));
meta.insert(
"description".to_string(),
format!("A comprehensive guide about topic {} with examples", i % 500),
);
let cat = categories[(i % 20) as usize];
meta.insert("category".to_string(), cat.to_string());
meta.insert("creator_id".to_string(), (i % CREATOR_COUNT).to_string());
db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
// 10% of items get view signals (spread across the corpus)
if i % 10 == 0 {
db.signal("view", EntityId::new(i), 1.0, ts).unwrap();
}
// 5% get like signals
if i % 20 == 0 {
db.signal("like", EntityId::new(i), 1.0, ts).unwrap();
}
}
// Wait for text syncer to commit, then reload
std::thread::sleep(Duration::from_secs(3));
db.reload_text_index().unwrap();
db
}
static SCALE_DB: LazyLock<TidalDb> = LazyLock::new(build_scale_db);
3. RETRIEVE benchmarks
fn bench_retrieve_for_you_1m(c: &mut Criterion) {
let db = &*SCALE_DB;
let mut group = c.benchmark_group("retrieve_1m");
group.sample_size(10);
group.measurement_time(Duration::from_secs(30));
group.sampling_mode(SamplingMode::Flat);
// for_you: signal-ranked candidates + diversity enforcement
let for_you = Retrieve::builder()
.profile("for_you")
.limit(20)
.diversity(DiversityConstraints::new().max_per_creator(2))
.build()
.unwrap();
group.bench_function("for_you", |b| {
b.iter(|| db.retrieve(black_box(&for_you)).unwrap());
});
// trending: windowed count ranking, no diversity
let trending = Retrieve::builder()
.profile("trending")
.limit(20)
.build()
.unwrap();
group.bench_function("trending", |b| {
b.iter(|| db.retrieve(black_box(&trending)).unwrap());
});
// new: creation-time sort, category filter (~5% selectivity)
let new_filtered = Retrieve::builder()
.profile("new")
.limit(20)
.filter(FilterExpr::CategoryEq("programming".into()))
.build()
.unwrap();
group.bench_function("new_filtered", |b| {
b.iter(|| db.retrieve(black_box(&new_filtered)).unwrap());
});
group.finish();
}
4. SEARCH benchmarks
fn bench_search_1m(c: &mut Criterion) {
let db = &*SCALE_DB;
let mut group = c.benchmark_group("search_1m");
group.sample_size(10);
group.measurement_time(Duration::from_secs(30));
group.sampling_mode(SamplingMode::Flat);
// Text-only search (BM25)
let text_only = Search::builder()
.query("tutorial guide")
.limit(20)
.build()
.unwrap();
group.bench_function("text_only", |b| {
b.iter(|| db.search(black_box(&text_only)).unwrap());
});
// Text search with category filter
let text_filtered = Search::builder()
.query("tutorial guide")
.limit(20)
.filter(FilterExpr::CategoryEq("programming".into()))
.build()
.unwrap();
group.bench_function("text_filtered", |b| {
b.iter(|| db.search(black_box(&text_filtered)).unwrap());
});
group.finish();
}
5. Signal write throughput benchmark
fn bench_signal_write_1m(c: &mut Criterion) {
let db = &*SCALE_DB;
let mut group = c.benchmark_group("signal_write_1m");
// Measure amortized write cost against a pre-populated 1M-item ledger.
// Use a rotating entity ID to avoid DashMap contention on a single shard.
let ts = Timestamp::now();
let mut entity_counter = 0u64;
group.bench_function("view_write", |b| {
b.iter(|| {
let entity_id = EntityId::new(entity_counter % ITEM_COUNT);
entity_counter += 1;
db.signal(
black_box("view"),
black_box(entity_id),
black_box(1.0),
black_box(ts),
)
.unwrap();
});
});
group.finish();
}
criterion_group!(
scale_benches,
bench_retrieve_for_you_1m,
bench_search_1m,
bench_signal_write_1m,
);
criterion_main!(scale_benches);
6. Measurement methodology
| Metric | Target | How measured |
|---|---|---|
| RETRIEVE for_you p99 | < 50ms | criterion flat sampling, 10 samples, 30s measurement |
| RETRIEVE trending p99 | < 50ms | Same |
| SEARCH text-only p99 | < 100ms | Same |
| SEARCH text+filter p99 | < 100ms | Same |
| Signal write amortized | < 100us | criterion default sampling, 1000+ iterations |
The p99 values are approximated from Criterion's reported [low est, high est] range. If the high est exceeds the target, the benchmark fails.
7. Setup time management
Building a 1M-item TidalDb is expensive. The LazyLock pattern ensures construction happens once. For CI, these benchmarks should be tagged with #[ignore] or gated behind a feature flag so they do not run on every cargo test --lib.
Acceptance Criteria
tidal/benches/scale.rsregistered inCargo.tomlas[[bench]]targetcargo bench --manifest-path tidal/Cargo.toml --bench scaleruns successfully- RETRIEVE benchmarks at 1M items: for_you, trending, new_filtered all produce valid results
- SEARCH benchmarks at 1M items: text_only, text_filtered both return results (non-empty)
- Signal write benchmark at 1M items: amortized cost measured and recorded
- Baseline numbers documented in
docs/profiling/scale-baselines.md - All benchmarks use
sample_size(10)andmeasurement_time(30s)for large-scale tests - LazyLock or equivalent ensures 1M-item DB is built only once per bench run
Test Strategy
This task is itself a test artifact -- the benchmarks are the deliverable. Validation:
- Smoke test: Run
cargo bench --manifest-path tidal/Cargo.toml --bench scale -- --testto verify benchmarks compile and can execute a single iteration without error. - Result validation: Each benchmark iteration must return a non-empty result set (RETRIEVE: items.len() > 0, SEARCH: items.len() > 0). Assert this inside the
b.iter()closure withdebug_assert!. - Baseline recording: After the first successful run, record results in
docs/profiling/scale-baselines.mdwith hardware specs, date, and exact Criterion output.