295 lines
9.7 KiB
Markdown
295 lines
9.7 KiB
Markdown
# Task 01: Scale Benchmark Suite
|
|
|
|
## Delivers
|
|
|
|
A Criterion benchmark suite operating at 1M items / 100K users / 10K creators that establishes performance baselines for RETRIEVE (for_you, trending, following), SEARCH (hybrid, text-only), and signal write throughput. All subsequent m7p3 tasks use these baselines to measure the impact of their optimizations.
|
|
|
|
## Complexity
|
|
|
|
L
|
|
|
|
## Dependencies
|
|
|
|
- m7p1 complete (TidalDb API, entity writes, signal writes, RETRIEVE, SEARCH all operational)
|
|
- Existing bench files: `tidal/benches/query.rs`, `tidal/benches/search.rs`, `tidal/benches/signals.rs`
|
|
|
|
## Technical Design
|
|
|
|
### 1. New bench target: `tidal/benches/scale.rs`
|
|
|
|
Register in `Cargo.toml`:
|
|
|
|
```toml
|
|
[[bench]]
|
|
name = "scale"
|
|
harness = false
|
|
```
|
|
|
|
### 2. Shared setup harness
|
|
|
|
The 1M-item universe takes minutes to construct. Build it once per benchmark group using a `LazyLock` (or `std::sync::OnceLock`) so all bench functions share the same populated `TidalDb` instance.
|
|
|
|
```rust
|
|
#![allow(clippy::unwrap_used, clippy::cast_precision_loss)]
|
|
|
|
use std::collections::HashMap;
|
|
use std::sync::LazyLock;
|
|
use std::time::Duration;
|
|
|
|
use criterion::{
|
|
Criterion, black_box, criterion_group, criterion_main,
|
|
BenchmarkId, SamplingMode,
|
|
};
|
|
use tidaldb::TidalDb;
|
|
use tidaldb::query::retrieve::Retrieve;
|
|
use tidaldb::query::search::Search;
|
|
use tidaldb::ranking::diversity::DiversityConstraints;
|
|
use tidaldb::schema::{
|
|
DecaySpec, EntityId, EntityKind, SchemaBuilder, TextFieldDef,
|
|
TextFieldType, Timestamp, Window,
|
|
};
|
|
use tidaldb::storage::indexes::filter::FilterExpr;
|
|
|
|
const ITEM_COUNT: u64 = 1_000_000;
|
|
const USER_COUNT: u64 = 100_000;
|
|
const CREATOR_COUNT: u64 = 10_000;
|
|
|
|
fn scale_schema() -> tidaldb::schema::Schema {
|
|
let mut builder = SchemaBuilder::new();
|
|
for sig in &["view", "like", "share", "skip", "completion"] {
|
|
let _ = builder
|
|
.signal(
|
|
sig,
|
|
EntityKind::Item,
|
|
DecaySpec::Exponential {
|
|
half_life: Duration::from_secs(7 * 24 * 3600),
|
|
},
|
|
)
|
|
.windows(&[Window::OneHour, Window::TwentyFourHours, Window::SevenDays])
|
|
.velocity(true)
|
|
.add();
|
|
}
|
|
builder.text_field("title", TextFieldType::Text);
|
|
builder.text_field("description", TextFieldType::Text);
|
|
builder.text_field("category", TextFieldType::Keyword);
|
|
builder.build().unwrap()
|
|
}
|
|
|
|
/// Build a TidalDb with 1M items, signal data, text fields, and embeddings.
|
|
///
|
|
/// Item distribution:
|
|
/// - 1M items, each assigned to one of 10K creators (100 items per creator)
|
|
/// - category: cycling through 20 categories
|
|
/// - title/description: varied vocabulary for realistic BM25 IDF
|
|
/// - 10% of items have view signals, 5% have like signals
|
|
/// - Embeddings: 128D random unit vectors for ANN (not 1536D -- that would
|
|
/// require ~5.7 GB of RAM for vectors alone; 128D is sufficient for
|
|
/// benchmark fidelity and uses ~0.5 GB)
|
|
fn build_scale_db() -> TidalDb {
|
|
let db = TidalDb::builder()
|
|
.ephemeral()
|
|
.with_schema(scale_schema())
|
|
.open()
|
|
.unwrap();
|
|
|
|
let categories = [
|
|
"music", "programming", "cooking", "sports", "science",
|
|
"art", "travel", "history", "math", "philosophy",
|
|
"gaming", "fitness", "photography", "writing", "design",
|
|
"finance", "health", "education", "nature", "technology",
|
|
];
|
|
|
|
let ts = Timestamp::now();
|
|
|
|
for i in 0..ITEM_COUNT {
|
|
let mut meta = HashMap::new();
|
|
meta.insert("title".to_string(), format!("Item {i} tutorial guide"));
|
|
meta.insert(
|
|
"description".to_string(),
|
|
format!("A comprehensive guide about topic {} with examples", i % 500),
|
|
);
|
|
let cat = categories[(i % 20) as usize];
|
|
meta.insert("category".to_string(), cat.to_string());
|
|
meta.insert("creator_id".to_string(), (i % CREATOR_COUNT).to_string());
|
|
|
|
db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
|
|
|
|
// 10% of items get view signals (spread across the corpus)
|
|
if i % 10 == 0 {
|
|
db.signal("view", EntityId::new(i), 1.0, ts).unwrap();
|
|
}
|
|
// 5% get like signals
|
|
if i % 20 == 0 {
|
|
db.signal("like", EntityId::new(i), 1.0, ts).unwrap();
|
|
}
|
|
}
|
|
|
|
// Wait for text syncer to commit, then reload
|
|
std::thread::sleep(Duration::from_secs(3));
|
|
db.reload_text_index().unwrap();
|
|
|
|
db
|
|
}
|
|
|
|
static SCALE_DB: LazyLock<TidalDb> = LazyLock::new(build_scale_db);
|
|
```
|
|
|
|
### 3. RETRIEVE benchmarks
|
|
|
|
```rust
|
|
fn bench_retrieve_for_you_1m(c: &mut Criterion) {
|
|
let db = &*SCALE_DB;
|
|
let mut group = c.benchmark_group("retrieve_1m");
|
|
group.sample_size(10);
|
|
group.measurement_time(Duration::from_secs(30));
|
|
group.sampling_mode(SamplingMode::Flat);
|
|
|
|
// for_you: signal-ranked candidates + diversity enforcement
|
|
let for_you = Retrieve::builder()
|
|
.profile("for_you")
|
|
.limit(20)
|
|
.diversity(DiversityConstraints::new().max_per_creator(2))
|
|
.build()
|
|
.unwrap();
|
|
|
|
group.bench_function("for_you", |b| {
|
|
b.iter(|| db.retrieve(black_box(&for_you)).unwrap());
|
|
});
|
|
|
|
// trending: windowed count ranking, no diversity
|
|
let trending = Retrieve::builder()
|
|
.profile("trending")
|
|
.limit(20)
|
|
.build()
|
|
.unwrap();
|
|
|
|
group.bench_function("trending", |b| {
|
|
b.iter(|| db.retrieve(black_box(&trending)).unwrap());
|
|
});
|
|
|
|
// new: creation-time sort, category filter (~5% selectivity)
|
|
let new_filtered = Retrieve::builder()
|
|
.profile("new")
|
|
.limit(20)
|
|
.filter(FilterExpr::CategoryEq("programming".into()))
|
|
.build()
|
|
.unwrap();
|
|
|
|
group.bench_function("new_filtered", |b| {
|
|
b.iter(|| db.retrieve(black_box(&new_filtered)).unwrap());
|
|
});
|
|
|
|
group.finish();
|
|
}
|
|
```
|
|
|
|
### 4. SEARCH benchmarks
|
|
|
|
```rust
|
|
fn bench_search_1m(c: &mut Criterion) {
|
|
let db = &*SCALE_DB;
|
|
let mut group = c.benchmark_group("search_1m");
|
|
group.sample_size(10);
|
|
group.measurement_time(Duration::from_secs(30));
|
|
group.sampling_mode(SamplingMode::Flat);
|
|
|
|
// Text-only search (BM25)
|
|
let text_only = Search::builder()
|
|
.query("tutorial guide")
|
|
.limit(20)
|
|
.build()
|
|
.unwrap();
|
|
|
|
group.bench_function("text_only", |b| {
|
|
b.iter(|| db.search(black_box(&text_only)).unwrap());
|
|
});
|
|
|
|
// Text search with category filter
|
|
let text_filtered = Search::builder()
|
|
.query("tutorial guide")
|
|
.limit(20)
|
|
.filter(FilterExpr::CategoryEq("programming".into()))
|
|
.build()
|
|
.unwrap();
|
|
|
|
group.bench_function("text_filtered", |b| {
|
|
b.iter(|| db.search(black_box(&text_filtered)).unwrap());
|
|
});
|
|
|
|
group.finish();
|
|
}
|
|
```
|
|
|
|
### 5. Signal write throughput benchmark
|
|
|
|
```rust
|
|
fn bench_signal_write_1m(c: &mut Criterion) {
|
|
let db = &*SCALE_DB;
|
|
let mut group = c.benchmark_group("signal_write_1m");
|
|
|
|
// Measure amortized write cost against a pre-populated 1M-item ledger.
|
|
// Use a rotating entity ID to avoid DashMap contention on a single shard.
|
|
let ts = Timestamp::now();
|
|
let mut entity_counter = 0u64;
|
|
|
|
group.bench_function("view_write", |b| {
|
|
b.iter(|| {
|
|
let entity_id = EntityId::new(entity_counter % ITEM_COUNT);
|
|
entity_counter += 1;
|
|
db.signal(
|
|
black_box("view"),
|
|
black_box(entity_id),
|
|
black_box(1.0),
|
|
black_box(ts),
|
|
)
|
|
.unwrap();
|
|
});
|
|
});
|
|
|
|
group.finish();
|
|
}
|
|
|
|
criterion_group!(
|
|
scale_benches,
|
|
bench_retrieve_for_you_1m,
|
|
bench_search_1m,
|
|
bench_signal_write_1m,
|
|
);
|
|
criterion_main!(scale_benches);
|
|
```
|
|
|
|
### 6. Measurement methodology
|
|
|
|
| Metric | Target | How measured |
|
|
|--------|--------|-------------|
|
|
| RETRIEVE for_you p99 | < 50ms | `criterion` flat sampling, 10 samples, 30s measurement |
|
|
| RETRIEVE trending p99 | < 50ms | Same |
|
|
| SEARCH text-only p99 | < 100ms | Same |
|
|
| SEARCH text+filter p99 | < 100ms | Same |
|
|
| Signal write amortized | < 100us | `criterion` default sampling, 1000+ iterations |
|
|
|
|
The p99 values are approximated from Criterion's reported `[low est, high est]` range. If the `high est` exceeds the target, the benchmark fails.
|
|
|
|
### 7. Setup time management
|
|
|
|
Building a 1M-item TidalDb is expensive. The `LazyLock` pattern ensures construction happens once. For CI, these benchmarks should be tagged with `#[ignore]` or gated behind a feature flag so they do not run on every `cargo test --lib`.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `tidal/benches/scale.rs` registered in `Cargo.toml` as `[[bench]]` target
|
|
- [ ] `cargo bench --manifest-path tidal/Cargo.toml --bench scale` runs successfully
|
|
- [ ] RETRIEVE benchmarks at 1M items: for_you, trending, new_filtered all produce valid results
|
|
- [ ] SEARCH benchmarks at 1M items: text_only, text_filtered both return results (non-empty)
|
|
- [ ] Signal write benchmark at 1M items: amortized cost measured and recorded
|
|
- [ ] Baseline numbers documented in `docs/profiling/scale-baselines.md`
|
|
- [ ] All benchmarks use `sample_size(10)` and `measurement_time(30s)` for large-scale tests
|
|
- [ ] LazyLock or equivalent ensures 1M-item DB is built only once per bench run
|
|
|
|
## Test Strategy
|
|
|
|
This task is itself a test artifact -- the benchmarks are the deliverable. Validation:
|
|
|
|
1. **Smoke test:** Run `cargo bench --manifest-path tidal/Cargo.toml --bench scale -- --test` to verify benchmarks compile and can execute a single iteration without error.
|
|
2. **Result validation:** Each benchmark iteration must return a non-empty result set (RETRIEVE: items.len() > 0, SEARCH: items.len() > 0). Assert this inside the `b.iter()` closure with `debug_assert!`.
|
|
3. **Baseline recording:** After the first successful run, record results in `docs/profiling/scale-baselines.md` with hardware specs, date, and exact Criterion output.
|