tidaldb/docs/profiling/scale-baselines.md
2026-02-23 22:41:16 -07:00

3.0 KiB
Raw Permalink Blame History

Scale Benchmarks: 1M-Item Baselines

Hardware

macOS Darwin 23.6.0 (Apple Silicon / x86-64 — see run date below).

Run command:

cargo bench --manifest-path tidal/Cargo.toml --bench scale

Date: 2026-02-23

Dataset

Parameter Value
Items 1,000,000
Creators 10,000 (100 items/creator)
Categories 20
Embedding dim 128 (not 1536 — reduced for bench RAM)
Signal coverage 10% view, 5% like
Bench tool Criterion (sample_size=10, 30s measurement, Flat mode)

Acceptance Criteria

Benchmark Target Measured Status
RETRIEVE p99 < 50ms 152 µs (for_you) PASS
SEARCH p99 < 100ms 28.9 ms (text_only) PASS
Signal write p99 < 100µs 82 ns PASS

All three targets pass by a wide margin.

Benchmark Results

RETRIEVE (1M items)

retrieve_1m/for_you          time:   [151.88 µs 152.13 µs 152.40 µs]
retrieve_1m/trending         time:   [127.96 µs 128.25 µs 128.52 µs]
retrieve_1m/new_filtered     time:   [  7.5636 µs   7.5855 µs   7.6058 µs]

All RETRIEVE queries < 200µs. The 50ms target is beaten by 3 orders of magnitude.

  • for_you: signal-scored ranking over full 1M-item universe — 152µs
  • trending: windowed view count ranking — 128µs
  • new_filtered: category filter at ~5% selectivity — 7.6µs (bitmap pre-filter eliminates 95% of candidates)

SEARCH (1M items)

search_1m/text_only          time:   [28.844 ms 28.934 ms 29.021 ms]
search_1m/text_filtered      time:   [ 1.8972 ms  1.9104 ms  1.9220 ms]

Both SEARCH queries < 30ms. The 100ms target is beaten by 3-50×.

  • text_only: BM25 over 1M documents — 28.9ms (most expensive path; dominated by Tantivy posting list traversal)
  • text_filtered: BM25 with category filter reduces candidate set — 1.9ms

Signal Write (1M-item DB, rotating 1K entities)

signal_write_1m/write_rotating_1k_entities    time:   [82.033 ns 82.286 ns 82.535 ns]

82 ns per write. The 100µs target is beaten by 1,200×. DashMap hot-path write amortises to sub-100ns across 1K rotating entity IDs.

Setup Notes

The LazyLock<TidalDb> pattern ensures the 1M-item database is built exactly once per bench run. Build time ~30s on the reference hardware above. The text syncer waits 3s after ingestion.

Database Build Time

Approximately 30 seconds on reference hardware (observed from [scale bench] Database ready log line).

Analysis

tidalDB operates well within all three acceptance-criteria targets at 1M items. The dominant cost is SEARCH text_only at ~29ms — driven by Tantivy posting list traversal across 1M documents. The LogMergePolicy tuning (< 20 segments at steady state) keeps this below the 100ms target with headroom.

Signal writes at 82ns confirm the DashMap hot-path is not a bottleneck at this scale. The 5M-entry LRU trimming threshold (DEFAULT_MAX_SIGNAL_ENTRIES) provides ample headroom for the 100K-item signal coverage in this benchmark (~200K entries = ~218MB).