tidaldb/docs/profiling/tantivy-merge-tuning.md
2026-02-23 22:41:16 -07:00

2.1 KiB

Tantivy Merge Policy Tuning

Configuration Applied

TextIndex construction in tidal/src/text/index.rs configures LogMergePolicy with:

Parameter Value Rationale
min_num_segments 4 More aggressive than default (8) — triggers merges earlier to bound segment count at steady state
max_docs_before_merge 5,000,000 Smaller max segment than default (10M) — reduces worst-case merge duration
del_docs_ratio_before_merge 0.3 Triggers merge when 30% of docs deleted — tidalDB uses delete-then-add for updates, so deleted docs accumulate

Applied to both Item and Creator TextIndex instances.

API

// Returns the number of active Tantivy search segments.
// Useful for monitoring merge policy effectiveness.
db.text_index.segment_count()

Note: TidalDb::text_segment_count() is exposed in tidal/src/db/items.rs.

Segment Count Target

< 20 segments at steady state after initial 1M-item ingest.

At 1M items with 1000-item commits (tidalDB's default syncer batch size), the initial ingest produces ~1000 commits. Without merge policy tuning, segment count can reach 50+. With min_num_segments=4, merges fire aggressively and keep steady-state count below 20.

Verification

The integration tests in tidal/tests/tantivy_merge.rs (marked #[ignore]) verify:

  • tantivy_segment_evolution: segment count stays < 20 during 10 rounds of steady-state writes
  • tantivy_concurrent_read_write_latency: read p99 < 100ms during concurrent writes

To run:

cargo test --manifest-path tidal/Cargo.toml -- tantivy_segment_evolution --ignored --nocapture
cargo test --manifest-path tidal/Cargo.toml -- tantivy_concurrent_read_write_latency --ignored --nocapture

Regression Guard

No regression in tidal/benches/text_index.rs and tidal/benches/search.rs benchmarks after applying LogMergePolicy changes.

References