2.1 KiB
2.1 KiB
Tantivy Merge Policy Tuning
Configuration Applied
TextIndex construction in tidal/src/text/index.rs configures LogMergePolicy with:
| Parameter | Value | Rationale |
|---|---|---|
min_num_segments |
4 | More aggressive than default (8) — triggers merges earlier to bound segment count at steady state |
max_docs_before_merge |
5,000,000 | Smaller max segment than default (10M) — reduces worst-case merge duration |
del_docs_ratio_before_merge |
0.3 | Triggers merge when 30% of docs deleted — tidalDB uses delete-then-add for updates, so deleted docs accumulate |
Applied to both Item and Creator TextIndex instances.
API
// Returns the number of active Tantivy search segments.
// Useful for monitoring merge policy effectiveness.
db.text_index.segment_count()
Note: TidalDb::text_segment_count() is exposed in tidal/src/db/items.rs.
Segment Count Target
< 20 segments at steady state after initial 1M-item ingest.
At 1M items with 1000-item commits (tidalDB's default syncer batch size), the initial
ingest produces ~1000 commits. Without merge policy tuning, segment count can reach 50+.
With min_num_segments=4, merges fire aggressively and keep steady-state count below 20.
Verification
The integration tests in tidal/tests/tantivy_merge.rs (marked #[ignore]) verify:
tantivy_segment_evolution: segment count stays < 20 during 10 rounds of steady-state writestantivy_concurrent_read_write_latency: read p99 < 100ms during concurrent writes
To run:
cargo test --manifest-path tidal/Cargo.toml -- tantivy_segment_evolution --ignored --nocapture
cargo test --manifest-path tidal/Cargo.toml -- tantivy_concurrent_read_write_latency --ignored --nocapture
Regression Guard
No regression in tidal/benches/text_index.rs and tidal/benches/search.rs benchmarks after
applying LogMergePolicy changes.
References
docs/research/tantivy.md— LogMergePolicy background and parameter guidance- Tantivy docs: https://docs.rs/tantivy/latest/tantivy/merge_policy/struct.LogMergePolicy.html