tidaldb/docs/profiling/tantivy-merge-tuning.md
2026-02-23 22:41:16 -07:00

55 lines
2.1 KiB
Markdown

# Tantivy Merge Policy Tuning
## Configuration Applied
`TextIndex` construction in `tidal/src/text/index.rs` configures `LogMergePolicy` with:
| Parameter | Value | Rationale |
|-----------|-------|-----------|
| `min_num_segments` | 4 | More aggressive than default (8) — triggers merges earlier to bound segment count at steady state |
| `max_docs_before_merge` | 5,000,000 | Smaller max segment than default (10M) — reduces worst-case merge duration |
| `del_docs_ratio_before_merge` | 0.3 | Triggers merge when 30% of docs deleted — tidalDB uses delete-then-add for updates, so deleted docs accumulate |
Applied to both Item and Creator `TextIndex` instances.
## API
```rust
// Returns the number of active Tantivy search segments.
// Useful for monitoring merge policy effectiveness.
db.text_index.segment_count()
```
Note: `TidalDb::text_segment_count()` is exposed in `tidal/src/db/items.rs`.
## Segment Count Target
**< 20 segments at steady state** after initial 1M-item ingest.
At 1M items with 1000-item commits (tidalDB's default syncer batch size), the initial
ingest produces ~1000 commits. Without merge policy tuning, segment count can reach 50+.
With `min_num_segments=4`, merges fire aggressively and keep steady-state count below 20.
## Verification
The integration tests in `tidal/tests/tantivy_merge.rs` (marked `#[ignore]`) verify:
- `tantivy_segment_evolution`: segment count stays < 20 during 10 rounds of steady-state writes
- `tantivy_concurrent_read_write_latency`: read p99 < 100ms during concurrent writes
To run:
```bash
cargo test --manifest-path tidal/Cargo.toml -- tantivy_segment_evolution --ignored --nocapture
cargo test --manifest-path tidal/Cargo.toml -- tantivy_concurrent_read_write_latency --ignored --nocapture
```
## Regression Guard
No regression in `tidal/benches/text_index.rs` and `tidal/benches/search.rs` benchmarks after
applying `LogMergePolicy` changes.
## References
- `docs/research/tantivy.md` LogMergePolicy background and parameter guidance
- Tantivy docs: https://docs.rs/tantivy/latest/tantivy/merge_policy/struct.LogMergePolicy.html