55 lines
2.1 KiB
Markdown
55 lines
2.1 KiB
Markdown
# Tantivy Merge Policy Tuning
|
|
|
|
## Configuration Applied
|
|
|
|
`TextIndex` construction in `tidal/src/text/index.rs` configures `LogMergePolicy` with:
|
|
|
|
| Parameter | Value | Rationale |
|
|
|-----------|-------|-----------|
|
|
| `min_num_segments` | 4 | More aggressive than default (8) — triggers merges earlier to bound segment count at steady state |
|
|
| `max_docs_before_merge` | 5,000,000 | Smaller max segment than default (10M) — reduces worst-case merge duration |
|
|
| `del_docs_ratio_before_merge` | 0.3 | Triggers merge when 30% of docs deleted — tidalDB uses delete-then-add for updates, so deleted docs accumulate |
|
|
|
|
Applied to both Item and Creator `TextIndex` instances.
|
|
|
|
## API
|
|
|
|
```rust
|
|
// Returns the number of active Tantivy search segments.
|
|
// Useful for monitoring merge policy effectiveness.
|
|
db.text_index.segment_count()
|
|
```
|
|
|
|
Note: `TidalDb::text_segment_count()` is exposed in `tidal/src/db/items.rs`.
|
|
|
|
## Segment Count Target
|
|
|
|
**< 20 segments at steady state** after initial 1M-item ingest.
|
|
|
|
At 1M items with 1000-item commits (tidalDB's default syncer batch size), the initial
|
|
ingest produces ~1000 commits. Without merge policy tuning, segment count can reach 50+.
|
|
With `min_num_segments=4`, merges fire aggressively and keep steady-state count below 20.
|
|
|
|
## Verification
|
|
|
|
The integration tests in `tidal/tests/tantivy_merge.rs` (marked `#[ignore]`) verify:
|
|
|
|
- `tantivy_segment_evolution`: segment count stays < 20 during 10 rounds of steady-state writes
|
|
- `tantivy_concurrent_read_write_latency`: read p99 < 100ms during concurrent writes
|
|
|
|
To run:
|
|
```bash
|
|
cargo test --manifest-path tidal/Cargo.toml -- tantivy_segment_evolution --ignored --nocapture
|
|
cargo test --manifest-path tidal/Cargo.toml -- tantivy_concurrent_read_write_latency --ignored --nocapture
|
|
```
|
|
|
|
## Regression Guard
|
|
|
|
No regression in `tidal/benches/text_index.rs` and `tidal/benches/search.rs` benchmarks after
|
|
applying `LogMergePolicy` changes.
|
|
|
|
## References
|
|
|
|
- `docs/research/tantivy.md` — LogMergePolicy background and parameter guidance
|
|
- Tantivy docs: https://docs.rs/tantivy/latest/tantivy/merge_policy/struct.LogMergePolicy.html
|