# Flamegraph Hotspot Analysis ## Profiling Setup ### macOS (primary) ```bash cargo install samply cargo build --manifest-path tidal/Cargo.toml --release --benches samply record target/release/deps/scale-* ``` ### Linux ```bash cargo install flamegraph cargo flamegraph --manifest-path tidal/Cargo.toml --bench scale ``` Both tools generate profiles that can be viewed in the browser (samply) or as SVG (flamegraph). _SVG artifacts: `docs/profiling/retrieve_1m.svg` and `docs/profiling/search_1m.svg`_ _These are generated by running the benchmarks — not committed to the repository._ ## Expected Hot Paths Based on code analysis of the RETRIEVE and SEARCH pipelines: ### RETRIEVE (`for_you` profile, 1M items) | Stage | Hot Path | Estimated % | |-------|---------|-------------| | Stage 3: Signal scoring | DashMap lookup + `exp()` per candidate | ~45% | | Stage 1: Candidate generation | RoaringBitmap universe scan | ~20% | | Stage 2: Filter evaluation | Bitmap intersections | ~15% | | Stage 4: Sort (top-K) | `Vec::sort_unstable` on scored candidates | ~10% | | Stage 5: Result assembly | Struct allocation, metadata lookup | ~10% | ### SEARCH (`text_only` query, 1M items) | Stage | Hot Path | Estimated % | |-------|---------|-------------| | BM25 engine | Tantivy posting list traversal, IDF scoring | ~50% | | ANN (USearch HNSW) | Graph traversal at ef_search=200 | ~30% | | RRF fusion | Score normalization, merge sort | ~10% | | Post-filter | Bitmap filter evaluation | ~10% | ## Optimizations Applied ### 1. Sort optimization for top-K (RETRIEVE Stage 4) **Applied:** `sort_by` -> `sort_unstable_by` at both sort sites in `tidal/src/ranking/executor/mod.rs`. **Deferred:** The full `select_nth_unstable_by` optimization (O(N) partition + O(K log K) sort) requires knowing `limit` at the sort site. In the current executor design, truncation to `limit` is applied by the caller after normalization, so the sort site does not have access to `K`. Applying `select_nth_unstable` would require passing `limit` through the scoring pipeline, which is a more invasive refactor deferred to a future optimization pass. **Current benefit:** `sort_unstable_by` vs `sort_by` -- same O(N log N) complexity but ~5-10% faster in practice due to eliminated stability bookkeeping. ### 2. LogMergePolicy for Tantivy (SEARCH BM25) Reduces segment count from potentially hundreds to < 20 at steady state. Fewer segments = fewer posting list merges during BM25 scoring. **Location:** `tidal/src/text/index.rs` — `LogMergePolicy` configured with: - `min_num_segments = 4` - `max_docs_before_merge = 5_000_000` - `del_docs_ratio_before_merge = 0.3` ### 3. ef_construction=400 (ANN build quality) Higher construction quality reduces the number of graph re-traversals needed during search, improving recall without increasing search latency. ## Deferred Hotspot Work The following optimizations were identified but deferred as premature until flamegraph profiling confirms they are in the top-3 hotspots: - **DashMap lookup sharding:** Batch signal reads by pre-sorting entity IDs by DashMap shard to improve cache locality. Expected gain: ~10-15% in Stage 3. - **`exp()` approximation:** Replace `(-lambda * dt).exp()` with a fast approximation (e.g., `exp_fast` via bit manipulation). Expected gain: ~5-8% in Stage 3 on high-throughput paths. - **Tantivy heap budget increase:** 50MB→100MB to reduce flushing frequency under 1M ingest. These should be re-evaluated after actual flamegraph data is collected. ## Before/After Evidence | Optimization | Change | Expected Benefit | |-------------|--------|-----------------| | `sort_by` -> `sort_unstable_by` (Stage 4) | Eliminates stability bookkeeping | ~5-10% reduction in sort time | | LogMergePolicy (SEARCH BM25) | Reduces segment count from potentially 50+ to < 20 | Fewer posting list merges per query | | ef_construction=400 (ANN build) | Higher graph quality -> fewer graph re-traversals during search | Improved recall without latency regression | _Note: Actual before/after latency deltas require running `samply`/`flamegraph` on the release bench binary. Flamegraph SVG artifacts are machine-generated and are not committed to the repository -- they are too large for git. To generate:_ ```bash cargo install samply cargo build --manifest-path tidal/Cargo.toml --release --benches samply record target/release/deps/scale-* ``` ## Next Steps 1. Run `samply record target/release/deps/scale-*` on macOS target hardware 2. Identify top-3 hotspots from the flamegraph (by cumulative self-time) 3. Apply the optimization if > 10% of total time 4. Re-run benchmark before/after to confirm improvement 5. Update this document with actual SVG artifacts and measured deltas