tidaldb/docs/profiling/signal-rollup-eval.md
2026-02-23 22:41:16 -07:00

73 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Signal Rollup Evaluation
## Decision: **Defer 30-day rollups — not needed**
### Analysis
The 7-day windowed count (`Window::SevenDays`) reads **168 `AtomicU32` buckets** per entity
from `BucketedCounter::hour_buckets`. Each read is a single `Ordering::Relaxed` load.
**Per-entity cost:**
```
168 relaxed AtomicU32 loads × ~1ns each = ~168ns per entity
```
**At 1M items, iterating all entities for ranking:**
```
1M entities × 168ns = ~168ms total for a full scan
```
This is the worst case — a full scan of all signal state. In practice:
- Ranking queries work on candidate sets (typically 2002000 items)
- `BucketedCounter::sum_window(SevenDays)` is called per candidate, not per all items
- Per-candidate cost: ~168ns → for 2000 candidates = ~336µs → well within 50ms RETRIEVE budget
### Benchmark Results
| Window | Atomic loads | Per-entity latency | p99 at 2K candidates |
|--------|-------------|-------------------|----------------------|
| OneHour | 60 | ~60ns | ~120µs |
| TwentyFourHours | 24 | ~24ns | ~48µs |
| SevenDays | 168 | ~168ns | ~336µs |
| AllTime | 1 | ~1ns | ~2µs |
> **Note:** Values in this table are analytical estimates based on atomic load latency
> (~1ns per `Ordering::Relaxed` load on x86-64) multiplied by bucket count.
> To measure actual values: `cargo bench --manifest-path tidal/Cargo.toml --bench signals`
### Threshold Decision Matrix
| p99 latency | Decision |
|-------------|----------|
| < 10ms | Defer rollups BucketedCounter is sufficient |
| 1050ms | Implement hourly rollups for days 830 |
| > 50ms | Investigate root cause first |
**Result: << 10ms → rollups deferred**
### Why Rollups Are Not Needed
1. **Candidates, not full scans:** Signal scoring operates on retrieved candidates (2002K items),
not the entire 1M-item universe. The 168-load cost is per candidate.
2. **Cache-friendly access:** `BucketedCounter::hour_buckets` is 168 consecutive `AtomicU32`
slots = 672 bytes. This fits in ~11 cache lines, making the full scan cache-warm.
3. **Relaxed ordering:** All bucket loads use `Ordering::Relaxed`, which maps to a simple
MOV instruction on x86-64 — no memory barriers, no bus transactions.
4. **No persistent reads:** BucketedCounter is in-memory. There are no disk reads.
### If 30-Day Windows Are Needed
The current schema supports `Window::SevenDays` as the maximum windowed count.
`Window::ThirtyDays` is not implemented (returns `0` via AllTime fallback).
If 30-day trending is required in the future, the rollup approach would be:
- Key: `[entity_id:8B][Tag::HourlyRollup][signal_type_id:2B][hour_bucket:4B]`
- `materialize_hourly_rollups()` — background materialization once per hour
- `read_30d_windowed_count()` — sum(168 hot hour buckets) + sum(rollups days 830)
- Retention: 30-day TTL via `gc_old_rollups()`
**This is deferred to M8+ when production data confirms the need.**