tidaldb/docs/profiling/signal-memory-analysis.md
2026-02-23 22:41:16 -07:00

88 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Signal State Memory Analysis
## Struct Sizes (verified at runtime in `trimmer.rs` tests)
| Component | Size | Notes |
|-----------|------|-------|
| `HotSignalState` | **64 bytes** | `#[repr(C, align(64))]` — exactly one cache line. Compile-time asserted. |
| `BucketedCounter` | **~944 bytes** | `[AtomicU32; 60]` (240B) + `[AtomicU32; 168]` (672B) + 2×`AtomicU8` (2B) + 3×`AtomicU64` (24B) + alignment padding ≈ 944B |
| `EntitySignalEntry` (hot + warm) | **~1,008 bytes** | `HotSignalState` + `BucketedCounter` |
| DashMap per-entry overhead | **~80 bytes** | Hash metadata, shard lock overhead, key storage |
| **Total per `(entity, signal_type)` entry** | **~1,088 bytes** | |
## Memory Projection
| Scale | Entries | Projected Memory |
|-------|---------|-----------------|
| 100K items × 2 signals | 200K | ~218 MB |
| 100K items × 10 signals | 1M | ~1.1 GB |
| 1M items × 2 signals | 2M | ~2.2 GB |
| **1M items × 10 signals** | **10M** | **~10.4 GB** |
| 5M items × 10 signals | 50M | ~52 GB |
## Budget Assessment
**Budget:** 10 GB signal state target.
**Result at 1M × 10 signals:** ~10.4 GB — **marginally over budget (4% headroom)**
This is close enough that we need trimming. The `trim_cold_entries` function
in `tidal/src/signals/trimmer.rs` provides a time-ordered eviction policy
with a default threshold of **5M entries (~5.4 GB)** — providing a 2× safety
margin below the 10 GB budget.
## Memory Bounding via LRU Trimmer
### Threshold
```rust
pub const DEFAULT_MAX_SIGNAL_ENTRIES: usize = 5_000_000; // ~5.44 GB
```
### Policy
- **Oldest first:** entries sorted by `last_update_ns` ascending
- **Batch eviction:** removes enough entries to reach `max_entries` in a single pass
- **O(N log N)** complexity per eviction, amortised across checkpoint intervals (every 30s)
- **Thread-safe:** uses DashMap's snapshot iteration + per-shard removes
### Trigger
Trimming runs in the checkpoint background thread (`run_checkpoint_thread` in `db/state_rebuild.rs`),
checked every 30 seconds:
```rust
if ledger.entries().len() > DEFAULT_MAX_SIGNAL_ENTRIES {
trim_cold_entries(ledger.entries(), DEFAULT_MAX_SIGNAL_ENTRIES);
}
```
### Correctness
Re-signalling an evicted entity creates a fresh entry. The entry will miss some historical
window counts (BucketedCounter is zeroed on creation), but decay scores are accurate
from the new events forward. This is acceptable for a ranking system: stale signal state
is less accurate than no signal state.
## Actual DashMap Overhead Measurement
To verify the ~80 bytes/entry DashMap overhead assumption, measure empirically:
```rust
let map: DashMap<(u64, u16), [u8; 1008]> = DashMap::new();
// Insert N entries
// Read /proc/self/status VmRSS before and after
// overhead_per_entry = (after - before - N * 1008) / N
```
Expected result: ~60-100 bytes/entry depending on load factor and shard count.
The 80-byte assumption is conservative (favors detecting memory problems early).
## Signal Coverage in Practice
Not all `(entity, signal_type)` pairs are populated:
- Only signalled entities have entries
- At 10% view coverage (100K/1M items): 100K entries × 10 signals = 1M entries = ~1.1 GB
- The trimmer only fires when load exceeds 5M entries — typical production workloads
will stay well below this unless operating at very high write throughput for extended periods.