jordan 213b8efcca feat: complete M6-M7 + Enterprise Readiness milestones; split oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-23 22:41:16 -07:00

3.3 KiB

Raw Blame History

Signal State Memory Analysis

Struct Sizes (verified at runtime in `trimmer.rs` tests)

Component	Size	Notes
`HotSignalState`	64 bytes	`#[repr(C, align(64))]` — exactly one cache line. Compile-time asserted.
`BucketedCounter`	~944 bytes	`[AtomicU32; 60]` (240B) + `[AtomicU32; 168]` (672B) + 2×`AtomicU8` (2B) + 3×`AtomicU64` (24B) + alignment padding ≈ 944B
`EntitySignalEntry` (hot + warm)	~1,008 bytes	`HotSignalState` + `BucketedCounter`
DashMap per-entry overhead	~80 bytes	Hash metadata, shard lock overhead, key storage
Total per `(entity, signal_type)` entry	~1,088 bytes

Memory Projection

Scale	Entries	Projected Memory
100K items × 2 signals	200K	~218 MB
100K items × 10 signals	1M	~1.1 GB
1M items × 2 signals	2M	~2.2 GB
1M items × 10 signals	10M	~10.4 GB
5M items × 10 signals	50M	~52 GB

Budget Assessment

Budget: 10 GB signal state target.

Result at 1M × 10 signals: ~10.4 GB — marginally over budget (4% headroom)

This is close enough that we need trimming. The trim_cold_entries function in tidal/src/signals/trimmer.rs provides a time-ordered eviction policy with a default threshold of 5M entries (~5.4 GB) — providing a 2× safety margin below the 10 GB budget.

Memory Bounding via LRU Trimmer

Threshold

pub const DEFAULT_MAX_SIGNAL_ENTRIES: usize = 5_000_000; // ~5.44 GB

Policy

Oldest first: entries sorted by last_update_ns ascending
Batch eviction: removes enough entries to reach max_entries in a single pass
O(N log N) complexity per eviction, amortised across checkpoint intervals (every 30s)
Thread-safe: uses DashMap's snapshot iteration + per-shard removes

Trigger

Trimming runs in the checkpoint background thread (run_checkpoint_thread in db/state_rebuild.rs), checked every 30 seconds:

if ledger.entries().len() > DEFAULT_MAX_SIGNAL_ENTRIES {
    trim_cold_entries(ledger.entries(), DEFAULT_MAX_SIGNAL_ENTRIES);
}

Correctness

Re-signalling an evicted entity creates a fresh entry. The entry will miss some historical window counts (BucketedCounter is zeroed on creation), but decay scores are accurate from the new events forward. This is acceptable for a ranking system: stale signal state is less accurate than no signal state.

Actual DashMap Overhead Measurement

To verify the ~80 bytes/entry DashMap overhead assumption, measure empirically:

let map: DashMap<(u64, u16), [u8; 1008]> = DashMap::new();
// Insert N entries
// Read /proc/self/status VmRSS before and after
// overhead_per_entry = (after - before - N * 1008) / N

Expected result: ~60-100 bytes/entry depending on load factor and shard count. The 80-byte assumption is conservative (favors detecting memory problems early).

Signal Coverage in Practice

Not all (entity, signal_type) pairs are populated:

Only signalled entities have entries
At 10% view coverage (100K/1M items): 100K entries × 10 signals = 1M entries = ~1.1 GB
The trimmer only fires when load exceeds 5M entries — typical production workloads will stay well below this unless operating at very high write throughput for extended periods.

3.3 KiB Raw Blame History Unescape Escape