# Task 01: Hot-Tier Signal State ## Context **Milestone:** 1 -- Signal Engine **Phase:** m1p4 -- Signal Ledger **Depends On:** None (uses types from m1p1 but no m1p4 tasks) **Blocks:** Task 03 (Signal Ledger and Velocity) **Complexity:** L ## Objective Deliver `HotSignalState`, the cache-line-aligned, lock-free struct that holds running exponential decay scores for a single signal type on a single entity. This is the structure touched on every ranking query -- it must be exactly 64 bytes, use atomic operations for concurrent read/write, and implement the running decay formula with mathematical exactness. The struct handles both in-order and out-of-order signal events, and provides lazy decay at read time so ranking queries pay only one `exp()` call per entity per decay rate. This is the single most performance-critical data structure in tidalDB. Every design choice is driven by the hot-path constraint: a ranking query scoring 200 candidates must complete in under 5 microseconds. That means ~25 nanoseconds per entity for decay score reads, which allows exactly one L1 cache miss and one `exp()` call. ## Requirements - `HotSignalState` must be `#[repr(C, align(64))]` -- exactly one L1 cache line - `static_assert!(size_of::() == 64)` - Running decay formula: `S(t) = S(t_prev) * exp(-lambda * dt) + weight` - `on_signal()` updates decay scores via CAS loop with correct memory ordering - `current_score()` applies lazy decay at read time: `stored_score * exp(-lambda * dt)` - Out-of-order events: when `t_event < last_update_ns`, pre-decay the weight instead of advancing time - Decay scores are non-negative (debug assertion) - All atomic operations use Acquire/Release/AcqRel -- no Relaxed without explicit justification - `Send + Sync` (ensured by atomic-only fields) - No `unsafe` code ## Technical Design ### Module Structure ``` tidal/src/signals/ hot.rs -- HotSignalState, all methods ``` ### Public API ```rust // === signals/hot.rs === use std::sync::atomic::{AtomicU64, Ordering}; /// Hot-path signal state for a single signal type on a single entity. /// /// One cache line (64 bytes). Touched on every ranking query involving this /// signal. Contains running decay scores for up to 3 decay rates and the /// timestamp of the last update for lazy decay at read time. /// /// # Memory Layout /// /// ```text /// Offset Size Field /// 0..8 8 entity_id (u64) /// 8..16 8 last_update_ns (AtomicU64) /// 16..18 2 signal_type_id (u16) /// 18..20 2 flags (u16) /// 20..24 4 _pad0 /// 24..32 8 decay_scores[0] (AtomicU64, f64 via to_bits/from_bits) /// 32..40 8 decay_scores[1] (AtomicU64) /// 40..48 8 decay_scores[2] (AtomicU64) /// 48..64 16 _pad1 /// ``` /// /// # Concurrency /// /// - Writers: CAS loop on each `decay_scores[i]`, then conditional store on /// `last_update_ns`. Multiple concurrent writers are serialized by CAS retry. /// - Readers: Acquire load on `last_update_ns`, then Acquire load on /// `decay_scores[i]`. Lazy decay applied from stored time to query time. /// - A reader may see a stale score with a fresh timestamp (over-decaying by /// a few nanoseconds) or a fresh score with a stale timestamp (under-decaying). /// Both produce ranking-correct results within floating-point epsilon. #[repr(C, align(64))] pub struct HotSignalState { entity_id: u64, last_update_ns: AtomicU64, signal_type_id: u16, flags: u16, _pad0: [u8; 4], decay_scores: [AtomicU64; 3], _pad1: [u8; 16], } // Compile-time size assertion const _: () = assert!(std::mem::size_of::() == 64); const _: () = assert!(std::mem::align_of::() == 64); /// Maximum number of decay rate slots per signal type. pub const MAX_DECAY_RATES: usize = 3; impl HotSignalState { /// Construct a new, zeroed state for the given entity and signal type. pub fn new(entity_id: u64, signal_type_id: u16) -> Self; /// Construct with the velocity_enabled flag set. pub fn with_flags(entity_id: u64, signal_type_id: u16, velocity_enabled: bool) -> Self; /// The entity this state belongs to. pub fn entity_id(&self) -> u64; /// The signal type index. pub fn signal_type_id(&self) -> u16; /// Whether velocity computation is enabled for this signal. pub fn velocity_enabled(&self) -> bool; /// Update running decay scores on a new signal event. /// /// For each configured lambda, applies the decay formula: /// new_score = old_score * exp(-lambda * dt) + effective_weight /// /// For in-order events (event_time_ns >= last_update_ns): /// dt = (event_time_ns - last_update_ns) as seconds /// effective_weight = weight /// last_update_ns is advanced to event_time_ns /// /// For out-of-order events (event_time_ns < last_update_ns): /// The existing score is not decayed (dt=0 for the score shift). /// Instead, the weight is pre-decayed: /// effective_weight = weight * exp(-lambda * (last_update_ns - event_time_ns)) /// last_update_ns is NOT changed. /// /// Cost: K * exp() calls where K = number of configured decay rates. /// At K=1 (M1 default): ~12ns. At K=3: ~36ns. pub fn on_signal( &self, weight: f64, event_time_ns: u64, lambdas: &[f64], ); /// Read the current decay score at query time. /// /// Applies lazy decay from last_update to query_time_ns: /// score = stored_score * exp(-lambda * dt) /// /// Cost: 1 load + 1 exp() + 1 multiply = ~15ns. pub fn current_score( &self, decay_rate_idx: usize, query_time_ns: u64, lambda: f64, ) -> f64; /// Read the raw stored score without lazy decay. /// Used only for checkpoint serialization. pub fn stored_score(&self, decay_rate_idx: usize) -> f64; /// Read the last update timestamp in nanoseconds. pub fn last_update_ns(&self) -> u64; /// Restore state from a checkpoint (set all fields). /// Called during crash recovery before WAL replay. pub fn restore( &self, last_update_ns: u64, scores: &[f64], ); } ``` ### Internal Design **Atomic memory ordering rationale:** The critical invariant is that a reader who loads `last_update_ns` via Acquire must see decay scores that are consistent with (or more recent than) that timestamp. Without this, a reader could see a new timestamp with an old score, producing an over-decayed (too small) result. - `last_update_ns` loads: `Ordering::Acquire` -- establishes a happens-before edge with the Release store from the writer. - `last_update_ns` stores: `Ordering::Release` -- makes all prior decay score CAS operations visible to readers who Acquire this timestamp. - `decay_scores[i]` loads: `Ordering::Acquire` -- ensures we read the most recent value stored by any CAS. - `decay_scores[i]` CAS: `Ordering::AcqRel` (success), `Ordering::Acquire` (failure) -- AcqRel on success makes the new score visible and acquires the latest value; Acquire on failure loads the freshest competing write. The write order is critical: CAS all decay scores FIRST, then conditionally store `last_update_ns`. If the process crashes between CAS and timestamp store, the worst case is that a reader applies lazy decay from an older timestamp, producing a slightly under-decayed (too large) score. This is safe for ranking because it is bounded and self-correcting on the next write. **Out-of-order event handling:** When `event_time_ns < last_update_ns`, the event arrived late. We cannot "rewind" the running score. Instead, we pre-decay the weight to account for the event's age relative to the current state: ``` adjusted_weight = weight * exp(-lambda * (last_update_ns - event_time_ns) / 1e9) ``` This is mathematically equivalent to having processed the event at its original time: the contribution of the late event to the score at `last_update_ns` is exactly `weight * exp(-lambda * age)`. For the CAS loop on out-of-order events, `dt` is 0 (the score is not decayed), and the adjusted weight is added: ``` new_score = old_score + adjusted_weight ``` **f64 via AtomicU64:** Decay scores are f64 values stored as u64 bit patterns using `f64::to_bits()` and `f64::from_bits()`. Both functions are safe, const, and produce well-defined results for all finite f64 values including 0.0, negative zero, and subnormals. NaN bit patterns are never stored because the decay formula cannot produce NaN from non-negative inputs. ### Error Handling No fallible operations. `on_signal()` and `current_score()` are infallible. `decay_rate_idx` out of bounds is a caller error -- debug-asserted but saturated to 0 in release (never panics on the hot path). ## Test Strategy ### Property Tests ```rust use proptest::prelude::*; // P1: Decay scores decrease monotonically without new events. proptest! { #[test] fn decay_monotonic_decrease( initial_score in 0.0f64..1e12, lambda in 1e-7f64..1e-3, dt_secs in 1.0f64..1e7, ) { let decayed = initial_score * (-lambda * dt_secs).exp(); prop_assert!(decayed <= initial_score); prop_assert!(decayed >= 0.0); } } // P2: Running score matches analytical sum to 6 decimal places. proptest! { #[test] fn running_score_matches_analytical( events in prop::collection::vec( (0.1f64..10.0, 1_000_000u64..1_000_000_000), 1..100, ), lambda in 1e-7f64..1e-3, ) { // Sort events by time for in-order processing let mut sorted_events = events.clone(); sorted_events.sort_by_key(|e| e.1); let query_time_ns = sorted_events.last().unwrap().1 + 1_000_000_000; // +1 second // Build HotSignalState and process events let state = HotSignalState::new(42, 0); for &(weight, time_ns) in &sorted_events { state.on_signal(weight, time_ns, &[lambda]); } let running = state.current_score(0, query_time_ns, lambda); // Compute analytical sum let analytical: f64 = sorted_events.iter() .map(|&(w, t)| w * (-lambda * (query_time_ns - t) as f64 / 1e9).exp()) .sum(); let relative_error = if analytical.abs() < 1e-15 { running.abs() } else { (running - analytical).abs() / analytical }; prop_assert!( relative_error < 1e-6, "running={running}, analytical={analytical}, relative_error={relative_error}" ); } } // P4: Out-of-order events produce same final score as in-order. proptest! { #[test] fn out_of_order_events_commutative( events in prop::collection::vec( (0.1f64..10.0, 1_000_000u64..1_000_000_000), 2..50, ), lambda in 1e-7f64..1e-3, ) { let query_time_ns = events.iter().map(|e| e.1).max().unwrap() + 1_000_000_000; // Process in-order let mut sorted = events.clone(); sorted.sort_by_key(|e| e.1); let state_ordered = HotSignalState::new(42, 0); for &(w, t) in &sorted { state_ordered.on_signal(w, t, &[lambda]); } let score_ordered = state_ordered.current_score(0, query_time_ns, lambda); // Process in reverse order (all out-of-order except first) sorted.reverse(); let state_reversed = HotSignalState::new(42, 0); for &(w, t) in &sorted { state_reversed.on_signal(w, t, &[lambda]); } let score_reversed = state_reversed.current_score(0, query_time_ns, lambda); // Also compare to analytical sum let analytical: f64 = events.iter() .map(|&(w, t)| w * (-lambda * (query_time_ns - t) as f64 / 1e9).exp()) .sum(); let error_ordered = if analytical.abs() < 1e-15 { score_ordered.abs() } else { (score_ordered - analytical).abs() / analytical }; let error_reversed = if analytical.abs() < 1e-15 { score_reversed.abs() } else { (score_reversed - analytical).abs() / analytical }; prop_assert!(error_ordered < 1e-6, "ordered: running={score_ordered}, analytical={analytical}, error={error_ordered}"); prop_assert!(error_reversed < 1e-6, "reversed: running={score_reversed}, analytical={analytical}, error={error_reversed}"); } } // Decay scores are always non-negative (INV-SIG-3). proptest! { #[test] fn decay_scores_non_negative( events in prop::collection::vec( (0.0f64..100.0, 0u64..2_000_000_000), 1..200, ), lambda in 1e-7f64..1e-3, query_offset in 0u64..2_000_000_000, ) { let state = HotSignalState::new(1, 0); for &(w, t) in &events { state.on_signal(w, t, &[lambda]); } let query_time = events.iter().map(|e| e.1).max().unwrap_or(0) + query_offset; let score = state.current_score(0, query_time, lambda); prop_assert!(score >= 0.0, "score was {score}"); } } ``` ### Unit Tests ```rust #[test] fn hot_signal_state_size_and_alignment() { assert_eq!(std::mem::size_of::(), 64); assert_eq!(std::mem::align_of::(), 64); } #[test] fn new_state_is_zeroed() { let state = HotSignalState::new(42, 5); assert_eq!(state.entity_id(), 42); assert_eq!(state.signal_type_id(), 5); assert_eq!(state.last_update_ns(), 0); assert_eq!(state.stored_score(0), 0.0); assert_eq!(state.stored_score(1), 0.0); assert_eq!(state.stored_score(2), 0.0); } #[test] fn single_event_sets_score_to_weight() { let state = HotSignalState::new(1, 0); let lambda = std::f64::consts::LN_2 / (7.0 * 24.0 * 3600.0); // 7-day half-life let t = 1_000_000_000u64; // 1 second in nanos state.on_signal(1.0, t, &[lambda]); // Immediately after, with no time elapsed, score should be ~1.0 let score = state.current_score(0, t, lambda); assert!((score - 1.0).abs() < 1e-10); } #[test] fn score_halves_after_half_life() { let half_life_secs = 3600.0; // 1 hour let lambda = std::f64::consts::LN_2 / half_life_secs; let state = HotSignalState::new(1, 0); let t0 = 0u64; state.on_signal(1.0, t0, &[lambda]); // Read after exactly one half-life let t1 = (half_life_secs * 1e9) as u64; let score = state.current_score(0, t1, lambda); assert!((score - 0.5).abs() < 1e-10, "score was {score}, expected ~0.5"); } #[test] fn two_events_accumulate() { let lambda = std::f64::consts::LN_2 / 3600.0; // 1h half-life let state = HotSignalState::new(1, 0); let t0 = 0u64; let t1 = 1_000_000_000u64; // 1 second later state.on_signal(1.0, t0, &[lambda]); state.on_signal(1.0, t1, &[lambda]); let score = state.current_score(0, t1, lambda); // score = 1.0 * exp(-lambda * 1.0) + 1.0 let expected = 1.0_f64 * (-lambda * 1.0).exp() + 1.0; assert!((score - expected).abs() < 1e-10, "score={score}, expected={expected}"); } #[test] fn out_of_order_event_predecays_weight() { let lambda = std::f64::consts::LN_2 / 3600.0; let state = HotSignalState::new(1, 0); // Process event at t=10s first let t_late = 10_000_000_000u64; state.on_signal(1.0, t_late, &[lambda]); // Then process event at t=5s (out of order) let t_early = 5_000_000_000u64; state.on_signal(1.0, t_early, &[lambda]); // Query at t=10s -- should match analytical result let analytical = 1.0 * (-lambda * 0.0).exp() // event at t=10, age=0 + 1.0 * (-lambda * 5.0).exp(); // event at t=5, age=5s let actual = state.current_score(0, t_late, lambda); assert!((actual - analytical).abs() < 1e-10, "actual={actual}, analytical={analytical}"); } #[test] fn last_update_ns_not_regressed_by_out_of_order() { let lambda = std::f64::consts::LN_2 / 3600.0; let state = HotSignalState::new(1, 0); state.on_signal(1.0, 10_000_000_000, &[lambda]); let ts_before = state.last_update_ns(); state.on_signal(1.0, 5_000_000_000, &[lambda]); // older event let ts_after = state.last_update_ns(); assert_eq!(ts_before, ts_after, "timestamp should not regress"); assert_eq!(ts_after, 10_000_000_000); } #[test] fn score_decays_to_near_zero_after_many_half_lives() { let lambda = std::f64::consts::LN_2 / 3600.0; // 1h half-life let state = HotSignalState::new(1, 0); state.on_signal(1.0, 0, &[lambda]); // After 100 half-lives (~100 hours), score should be essentially zero let t = (100.0 * 3600.0 * 1e9) as u64; let score = state.current_score(0, t, lambda); assert!(score < 1e-20, "score was {score}"); } #[test] fn velocity_flag() { let state = HotSignalState::with_flags(1, 0, true); assert!(state.velocity_enabled()); let state2 = HotSignalState::with_flags(1, 0, false); assert!(!state2.velocity_enabled()); } #[test] fn restore_sets_all_fields() { let state = HotSignalState::new(1, 0); state.restore(42_000_000_000, &[1.5, 2.5, 3.5]); assert_eq!(state.last_update_ns(), 42_000_000_000); assert!((state.stored_score(0) - 1.5).abs() < 1e-15); assert!((state.stored_score(1) - 2.5).abs() < 1e-15); assert!((state.stored_score(2) - 3.5).abs() < 1e-15); } #[test] fn multiple_lambdas() { let lambda_fast = std::f64::consts::LN_2 / 3600.0; // 1h half-life let lambda_slow = std::f64::consts::LN_2 / 604800.0; // 7d half-life let lambdas = [lambda_fast, lambda_slow]; let state = HotSignalState::new(1, 0); state.on_signal(1.0, 0, &lambdas); // After 1 hour, fast score ~0.5, slow score ~0.9996 let t = (3600.0 * 1e9) as u64; let score_fast = state.current_score(0, t, lambda_fast); let score_slow = state.current_score(1, t, lambda_slow); assert!((score_fast - 0.5).abs() < 1e-6); assert!((score_slow - (-lambda_slow * 3600.0).exp()).abs() < 1e-6); assert!(score_slow > score_fast, "slow decay should retain more"); } ``` ## Acceptance Criteria - [ ] `HotSignalState` is `#[repr(C, align(64))]` with compile-time size assertion `== 64` - [ ] `on_signal()` implements the running decay formula with CAS loops using `AcqRel`/`Acquire` ordering - [ ] `current_score()` applies lazy decay with `Acquire` loads - [ ] Out-of-order events pre-decay the weight and do not regress `last_update_ns` - [ ] Running score matches analytical brute-force sum to 6 decimal places (property test P2) - [ ] Decay scores monotonically decrease without new events (property test P1) - [ ] Decay scores are always non-negative across all property test inputs (INV-SIG-3) - [ ] Out-of-order processing produces same score as in-order to 6 decimal places (property test P4) - [ ] `restore()` correctly sets all fields for checkpoint recovery - [ ] No `unsafe` code - [ ] `cargo clippy -- -D warnings` passes - [ ] All property tests and unit tests pass ## Research References - [docs/research/tidaldb_signal_ledger.md](../../../research/tidaldb_signal_ledger.md) -- Section 3 (running-score formula proof), Section 4 (EntityState struct layout), Section 5 (f64 precision analysis: "adequate through year 18,000"), performance estimates (12ns per exp(), 36ns for 3 rates) - Cormode, G. et al., "Forward Decay: A Practical Time Decay Model for Streaming Systems," ICDE 2009 -- mathematical foundation for running score exactness ## Spec References - [docs/specs/03-signal-system.md](../../../specs/03-signal-system.md) -- Section 3 (HotSignalState layout), Section 4 (decay computation: write-path `on_signal`, read-path `current_score`, out-of-order handling, numerical stability), invariants INV-SIG-2 (monotonic decrease), INV-SIG-3 (non-negative), INV-SIG-5 (running score exactness), INV-CON-1 (lock-free reads), INV-CON-2 (CAS correctness), performance targets (Section 12: hot-tier update < 50ns, decay score read ~15ns) - [docs/specs/00-architecture-overview.md](../../../specs/00-architecture-overview.md) -- Section 8 code module map showing `signal/hot.rs` ## Implementation Notes - `f64::from_bits(0u64)` returns `0.0` and `(0.0f64).to_bits()` returns `0u64`. This means a zeroed `AtomicU64` reads as `0.0` through `from_bits`, which is the correct initial decay score. No special initialization needed. - `compare_exchange_weak` is used instead of `compare_exchange` because we are in a retry loop. The weak variant may fail spuriously but is faster on architectures with LL/SC (ARM). On x86, both compile to `CMPXCHG`. - The `_pad0` and `_pad1` fields ensure the struct is exactly 64 bytes. Without them, the compiler might add different padding that changes the size. `#[repr(C)]` makes the layout deterministic. - Do NOT implement the Jacobs forward-decay trick in this task. It eliminates read-time computation but requires log-space arithmetic and overflow prevention. Deferred to M2+ as an optimization. - Do NOT add benchmark harness in this task. Benchmarks are added in Task 03 after the full signal ledger is assembled. Property tests are the correctness gate for this task.