tidaldb/docs/planning/milestone-8/phase-3/task-04-crdt-signal-state.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

4.4 KiB

Task 04: CrdtSignalState

Delivers

CrdtSignalState wrapping HotSignalState and BucketedCounter with per-node CRDT semantics. Per-node decay accumulators that sum on merge. Per-node bucket arrays that max on merge. Merge produces correct decay scores regardless of order.

Complexity: L

Dependencies

  • Task 02 (PNCounter)
  • Phase 8.1 (ShardId as node identifier)

Technical Design

The key insight: exponential decay scores are sums of weighted exponentials. S_total(t) = sum_i(w_i * exp(-lambda * (t - t_i))). Each node maintains its own running partial sum. On merge, partial sums add (each covers disjoint events since each node processes distinct WAL segments). This is mathematically exact.

// tidal/src/replication/crdt/signal_state.rs

/// CRDT-aware signal state for a single entity+signal_type pair.
///
/// Extends the existing HotSignalState and BucketedCounter with per-node
/// accounting that enables correct merge after partitioned writes.
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct CrdtSignalState {
    /// Per-node running decay score.
    ///
    /// Each node contributes its own partial decay sum.
    /// Global score = sum of all node contributions at query time.
    node_decay_scores: HashMap<ShardId, f64>,

    /// Timestamp of last event per node (for decay math on merge).
    node_last_update_ns: HashMap<ShardId, u64>,

    /// Per-node windowed counters.
    ///
    /// Each node tracks its own bucket increments.
    /// On merge, per-node buckets are merged by taking per-node max
    /// (idempotent since same-node events are identical across replicas).
    node_buckets: HashMap<ShardId, PNCounter>,

    /// Lambda (decay rate) -- identical across all nodes for this signal.
    lambda: f64,
}

impl CrdtSignalState {
    pub fn new(lambda: f64) -> Self {
        Self {
            node_decay_scores: HashMap::new(),
            node_last_update_ns: HashMap::new(),
            node_buckets: HashMap::new(),
            lambda,
        }
    }

    /// Record a new signal event from `node`.
    pub fn on_signal(&mut self, node: ShardId, weight: f64, now_ns: u64) {
        let entry = self.node_decay_scores.entry(node).or_default();
        let last = self.node_last_update_ns.entry(node).or_insert(now_ns);

        // Decay existing score, then add new event weight.
        let dt = (now_ns.saturating_sub(*last)) as f64 / 1e9;
        *entry = *entry * (-self.lambda * dt).exp() + weight;
        *last = now_ns;
    }

    /// Global decay score: sum of all per-node contributions at `now_ns`.
    pub fn decay_score(&self, now_ns: u64) -> f64 {
        self.node_decay_scores.iter()
            .zip(self.node_last_update_ns.values())
            .map(|((_, &score), &last)| {
                let dt = (now_ns.saturating_sub(last)) as f64 / 1e9;
                score * (-self.lambda * dt).exp()
            })
            .sum()
    }

    /// Merge another CrdtSignalState into this one.
    ///
    /// Per-node scores are summed (each node contributes distinct events).
    /// Per-node buckets are merged via PNCounter merge (per-node max).
    pub fn merge(&mut self, other: &CrdtSignalState) {
        for (&node, &other_score) in &other.node_decay_scores {
            *self.node_decay_scores.entry(node).or_default() += other_score;
        }
        for (&node, &other_ts) in &other.node_last_update_ns {
            let entry = self.node_last_update_ns.entry(node).or_default();
            *entry = (*entry).max(other_ts);
        }
        for (node, other_bucket) in &other.node_buckets {
            self.node_buckets
                .entry(*node)
                .or_default()
                .merge(other_bucket);
        }
    }
}

Acceptance Criteria

  • CrdtSignalState::decay_score(now_ns) returns sum of all per-node contributions decayed to now_ns
  • Two nodes process 500 events each (non-overlapping); after merge, decay_score == sum of both individual scores (property test: 1000 random event sequences)
  • merge is commutative and associative (property tests)
  • merge does not double-count: same-node events produce the same score regardless of how many times the node's state is merged (idempotent per node)
  • BucketedCounter equivalent: per-node bucket increments merged by PNCounter; total windowed count = sum of distinct events across all nodes; no double-counting
  • cargo clippy -D warnings and cargo fmt pass