tidaldb/docs/planning/milestone-8/phase-3/task-04-crdt-signal-state.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

112 lines
4.4 KiB
Markdown

# Task 04: CrdtSignalState
## Delivers
`CrdtSignalState` wrapping `HotSignalState` and `BucketedCounter` with per-node CRDT semantics. Per-node decay accumulators that sum on merge. Per-node bucket arrays that max on merge. Merge produces correct decay scores regardless of order.
## Complexity: L
## Dependencies
- Task 02 (PNCounter)
- Phase 8.1 (ShardId as node identifier)
## Technical Design
The key insight: exponential decay scores are sums of weighted exponentials.
`S_total(t) = sum_i(w_i * exp(-lambda * (t - t_i)))`. Each node maintains its
own running partial sum. On merge, partial sums add (each covers disjoint events
since each node processes distinct WAL segments). This is mathematically exact.
```rust
// tidal/src/replication/crdt/signal_state.rs
/// CRDT-aware signal state for a single entity+signal_type pair.
///
/// Extends the existing HotSignalState and BucketedCounter with per-node
/// accounting that enables correct merge after partitioned writes.
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct CrdtSignalState {
/// Per-node running decay score.
///
/// Each node contributes its own partial decay sum.
/// Global score = sum of all node contributions at query time.
node_decay_scores: HashMap<ShardId, f64>,
/// Timestamp of last event per node (for decay math on merge).
node_last_update_ns: HashMap<ShardId, u64>,
/// Per-node windowed counters.
///
/// Each node tracks its own bucket increments.
/// On merge, per-node buckets are merged by taking per-node max
/// (idempotent since same-node events are identical across replicas).
node_buckets: HashMap<ShardId, PNCounter>,
/// Lambda (decay rate) -- identical across all nodes for this signal.
lambda: f64,
}
impl CrdtSignalState {
pub fn new(lambda: f64) -> Self {
Self {
node_decay_scores: HashMap::new(),
node_last_update_ns: HashMap::new(),
node_buckets: HashMap::new(),
lambda,
}
}
/// Record a new signal event from `node`.
pub fn on_signal(&mut self, node: ShardId, weight: f64, now_ns: u64) {
let entry = self.node_decay_scores.entry(node).or_default();
let last = self.node_last_update_ns.entry(node).or_insert(now_ns);
// Decay existing score, then add new event weight.
let dt = (now_ns.saturating_sub(*last)) as f64 / 1e9;
*entry = *entry * (-self.lambda * dt).exp() + weight;
*last = now_ns;
}
/// Global decay score: sum of all per-node contributions at `now_ns`.
pub fn decay_score(&self, now_ns: u64) -> f64 {
self.node_decay_scores.iter()
.zip(self.node_last_update_ns.values())
.map(|((_, &score), &last)| {
let dt = (now_ns.saturating_sub(last)) as f64 / 1e9;
score * (-self.lambda * dt).exp()
})
.sum()
}
/// Merge another CrdtSignalState into this one.
///
/// Per-node scores are summed (each node contributes distinct events).
/// Per-node buckets are merged via PNCounter merge (per-node max).
pub fn merge(&mut self, other: &CrdtSignalState) {
for (&node, &other_score) in &other.node_decay_scores {
*self.node_decay_scores.entry(node).or_default() += other_score;
}
for (&node, &other_ts) in &other.node_last_update_ns {
let entry = self.node_last_update_ns.entry(node).or_default();
*entry = (*entry).max(other_ts);
}
for (node, other_bucket) in &other.node_buckets {
self.node_buckets
.entry(*node)
.or_default()
.merge(other_bucket);
}
}
}
```
## Acceptance Criteria
- [ ] `CrdtSignalState::decay_score(now_ns)` returns sum of all per-node contributions decayed to `now_ns`
- [ ] Two nodes process 500 events each (non-overlapping); after merge, `decay_score` == sum of both individual scores (property test: 1000 random event sequences)
- [ ] `merge` is commutative and associative (property tests)
- [ ] `merge` does not double-count: same-node events produce the same score regardless of how many times the node's state is merged (idempotent per node)
- [ ] `BucketedCounter` equivalent: per-node bucket increments merged by PNCounter; total windowed count = sum of distinct events across all nodes; no double-counting
- [ ] `cargo clippy -D warnings` and `cargo fmt` pass