Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
112 lines
4.4 KiB
Markdown
112 lines
4.4 KiB
Markdown
# Task 04: CrdtSignalState
|
|
|
|
## Delivers
|
|
|
|
`CrdtSignalState` wrapping `HotSignalState` and `BucketedCounter` with per-node CRDT semantics. Per-node decay accumulators that sum on merge. Per-node bucket arrays that max on merge. Merge produces correct decay scores regardless of order.
|
|
|
|
## Complexity: L
|
|
|
|
## Dependencies
|
|
|
|
- Task 02 (PNCounter)
|
|
- Phase 8.1 (ShardId as node identifier)
|
|
|
|
## Technical Design
|
|
|
|
The key insight: exponential decay scores are sums of weighted exponentials.
|
|
`S_total(t) = sum_i(w_i * exp(-lambda * (t - t_i)))`. Each node maintains its
|
|
own running partial sum. On merge, partial sums add (each covers disjoint events
|
|
since each node processes distinct WAL segments). This is mathematically exact.
|
|
|
|
```rust
|
|
// tidal/src/replication/crdt/signal_state.rs
|
|
|
|
/// CRDT-aware signal state for a single entity+signal_type pair.
|
|
///
|
|
/// Extends the existing HotSignalState and BucketedCounter with per-node
|
|
/// accounting that enables correct merge after partitioned writes.
|
|
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
|
|
pub struct CrdtSignalState {
|
|
/// Per-node running decay score.
|
|
///
|
|
/// Each node contributes its own partial decay sum.
|
|
/// Global score = sum of all node contributions at query time.
|
|
node_decay_scores: HashMap<ShardId, f64>,
|
|
|
|
/// Timestamp of last event per node (for decay math on merge).
|
|
node_last_update_ns: HashMap<ShardId, u64>,
|
|
|
|
/// Per-node windowed counters.
|
|
///
|
|
/// Each node tracks its own bucket increments.
|
|
/// On merge, per-node buckets are merged by taking per-node max
|
|
/// (idempotent since same-node events are identical across replicas).
|
|
node_buckets: HashMap<ShardId, PNCounter>,
|
|
|
|
/// Lambda (decay rate) -- identical across all nodes for this signal.
|
|
lambda: f64,
|
|
}
|
|
|
|
impl CrdtSignalState {
|
|
pub fn new(lambda: f64) -> Self {
|
|
Self {
|
|
node_decay_scores: HashMap::new(),
|
|
node_last_update_ns: HashMap::new(),
|
|
node_buckets: HashMap::new(),
|
|
lambda,
|
|
}
|
|
}
|
|
|
|
/// Record a new signal event from `node`.
|
|
pub fn on_signal(&mut self, node: ShardId, weight: f64, now_ns: u64) {
|
|
let entry = self.node_decay_scores.entry(node).or_default();
|
|
let last = self.node_last_update_ns.entry(node).or_insert(now_ns);
|
|
|
|
// Decay existing score, then add new event weight.
|
|
let dt = (now_ns.saturating_sub(*last)) as f64 / 1e9;
|
|
*entry = *entry * (-self.lambda * dt).exp() + weight;
|
|
*last = now_ns;
|
|
}
|
|
|
|
/// Global decay score: sum of all per-node contributions at `now_ns`.
|
|
pub fn decay_score(&self, now_ns: u64) -> f64 {
|
|
self.node_decay_scores.iter()
|
|
.zip(self.node_last_update_ns.values())
|
|
.map(|((_, &score), &last)| {
|
|
let dt = (now_ns.saturating_sub(last)) as f64 / 1e9;
|
|
score * (-self.lambda * dt).exp()
|
|
})
|
|
.sum()
|
|
}
|
|
|
|
/// Merge another CrdtSignalState into this one.
|
|
///
|
|
/// Per-node scores are summed (each node contributes distinct events).
|
|
/// Per-node buckets are merged via PNCounter merge (per-node max).
|
|
pub fn merge(&mut self, other: &CrdtSignalState) {
|
|
for (&node, &other_score) in &other.node_decay_scores {
|
|
*self.node_decay_scores.entry(node).or_default() += other_score;
|
|
}
|
|
for (&node, &other_ts) in &other.node_last_update_ns {
|
|
let entry = self.node_last_update_ns.entry(node).or_default();
|
|
*entry = (*entry).max(other_ts);
|
|
}
|
|
for (node, other_bucket) in &other.node_buckets {
|
|
self.node_buckets
|
|
.entry(*node)
|
|
.or_default()
|
|
.merge(other_bucket);
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `CrdtSignalState::decay_score(now_ns)` returns sum of all per-node contributions decayed to `now_ns`
|
|
- [ ] Two nodes process 500 events each (non-overlapping); after merge, `decay_score` == sum of both individual scores (property test: 1000 random event sequences)
|
|
- [ ] `merge` is commutative and associative (property tests)
|
|
- [ ] `merge` does not double-count: same-node events produce the same score regardless of how many times the node's state is merged (idempotent per node)
|
|
- [ ] `BucketedCounter` equivalent: per-node bucket increments merged by PNCounter; total windowed count = sum of distinct events across all nodes; no double-counting
|
|
- [ ] `cargo clippy -D warnings` and `cargo fmt` pass
|