Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
95 lines
6.3 KiB
Markdown
95 lines
6.3 KiB
Markdown
# m8p3: CRDT Counters and Deterministic Reconciliation
|
|
|
|
## Delivers
|
|
|
|
Conflict-free replicated data types (CRDTs) for signal counters and hard
|
|
negatives that enable deterministic reconciliation after network partitions.
|
|
After this phase, two shards that process overlapping signal streams during a
|
|
partition can merge their state without double-counting, without losing hard
|
|
negatives, and without application intervention.
|
|
|
|
This is the critical correctness layer that makes "heal the partition; verify
|
|
deterministic reconciliation" possible in the UAT.
|
|
|
|
Deliverables:
|
|
- `PNCounter`: a positive-negative counter CRDT with per-node increments; merge = max per node per side
|
|
- `LWWRegister<T>`: last-writer-wins register with HLC timestamps for hard negatives (hide/mute/block)
|
|
- `CrdtSignalState`: wraps `HotSignalState` and `BucketedCounter` with CRDT merge semantics
|
|
- `ReconciliationEngine`: given two `ReplicationState` snapshots, produces a merge plan; applies it idempotently
|
|
- `HLC` (Hybrid Logical Clock): wall-clock + logical counter for causal ordering of hard-negative writes
|
|
|
|
## Dependencies
|
|
|
|
- **Requires:** Phase 8.1 (ShardId, RegionId), Phase 8.2 (WAL shipping, segment replay)
|
|
- **Files modified:**
|
|
- `tidal/src/signals/hot.rs` -- `HotSignalState` gains `node_id` field; decay scores become per-node accumulators
|
|
- `tidal/src/signals/warm.rs` -- `BucketedCounter` gains per-node bucket arrays for CRDT merge
|
|
- `tidal/src/entities/hard_neg.rs` -- `HardNegEntry` gains HLC timestamp for LWW semantics
|
|
- **Files created:**
|
|
- `tidal/src/replication/crdt/mod.rs` -- module root
|
|
- `tidal/src/replication/crdt/pn_counter.rs` -- `PNCounter`
|
|
- `tidal/src/replication/crdt/lww_register.rs` -- `LWWRegister<T>`
|
|
- `tidal/src/replication/crdt/hlc.rs` -- Hybrid Logical Clock
|
|
- `tidal/src/replication/reconcile.rs` -- `ReconciliationEngine`
|
|
|
|
## Research References
|
|
|
|
- `thoughts.md` -- Part V (StemeDB CRDT replication: G-Set for events, G-Counter for counts, LWW for state)
|
|
|
|
## Acceptance Criteria (Phase Level)
|
|
|
|
- [ ] `PNCounter` supports `increment(node_id, amount)` and `decrement(node_id, amount)`; `merge(other)` takes per-node max for both P and N vectors; `value()` returns `P_total - N_total`
|
|
- [ ] `PNCounter` merge is commutative, associative, and idempotent (property tests with 100K random operations across 5 nodes)
|
|
- [ ] `LWWRegister<T>` resolves concurrent writes by HLC timestamp; ties broken by `node_id` (higher wins); `merge(other)` takes the register with the higher timestamp
|
|
- [ ] `HLC::now()` returns `(wall_clock_ns, logical_counter)`; `HLC::update(received_hlc)` advances the clock; monotonically increasing within a node
|
|
- [ ] `CrdtSignalState` wraps decay scores as per-node accumulators: `merge` of two states produces the same result regardless of merge order (commutative property test)
|
|
- [ ] `BucketedCounter` CRDT merge: per-node bucket arrays merged by max; total count = sum across all nodes; no double-counting after merge (verification: sum of all increments across all nodes == merged counter value)
|
|
- [ ] Hard negatives use `LWWRegister<HardNegAction>`: a `hide` at HLC T1 followed by an `unhide` at HLC T2 > T1 resolves to unhide; a concurrent `hide` and `unhide` at the same wall-clock resolves deterministically by node_id
|
|
- [ ] `ReconciliationEngine::reconcile(local_state, remote_state) -> MergePlan`: produces a list of signal counter merges and hard-negative LWW resolutions; applying the plan is idempotent
|
|
- [ ] After reconciliation, no signal count exceeds the true event count (no double-counting); verified by replaying all WAL events from both sides and comparing against merged state
|
|
|
|
## Task Execution Order
|
|
|
|
```
|
|
Task 01: HLC ─────────────────────┐
|
|
├──> Task 04: CrdtSignalState
|
|
Task 02: PNCounter ────────────────┤
|
|
├──> Task 05: ReconciliationEngine
|
|
Task 03: LWWRegister ──────────────┘ │
|
|
v
|
|
Task 06: Reconciliation Property Tests
|
|
```
|
|
|
|
Tasks 01, 02, 03 are fully parallelizable. Tasks 04 and 05 depend on all three. Task 06 depends on 05.
|
|
|
|
## Module Location
|
|
|
|
| File | Status | Contains |
|
|
|------|--------|----------|
|
|
| `tidal/src/replication/crdt/mod.rs` | NEW | Module root |
|
|
| `tidal/src/replication/crdt/pn_counter.rs` | NEW | `PNCounter` |
|
|
| `tidal/src/replication/crdt/lww_register.rs` | NEW | `LWWRegister<T>` |
|
|
| `tidal/src/replication/crdt/hlc.rs` | NEW | `HLC` (Hybrid Logical Clock) |
|
|
| `tidal/src/replication/reconcile.rs` | NEW | `ReconciliationEngine`, `MergePlan` |
|
|
| `tidal/src/signals/hot.rs` | MODIFIED | Per-node accumulator support |
|
|
| `tidal/src/signals/warm.rs` | MODIFIED | Per-node bucket arrays |
|
|
| `tidal/src/entities/hard_neg.rs` | MODIFIED | HLC timestamp on entries |
|
|
|
|
## Notes
|
|
|
|
### Per-node accumulators, not per-event dedup
|
|
|
|
The naive approach of deduplicating every event by BLAKE3 hash across all nodes is O(events) in memory. Instead, we use PN-counters: each node tracks its own increment total, and merge takes per-node max. This is O(nodes) in memory, which is bounded and small.
|
|
|
|
### Decay score CRDT
|
|
|
|
Exponential decay scores are not naturally CRDT-compatible because `S(t) = S(t_prev) * exp(-lambda * dt) + w` is order-dependent. The solution: each node maintains its own running decay score. On merge, per-node scores are summed (each represents that node's contribution). This is mathematically equivalent to summing all events from all nodes, because the running-score formula is a sum of weighted exponentials. Property tests verify this.
|
|
|
|
### HLC, not NTP
|
|
|
|
Wall-clock skew between nodes can cause LWW to resolve incorrectly. The HLC (Kulkarni et al., 2014) adds a logical counter that advances on `send` and `max(local, remote)+1` on `receive`, guaranteeing causal ordering even with clock skew up to the HLC's tolerance (typically seconds).
|
|
|
|
## Done When
|
|
|
|
Two `TidalDb` instances process overlapping signal streams and hard-negative writes during a simulated partition. After merge via `ReconciliationEngine`, the merged signal counts exactly equal the deduplicated union of all events, and hard negatives reflect the latest write by HLC timestamp. Property tests verify commutativity, associativity, and idempotency of all CRDT merge operations across 100K random operation sequences.
|