Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
110 lines
6.2 KiB
Markdown
110 lines
6.2 KiB
Markdown
# m8p2: WAL Shipping and Follower Replay
|
|
|
|
## Delivers
|
|
|
|
One-way WAL replication from leader to followers. The leader ships sealed WAL
|
|
segments over an abstract transport trait. Followers receive segments, validate
|
|
checksums, and replay them idempotently through the existing signal ledger
|
|
`apply_wal_event()` path. A replication lag metric is emitted. A follower can
|
|
serve read queries (RETRIEVE, SEARCH) with bounded staleness.
|
|
|
|
This is the "read replicas" capability -- the foundation for multi-region deployment.
|
|
|
|
Deliverables:
|
|
- `Transport` trait: `async fn send_segment(peer: ShardId, segment: &WalSegmentPayload)` and `async fn recv_segment() -> WalSegmentPayload`
|
|
- `InProcessTransport`: for testing, uses `tokio::sync::mpsc` channels between co-located instances
|
|
- `WalShipper`: background task on leader that watches for sealed segments, ships them to registered followers
|
|
- `SegmentReceiver`: background task on follower that receives segments, validates BLAKE3, replays events
|
|
- `ReplicationLagGauge`: tracks the delta between leader's latest seqno and each follower's applied seqno
|
|
- `FollowerDb`: a `TidalDb` variant that does not accept writes, only replays segments; serves read queries from its local state
|
|
|
|
## Dependencies
|
|
|
|
- **Requires:** Phase 8.1 (ShardId, RegionId, WalSegmentId, BatchHeader v2, ReplicationState)
|
|
- **Files modified:**
|
|
- `tidal/src/wal/segment.rs` -- `sealed_segments_since(seqno)` helper
|
|
- `tidal/src/db/open.rs` -- support `NodeRole::Follower` startup
|
|
- `tidal/src/db/mod.rs` -- `TidalDb::is_follower()` guard on write paths
|
|
- `tidal/src/signals/ledger/mod.rs` -- ensure `apply_wal_event()` is idempotent when replaying duplicate segments
|
|
- **Files created:**
|
|
- `tidal/src/replication/transport.rs` -- `Transport` trait, `WalSegmentPayload`
|
|
- `tidal/src/replication/in_process.rs` -- `InProcessTransport`
|
|
- `tidal/src/replication/shipper.rs` -- `WalShipper`
|
|
- `tidal/src/replication/receiver.rs` -- `SegmentReceiver`
|
|
- `tidal/src/replication/lag.rs` -- `ReplicationLagGauge`
|
|
|
|
## Research References
|
|
|
|
- `docs/research/tidaldb_wal.md` -- Segment sealing, batch checksum validation
|
|
- `thoughts.md` -- Part V.5 (quarantine-first ingestion; WAL is source of truth)
|
|
|
|
## Acceptance Criteria (Phase Level)
|
|
|
|
- [ ] `Transport` trait has `send_segment` and `recv_segment` async methods; `InProcessTransport` implements them via bounded mpsc channels
|
|
- [ ] `WalShipper` runs as a background `tokio::task`; polls for newly sealed segments every 2 seconds (configurable); ships segments to all registered followers in parallel
|
|
- [ ] `SegmentReceiver` validates BLAKE3 checksum of each received segment before replay; rejects corrupted segments with `WalError::Corruption`
|
|
- [ ] Follower replay is idempotent: replaying a segment with seqno <= follower's high-water-mark is a no-op (no duplicate signal counting)
|
|
- [ ] `ReplicationLagGauge` reports `leader_seqno - follower_applied_seqno` per follower; accessible via `MetricsState`
|
|
- [ ] Leader writes 1,000 signals -> follower replays all 1,000 -> `read_decay_score` on follower matches leader to 6 decimal places (analytical equivalence)
|
|
- [ ] Follower rejects write operations (`db.signal()`, `db.write_item()`) with `TidalError::ReadOnly`
|
|
- [ ] Replication lag converges to 0 within 5 seconds after leader quiesces (in-process transport)
|
|
- [ ] Leader crash and restart: follower continues serving reads from last replayed state; leader resumes shipping from last sealed segment
|
|
- [ ] `FollowerDb` serves `db.retrieve()` and `db.search()` queries against its local replayed state
|
|
|
|
## Task Execution Order
|
|
|
|
```
|
|
Task 01: Transport Trait ──────┐
|
|
├──> Task 03: WalShipper
|
|
Task 02: InProcessTransport ───┘ │
|
|
v
|
|
Task 04: SegmentReceiver
|
|
│
|
|
v
|
|
Task 05: FollowerDb
|
|
│
|
|
v
|
|
Task 06: ReplicationLagGauge
|
|
│
|
|
v
|
|
Task 07: Integration Tests
|
|
```
|
|
|
|
Tasks 01 and 02 are parallelizable. Task 03 requires Task 01. Tasks 04-07 are sequential.
|
|
|
|
## Module Location
|
|
|
|
| File | Status | Contains |
|
|
|------|--------|----------|
|
|
| `tidal/src/replication/transport.rs` | NEW | `Transport` trait, `WalSegmentPayload` |
|
|
| `tidal/src/replication/in_process.rs` | NEW | `InProcessTransport` (channel-based) |
|
|
| `tidal/src/replication/shipper.rs` | NEW | `WalShipper` background task |
|
|
| `tidal/src/replication/receiver.rs` | NEW | `SegmentReceiver` with checksum validation and replay |
|
|
| `tidal/src/replication/lag.rs` | NEW | `ReplicationLagGauge` |
|
|
| `tidal/src/wal/segment.rs` | MODIFIED | `sealed_segments_since(seqno)` |
|
|
| `tidal/src/db/open.rs` | MODIFIED | Follower startup path |
|
|
| `tidal/src/db/mod.rs` | MODIFIED | Write-rejection guard for followers |
|
|
| `tidal/src/signals/ledger/mod.rs` | MODIFIED | Idempotency guard on `apply_wal_event` |
|
|
|
|
## Notes
|
|
|
|
### In-process transport only in this phase
|
|
|
|
A TCP/gRPC transport is deferred to Phase 8.5. The `Transport` trait is async to support both in-process channels and future network transports.
|
|
|
|
### Idempotency via seqno
|
|
|
|
Followers track their high-water-mark `applied_seqno`. Segments with `first_seq <= applied_seqno` are skipped entirely. This reuses the existing checkpoint format from M1.
|
|
|
|
### Timer-based segment sealing
|
|
|
|
The existing `WalHandle` seals segments when they reach `max_size`. For replication, we add a timer-based seal: every `wal_ship_interval` (default 2s), the active segment is sealed even if not full. This bounds replication lag.
|
|
|
|
### No Raft, no consensus
|
|
|
|
This is primary-backup replication. One leader, N followers. Promotion is manual or triggered by the control plane (Phase 8.5).
|
|
|
|
## Done When
|
|
|
|
A developer can start a leader and a follower using `InProcessTransport`, write 10,000 signals to the leader, observe the follower replay all events with lag < 5 seconds, and execute `db.retrieve()` on the follower with results matching the leader's state (modulo staleness of up to 1 batch).
|