Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
94 lines
5.8 KiB
Markdown
94 lines
5.8 KiB
Markdown
# Milestone 1, Phase 2: Write-Ahead Log
|
|
|
|
## Status: COMPLETE
|
|
|
|
## Phase Deliverable
|
|
|
|
A durable, append-only signal event log. Every signal write (view, like, skip, completion) is appended to the WAL before any aggregation occurs. Signal aggregates, decay scores, and windowed counts are derived state — the WAL is the source of truth. Group commit amortizes fsync cost across concurrent writers. Content-addressed events via per-event BLAKE3 hash for deduplication. Crash recovery scans forward from last checkpoint and truncates corrupted tails.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] Batch-oriented wire format: 64-byte cache-aligned header (magic `0x54494C44`, version, event count, first sequence number, batch timestamp, payload length, BLAKE3 checksum) followed by tightly-packed 21-byte event records (entity_id u64 LE, signal_type u8, weight f32 LE, timestamp u64 LE)
|
|
- [x] BLAKE3 hash covers `header[0..32] || all_event_bytes` — corrupted batches detected at recovery
|
|
- [x] Group commit: dedicated writer thread via `crossbeam::channel::bounded(10_000)` with `recv_deadline`; batch fills at 100 events or 10ms timeout, whichever comes first; one fsync per batch
|
|
- [x] Segment files: 16 MB rotation, named `wal-{first_seq:020}.seg`; `list_segments()` returns ordered list
|
|
- [x] Two-phase crash recovery: Phase 1 — verify magic and payload bounds; Phase 2 — verify BLAKE3; truncate at first invalid batch boundary
|
|
- [x] `WalHandle::open()` returns `(handle, replayed_events)` — caller gets events since last checkpoint for signal materializer replay
|
|
- [x] Sequence numbers are monotonically increasing u64, starting at 1; persist across close/reopen
|
|
- [x] Deduplication via double-buffered `HashSet<u128>` (first 128 bits of per-event BLAKE3); 30-second rotation window; duplicate returns `Ok(0)`
|
|
- [x] `WalHandle::checkpoint(seq)` writes `checkpoint.meta` atomically with last-materialized sequence number and timestamp
|
|
- [x] `WalHandle::truncate_before(seq)` dispatches to writer thread (no race with segment writes); deletes segments whose last sequence < `seq`
|
|
- [x] `WalHandle::shutdown()` flushes remaining events, fsyncs, and joins writer thread
|
|
- [x] `WalHandle` implements `Drop` for best-effort shutdown
|
|
- [x] `#![forbid(unsafe_code)]` — entirely safe Rust; `crossbeam` unsafe is in the dependency, not the WAL code
|
|
- [x] `cargo fmt` clean, `cargo clippy -D warnings` clean
|
|
|
|
## Dependencies
|
|
|
|
- **Requires:** m1p1 (types: `EntityId`, `Timestamp` encoding patterns) — WAL uses u64 entity IDs and nanosecond timestamps directly
|
|
- **Blocks:** m1p4 (Signal Ledger — WAL replay feeds the materializer; `WalHandle` is `SignalLedger`'s durability backend)
|
|
|
|
## Research References
|
|
|
|
- [docs/research/tidaldb_wal.md](../../../research/tidaldb_wal.md) — batch-oriented format (Section 1, Approach 3), group commit with crossbeam (Section 3, Pattern 4), BLAKE3 + length-prefix crash detection (Section 4, Approach 3), segment rotation (Section 5), bounded sliding window dedup (Section 6, Approach 3), full implementation blueprint
|
|
- [thoughts.md](../../../../thoughts.md) — Part II.1 (WAL convergence), Part V.5 (quarantine-first), Part V.6 (group commit)
|
|
|
|
## Spec References
|
|
|
|
- [CODING_GUIDELINES.md](../../../../CODING_GUIDELINES.md) — Section 7 (error handling), Section 10 (dependency policy for crossbeam)
|
|
|
|
## Task Index
|
|
|
|
| # | Task | Delivers | Depends On | Complexity | Status |
|
|
|---|------|----------|------------|------------|--------|
|
|
| 01 | WAL Wire Format and Segment Files | `BatchHeader`, `EventRecord`, `SegmentWriter`, `WalError` | None | M | COMPLETE |
|
|
| 02 | Group Commit Writer | `WriterConfig`, `WalCommand`, `run_writer` loop | Task 01 | M | COMPLETE |
|
|
| 03 | Crash Recovery and Replay | `WalReader`, `recover()`, partial-write truncation | Task 01 | M | COMPLETE |
|
|
| 04 | Deduplication, Checkpoint, and Public API | `DedupWindow`, `CheckpointManager`, `WalHandle`, `SignalEvent` | Task 02, Task 03 | M | COMPLETE |
|
|
|
|
## Task Dependency DAG
|
|
|
|
```
|
|
Task 01: Wire Format + Segment Files
|
|
|
|
|
+-------------------------------+
|
|
| |
|
|
v v
|
|
Task 02: Group Commit Writer Task 03: Crash Recovery + Replay
|
|
| |
|
|
+---------------+---------------+
|
|
|
|
|
v
|
|
Task 04: Dedup + Checkpoint + WalHandle (Public API)
|
|
```
|
|
|
|
Tasks 02 and 03 are parallelizable — both depend only on Task 01's types.
|
|
|
|
## File Layout
|
|
|
|
```
|
|
tidal/src/
|
|
wal/
|
|
mod.rs -- Task 04: WalHandle, WalConfig, SignalEvent (public API)
|
|
format.rs -- Task 01: BatchHeader, EventRecord encode/decode
|
|
segment.rs -- Task 01: SegmentWriter, list_segments
|
|
error.rs -- Task 01: WalError enum
|
|
writer.rs -- Task 02: WalCommand, WriterConfig, run_writer
|
|
reader.rs -- Task 03: WalReader, RecoveryResult, recover()
|
|
dedup.rs -- Task 04: DedupWindow
|
|
checkpoint.rs -- Task 04: CheckpointManager
|
|
lib.rs -- pub mod wal (added)
|
|
```
|
|
|
|
## Open Questions (Resolved)
|
|
|
|
1. **oneshot channels** — Resolved: used `crossbeam::channel::bounded(1)` per-append as the reply channel. Zero additional dependencies.
|
|
|
|
2. **Segment pre-allocation** — Resolved: not implemented in m1p2. Deferred until disk write performance becomes a measured bottleneck.
|
|
|
|
3. **WAL compression** — Resolved: deferred. At 10K events/sec the write rate (~210 KB/sec) is nowhere near a disk bandwidth constraint.
|
|
|
|
4. **Multi-batch fsync** — Resolved: single fsync per batch (as designed). The 10ms timeout at low write rates makes multi-batch accumulation unnecessary.
|
|
|
|
5. **Interaction with fjall WAL** — Resolved: the two WALs are independent. tidalDB's signal WAL sits in `{dir}/wal/`; fjall's internal journal sits in the fjall keyspace directory. Recovery order: signal WAL replay → signal state reconstruction → fjall entity store (no cross-dependency in crash recovery).
|