tidaldb/docs/planning/milestone-1/phase-2/OVERVIEW.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

5.8 KiB

Milestone 1, Phase 2: Write-Ahead Log

Status: COMPLETE

Phase Deliverable

A durable, append-only signal event log. Every signal write (view, like, skip, completion) is appended to the WAL before any aggregation occurs. Signal aggregates, decay scores, and windowed counts are derived state — the WAL is the source of truth. Group commit amortizes fsync cost across concurrent writers. Content-addressed events via per-event BLAKE3 hash for deduplication. Crash recovery scans forward from last checkpoint and truncates corrupted tails.

Acceptance Criteria

  • Batch-oriented wire format: 64-byte cache-aligned header (magic 0x54494C44, version, event count, first sequence number, batch timestamp, payload length, BLAKE3 checksum) followed by tightly-packed 21-byte event records (entity_id u64 LE, signal_type u8, weight f32 LE, timestamp u64 LE)
  • BLAKE3 hash covers header[0..32] || all_event_bytes — corrupted batches detected at recovery
  • Group commit: dedicated writer thread via crossbeam::channel::bounded(10_000) with recv_deadline; batch fills at 100 events or 10ms timeout, whichever comes first; one fsync per batch
  • Segment files: 16 MB rotation, named wal-{first_seq:020}.seg; list_segments() returns ordered list
  • Two-phase crash recovery: Phase 1 — verify magic and payload bounds; Phase 2 — verify BLAKE3; truncate at first invalid batch boundary
  • WalHandle::open() returns (handle, replayed_events) — caller gets events since last checkpoint for signal materializer replay
  • Sequence numbers are monotonically increasing u64, starting at 1; persist across close/reopen
  • Deduplication via double-buffered HashSet<u128> (first 128 bits of per-event BLAKE3); 30-second rotation window; duplicate returns Ok(0)
  • WalHandle::checkpoint(seq) writes checkpoint.meta atomically with last-materialized sequence number and timestamp
  • WalHandle::truncate_before(seq) dispatches to writer thread (no race with segment writes); deletes segments whose last sequence < seq
  • WalHandle::shutdown() flushes remaining events, fsyncs, and joins writer thread
  • WalHandle implements Drop for best-effort shutdown
  • #![forbid(unsafe_code)] — entirely safe Rust; crossbeam unsafe is in the dependency, not the WAL code
  • cargo fmt clean, cargo clippy -D warnings clean

Dependencies

  • Requires: m1p1 (types: EntityId, Timestamp encoding patterns) — WAL uses u64 entity IDs and nanosecond timestamps directly
  • Blocks: m1p4 (Signal Ledger — WAL replay feeds the materializer; WalHandle is SignalLedger's durability backend)

Research References

  • docs/research/tidaldb_wal.md — batch-oriented format (Section 1, Approach 3), group commit with crossbeam (Section 3, Pattern 4), BLAKE3 + length-prefix crash detection (Section 4, Approach 3), segment rotation (Section 5), bounded sliding window dedup (Section 6, Approach 3), full implementation blueprint
  • thoughts.md — Part II.1 (WAL convergence), Part V.5 (quarantine-first), Part V.6 (group commit)

Spec References

Task Index

# Task Delivers Depends On Complexity Status
01 WAL Wire Format and Segment Files BatchHeader, EventRecord, SegmentWriter, WalError None M COMPLETE
02 Group Commit Writer WriterConfig, WalCommand, run_writer loop Task 01 M COMPLETE
03 Crash Recovery and Replay WalReader, recover(), partial-write truncation Task 01 M COMPLETE
04 Deduplication, Checkpoint, and Public API DedupWindow, CheckpointManager, WalHandle, SignalEvent Task 02, Task 03 M COMPLETE

Task Dependency DAG

Task 01: Wire Format + Segment Files
    |
    +-------------------------------+
    |                               |
    v                               v
Task 02: Group Commit Writer    Task 03: Crash Recovery + Replay
    |                               |
    +---------------+---------------+
                    |
                    v
    Task 04: Dedup + Checkpoint + WalHandle (Public API)

Tasks 02 and 03 are parallelizable — both depend only on Task 01's types.

File Layout

tidal/src/
  wal/
    mod.rs          -- Task 04: WalHandle, WalConfig, SignalEvent (public API)
    format.rs       -- Task 01: BatchHeader, EventRecord encode/decode
    segment.rs      -- Task 01: SegmentWriter, list_segments
    error.rs        -- Task 01: WalError enum
    writer.rs       -- Task 02: WalCommand, WriterConfig, run_writer
    reader.rs       -- Task 03: WalReader, RecoveryResult, recover()
    dedup.rs        -- Task 04: DedupWindow
    checkpoint.rs   -- Task 04: CheckpointManager
  lib.rs            -- pub mod wal (added)

Open Questions (Resolved)

  1. oneshot channels — Resolved: used crossbeam::channel::bounded(1) per-append as the reply channel. Zero additional dependencies.

  2. Segment pre-allocation — Resolved: not implemented in m1p2. Deferred until disk write performance becomes a measured bottleneck.

  3. WAL compression — Resolved: deferred. At 10K events/sec the write rate (~210 KB/sec) is nowhere near a disk bandwidth constraint.

  4. Multi-batch fsync — Resolved: single fsync per batch (as designed). The 10ms timeout at low write rates makes multi-batch accumulation unnecessary.

  5. Interaction with fjall WAL — Resolved: the two WALs are independent. tidalDB's signal WAL sits in {dir}/wal/; fjall's internal journal sits in the fjall keyspace directory. Recovery order: signal WAL replay → signal state reconstruction → fjall entity store (no cross-dependency in crash recovery).