tidaldb/docs/planning/milestone-1/phase-2/task-04-deduplication-and-checkpoint.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

9.7 KiB

Task 04: Deduplication, Checkpoint, and WalHandle Public API

Context

Milestone: 1 -- Signal Engine Phase: m1p2 -- Write-Ahead Log Status: COMPLETE Depends On: Task 02 (writer channel types), Task 03 (recover()) Blocks: m1p4 (Signal Ledger uses WalHandle as its durability backend) Complexity: M

Objective

Deliver three components that complete the WAL:

  1. DedupWindow — a double-buffered HashSet<u128> that detects duplicate signal events within a 60-second window using the first 128 bits of each event's BLAKE3 hash. Zero false positives. Bounded memory.

  2. CheckpointManager — reads and writes checkpoint.meta, the small JSON-like file that records the last-materialized sequence number. Enables recovery to skip already-materialized events.

  3. WalHandle — the public API: open(), append(), checkpoint(), truncate_before(), shutdown(). The entry point for m1p4 (Signal Ledger) and m1p5 (Entity CRUD API).

Requirements

DedupWindow

  • Two HashSet<u128> buffers, alternating every window_duration (default 30s)
  • Effective dedup coverage: ~60 seconds (current + previous window)
  • Hash key: first 16 bytes (128 bits) of blake3::hash(event_bytes) interpreted as u128 little-endian
  • check_and_insert(event_bytes: &[u8]) -> bool — returns true if duplicate
  • populate_from_events(events: Vec<EventRecord>) — bulk-insert on startup from replayed events
  • maybe_rotate() — called on each check_and_insert; swaps buffers when rotation_time.elapsed() > window_duration and clears the old current

CheckpointManager

  • checkpoint.meta is a simple binary file: [sequence: u64 LE][timestamp_nanos: u64 LE] (16 bytes)
  • CheckpointManager::write(dir, seq, timestamp_nanos) — writes atomically (write to temp file, fsync, rename)
  • CheckpointManager::read(dir) -> Result<Option<(u64, u64)>, WalError>None if file does not exist
  • File corruption (wrong size) returns WalError::Corruption

WalHandle

  • WalHandle::open(config: WalConfig) -> Result<(Self, Vec<SignalEvent>), WalError>
    • Creates {config.dir}/wal/ if absent
    • Calls recover(), initializes DedupWindow from replayed events
    • Finds or creates current segment
    • Spawns writer thread via std::thread::Builder::new().name("tidaldb-wal-writer")
    • Returns (handle, replayed_events) — replayed events are for m1p4 to feed into the signal materializer
  • WalHandle::append(event: SignalEvent) -> Result<u64, WalError> — blocks until durably committed
  • WalHandle::checkpoint(seq: u64) -> Result<(), WalError> — writes checkpoint.meta directly (no writer thread round-trip)
  • WalHandle::truncate_before(seq: u64) -> Result<(), WalError> — dispatches WalCommand::TruncateBefore to writer thread
  • WalHandle::shutdown(self) -> Result<(), WalError> — sends WalCommand::Shutdown, joins writer thread
  • impl Drop for WalHandle — best-effort shutdown if not already shut down (ignores errors)
  • WalHandle: Send + Sync — the Sender<WalCommand> is Send + Sync

Technical Design

DedupWindow

pub struct DedupWindow {
    current: HashSet<u128>,
    previous: HashSet<u128>,
    rotation_time: Instant,
    window: Duration,
}

impl DedupWindow {
    pub fn new(window: Duration) -> Self;

    pub fn check_and_insert(&mut self, event_bytes: &[u8]) -> bool {
        self.maybe_rotate();
        let hash = self.hash(event_bytes);
        if self.current.contains(&hash) || self.previous.contains(&hash) {
            return true; // duplicate
        }
        self.current.insert(hash);
        false
    }

    pub fn populate_from_events(&mut self, events: Vec<EventRecord>) {
        for e in events {
            let bytes = e.encode();
            let hash = self.hash(&bytes);
            self.current.insert(hash);
        }
    }

    fn hash(&self, event_bytes: &[u8]) -> u128 {
        u128::from_le_bytes(
            blake3::hash(event_bytes).as_bytes()[..16].try_into().unwrap()
        )
    }

    fn maybe_rotate(&mut self) {
        if self.rotation_time.elapsed() > self.window {
            std::mem::swap(&mut self.current, &mut self.previous);
            self.current.clear();
            self.rotation_time = Instant::now();
        }
    }
}

Memory at 10K events/sec: ~300K entries/window * 16 bytes * 2 windows + HashSet overhead ≈ 19 MB Memory at 100K events/sec: ~3M entries/window * 16 bytes * 2 ≈ 144 MB

CheckpointManager

pub struct CheckpointManager;

impl CheckpointManager {
    pub fn write(dir: &Path, seq: u64, timestamp_nanos: u64) -> Result<(), WalError> {
        // Write to temp file, fsync, rename (atomic on POSIX)
    }

    pub fn read(dir: &Path) -> Result<Option<(u64, u64)>, WalError> {
        // Returns None if checkpoint.meta does not exist
        // Returns Corruption if file is wrong size
    }
}

Test Strategy

DedupWindow Tests

#[test]
fn dedup_detects_duplicate() {
    let mut window = DedupWindow::new(Duration::from_secs(30));
    let bytes = [1u8; 21];
    assert!(!window.check_and_insert(&bytes)); // first: not duplicate
    assert!(window.check_and_insert(&bytes));  // second: duplicate
}

#[test]
fn dedup_different_events_not_duplicates() {
    let mut window = DedupWindow::new(Duration::from_secs(30));
    assert!(!window.check_and_insert(&[1u8; 21]));
    assert!(!window.check_and_insert(&[2u8; 21]));
}

#[test]
fn dedup_rotation_clears_old_events() {
    let mut window = DedupWindow::new(Duration::from_millis(10));
    let bytes = [1u8; 21];
    window.check_and_insert(&bytes);
    std::thread::sleep(Duration::from_millis(11)); // trigger rotation
    // After one rotation: event is in "previous" -- still caught
    assert!(window.check_and_insert(&bytes));
    std::thread::sleep(Duration::from_millis(11)); // trigger second rotation
    // After two rotations: event has left both windows
    assert!(!window.check_and_insert(&bytes));
}

#[test]
fn dedup_populate_from_events_seeds_correctly() {
    let mut window = DedupWindow::new(Duration::from_secs(30));
    let events = vec![EventRecord { entity_id: 1, signal_type: 1, weight: 1.0, timestamp_nanos: 0 }];
    window.populate_from_events(events);
    let bytes = EventRecord { entity_id: 1, signal_type: 1, weight: 1.0, timestamp_nanos: 0 }.encode();
    assert!(window.check_and_insert(&bytes)); // seeded event is detected as duplicate
}

CheckpointManager Tests

#[test]
fn checkpoint_read_returns_none_if_absent() {
    let dir = tempfile::tempdir().unwrap();
    assert!(CheckpointManager::read(dir.path()).unwrap().is_none());
}

#[test]
fn checkpoint_write_then_read_roundtrip() {
    let dir = tempfile::tempdir().unwrap();
    CheckpointManager::write(dir.path(), 42, 1_700_000_000_000_000_000).unwrap();
    let result = CheckpointManager::read(dir.path()).unwrap().unwrap();
    assert_eq!(result.0, 42);
    assert_eq!(result.1, 1_700_000_000_000_000_000);
}

#[test]
fn checkpoint_overwrites_previous() {
    let dir = tempfile::tempdir().unwrap();
    CheckpointManager::write(dir.path(), 10, 0).unwrap();
    CheckpointManager::write(dir.path(), 20, 0).unwrap();
    let (seq, _) = CheckpointManager::read(dir.path()).unwrap().unwrap();
    assert_eq!(seq, 20);
}

WalHandle Integration Tests

#[test]
fn open_creates_wal_directory() { /* ... */ }

#[test]
fn append_returns_sequence_number() { /* ... */ }

#[test]
fn dedup_returns_zero() { /* ... */ }

#[test]
fn checkpoint_writes_file() { /* ... */ }

#[test]
fn close_and_reopen_continues_sequence() { /* ... */ }

#[test]
fn drop_shuts_down_cleanly() {
    // WalHandle drops without explicit shutdown — no panic, no thread leak
    let dir = tempfile::tempdir().unwrap();
    let (handle, _) = WalHandle::open(test_config(dir.path())).unwrap();
    drop(handle); // should not hang or panic
}

Acceptance Criteria

  • DedupWindow::check_and_insert() returns true for duplicates, false for new events
  • Duplicate detection covers ~60-second window via double-buffer rotation
  • Zero false positives — no legitimate events are silently dropped
  • DedupWindow::populate_from_events() seeds the window from WAL replay
  • CheckpointManager::write() is atomic (temp file + rename on POSIX)
  • CheckpointManager::read() returns None for a fresh WAL with no checkpoint
  • WalHandle::open() returns (handle, replayed_events) where replayed_events contains all events since last checkpoint
  • WalHandle::append() returns Ok(0) for deduplicated events
  • WalHandle::checkpoint() does not go through the writer thread (no deadlock risk if writer is busy)
  • WalHandle::truncate_before() runs inside the writer thread (no race with active writes)
  • impl Drop for WalHandle provides best-effort shutdown without panicking

Research References

  • docs/research/tidaldb_wal.md — Section 6 (Approach 3: bounded sliding window dedup, DedupWindow implementation, memory analysis), Section 5 (checkpoint.meta format, checkpoint process with atomic write)
  • thoughts.md — Part II.1 (WAL convergence lessons from Engram/Citadel/StemeDB)

Implementation Notes

  • blake3 is a direct dependency of the WAL module (blake3 = "1" in Cargo.toml). Already in the dependency plan per CODING_GUIDELINES.md.
  • crossbeam is already a transitive dependency via fjall. Adding it as a direct dependency makes the version explicit and allows feature selection.
  • The checkpoint file format (16 bytes binary) is simpler than JSON and trivially parsed. If schema evolution is ever needed, bump the format version (currently implied 1 by the read/write assumption).
  • WalHandle does not implement Clone — there is exactly one writer thread. Use Arc<WalHandle> if shared across threads.