tidaldb/docs/planning/milestone-1/phase-2/task-04-deduplication-and-checkpoint.md
jordan 29400d48db feat: implement Milestone 1 phases 1-3 — schema, WAL, and storage layer
Implements the foundation of tidalDB's data pipeline:

**Phase 1 – Schema primitives**
- EntityId newtype (u64, big-endian ordering)
- SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows
- SchemaBuilder with full constraint validation (duplicates, identifiers,
  half-life, windows, velocity)
- LumenError wrapping all subsystems with required From impls

**Phase 2 – Write-Ahead Log**
- Length-prefixed, BLAKE3-protected entry format
- Group-commit writer (batch up to 100 events / 10 ms)
- Double-buffered content-hash deduplication
- Checkpoint, truncation, and crash-recovery with full replay
- Integration, property, and UAT tests (incl. 5,500-event deterministic UAT)
- Proptest coverage scaled to 10 000 events/run (was ≤500) to meet
  acceptance criterion; cases reduced 100→10 to keep runtime comparable

**Phase 3 – Storage engine**
- StorageEngine trait (get/put/delete/scan/batch/flush)
- Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers
- InMemoryBackend (BTreeMap + RwLock)
- FjallStorage with three isolated keyspaces and atomic batch helper
- Property tests for key ordering and round-trip correctness

Also adds planning docs for phases 4-5, research docs, architecture
overview, and roadmap updates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 16:43:24 -07:00

247 lines
9.7 KiB
Markdown

# Task 04: Deduplication, Checkpoint, and WalHandle Public API
## Context
**Milestone:** 1 -- Signal Engine
**Phase:** m1p2 -- Write-Ahead Log
**Status:** COMPLETE
**Depends On:** Task 02 (writer channel types), Task 03 (`recover()`)
**Blocks:** m1p4 (Signal Ledger uses `WalHandle` as its durability backend)
**Complexity:** M
## Objective
Deliver three components that complete the WAL:
1. **`DedupWindow`** — a double-buffered `HashSet<u128>` that detects duplicate signal events within a 60-second window using the first 128 bits of each event's BLAKE3 hash. Zero false positives. Bounded memory.
2. **`CheckpointManager`** — reads and writes `checkpoint.meta`, the small JSON-like file that records the last-materialized sequence number. Enables recovery to skip already-materialized events.
3. **`WalHandle`** — the public API: `open()`, `append()`, `checkpoint()`, `truncate_before()`, `shutdown()`. The entry point for m1p4 (Signal Ledger) and m1p5 (Entity CRUD API).
## Requirements
### DedupWindow
- Two `HashSet<u128>` buffers, alternating every `window_duration` (default 30s)
- Effective dedup coverage: ~60 seconds (current + previous window)
- Hash key: first 16 bytes (128 bits) of `blake3::hash(event_bytes)` interpreted as `u128` little-endian
- `check_and_insert(event_bytes: &[u8]) -> bool` — returns `true` if duplicate
- `populate_from_events(events: Vec<EventRecord>)` — bulk-insert on startup from replayed events
- `maybe_rotate()` — called on each `check_and_insert`; swaps buffers when `rotation_time.elapsed() > window_duration` and clears the old current
### CheckpointManager
- `checkpoint.meta` is a simple binary file: `[sequence: u64 LE][timestamp_nanos: u64 LE]` (16 bytes)
- `CheckpointManager::write(dir, seq, timestamp_nanos)` — writes atomically (write to temp file, fsync, rename)
- `CheckpointManager::read(dir) -> Result<Option<(u64, u64)>, WalError>``None` if file does not exist
- File corruption (wrong size) returns `WalError::Corruption`
### WalHandle
- `WalHandle::open(config: WalConfig) -> Result<(Self, Vec<SignalEvent>), WalError>`
- Creates `{config.dir}/wal/` if absent
- Calls `recover()`, initializes `DedupWindow` from replayed events
- Finds or creates current segment
- Spawns writer thread via `std::thread::Builder::new().name("tidaldb-wal-writer")`
- Returns `(handle, replayed_events)` — replayed events are for m1p4 to feed into the signal materializer
- `WalHandle::append(event: SignalEvent) -> Result<u64, WalError>` — blocks until durably committed
- `WalHandle::checkpoint(seq: u64) -> Result<(), WalError>` — writes checkpoint.meta directly (no writer thread round-trip)
- `WalHandle::truncate_before(seq: u64) -> Result<(), WalError>` — dispatches `WalCommand::TruncateBefore` to writer thread
- `WalHandle::shutdown(self) -> Result<(), WalError>` — sends `WalCommand::Shutdown`, joins writer thread
- `impl Drop for WalHandle` — best-effort shutdown if not already shut down (ignores errors)
- `WalHandle: Send + Sync` — the `Sender<WalCommand>` is `Send + Sync`
## Technical Design
### DedupWindow
```rust
pub struct DedupWindow {
current: HashSet<u128>,
previous: HashSet<u128>,
rotation_time: Instant,
window: Duration,
}
impl DedupWindow {
pub fn new(window: Duration) -> Self;
pub fn check_and_insert(&mut self, event_bytes: &[u8]) -> bool {
self.maybe_rotate();
let hash = self.hash(event_bytes);
if self.current.contains(&hash) || self.previous.contains(&hash) {
return true; // duplicate
}
self.current.insert(hash);
false
}
pub fn populate_from_events(&mut self, events: Vec<EventRecord>) {
for e in events {
let bytes = e.encode();
let hash = self.hash(&bytes);
self.current.insert(hash);
}
}
fn hash(&self, event_bytes: &[u8]) -> u128 {
u128::from_le_bytes(
blake3::hash(event_bytes).as_bytes()[..16].try_into().unwrap()
)
}
fn maybe_rotate(&mut self) {
if self.rotation_time.elapsed() > self.window {
std::mem::swap(&mut self.current, &mut self.previous);
self.current.clear();
self.rotation_time = Instant::now();
}
}
}
```
**Memory at 10K events/sec:** ~300K entries/window * 16 bytes * 2 windows + HashSet overhead ≈ 19 MB
**Memory at 100K events/sec:** ~3M entries/window * 16 bytes * 2 ≈ 144 MB
### CheckpointManager
```rust
pub struct CheckpointManager;
impl CheckpointManager {
pub fn write(dir: &Path, seq: u64, timestamp_nanos: u64) -> Result<(), WalError> {
// Write to temp file, fsync, rename (atomic on POSIX)
}
pub fn read(dir: &Path) -> Result<Option<(u64, u64)>, WalError> {
// Returns None if checkpoint.meta does not exist
// Returns Corruption if file is wrong size
}
}
```
## Test Strategy
### DedupWindow Tests
```rust
#[test]
fn dedup_detects_duplicate() {
let mut window = DedupWindow::new(Duration::from_secs(30));
let bytes = [1u8; 21];
assert!(!window.check_and_insert(&bytes)); // first: not duplicate
assert!(window.check_and_insert(&bytes)); // second: duplicate
}
#[test]
fn dedup_different_events_not_duplicates() {
let mut window = DedupWindow::new(Duration::from_secs(30));
assert!(!window.check_and_insert(&[1u8; 21]));
assert!(!window.check_and_insert(&[2u8; 21]));
}
#[test]
fn dedup_rotation_clears_old_events() {
let mut window = DedupWindow::new(Duration::from_millis(10));
let bytes = [1u8; 21];
window.check_and_insert(&bytes);
std::thread::sleep(Duration::from_millis(11)); // trigger rotation
// After one rotation: event is in "previous" -- still caught
assert!(window.check_and_insert(&bytes));
std::thread::sleep(Duration::from_millis(11)); // trigger second rotation
// After two rotations: event has left both windows
assert!(!window.check_and_insert(&bytes));
}
#[test]
fn dedup_populate_from_events_seeds_correctly() {
let mut window = DedupWindow::new(Duration::from_secs(30));
let events = vec![EventRecord { entity_id: 1, signal_type: 1, weight: 1.0, timestamp_nanos: 0 }];
window.populate_from_events(events);
let bytes = EventRecord { entity_id: 1, signal_type: 1, weight: 1.0, timestamp_nanos: 0 }.encode();
assert!(window.check_and_insert(&bytes)); // seeded event is detected as duplicate
}
```
### CheckpointManager Tests
```rust
#[test]
fn checkpoint_read_returns_none_if_absent() {
let dir = tempfile::tempdir().unwrap();
assert!(CheckpointManager::read(dir.path()).unwrap().is_none());
}
#[test]
fn checkpoint_write_then_read_roundtrip() {
let dir = tempfile::tempdir().unwrap();
CheckpointManager::write(dir.path(), 42, 1_700_000_000_000_000_000).unwrap();
let result = CheckpointManager::read(dir.path()).unwrap().unwrap();
assert_eq!(result.0, 42);
assert_eq!(result.1, 1_700_000_000_000_000_000);
}
#[test]
fn checkpoint_overwrites_previous() {
let dir = tempfile::tempdir().unwrap();
CheckpointManager::write(dir.path(), 10, 0).unwrap();
CheckpointManager::write(dir.path(), 20, 0).unwrap();
let (seq, _) = CheckpointManager::read(dir.path()).unwrap().unwrap();
assert_eq!(seq, 20);
}
```
### WalHandle Integration Tests
```rust
#[test]
fn open_creates_wal_directory() { /* ... */ }
#[test]
fn append_returns_sequence_number() { /* ... */ }
#[test]
fn dedup_returns_zero() { /* ... */ }
#[test]
fn checkpoint_writes_file() { /* ... */ }
#[test]
fn close_and_reopen_continues_sequence() { /* ... */ }
#[test]
fn drop_shuts_down_cleanly() {
// WalHandle drops without explicit shutdown — no panic, no thread leak
let dir = tempfile::tempdir().unwrap();
let (handle, _) = WalHandle::open(test_config(dir.path())).unwrap();
drop(handle); // should not hang or panic
}
```
## Acceptance Criteria
- [x] `DedupWindow::check_and_insert()` returns `true` for duplicates, `false` for new events
- [x] Duplicate detection covers ~60-second window via double-buffer rotation
- [x] Zero false positives — no legitimate events are silently dropped
- [x] `DedupWindow::populate_from_events()` seeds the window from WAL replay
- [x] `CheckpointManager::write()` is atomic (temp file + rename on POSIX)
- [x] `CheckpointManager::read()` returns `None` for a fresh WAL with no checkpoint
- [x] `WalHandle::open()` returns `(handle, replayed_events)` where `replayed_events` contains all events since last checkpoint
- [x] `WalHandle::append()` returns `Ok(0)` for deduplicated events
- [x] `WalHandle::checkpoint()` does not go through the writer thread (no deadlock risk if writer is busy)
- [x] `WalHandle::truncate_before()` runs inside the writer thread (no race with active writes)
- [x] `impl Drop for WalHandle` provides best-effort shutdown without panicking
## Research References
- [docs/research/tidaldb_wal.md](../../../research/tidaldb_wal.md) — Section 6 (Approach 3: bounded sliding window dedup, DedupWindow implementation, memory analysis), Section 5 (checkpoint.meta format, checkpoint process with atomic write)
- [thoughts.md](../../../../thoughts.md) — Part II.1 (WAL convergence lessons from Engram/Citadel/StemeDB)
## Implementation Notes
- `blake3` is a direct dependency of the WAL module (`blake3 = "1"` in `Cargo.toml`). Already in the dependency plan per CODING_GUIDELINES.md.
- `crossbeam` is already a transitive dependency via fjall. Adding it as a direct dependency makes the version explicit and allows feature selection.
- The checkpoint file format (16 bytes binary) is simpler than JSON and trivially parsed. If schema evolution is ever needed, bump the format version (currently implied 1 by the read/write assumption).
- `WalHandle` does not implement `Clone` — there is exactly one writer thread. Use `Arc<WalHandle>` if shared across threads.