Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
247 lines
9.7 KiB
Markdown
247 lines
9.7 KiB
Markdown
# Task 04: Deduplication, Checkpoint, and WalHandle Public API
|
|
|
|
## Context
|
|
|
|
**Milestone:** 1 -- Signal Engine
|
|
**Phase:** m1p2 -- Write-Ahead Log
|
|
**Status:** COMPLETE
|
|
**Depends On:** Task 02 (writer channel types), Task 03 (`recover()`)
|
|
**Blocks:** m1p4 (Signal Ledger uses `WalHandle` as its durability backend)
|
|
**Complexity:** M
|
|
|
|
## Objective
|
|
|
|
Deliver three components that complete the WAL:
|
|
|
|
1. **`DedupWindow`** — a double-buffered `HashSet<u128>` that detects duplicate signal events within a 60-second window using the first 128 bits of each event's BLAKE3 hash. Zero false positives. Bounded memory.
|
|
|
|
2. **`CheckpointManager`** — reads and writes `checkpoint.meta`, the small JSON-like file that records the last-materialized sequence number. Enables recovery to skip already-materialized events.
|
|
|
|
3. **`WalHandle`** — the public API: `open()`, `append()`, `checkpoint()`, `truncate_before()`, `shutdown()`. The entry point for m1p4 (Signal Ledger) and m1p5 (Entity CRUD API).
|
|
|
|
## Requirements
|
|
|
|
### DedupWindow
|
|
|
|
- Two `HashSet<u128>` buffers, alternating every `window_duration` (default 30s)
|
|
- Effective dedup coverage: ~60 seconds (current + previous window)
|
|
- Hash key: first 16 bytes (128 bits) of `blake3::hash(event_bytes)` interpreted as `u128` little-endian
|
|
- `check_and_insert(event_bytes: &[u8]) -> bool` — returns `true` if duplicate
|
|
- `populate_from_events(events: Vec<EventRecord>)` — bulk-insert on startup from replayed events
|
|
- `maybe_rotate()` — called on each `check_and_insert`; swaps buffers when `rotation_time.elapsed() > window_duration` and clears the old current
|
|
|
|
### CheckpointManager
|
|
|
|
- `checkpoint.meta` is a simple binary file: `[sequence: u64 LE][timestamp_nanos: u64 LE]` (16 bytes)
|
|
- `CheckpointManager::write(dir, seq, timestamp_nanos)` — writes atomically (write to temp file, fsync, rename)
|
|
- `CheckpointManager::read(dir) -> Result<Option<(u64, u64)>, WalError>` — `None` if file does not exist
|
|
- File corruption (wrong size) returns `WalError::Corruption`
|
|
|
|
### WalHandle
|
|
|
|
- `WalHandle::open(config: WalConfig) -> Result<(Self, Vec<SignalEvent>), WalError>`
|
|
- Creates `{config.dir}/wal/` if absent
|
|
- Calls `recover()`, initializes `DedupWindow` from replayed events
|
|
- Finds or creates current segment
|
|
- Spawns writer thread via `std::thread::Builder::new().name("tidaldb-wal-writer")`
|
|
- Returns `(handle, replayed_events)` — replayed events are for m1p4 to feed into the signal materializer
|
|
- `WalHandle::append(event: SignalEvent) -> Result<u64, WalError>` — blocks until durably committed
|
|
- `WalHandle::checkpoint(seq: u64) -> Result<(), WalError>` — writes checkpoint.meta directly (no writer thread round-trip)
|
|
- `WalHandle::truncate_before(seq: u64) -> Result<(), WalError>` — dispatches `WalCommand::TruncateBefore` to writer thread
|
|
- `WalHandle::shutdown(self) -> Result<(), WalError>` — sends `WalCommand::Shutdown`, joins writer thread
|
|
- `impl Drop for WalHandle` — best-effort shutdown if not already shut down (ignores errors)
|
|
- `WalHandle: Send + Sync` — the `Sender<WalCommand>` is `Send + Sync`
|
|
|
|
## Technical Design
|
|
|
|
### DedupWindow
|
|
|
|
```rust
|
|
pub struct DedupWindow {
|
|
current: HashSet<u128>,
|
|
previous: HashSet<u128>,
|
|
rotation_time: Instant,
|
|
window: Duration,
|
|
}
|
|
|
|
impl DedupWindow {
|
|
pub fn new(window: Duration) -> Self;
|
|
|
|
pub fn check_and_insert(&mut self, event_bytes: &[u8]) -> bool {
|
|
self.maybe_rotate();
|
|
let hash = self.hash(event_bytes);
|
|
if self.current.contains(&hash) || self.previous.contains(&hash) {
|
|
return true; // duplicate
|
|
}
|
|
self.current.insert(hash);
|
|
false
|
|
}
|
|
|
|
pub fn populate_from_events(&mut self, events: Vec<EventRecord>) {
|
|
for e in events {
|
|
let bytes = e.encode();
|
|
let hash = self.hash(&bytes);
|
|
self.current.insert(hash);
|
|
}
|
|
}
|
|
|
|
fn hash(&self, event_bytes: &[u8]) -> u128 {
|
|
u128::from_le_bytes(
|
|
blake3::hash(event_bytes).as_bytes()[..16].try_into().unwrap()
|
|
)
|
|
}
|
|
|
|
fn maybe_rotate(&mut self) {
|
|
if self.rotation_time.elapsed() > self.window {
|
|
std::mem::swap(&mut self.current, &mut self.previous);
|
|
self.current.clear();
|
|
self.rotation_time = Instant::now();
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Memory at 10K events/sec:** ~300K entries/window * 16 bytes * 2 windows + HashSet overhead ≈ 19 MB
|
|
**Memory at 100K events/sec:** ~3M entries/window * 16 bytes * 2 ≈ 144 MB
|
|
|
|
### CheckpointManager
|
|
|
|
```rust
|
|
pub struct CheckpointManager;
|
|
|
|
impl CheckpointManager {
|
|
pub fn write(dir: &Path, seq: u64, timestamp_nanos: u64) -> Result<(), WalError> {
|
|
// Write to temp file, fsync, rename (atomic on POSIX)
|
|
}
|
|
|
|
pub fn read(dir: &Path) -> Result<Option<(u64, u64)>, WalError> {
|
|
// Returns None if checkpoint.meta does not exist
|
|
// Returns Corruption if file is wrong size
|
|
}
|
|
}
|
|
```
|
|
|
|
## Test Strategy
|
|
|
|
### DedupWindow Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn dedup_detects_duplicate() {
|
|
let mut window = DedupWindow::new(Duration::from_secs(30));
|
|
let bytes = [1u8; 21];
|
|
assert!(!window.check_and_insert(&bytes)); // first: not duplicate
|
|
assert!(window.check_and_insert(&bytes)); // second: duplicate
|
|
}
|
|
|
|
#[test]
|
|
fn dedup_different_events_not_duplicates() {
|
|
let mut window = DedupWindow::new(Duration::from_secs(30));
|
|
assert!(!window.check_and_insert(&[1u8; 21]));
|
|
assert!(!window.check_and_insert(&[2u8; 21]));
|
|
}
|
|
|
|
#[test]
|
|
fn dedup_rotation_clears_old_events() {
|
|
let mut window = DedupWindow::new(Duration::from_millis(10));
|
|
let bytes = [1u8; 21];
|
|
window.check_and_insert(&bytes);
|
|
std::thread::sleep(Duration::from_millis(11)); // trigger rotation
|
|
// After one rotation: event is in "previous" -- still caught
|
|
assert!(window.check_and_insert(&bytes));
|
|
std::thread::sleep(Duration::from_millis(11)); // trigger second rotation
|
|
// After two rotations: event has left both windows
|
|
assert!(!window.check_and_insert(&bytes));
|
|
}
|
|
|
|
#[test]
|
|
fn dedup_populate_from_events_seeds_correctly() {
|
|
let mut window = DedupWindow::new(Duration::from_secs(30));
|
|
let events = vec![EventRecord { entity_id: 1, signal_type: 1, weight: 1.0, timestamp_nanos: 0 }];
|
|
window.populate_from_events(events);
|
|
let bytes = EventRecord { entity_id: 1, signal_type: 1, weight: 1.0, timestamp_nanos: 0 }.encode();
|
|
assert!(window.check_and_insert(&bytes)); // seeded event is detected as duplicate
|
|
}
|
|
```
|
|
|
|
### CheckpointManager Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn checkpoint_read_returns_none_if_absent() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
assert!(CheckpointManager::read(dir.path()).unwrap().is_none());
|
|
}
|
|
|
|
#[test]
|
|
fn checkpoint_write_then_read_roundtrip() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
CheckpointManager::write(dir.path(), 42, 1_700_000_000_000_000_000).unwrap();
|
|
let result = CheckpointManager::read(dir.path()).unwrap().unwrap();
|
|
assert_eq!(result.0, 42);
|
|
assert_eq!(result.1, 1_700_000_000_000_000_000);
|
|
}
|
|
|
|
#[test]
|
|
fn checkpoint_overwrites_previous() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
CheckpointManager::write(dir.path(), 10, 0).unwrap();
|
|
CheckpointManager::write(dir.path(), 20, 0).unwrap();
|
|
let (seq, _) = CheckpointManager::read(dir.path()).unwrap().unwrap();
|
|
assert_eq!(seq, 20);
|
|
}
|
|
```
|
|
|
|
### WalHandle Integration Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn open_creates_wal_directory() { /* ... */ }
|
|
|
|
#[test]
|
|
fn append_returns_sequence_number() { /* ... */ }
|
|
|
|
#[test]
|
|
fn dedup_returns_zero() { /* ... */ }
|
|
|
|
#[test]
|
|
fn checkpoint_writes_file() { /* ... */ }
|
|
|
|
#[test]
|
|
fn close_and_reopen_continues_sequence() { /* ... */ }
|
|
|
|
#[test]
|
|
fn drop_shuts_down_cleanly() {
|
|
// WalHandle drops without explicit shutdown — no panic, no thread leak
|
|
let dir = tempfile::tempdir().unwrap();
|
|
let (handle, _) = WalHandle::open(test_config(dir.path())).unwrap();
|
|
drop(handle); // should not hang or panic
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] `DedupWindow::check_and_insert()` returns `true` for duplicates, `false` for new events
|
|
- [x] Duplicate detection covers ~60-second window via double-buffer rotation
|
|
- [x] Zero false positives — no legitimate events are silently dropped
|
|
- [x] `DedupWindow::populate_from_events()` seeds the window from WAL replay
|
|
- [x] `CheckpointManager::write()` is atomic (temp file + rename on POSIX)
|
|
- [x] `CheckpointManager::read()` returns `None` for a fresh WAL with no checkpoint
|
|
- [x] `WalHandle::open()` returns `(handle, replayed_events)` where `replayed_events` contains all events since last checkpoint
|
|
- [x] `WalHandle::append()` returns `Ok(0)` for deduplicated events
|
|
- [x] `WalHandle::checkpoint()` does not go through the writer thread (no deadlock risk if writer is busy)
|
|
- [x] `WalHandle::truncate_before()` runs inside the writer thread (no race with active writes)
|
|
- [x] `impl Drop for WalHandle` provides best-effort shutdown without panicking
|
|
|
|
## Research References
|
|
|
|
- [docs/research/tidaldb_wal.md](../../../research/tidaldb_wal.md) — Section 6 (Approach 3: bounded sliding window dedup, DedupWindow implementation, memory analysis), Section 5 (checkpoint.meta format, checkpoint process with atomic write)
|
|
- [thoughts.md](../../../../thoughts.md) — Part II.1 (WAL convergence lessons from Engram/Citadel/StemeDB)
|
|
|
|
## Implementation Notes
|
|
|
|
- `blake3` is a direct dependency of the WAL module (`blake3 = "1"` in `Cargo.toml`). Already in the dependency plan per CODING_GUIDELINES.md.
|
|
- `crossbeam` is already a transitive dependency via fjall. Adding it as a direct dependency makes the version explicit and allows feature selection.
|
|
- The checkpoint file format (16 bytes binary) is simpler than JSON and trivially parsed. If schema evolution is ever needed, bump the format version (currently implied 1 by the read/write assumption).
|
|
- `WalHandle` does not implement `Clone` — there is exactly one writer thread. Use `Arc<WalHandle>` if shared across threads.
|