Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.9 KiB
Task 02: Group Commit Writer
Context
Milestone: 1 -- Signal Engine
Phase: m1p2 -- Write-Ahead Log
Status: COMPLETE
Depends On: Task 01 (wire format types, SegmentWriter, WalError)
Blocks: Task 04 (WalHandle public API depends on writer channel)
Complexity: M
Objective
Implement the group commit writer thread that accumulates signal events from concurrent callers, forms batches by count or timeout, writes each batch to the current segment, fsyncs, and notifies all waiting callers with their assigned sequence numbers.
The group commit pattern amortizes fsync cost across concurrent writers. A dedicated thread owns the file handle — no concurrency on the write path. Callers send events through a bounded crossbeam channel and block on a per-caller reply channel until their batch is durably committed.
Requirements
- Single writer thread owns the WAL file handle; no concurrent writes
crossbeam::channel::bounded(10_000)for the command channel- Batch accumulation: drain up to 100 events with
recv_deadline(10ms)— batch fills at count limit or timeout - One
fsyncper batch (not per event); called afterwrite_all - Each event receives a monotonically increasing u64 sequence number starting at 1
- Sequence number monotonicity survives segment rotation
WalCommand::Append { event, reply }— reply channel receivesResult<u64, WalError>(seq orOk(0)for dedup)WalCommand::TruncateBefore { before_seq, reply }— deletes eligible segments from the writer thread (no race with writes)WalCommand::Shutdown— flush partial batch, fsync, exit cleanlyWriterConfigcarries:dir,segment_size,batch_size,batch_timeout,dedup_windowrun_writeris a free function taking&Receiver<WalCommand>,&WriterConfig,SegmentWriter, initialnext_seq, andDedupWindow
Technical Design
Writer Loop
pub fn run_writer(
rx: &Receiver<WalCommand>,
config: &WriterConfig,
mut segment: SegmentWriter,
mut next_seq: u64,
mut dedup: DedupWindow,
) -> Result<(), WalError> {
let mut batch: Vec<(EventRecord, Sender<Result<u64, WalError>>)> = Vec::with_capacity(config.batch_size);
loop {
// Block until first command
match rx.recv() {
Ok(cmd) => handle_command(cmd, &mut batch, &mut dedup),
Err(_) => break, // channel closed
}
// Drain up to batch_size - 1 more with timeout
let deadline = Instant::now() + config.batch_timeout;
while batch.len() < config.batch_size {
match rx.recv_deadline(deadline) {
Ok(cmd) => handle_command(cmd, &mut batch, &mut dedup),
Err(RecvTimeoutError::Timeout) => break,
Err(RecvTimeoutError::Disconnected) => { /* drain and exit */ break }
}
}
// Flush batch if non-empty
if !batch.is_empty() {
flush_batch(&mut batch, &mut segment, config, &mut next_seq)?;
}
}
// Final flush on shutdown
if !batch.is_empty() {
flush_batch(&mut batch, &mut segment, config, &mut next_seq)?;
}
segment.flush()?;
Ok(())
}
Batch Flush
fn flush_batch(
batch: &mut Vec<(EventRecord, Sender<Result<u64, WalError>>)>,
segment: &mut SegmentWriter,
config: &WriterConfig,
next_seq: &mut u64,
) -> Result<(), WalError> {
// Assign sequence numbers to non-dedup events
// Encode all events
// Encode batch header with BLAKE3
// Write batch bytes to segment (handles rotation)
// fsync
// Notify all waiters with their sequence numbers
}
Segment Rotation
When SegmentWriter::append_batch() returns true (segment full), the writer:
- Calls
segment.flush()on the current segment (fsync already done per batch) - Creates a new
SegmentWriterwithfirst_seq = next_seq - Continues writing
Deduplication Integration
Before adding an event to the batch, DedupWindow::check_and_insert() is called. If it returns true (duplicate), the event's reply channel gets Ok(0) immediately — the event does not join the batch.
Test Strategy
Integration Tests
#[test]
fn writer_fsyncs_per_batch() {
// Write 10 events to a WAL
// Verify the WAL file exists and is non-empty
// Verify contents are readable by WalReader
}
#[test]
fn writer_sequence_numbers_monotonic() {
// Spawn 4 threads, each appending 25 events concurrently
// Collect all 100 sequence numbers
// Assert they form a contiguous range [1..=100] with no duplicates
}
#[test]
fn writer_respects_segment_size() {
// Configure segment_size = 1024 (tiny)
// Write enough events to force multiple rotations
// Assert multiple segment files exist in the WAL dir
// Assert all events are readable in order across segments
}
#[test]
fn writer_shutdown_flushes_partial_batch() {
// Append 5 events (less than batch_size=100)
// Shutdown immediately
// Reopen and verify all 5 events are replayed
}
#[test]
fn truncate_before_deletes_old_segments() {
// Write events across 3 segments
// Checkpoint at the end of segment 2
// Truncate before segment 3's first seq
// Assert segments 1 and 2 are deleted, segment 3 remains
}
Acceptance Criteria
- Batch accumulates up to 100 events or 10ms, then writes and fsyncs
- One fsync per batch — not per event
- Sequence numbers are monotonically increasing across the lifetime of the WAL
- Concurrent appenders each receive the correct, unique sequence number for their event
- Duplicate events receive
Ok(0)— deduplicated before joining the batch Shutdowncommand flushes any partial batch before the thread exits- Segment rotation is transparent — sequence numbers continue without reset
TruncateBeforeruns inside the writer thread to prevent races with active writes- Channel capacity 10,000 provides backpressure under load without deadlock
Research References
- docs/research/tidaldb_wal.md — Section 3 (Pattern 4: crossbeam-channel with recv_deadline, full implementation sketch, comparison table), Section 5 (segment rotation strategy)
- thoughts.md — Part V.6 (group commit: batch fsync amortization)
Implementation Notes
- Use
crossbeam::channel::bounded(1)as the per-event reply channel — bounded to 1 because each append waits for exactly one reply. recv_deadline(notrecv_timeout) —recv_deadlineuses a fixed instant, so batch accumulation does not reset the timeout on each event.- Do NOT hold the reply channel senders in the batch after sending replies. Memory leak if the writer thread is slow and batch grows large.
- The writer thread name is
"tidaldb-wal-writer"— visible intop,htop, and crash backtraces. std::thread::Builder::new().name(...).spawn(...)is used (not barethread::spawn) so the name appears in panic messages.