Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
174 lines
6.9 KiB
Markdown
174 lines
6.9 KiB
Markdown
# Task 02: Group Commit Writer
|
|
|
|
## Context
|
|
|
|
**Milestone:** 1 -- Signal Engine
|
|
**Phase:** m1p2 -- Write-Ahead Log
|
|
**Status:** COMPLETE
|
|
**Depends On:** Task 01 (wire format types, `SegmentWriter`, `WalError`)
|
|
**Blocks:** Task 04 (WalHandle public API depends on writer channel)
|
|
**Complexity:** M
|
|
|
|
## Objective
|
|
|
|
Implement the group commit writer thread that accumulates signal events from concurrent callers, forms batches by count or timeout, writes each batch to the current segment, fsyncs, and notifies all waiting callers with their assigned sequence numbers.
|
|
|
|
The group commit pattern amortizes fsync cost across concurrent writers. A dedicated thread owns the file handle — no concurrency on the write path. Callers send events through a bounded crossbeam channel and block on a per-caller reply channel until their batch is durably committed.
|
|
|
|
## Requirements
|
|
|
|
- Single writer thread owns the WAL file handle; no concurrent writes
|
|
- `crossbeam::channel::bounded(10_000)` for the command channel
|
|
- Batch accumulation: drain up to 100 events with `recv_deadline(10ms)` — batch fills at count limit or timeout
|
|
- One `fsync` per batch (not per event); called after `write_all`
|
|
- Each event receives a monotonically increasing u64 sequence number starting at 1
|
|
- Sequence number monotonicity survives segment rotation
|
|
- `WalCommand::Append { event, reply }` — reply channel receives `Result<u64, WalError>` (seq or `Ok(0)` for dedup)
|
|
- `WalCommand::TruncateBefore { before_seq, reply }` — deletes eligible segments from the writer thread (no race with writes)
|
|
- `WalCommand::Shutdown` — flush partial batch, fsync, exit cleanly
|
|
- `WriterConfig` carries: `dir`, `segment_size`, `batch_size`, `batch_timeout`, `dedup_window`
|
|
- `run_writer` is a free function taking `&Receiver<WalCommand>`, `&WriterConfig`, `SegmentWriter`, initial `next_seq`, and `DedupWindow`
|
|
|
|
## Technical Design
|
|
|
|
### Writer Loop
|
|
|
|
```rust
|
|
pub fn run_writer(
|
|
rx: &Receiver<WalCommand>,
|
|
config: &WriterConfig,
|
|
mut segment: SegmentWriter,
|
|
mut next_seq: u64,
|
|
mut dedup: DedupWindow,
|
|
) -> Result<(), WalError> {
|
|
let mut batch: Vec<(EventRecord, Sender<Result<u64, WalError>>)> = Vec::with_capacity(config.batch_size);
|
|
|
|
loop {
|
|
// Block until first command
|
|
match rx.recv() {
|
|
Ok(cmd) => handle_command(cmd, &mut batch, &mut dedup),
|
|
Err(_) => break, // channel closed
|
|
}
|
|
|
|
// Drain up to batch_size - 1 more with timeout
|
|
let deadline = Instant::now() + config.batch_timeout;
|
|
while batch.len() < config.batch_size {
|
|
match rx.recv_deadline(deadline) {
|
|
Ok(cmd) => handle_command(cmd, &mut batch, &mut dedup),
|
|
Err(RecvTimeoutError::Timeout) => break,
|
|
Err(RecvTimeoutError::Disconnected) => { /* drain and exit */ break }
|
|
}
|
|
}
|
|
|
|
// Flush batch if non-empty
|
|
if !batch.is_empty() {
|
|
flush_batch(&mut batch, &mut segment, config, &mut next_seq)?;
|
|
}
|
|
}
|
|
|
|
// Final flush on shutdown
|
|
if !batch.is_empty() {
|
|
flush_batch(&mut batch, &mut segment, config, &mut next_seq)?;
|
|
}
|
|
segment.flush()?;
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
### Batch Flush
|
|
|
|
```rust
|
|
fn flush_batch(
|
|
batch: &mut Vec<(EventRecord, Sender<Result<u64, WalError>>)>,
|
|
segment: &mut SegmentWriter,
|
|
config: &WriterConfig,
|
|
next_seq: &mut u64,
|
|
) -> Result<(), WalError> {
|
|
// Assign sequence numbers to non-dedup events
|
|
// Encode all events
|
|
// Encode batch header with BLAKE3
|
|
// Write batch bytes to segment (handles rotation)
|
|
// fsync
|
|
// Notify all waiters with their sequence numbers
|
|
}
|
|
```
|
|
|
|
### Segment Rotation
|
|
|
|
When `SegmentWriter::append_batch()` returns `true` (segment full), the writer:
|
|
1. Calls `segment.flush()` on the current segment (fsync already done per batch)
|
|
2. Creates a new `SegmentWriter` with `first_seq = next_seq`
|
|
3. Continues writing
|
|
|
|
### Deduplication Integration
|
|
|
|
Before adding an event to the batch, `DedupWindow::check_and_insert()` is called. If it returns `true` (duplicate), the event's reply channel gets `Ok(0)` immediately — the event does not join the batch.
|
|
|
|
## Test Strategy
|
|
|
|
### Integration Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn writer_fsyncs_per_batch() {
|
|
// Write 10 events to a WAL
|
|
// Verify the WAL file exists and is non-empty
|
|
// Verify contents are readable by WalReader
|
|
}
|
|
|
|
#[test]
|
|
fn writer_sequence_numbers_monotonic() {
|
|
// Spawn 4 threads, each appending 25 events concurrently
|
|
// Collect all 100 sequence numbers
|
|
// Assert they form a contiguous range [1..=100] with no duplicates
|
|
}
|
|
|
|
#[test]
|
|
fn writer_respects_segment_size() {
|
|
// Configure segment_size = 1024 (tiny)
|
|
// Write enough events to force multiple rotations
|
|
// Assert multiple segment files exist in the WAL dir
|
|
// Assert all events are readable in order across segments
|
|
}
|
|
|
|
#[test]
|
|
fn writer_shutdown_flushes_partial_batch() {
|
|
// Append 5 events (less than batch_size=100)
|
|
// Shutdown immediately
|
|
// Reopen and verify all 5 events are replayed
|
|
}
|
|
|
|
#[test]
|
|
fn truncate_before_deletes_old_segments() {
|
|
// Write events across 3 segments
|
|
// Checkpoint at the end of segment 2
|
|
// Truncate before segment 3's first seq
|
|
// Assert segments 1 and 2 are deleted, segment 3 remains
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] Batch accumulates up to 100 events or 10ms, then writes and fsyncs
|
|
- [x] One fsync per batch — not per event
|
|
- [x] Sequence numbers are monotonically increasing across the lifetime of the WAL
|
|
- [x] Concurrent appenders each receive the correct, unique sequence number for their event
|
|
- [x] Duplicate events receive `Ok(0)` — deduplicated before joining the batch
|
|
- [x] `Shutdown` command flushes any partial batch before the thread exits
|
|
- [x] Segment rotation is transparent — sequence numbers continue without reset
|
|
- [x] `TruncateBefore` runs inside the writer thread to prevent races with active writes
|
|
- [x] Channel capacity 10,000 provides backpressure under load without deadlock
|
|
|
|
## Research References
|
|
|
|
- [docs/research/tidaldb_wal.md](../../../research/tidaldb_wal.md) — Section 3 (Pattern 4: crossbeam-channel with recv_deadline, full implementation sketch, comparison table), Section 5 (segment rotation strategy)
|
|
- [thoughts.md](../../../../thoughts.md) — Part V.6 (group commit: batch fsync amortization)
|
|
|
|
## Implementation Notes
|
|
|
|
- Use `crossbeam::channel::bounded(1)` as the per-event reply channel — bounded to 1 because each append waits for exactly one reply.
|
|
- `recv_deadline` (not `recv_timeout`) — `recv_deadline` uses a fixed instant, so batch accumulation does not reset the timeout on each event.
|
|
- Do NOT hold the reply channel senders in the batch after sending replies. Memory leak if the writer thread is slow and batch grows large.
|
|
- The writer thread name is `"tidaldb-wal-writer"` — visible in `top`, `htop`, and crash backtraces.
|
|
- `std::thread::Builder::new().name(...).spawn(...)` is used (not bare `thread::spawn`) so the name appears in panic messages.
|