Implements the foundation of tidalDB's data pipeline: **Phase 1 – Schema primitives** - EntityId newtype (u64, big-endian ordering) - SignalTypeDefinition with pre-computed decay λ, deduped/sorted windows - SchemaBuilder with full constraint validation (duplicates, identifiers, half-life, windows, velocity) - LumenError wrapping all subsystems with required From impls **Phase 2 – Write-Ahead Log** - Length-prefixed, BLAKE3-protected entry format - Group-commit writer (batch up to 100 events / 10 ms) - Double-buffered content-hash deduplication - Checkpoint, truncation, and crash-recovery with full replay - Integration, property, and UAT tests (incl. 5,500-event deterministic UAT) - Proptest coverage scaled to 10 000 events/run (was ≤500) to meet acceptance criterion; cases reduced 100→10 to keep runtime comparable **Phase 3 – Storage engine** - StorageEngine trait (get/put/delete/scan/batch/flush) - Key encoding: [EntityId][0x00][Tag][suffix] with ordering/prefix helpers - InMemoryBackend (BTreeMap + RwLock) - FjallStorage with three isolated keyspaces and atomic batch helper - Property tests for key ordering and round-trip correctness Also adds planning docs for phases 4-5, research docs, architecture overview, and roadmap updates. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
222 lines
8.8 KiB
Markdown
222 lines
8.8 KiB
Markdown
# Task 01: WAL Wire Format and Segment Files
|
|
|
|
## Context
|
|
|
|
**Milestone:** 1 -- Signal Engine
|
|
**Phase:** m1p2 -- Write-Ahead Log
|
|
**Status:** COMPLETE
|
|
**Depends On:** None
|
|
**Blocks:** Task 02 (Group Commit Writer), Task 03 (Crash Recovery and Replay)
|
|
**Complexity:** M
|
|
|
|
## Objective
|
|
|
|
Define the on-disk binary format for WAL batches and event records, implement the segment file writer that manages 16 MB rotating files, and define the `WalError` type. This is the foundation everything else builds on — the format dictates how writers produce batches, how readers parse them, and how crash recovery validates them.
|
|
|
|
The key design decision (already resolved in `docs/research/tidaldb_wal.md`) is batch-oriented framing: frame entire batches rather than individual events. A 64-byte cache-line-aligned header with BLAKE3 checksum, followed by tightly-packed 21-byte event records. This matches the group-commit write path exactly and amortizes both checksum and fsync cost across 100 events per batch.
|
|
|
|
## Requirements
|
|
|
|
- `BatchHeader` is exactly 64 bytes (`#[repr(C)]`, compile-time assertion)
|
|
- Magic bytes `0x54494C44` ("TIDL") at offset 0 for human-readable crash dumps
|
|
- BLAKE3 hash at bytes [32..64] covers `header[0..32] || all_event_bytes` — NOT the hash field itself
|
|
- `EventRecord` is exactly 21 bytes, little-endian throughout: entity_id (u64), signal_type (u8), weight (f32), timestamp_nanos (u64)
|
|
- `SegmentWriter` opens or creates a segment file and appends batches
|
|
- Segment files named `wal-{first_seq:020}.seg` — zero-padded 20-digit, lexicographic = numeric order
|
|
- `list_segments(dir)` returns `Vec<(first_seq, PathBuf)>` sorted by first sequence number
|
|
- `WalError` covers: `Io(std::io::Error)`, `Corruption(String)`, `Closed`, `SendFailed`, `ShutdownFailed`
|
|
|
|
## Technical Design
|
|
|
|
### Wire Format
|
|
|
|
```
|
|
BATCH FRAME:
|
|
+==========================================================================+
|
|
| Offset | Size | Field | Encoding | Notes |
|
|
+--------+------+---------------------+------------------+----------------+
|
|
| 0 | 4 | Magic | [0x54,0x49,0x4C,0x44] | "TIDL" |
|
|
| 4 | 1 | Version | u8 | Currently 1 |
|
|
| 5 | 1 | Flags | u8 | Reserved (0) |
|
|
| 6 | 2 | Event count | u16 LE | 1..=65535 |
|
|
| 8 | 8 | First sequence no. | u64 LE | Monotonic |
|
|
| 16 | 8 | Batch timestamp | u64 LE | Nanos epoch |
|
|
| 24 | 4 | Payload byte length | u32 LE | count * 21 |
|
|
| 28 | 4 | Reserved | [0u8; 4] | Future use |
|
|
| 32 | 32 | BLAKE3 checksum | [u8; 32] | See below |
|
|
+--------+------+---------------------+------------------+----------------+
|
|
| 64 | N*21 | Event records | packed structs | |
|
|
+==========================================================================+
|
|
|
|
BLAKE3 INPUT: blake3(header[0..32] || event_bytes[..])
|
|
(hash covers magic through reserved; the hash field [32..64] is excluded)
|
|
|
|
EVENT RECORD (21 bytes each, tightly packed):
|
|
| Offset | Size | Field | Encoding |
|
|
|--------|------|----------------|-----------|
|
|
| 0 | 8 | Entity ID | u64 LE |
|
|
| 8 | 1 | Signal type | u8 |
|
|
| 9 | 4 | Weight | f32 LE |
|
|
| 13 | 8 | Timestamp nanos| u64 LE |
|
|
```
|
|
|
|
### Module Structure
|
|
|
|
```
|
|
tidal/src/wal/
|
|
format.rs -- BatchHeader, EventRecord: encode/decode
|
|
segment.rs -- SegmentWriter, list_segments
|
|
error.rs -- WalError
|
|
```
|
|
|
|
### Public API Surface
|
|
|
|
```rust
|
|
// === format.rs ===
|
|
|
|
pub const MAGIC: [u8; 4] = [0x54, 0x49, 0x4C, 0x44]; // "TIDL"
|
|
pub const HEADER_SIZE: usize = 64;
|
|
pub const EVENT_SIZE: usize = 21;
|
|
pub const FORMAT_VERSION: u8 = 1;
|
|
|
|
#[derive(Debug, Clone, PartialEq)]
|
|
pub struct BatchHeader {
|
|
pub event_count: u16,
|
|
pub first_seq: u64,
|
|
pub batch_timestamp_nanos: u64,
|
|
pub payload_len: u32,
|
|
pub checksum: [u8; 32],
|
|
}
|
|
|
|
impl BatchHeader {
|
|
pub fn encode(&self) -> [u8; HEADER_SIZE];
|
|
pub fn decode(bytes: &[u8; HEADER_SIZE]) -> Result<Self, WalError>;
|
|
pub fn compute_checksum(header_prefix: &[u8; 32], events: &[u8]) -> [u8; 32];
|
|
}
|
|
|
|
#[derive(Debug, Clone, PartialEq)]
|
|
pub struct EventRecord {
|
|
pub entity_id: u64,
|
|
pub signal_type: u8,
|
|
pub weight: f32,
|
|
pub timestamp_nanos: u64,
|
|
}
|
|
|
|
impl EventRecord {
|
|
pub fn encode(&self) -> [u8; EVENT_SIZE];
|
|
pub fn decode(bytes: &[u8; EVENT_SIZE]) -> Self;
|
|
}
|
|
|
|
// === segment.rs ===
|
|
|
|
pub struct SegmentWriter { /* file handle, current size, segment_size limit */ }
|
|
|
|
impl SegmentWriter {
|
|
pub fn open(dir: &Path, first_seq: u64, segment_size: u64) -> Result<Self, WalError>;
|
|
/// Append raw batch bytes. Returns true if segment is now full.
|
|
pub fn append_batch(&mut self, bytes: &[u8]) -> Result<bool, WalError>;
|
|
pub fn flush(&mut self) -> Result<(), WalError>;
|
|
pub fn segment_size(&self) -> u64;
|
|
pub fn current_size(&self) -> u64;
|
|
}
|
|
|
|
pub fn segment_path(dir: &Path, first_seq: u64) -> PathBuf;
|
|
pub fn list_segments(dir: &Path) -> Result<Vec<(u64, PathBuf)>, WalError>;
|
|
```
|
|
|
|
## Test Strategy
|
|
|
|
### Unit Tests
|
|
|
|
```rust
|
|
#[test]
|
|
fn batch_header_roundtrip() {
|
|
let header = BatchHeader {
|
|
event_count: 42,
|
|
first_seq: 1000,
|
|
batch_timestamp_nanos: 1_700_000_000_000_000_000,
|
|
payload_len: 42 * 21,
|
|
checksum: [0xAB; 32],
|
|
};
|
|
let encoded = header.encode();
|
|
let decoded = BatchHeader::decode(&encoded).unwrap();
|
|
assert_eq!(header, decoded);
|
|
}
|
|
|
|
#[test]
|
|
fn event_record_roundtrip() {
|
|
let event = EventRecord { entity_id: 999, signal_type: 3, weight: 2.5, timestamp_nanos: 42_000_000_000 };
|
|
let encoded = event.encode();
|
|
let decoded = EventRecord::decode(&encoded);
|
|
assert_eq!(decoded.entity_id, 999);
|
|
assert_eq!(decoded.weight.to_bits(), 2.5_f32.to_bits());
|
|
}
|
|
|
|
#[test]
|
|
fn magic_bytes_in_header() {
|
|
let header = BatchHeader { event_count: 1, first_seq: 1, batch_timestamp_nanos: 0, payload_len: 21, checksum: [0u8; 32] };
|
|
let encoded = header.encode();
|
|
assert_eq!(&encoded[0..4], &[0x54, 0x49, 0x4C, 0x44]);
|
|
}
|
|
|
|
#[test]
|
|
fn segment_naming_is_ordered() {
|
|
let p1 = segment_path(Path::new("/tmp"), 1);
|
|
let p2 = segment_path(Path::new("/tmp"), 1000);
|
|
// Lexicographic order matches numeric order
|
|
assert!(p1.file_name() < p2.file_name());
|
|
}
|
|
|
|
#[test]
|
|
fn list_segments_returns_sorted() {
|
|
let dir = tempfile::tempdir().unwrap();
|
|
// Create segment files out of order
|
|
std::fs::write(segment_path(dir.path(), 200), b"").unwrap();
|
|
std::fs::write(segment_path(dir.path(), 1), b"").unwrap();
|
|
std::fs::write(segment_path(dir.path(), 100), b"").unwrap();
|
|
let segments = list_segments(dir.path()).unwrap();
|
|
assert_eq!(segments[0].0, 1);
|
|
assert_eq!(segments[1].0, 100);
|
|
assert_eq!(segments[2].0, 200);
|
|
}
|
|
|
|
#[test]
|
|
fn header_decode_rejects_wrong_magic() {
|
|
let mut bytes = [0u8; 64];
|
|
bytes[0] = 0xFF; // wrong magic
|
|
assert!(BatchHeader::decode(&bytes).is_err());
|
|
}
|
|
|
|
#[test]
|
|
fn header_decode_rejects_wrong_version() {
|
|
let mut bytes = [0u8; 64];
|
|
bytes[0..4].copy_from_slice(&[0x54, 0x49, 0x4C, 0x44]); // correct magic
|
|
bytes[4] = 99; // wrong version
|
|
assert!(BatchHeader::decode(&bytes).is_err());
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] `BatchHeader` encodes to exactly 64 bytes (compile-time assertion)
|
|
- [x] `EventRecord` encodes to exactly 21 bytes (compile-time assertion)
|
|
- [x] Magic bytes `0x54494C44` appear at bytes [0..4] of every encoded header
|
|
- [x] BLAKE3 checksum covers `header[0..32] || event_bytes` (excludes the hash field itself)
|
|
- [x] `BatchHeader::decode()` returns `WalError::Corruption` on wrong magic or unknown version
|
|
- [x] `EventRecord::encode`/`decode` roundtrip is lossless for all finite f32 weights
|
|
- [x] Segment files are named `wal-{seq:020}.seg`; `list_segments()` returns them sorted ascending
|
|
- [x] `SegmentWriter::append_batch()` writes raw bytes and returns `true` when the segment has exceeded its size limit
|
|
- [x] All little-endian encoding — no byte-swap cost on x86/ARM
|
|
- [x] `cargo clippy -D warnings` passes
|
|
|
|
## Research References
|
|
|
|
- [docs/research/tidaldb_wal.md](../../../research/tidaldb_wal.md) — Section 1 (Approach 3: batch-oriented framing with wire format table), Section 5 (segment rotation at 16 MB, naming convention)
|
|
|
|
## Implementation Notes
|
|
|
|
- `payload_len` is always `event_count * 21`. The redundancy allows Phase 1 crash validation (check bounds before computing BLAKE3) without reading the event data.
|
|
- The hash field at `header[32..64]` is written AFTER computing the hash. The hash input uses a zeroed header suffix — equivalently, it hashes `header[0..32] || events`.
|
|
- `f32::to_bits()` / `f32::from_bits()` are used for weight encoding — safe, const, and exact. Never cast f32 to u32 via `as`.
|
|
- Segment files do not need pre-allocation in m1p2. Defer `fallocate` until disk write performance is a measured bottleneck.
|