tidaldb/docs/planning/milestone-8/phase-1/task-03-batch-header-v2.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

121 lines
4.7 KiB
Markdown

# Task 03: BatchHeader v2
## Delivers
Extend `BatchHeader` in `tidal/src/wal/format/batch.rs` to v2 format with `shard_id` and `region_id` fields at bytes 58-61; update encode/decode; ensure v1 backward compatibility (zeros decode as shard 0, region 0). Bumps `FORMAT_VERSION` to 2.
## Complexity: S
## Dependencies
- Task 01 (ShardId, RegionId types)
## Technical Design
The existing `BatchHeader` is 64 bytes. The current layout (from WAL research doc):
```
Bytes 0-3: MAGIC (0x54494441 = "TIDA")
Bytes 4-7: FORMAT_VERSION (u32 LE)
Bytes 8-15: first_seq (u64 LE)
Bytes 16-23: last_seq (u64 LE)
Bytes 24-31: event_count (u64 LE)
Bytes 32-39: uncompressed_size (u64 LE)
Bytes 40-47: compressed_size (u64 LE)
Bytes 48-55: timestamp_ns (u64 LE)
Bytes 56-59: checksum (u32 LE) <- BLAKE3 first 4 bytes
Bytes 60-61: [RESERVED / ZERO]
Bytes 62-63: [RESERVED / ZERO]
```
v2 adds `shard_id` and `region_id` at the zero-padded bytes:
```
Bytes 56-59: checksum (u32 LE)
Bytes 60-61: shard_id (u16 LE) <- NEW in v2 (was zero padding in v1)
Bytes 62-63: region_id (u16 LE) <- NEW in v2 (was zero padding in v1)
```
This is backward compatible: v1 always wrote zeros at 60-63, so v2 code reading v1 segments correctly interprets shard_id=0, region_id=0.
```rust
// tidal/src/wal/format/batch.rs
pub const FORMAT_VERSION_V1: u32 = 1;
pub const FORMAT_VERSION_V2: u32 = 2;
pub const FORMAT_VERSION: u32 = FORMAT_VERSION_V2;
#[derive(Debug, Clone, PartialEq)]
pub struct BatchHeader {
pub first_seq: u64,
pub last_seq: u64,
pub event_count: u64,
pub uncompressed_size: u64,
pub compressed_size: u64,
pub timestamp_ns: u64,
pub checksum: u32,
// v2 fields -- default to 0 for single-node deployments
pub shard_id: ShardId,
pub region_id: RegionId,
}
impl BatchHeader {
/// Encode to the 64-byte wire format.
pub fn encode(&self) -> [u8; 64] {
let mut buf = [0u8; 64];
buf[0..4].copy_from_slice(&MAGIC.to_le_bytes());
buf[4..8].copy_from_slice(&FORMAT_VERSION.to_le_bytes());
buf[8..16].copy_from_slice(&self.first_seq.to_le_bytes());
buf[16..24].copy_from_slice(&self.last_seq.to_le_bytes());
buf[24..32].copy_from_slice(&self.event_count.to_le_bytes());
buf[32..40].copy_from_slice(&self.uncompressed_size.to_le_bytes());
buf[40..48].copy_from_slice(&self.compressed_size.to_le_bytes());
buf[48..56].copy_from_slice(&self.timestamp_ns.to_le_bytes());
buf[56..60].copy_from_slice(&self.checksum.to_le_bytes());
buf[60..62].copy_from_slice(&self.shard_id.0.to_le_bytes());
buf[62..64].copy_from_slice(&self.region_id.0.to_le_bytes());
buf
}
/// Decode from a 64-byte buffer.
///
/// Accepts both v1 (shard_id=0, region_id=0) and v2 format.
pub fn decode(buf: &[u8; 64]) -> Result<Self, WalError> {
let magic = u32::from_le_bytes(buf[0..4].try_into().unwrap());
if magic != MAGIC {
return Err(WalError::Corruption("bad magic".into()));
}
let version = u32::from_le_bytes(buf[4..8].try_into().unwrap());
if version != FORMAT_VERSION_V1 && version != FORMAT_VERSION_V2 {
return Err(WalError::Corruption(format!("unknown version {version}")));
}
let shard_id = ShardId(u16::from_le_bytes(buf[60..62].try_into().unwrap()));
let region_id = RegionId(u16::from_le_bytes(buf[62..64].try_into().unwrap()));
Ok(Self {
first_seq: u64::from_le_bytes(buf[8..16].try_into().unwrap()),
last_seq: u64::from_le_bytes(buf[16..24].try_into().unwrap()),
event_count: u64::from_le_bytes(buf[24..32].try_into().unwrap()),
uncompressed_size: u64::from_le_bytes(buf[32..40].try_into().unwrap()),
compressed_size: u64::from_le_bytes(buf[40..48].try_into().unwrap()),
timestamp_ns: u64::from_le_bytes(buf[48..56].try_into().unwrap()),
checksum: u32::from_le_bytes(buf[56..60].try_into().unwrap()),
shard_id,
region_id,
})
}
}
```
## Acceptance Criteria
- [ ] `BatchHeader` has `shard_id: ShardId` and `region_id: RegionId` fields
- [ ] `BatchHeader::encode()` writes shard_id at bytes 60-61 (LE) and region_id at bytes 62-63 (LE)
- [ ] `BatchHeader::decode()` reads these bytes; v1 batches (zeros at 60-63) decode as `ShardId(0)`, `RegionId(0)`
- [ ] `FORMAT_VERSION` is bumped to 2; v1 reader accepts v1 and v2 version bytes
- [ ] Property test: encode + decode roundtrips for random shard_id, region_id values
- [ ] Property test: a buffer created with v1 code (shard bytes zeroed) decodes correctly
- [ ] All existing WAL tests pass (write/read/recovery) -- single-node uses shard=0, region=0 by default
- [ ] `cargo clippy -D warnings` and `cargo fmt` pass