tidaldb/docs/planning/milestone-8/phase-1/task-03-batch-header-v2.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

4.7 KiB

Task 03: BatchHeader v2

Delivers

Extend BatchHeader in tidal/src/wal/format/batch.rs to v2 format with shard_id and region_id fields at bytes 58-61; update encode/decode; ensure v1 backward compatibility (zeros decode as shard 0, region 0). Bumps FORMAT_VERSION to 2.

Complexity: S

Dependencies

  • Task 01 (ShardId, RegionId types)

Technical Design

The existing BatchHeader is 64 bytes. The current layout (from WAL research doc):

Bytes 0-3:   MAGIC (0x54494441 = "TIDA")
Bytes 4-7:   FORMAT_VERSION (u32 LE)
Bytes 8-15:  first_seq (u64 LE)
Bytes 16-23: last_seq (u64 LE)
Bytes 24-31: event_count (u64 LE)
Bytes 32-39: uncompressed_size (u64 LE)
Bytes 40-47: compressed_size (u64 LE)
Bytes 48-55: timestamp_ns (u64 LE)
Bytes 56-59: checksum (u32 LE)         <- BLAKE3 first 4 bytes
Bytes 60-61: [RESERVED / ZERO]
Bytes 62-63: [RESERVED / ZERO]

v2 adds shard_id and region_id at the zero-padded bytes:

Bytes 56-59: checksum (u32 LE)
Bytes 60-61: shard_id (u16 LE)    <- NEW in v2 (was zero padding in v1)
Bytes 62-63: region_id (u16 LE)   <- NEW in v2 (was zero padding in v1)

This is backward compatible: v1 always wrote zeros at 60-63, so v2 code reading v1 segments correctly interprets shard_id=0, region_id=0.

// tidal/src/wal/format/batch.rs

pub const FORMAT_VERSION_V1: u32 = 1;
pub const FORMAT_VERSION_V2: u32 = 2;
pub const FORMAT_VERSION: u32 = FORMAT_VERSION_V2;

#[derive(Debug, Clone, PartialEq)]
pub struct BatchHeader {
    pub first_seq: u64,
    pub last_seq: u64,
    pub event_count: u64,
    pub uncompressed_size: u64,
    pub compressed_size: u64,
    pub timestamp_ns: u64,
    pub checksum: u32,
    // v2 fields -- default to 0 for single-node deployments
    pub shard_id: ShardId,
    pub region_id: RegionId,
}

impl BatchHeader {
    /// Encode to the 64-byte wire format.
    pub fn encode(&self) -> [u8; 64] {
        let mut buf = [0u8; 64];
        buf[0..4].copy_from_slice(&MAGIC.to_le_bytes());
        buf[4..8].copy_from_slice(&FORMAT_VERSION.to_le_bytes());
        buf[8..16].copy_from_slice(&self.first_seq.to_le_bytes());
        buf[16..24].copy_from_slice(&self.last_seq.to_le_bytes());
        buf[24..32].copy_from_slice(&self.event_count.to_le_bytes());
        buf[32..40].copy_from_slice(&self.uncompressed_size.to_le_bytes());
        buf[40..48].copy_from_slice(&self.compressed_size.to_le_bytes());
        buf[48..56].copy_from_slice(&self.timestamp_ns.to_le_bytes());
        buf[56..60].copy_from_slice(&self.checksum.to_le_bytes());
        buf[60..62].copy_from_slice(&self.shard_id.0.to_le_bytes());
        buf[62..64].copy_from_slice(&self.region_id.0.to_le_bytes());
        buf
    }

    /// Decode from a 64-byte buffer.
    ///
    /// Accepts both v1 (shard_id=0, region_id=0) and v2 format.
    pub fn decode(buf: &[u8; 64]) -> Result<Self, WalError> {
        let magic = u32::from_le_bytes(buf[0..4].try_into().unwrap());
        if magic != MAGIC {
            return Err(WalError::Corruption("bad magic".into()));
        }
        let version = u32::from_le_bytes(buf[4..8].try_into().unwrap());
        if version != FORMAT_VERSION_V1 && version != FORMAT_VERSION_V2 {
            return Err(WalError::Corruption(format!("unknown version {version}")));
        }

        let shard_id = ShardId(u16::from_le_bytes(buf[60..62].try_into().unwrap()));
        let region_id = RegionId(u16::from_le_bytes(buf[62..64].try_into().unwrap()));

        Ok(Self {
            first_seq: u64::from_le_bytes(buf[8..16].try_into().unwrap()),
            last_seq: u64::from_le_bytes(buf[16..24].try_into().unwrap()),
            event_count: u64::from_le_bytes(buf[24..32].try_into().unwrap()),
            uncompressed_size: u64::from_le_bytes(buf[32..40].try_into().unwrap()),
            compressed_size: u64::from_le_bytes(buf[40..48].try_into().unwrap()),
            timestamp_ns: u64::from_le_bytes(buf[48..56].try_into().unwrap()),
            checksum: u32::from_le_bytes(buf[56..60].try_into().unwrap()),
            shard_id,
            region_id,
        })
    }
}

Acceptance Criteria

  • BatchHeader has shard_id: ShardId and region_id: RegionId fields
  • BatchHeader::encode() writes shard_id at bytes 60-61 (LE) and region_id at bytes 62-63 (LE)
  • BatchHeader::decode() reads these bytes; v1 batches (zeros at 60-63) decode as ShardId(0), RegionId(0)
  • FORMAT_VERSION is bumped to 2; v1 reader accepts v1 and v2 version bytes
  • Property test: encode + decode roundtrips for random shard_id, region_id values
  • Property test: a buffer created with v1 code (shard bytes zeroed) decodes correctly
  • All existing WAL tests pass (write/read/recovery) -- single-node uses shard=0, region=0 by default
  • cargo clippy -D warnings and cargo fmt pass