tidaldb/docs/planning/milestone-8/phase-1/task-03-batch-header-v2.md

# Task 03: BatchHeader v2

## Delivers

Extend `BatchHeader` in `tidal/src/wal/format/batch.rs` to v2 format with `shard_id` and `region_id` fields at bytes 58-61; update encode/decode; ensure v1 backward compatibility (zeros decode as shard 0, region 0). Bumps `FORMAT_VERSION` to 2.

## Complexity: S

## Dependencies

- Task 01 (ShardId, RegionId types)

## Technical Design

The existing `BatchHeader` is 64 bytes. The current layout (from WAL research doc):

```
Bytes 0-3:   MAGIC (0x54494441 = "TIDA")
Bytes 4-7:   FORMAT_VERSION (u32 LE)
Bytes 8-15:  first_seq (u64 LE)
Bytes 16-23: last_seq (u64 LE)
Bytes 24-31: event_count (u64 LE)
Bytes 32-39: uncompressed_size (u64 LE)
Bytes 40-47: compressed_size (u64 LE)
Bytes 48-55: timestamp_ns (u64 LE)
Bytes 56-59: checksum (u32 LE)         <- BLAKE3 first 4 bytes
Bytes 60-61: [RESERVED / ZERO]
Bytes 62-63: [RESERVED / ZERO]
```

v2 adds `shard_id` and `region_id` at the zero-padded bytes:

```
Bytes 56-59: checksum (u32 LE)
Bytes 60-61: shard_id (u16 LE)    <- NEW in v2 (was zero padding in v1)
Bytes 62-63: region_id (u16 LE)   <- NEW in v2 (was zero padding in v1)
```

This is backward compatible: v1 always wrote zeros at 60-63, so v2 code reading v1 segments correctly interprets shard_id=0, region_id=0.

```rust
// tidal/src/wal/format/batch.rs

pub const FORMAT_VERSION_V1: u32 = 1;
pub const FORMAT_VERSION_V2: u32 = 2;
pub const FORMAT_VERSION: u32 = FORMAT_VERSION_V2;

#[derive(Debug, Clone, PartialEq)]
pub struct BatchHeader {
    pub first_seq: u64,
    pub last_seq: u64,
    pub event_count: u64,
    pub uncompressed_size: u64,
    pub compressed_size: u64,
    pub timestamp_ns: u64,
    pub checksum: u32,
    // v2 fields -- default to 0 for single-node deployments
    pub shard_id: ShardId,
    pub region_id: RegionId,
}

impl BatchHeader {
    /// Encode to the 64-byte wire format.
    pub fn encode(&self) -> [u8; 64] {
        let mut buf = [0u8; 64];
        buf[0..4].copy_from_slice(&MAGIC.to_le_bytes());
        buf[4..8].copy_from_slice(&FORMAT_VERSION.to_le_bytes());
        buf[8..16].copy_from_slice(&self.first_seq.to_le_bytes());
        buf[16..24].copy_from_slice(&self.last_seq.to_le_bytes());
        buf[24..32].copy_from_slice(&self.event_count.to_le_bytes());
        buf[32..40].copy_from_slice(&self.uncompressed_size.to_le_bytes());
        buf[40..48].copy_from_slice(&self.compressed_size.to_le_bytes());
        buf[48..56].copy_from_slice(&self.timestamp_ns.to_le_bytes());
        buf[56..60].copy_from_slice(&self.checksum.to_le_bytes());
        buf[60..62].copy_from_slice(&self.shard_id.0.to_le_bytes());
        buf[62..64].copy_from_slice(&self.region_id.0.to_le_bytes());
        buf
    }

    /// Decode from a 64-byte buffer.
    ///
    /// Accepts both v1 (shard_id=0, region_id=0) and v2 format.
    pub fn decode(buf: &[u8; 64]) -> Result<Self, WalError> {
        let magic = u32::from_le_bytes(buf[0..4].try_into().unwrap());
        if magic != MAGIC {
            return Err(WalError::Corruption("bad magic".into()));
        }
        let version = u32::from_le_bytes(buf[4..8].try_into().unwrap());
        if version != FORMAT_VERSION_V1 && version != FORMAT_VERSION_V2 {
            return Err(WalError::Corruption(format!("unknown version {version}")));
        }

        let shard_id = ShardId(u16::from_le_bytes(buf[60..62].try_into().unwrap()));
        let region_id = RegionId(u16::from_le_bytes(buf[62..64].try_into().unwrap()));

        Ok(Self {
            first_seq: u64::from_le_bytes(buf[8..16].try_into().unwrap()),
            last_seq: u64::from_le_bytes(buf[16..24].try_into().unwrap()),
            event_count: u64::from_le_bytes(buf[24..32].try_into().unwrap()),
            uncompressed_size: u64::from_le_bytes(buf[32..40].try_into().unwrap()),
            compressed_size: u64::from_le_bytes(buf[40..48].try_into().unwrap()),
            timestamp_ns: u64::from_le_bytes(buf[48..56].try_into().unwrap()),
            checksum: u32::from_le_bytes(buf[56..60].try_into().unwrap()),
            shard_id,
            region_id,
        })
    }
}
```

## Acceptance Criteria

- [ ] `BatchHeader` has `shard_id: ShardId` and `region_id: RegionId` fields
- [ ] `BatchHeader::encode()` writes shard_id at bytes 60-61 (LE) and region_id at bytes 62-63 (LE)
- [ ] `BatchHeader::decode()` reads these bytes; v1 batches (zeros at 60-63) decode as `ShardId(0)`, `RegionId(0)`
- [ ] `FORMAT_VERSION` is bumped to 2; v1 reader accepts v1 and v2 version bytes
- [ ] Property test: encode + decode roundtrips for random shard_id, region_id values
- [ ] Property test: a buffer created with v1 code (shard bytes zeroed) decodes correctly
- [ ] All existing WAL tests pass (write/read/recovery) -- single-node uses shard=0, region=0 by default
- [ ] `cargo clippy -D warnings` and `cargo fmt` pass