# m8p1: Shard-Aware Foundations ## Delivers The identity types, WAL segment tagging, and shard routing table that make tidalDB distribution-aware without introducing any network code. After this phase, every WAL segment carries a globally unique ID (`region_id:shard_id:seqno`), every entity operation is routable through a `ShardRouter`, and the existing single-node deployment works identically with the default shard_id=0 / region_id=0 configuration. This is the "build the atoms right" phase -- no new runtime behavior, but every data structure is distribution-ready. Deliverables: - `ShardId(u16)`, `RegionId(u16)`, `WalSegmentId { region_id, shard_id, seqno }` identity types - WAL batch header v2: adds `shard_id` and `region_id` fields (backward-compatible; v1 readers skip unknown fields) - `ShardRouter`: maps `EntityId -> ShardId` via configurable range boundaries - `NodeConfig` extending `Config` with cluster role, shard assignment, region assignment - `ReplicationState` tracking per-shard high-water-mark seqno for follower bookkeeping - All existing tests pass unchanged (shard_id=0 is the default; single-node is shard 0) ## Dependencies - **Requires:** M7 complete (WAL format v1, `BatchHeader`, `EventRecord`, `SegmentWriter`, `CheckpointManager`, `Config`, `StorageMode`) - **Files modified:** - `tidal/src/wal/format/batch.rs` -- extend `BatchHeader` with shard/region fields - `tidal/src/wal/segment.rs` -- segment filename includes shard_id prefix for multi-shard directories - `tidal/src/db/config.rs` -- add `NodeConfig` with cluster fields - `tidal/src/wal/checkpoint.rs` -- checkpoint includes shard_id - **Files created:** - `tidal/src/replication/mod.rs` -- module root - `tidal/src/replication/shard.rs` -- `ShardId`, `RegionId`, `ShardRouter` - `tidal/src/replication/segment_id.rs` -- `WalSegmentId` - `tidal/src/replication/state.rs` -- `ReplicationState` ## Research References - `docs/research/tidaldb_wal.md` -- WAL segment format, batch header layout - `thoughts.md` -- Part V.12 (subject-prefix key encoding for sharding) ## Acceptance Criteria (Phase Level) - [ ] `ShardId(u16)` and `RegionId(u16)` are `Copy + Clone + Debug + Eq + Hash + Ord + Serialize + Deserialize` - [ ] `WalSegmentId { region_id: RegionId, shard_id: ShardId, seqno: u64 }` has total ordering by `(region_id, shard_id, seqno)` and a human-readable `Display` impl producing `"r0:s0:42"` - [ ] `BatchHeader` v2 adds `shard_id: u16` and `region_id: u16` at bytes 58-61 (within existing 64-byte header); `FORMAT_VERSION` bumped to 2; v1 batches decode as shard_id=0, region_id=0 - [ ] `ShardRouter::route(entity_id: EntityId) -> ShardId` returns the correct shard for hash-based routing; default single-shard config always returns `ShardId(0)` - [ ] `ShardRouter` is constructable from a `Vec<(ShardId, EntityIdRange)>` with validation that ranges are non-overlapping and cover the full u64 space - [ ] `NodeConfig` extends `Config` with `role: NodeRole`, `shard_id: ShardId`, `region_id: RegionId`, `peer_shards: Vec`; defaults produce a single-node config - [ ] `ReplicationState` tracks `HashMap` (high-water-mark seqno per shard) with atomic reads/writes - [ ] All existing M0-M7 tests pass without modification (single-node = shard 0, region 0) - [ ] Segment filename format for multi-shard: `wal-s{shard_id:05}-{first_seq:020}.seg`; single-shard (shard_id=0) retains old format `wal-{first_seq:020}.seg` for backward compatibility - [ ] Property test: 10,000 random EntityIds always route to exactly one shard; routing is a pure function of entity_id and shard_ranges ## Task Execution Order ``` Task 01: Identity Types ─────────┐ ├──> Task 03: BatchHeader v2 Task 02: ShardRouter ────────────┤ ├──> Task 04: Segment Naming │ └──> Task 05: NodeConfig │ v Task 06: ReplicationState ``` Tasks 01 and 02 are fully parallelizable. Task 03 and 04 depend on Task 01. Task 05 depends on both 01 and 02. Task 06 depends on 05. ## Module Location | File | Status | Contains | |------|--------|----------| | `tidal/src/replication/mod.rs` | NEW | Module root, re-exports | | `tidal/src/replication/shard.rs` | NEW | `ShardId`, `RegionId`, `ShardRouter`, `EntityIdRange` | | `tidal/src/replication/segment_id.rs` | NEW | `WalSegmentId`, ordering, Display | | `tidal/src/replication/state.rs` | NEW | `ReplicationState`, high-water-mark tracking | | `tidal/src/wal/format/batch.rs` | MODIFIED | `BatchHeader` v2 with shard/region fields | | `tidal/src/wal/segment.rs` | MODIFIED | Shard-aware segment filename | | `tidal/src/wal/checkpoint.rs` | MODIFIED | Checkpoint includes shard_id | | `tidal/src/db/config.rs` | MODIFIED | `NodeConfig`, `NodeRole` enum | | `tidal/src/lib.rs` | MODIFIED | Add `pub mod replication;` | ## Notes ### Backward compatibility is non-negotiable WAL v1 segments must be readable by v2 code. The 4 bytes at offsets 58-61 in the v1 header are currently zero-padding; v2 reinterprets them as shard_id and region_id. This is safe because v1 always wrote zeros there. ### Hash-based vs range-based routing `ShardRouter` supports both: `hash(entity_id) % num_shards` for uniform distribution, and explicit range boundaries for production deployments. The trait abstracts the choice. ### No network code in this phase Everything is in-process. The `replication` module defines data structures and routing logic only. The `Transport` trait is introduced in Phase 8.2. ## Done When A developer can construct a `NodeConfig` with 3 regions and 5 shards per region, create a `ShardRouter` from range boundaries, route EntityIds to shards, construct a WAL `BatchHeader` v2 with shard/region tags, and all existing single-node tests pass unchanged.