Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.9 KiB
m8p1: Shard-Aware Foundations
Delivers
The identity types, WAL segment tagging, and shard routing table that make
tidalDB distribution-aware without introducing any network code. After this
phase, every WAL segment carries a globally unique ID
(region_id:shard_id:seqno), every entity operation is routable through a
ShardRouter, and the existing single-node deployment works identically with
the default shard_id=0 / region_id=0 configuration. This is the "build the
atoms right" phase -- no new runtime behavior, but every data structure is
distribution-ready.
Deliverables:
ShardId(u16),RegionId(u16),WalSegmentId { region_id, shard_id, seqno }identity types- WAL batch header v2: adds
shard_idandregion_idfields (backward-compatible; v1 readers skip unknown fields) ShardRouter: mapsEntityId -> ShardIdvia configurable range boundariesNodeConfigextendingConfigwith cluster role, shard assignment, region assignmentReplicationStatetracking per-shard high-water-mark seqno for follower bookkeeping- All existing tests pass unchanged (shard_id=0 is the default; single-node is shard 0)
Dependencies
- Requires: M7 complete (WAL format v1,
BatchHeader,EventRecord,SegmentWriter,CheckpointManager,Config,StorageMode) - Files modified:
tidal/src/wal/format/batch.rs-- extendBatchHeaderwith shard/region fieldstidal/src/wal/segment.rs-- segment filename includes shard_id prefix for multi-shard directoriestidal/src/db/config.rs-- addNodeConfigwith cluster fieldstidal/src/wal/checkpoint.rs-- checkpoint includes shard_id
- Files created:
tidal/src/replication/mod.rs-- module roottidal/src/replication/shard.rs--ShardId,RegionId,ShardRoutertidal/src/replication/segment_id.rs--WalSegmentIdtidal/src/replication/state.rs--ReplicationState
Research References
docs/research/tidaldb_wal.md-- WAL segment format, batch header layoutthoughts.md-- Part V.12 (subject-prefix key encoding for sharding)
Acceptance Criteria (Phase Level)
ShardId(u16)andRegionId(u16)areCopy + Clone + Debug + Eq + Hash + Ord + Serialize + DeserializeWalSegmentId { region_id: RegionId, shard_id: ShardId, seqno: u64 }has total ordering by(region_id, shard_id, seqno)and a human-readableDisplayimpl producing"r0:s0:42"BatchHeaderv2 addsshard_id: u16andregion_id: u16at bytes 58-61 (within existing 64-byte header);FORMAT_VERSIONbumped to 2; v1 batches decode as shard_id=0, region_id=0ShardRouter::route(entity_id: EntityId) -> ShardIdreturns the correct shard for hash-based routing; default single-shard config always returnsShardId(0)ShardRouteris constructable from aVec<(ShardId, EntityIdRange)>with validation that ranges are non-overlapping and cover the full u64 spaceNodeConfigextendsConfigwithrole: NodeRole,shard_id: ShardId,region_id: RegionId,peer_shards: Vec<ShardId>; defaults produce a single-node configReplicationStatetracksHashMap<ShardId, u64>(high-water-mark seqno per shard) with atomic reads/writes- All existing M0-M7 tests pass without modification (single-node = shard 0, region 0)
- Segment filename format for multi-shard:
wal-s{shard_id:05}-{first_seq:020}.seg; single-shard (shard_id=0) retains old formatwal-{first_seq:020}.segfor backward compatibility - Property test: 10,000 random EntityIds always route to exactly one shard; routing is a pure function of entity_id and shard_ranges
Task Execution Order
Task 01: Identity Types ─────────┐
├──> Task 03: BatchHeader v2
Task 02: ShardRouter ────────────┤
├──> Task 04: Segment Naming
│
└──> Task 05: NodeConfig
│
v
Task 06: ReplicationState
Tasks 01 and 02 are fully parallelizable. Task 03 and 04 depend on Task 01. Task 05 depends on both 01 and 02. Task 06 depends on 05.
Module Location
| File | Status | Contains |
|---|---|---|
tidal/src/replication/mod.rs |
NEW | Module root, re-exports |
tidal/src/replication/shard.rs |
NEW | ShardId, RegionId, ShardRouter, EntityIdRange |
tidal/src/replication/segment_id.rs |
NEW | WalSegmentId, ordering, Display |
tidal/src/replication/state.rs |
NEW | ReplicationState, high-water-mark tracking |
tidal/src/wal/format/batch.rs |
MODIFIED | BatchHeader v2 with shard/region fields |
tidal/src/wal/segment.rs |
MODIFIED | Shard-aware segment filename |
tidal/src/wal/checkpoint.rs |
MODIFIED | Checkpoint includes shard_id |
tidal/src/db/config.rs |
MODIFIED | NodeConfig, NodeRole enum |
tidal/src/lib.rs |
MODIFIED | Add pub mod replication; |
Notes
Backward compatibility is non-negotiable
WAL v1 segments must be readable by v2 code. The 4 bytes at offsets 58-61 in the v1 header are currently zero-padding; v2 reinterprets them as shard_id and region_id. This is safe because v1 always wrote zeros there.
Hash-based vs range-based routing
ShardRouter supports both: hash(entity_id) % num_shards for uniform distribution, and explicit range boundaries for production deployments. The trait abstracts the choice.
No network code in this phase
Everything is in-process. The replication module defines data structures and routing logic only. The Transport trait is introduced in Phase 8.2.
Done When
A developer can construct a NodeConfig with 3 regions and 5 shards per region, create a ShardRouter from range boundaries, route EntityIds to shards, construct a WAL BatchHeader v2 with shard/region tags, and all existing single-node tests pass unchanged.