tidaldb/docs/planning/milestone-8/phase-1/task-02-shard-router.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

240 lines
7.8 KiB
Markdown

# Task 02: ShardRouter
## Delivers
`ShardRouter` with `EntityIdRange` type, range-based and hash-based routing, validation that ranges partition the full u64 space, and property tests for deterministic routing. The `ShardRouter` maps any `EntityId` to exactly one `ShardId` and is the single source of truth for shard assignment.
## Complexity: M
## Dependencies
- Task 01 (ShardId, RegionId types)
## Technical Design
```rust
// tidal/src/replication/shard.rs
use crate::EntityId;
/// A contiguous, half-open range of EntityIds: [start, end).
///
/// Used to define shard boundaries in range-based routing.
#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
pub struct EntityIdRange {
pub start: u64, // inclusive
pub end: u64, // exclusive; u64::MAX means "includes the last entity"
}
impl EntityIdRange {
pub fn contains(&self, id: u64) -> bool {
id >= self.start && id < self.end
}
/// The full u64 space (single-shard default).
pub fn full() -> Self {
Self { start: 0, end: u64::MAX }
}
}
/// Routing strategy for entity-to-shard mapping.
#[derive(Debug, Clone)]
pub enum RoutingStrategy {
/// All entities route to the default single shard.
/// Used for single-node deployments (shard_id=0).
Single,
/// Hash-based routing: `hash(entity_id) % num_shards`.
/// Uniform distribution; no explicit range boundaries.
Hash { num_shards: u16 },
/// Range-based routing: each shard owns a contiguous range of EntityIds.
/// Production deployments use this for controlled data placement.
Range(Vec<(ShardId, EntityIdRange)>),
}
/// Routes EntityIds to ShardIds.
///
/// Thread-safe; clone is cheap (inner data is Arc<_>).
#[derive(Debug, Clone)]
pub struct ShardRouter {
strategy: RoutingStrategy,
}
impl ShardRouter {
/// Create a single-node router (always returns ShardId(0)).
pub fn single() -> Self {
Self { strategy: RoutingStrategy::Single }
}
/// Create a hash-based router with `num_shards` shards.
pub fn hash(num_shards: u16) -> Result<Self, RouterError> {
if num_shards == 0 {
return Err(RouterError::ZeroShards);
}
Ok(Self { strategy: RoutingStrategy::Hash { num_shards } })
}
/// Create a range-based router from a list of (ShardId, EntityIdRange) pairs.
///
/// Validates that:
/// - Ranges are non-overlapping
/// - Ranges cover the full u64 space (no gaps)
/// - ShardIds are unique
pub fn range(ranges: Vec<(ShardId, EntityIdRange)>) -> Result<Self, RouterError> {
Self::validate_ranges(&ranges)?;
Ok(Self { strategy: RoutingStrategy::Range(ranges) })
}
/// Route an EntityId to its owning ShardId.
///
/// Always returns exactly one shard. Never panics.
pub fn route(&self, entity_id: EntityId) -> ShardId {
let id = entity_id.as_u64();
match &self.strategy {
RoutingStrategy::Single => ShardId::SINGLE,
RoutingStrategy::Hash { num_shards } => {
// FNV-1a hash for uniform distribution without dependencies
let hash = fnv1a_hash(id);
ShardId(hash as u16 % num_shards)
}
RoutingStrategy::Range(ranges) => {
for (shard_id, range) in ranges {
if range.contains(id) {
return *shard_id;
}
}
// Invariant: validated at construction time that ranges cover
// the full space, so this is unreachable.
ShardId::SINGLE
}
}
}
/// Returns all ShardIds known to this router.
pub fn all_shards(&self) -> Vec<ShardId> {
match &self.strategy {
RoutingStrategy::Single => vec![ShardId::SINGLE],
RoutingStrategy::Hash { num_shards } => {
(0..*num_shards).map(ShardId).collect()
}
RoutingStrategy::Range(ranges) => {
let mut shards: Vec<_> = ranges.iter().map(|(s, _)| *s).collect();
shards.sort();
shards.dedup();
shards
}
}
}
fn validate_ranges(ranges: &[(ShardId, EntityIdRange)]) -> Result<(), RouterError> {
if ranges.is_empty() {
return Err(RouterError::EmptyRanges);
}
// Sort by start position to check coverage and overlap.
let mut sorted: Vec<_> = ranges.iter().collect();
sorted.sort_by_key(|(_, r)| r.start);
// Check no gaps and no overlaps.
let mut expected_start = 0u64;
for (_, range) in &sorted {
if range.start != expected_start {
return Err(RouterError::Gap {
expected: expected_start,
found: range.start,
});
}
if range.end <= range.start {
return Err(RouterError::EmptyRange { start: range.start });
}
expected_start = range.end;
}
// Check coverage of full space.
if expected_start != u64::MAX {
return Err(RouterError::IncompleteCoverage { ends_at: expected_start });
}
Ok(())
}
}
#[inline]
fn fnv1a_hash(value: u64) -> u64 {
const FNV_OFFSET: u64 = 14_695_981_039_346_656_037;
const FNV_PRIME: u64 = 1_099_511_628_211;
let mut hash = FNV_OFFSET;
let bytes = value.to_le_bytes();
for byte in &bytes {
hash ^= *byte as u64;
hash = hash.wrapping_mul(FNV_PRIME);
}
hash
}
#[derive(Debug, thiserror::Error)]
pub enum RouterError {
#[error("shard count must be > 0")]
ZeroShards,
#[error("range list is empty")]
EmptyRanges,
#[error("gap in range: expected start {expected}, found {found}")]
Gap { expected: u64, found: u64 },
#[error("empty range starting at {start}")]
EmptyRange { start: u64 },
#[error("ranges don't cover full u64 space: ends at {ends_at}")]
IncompleteCoverage { ends_at: u64 },
}
```
## Acceptance Criteria
- [ ] `ShardRouter::single()` always returns `ShardId(0)` for any input
- [ ] `ShardRouter::hash(n)` distributes entities uniformly; property test with 10K IDs shows max deviation < 15% from expected bucket size
- [ ] `ShardRouter::range(ranges)` returns the correct shard for boundaries; property test with 10K random IDs within each range
- [ ] `RouterError::Gap` when ranges have a gap; `RouterError::IncompleteCoverage` when ranges don't reach u64::MAX
- [ ] `ShardRouter::all_shards()` returns all shards for each routing strategy
- [ ] Routing is a pure function: same input always returns same output (property test with proptest)
- [ ] `cargo clippy -D warnings` and `cargo fmt` pass
## Test Strategy
```rust
#[cfg(test)]
mod tests {
use super::*;
use proptest::prelude::*;
#[test]
fn single_router_always_returns_shard_zero() {
let router = ShardRouter::single();
for id in [0u64, 1, 100, u64::MAX - 1] {
assert_eq!(router.route(EntityId::from(id)), ShardId(0));
}
}
#[test]
fn range_router_validates_gap() {
let result = ShardRouter::range(vec![
(ShardId(0), EntityIdRange { start: 0, end: 1000 }),
(ShardId(1), EntityIdRange { start: 2000, end: u64::MAX }),
]);
assert!(matches!(result, Err(RouterError::Gap { .. })));
}
proptest! {
#[test]
fn hash_routing_is_deterministic(id in 0u64..u64::MAX) {
let router = ShardRouter::hash(5).unwrap();
let entity = EntityId::from(id);
assert_eq!(router.route(entity), router.route(entity));
}
#[test]
fn hash_routing_stays_in_range(id in 0u64..u64::MAX) {
let router = ShardRouter::hash(5).unwrap();
let shard = router.route(EntityId::from(id));
assert!(shard.0 < 5);
}
}
}
```