tidaldb/docs/planning/milestone-8/phase-1/task-02-shard-router.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

7.8 KiB

Task 02: ShardRouter

Delivers

ShardRouter with EntityIdRange type, range-based and hash-based routing, validation that ranges partition the full u64 space, and property tests for deterministic routing. The ShardRouter maps any EntityId to exactly one ShardId and is the single source of truth for shard assignment.

Complexity: M

Dependencies

  • Task 01 (ShardId, RegionId types)

Technical Design

// tidal/src/replication/shard.rs

use crate::EntityId;

/// A contiguous, half-open range of EntityIds: [start, end).
///
/// Used to define shard boundaries in range-based routing.
#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize, serde::Deserialize)]
pub struct EntityIdRange {
    pub start: u64,  // inclusive
    pub end: u64,    // exclusive; u64::MAX means "includes the last entity"
}

impl EntityIdRange {
    pub fn contains(&self, id: u64) -> bool {
        id >= self.start && id < self.end
    }

    /// The full u64 space (single-shard default).
    pub fn full() -> Self {
        Self { start: 0, end: u64::MAX }
    }
}

/// Routing strategy for entity-to-shard mapping.
#[derive(Debug, Clone)]
pub enum RoutingStrategy {
    /// All entities route to the default single shard.
    /// Used for single-node deployments (shard_id=0).
    Single,

    /// Hash-based routing: `hash(entity_id) % num_shards`.
    /// Uniform distribution; no explicit range boundaries.
    Hash { num_shards: u16 },

    /// Range-based routing: each shard owns a contiguous range of EntityIds.
    /// Production deployments use this for controlled data placement.
    Range(Vec<(ShardId, EntityIdRange)>),
}

/// Routes EntityIds to ShardIds.
///
/// Thread-safe; clone is cheap (inner data is Arc<_>).
#[derive(Debug, Clone)]
pub struct ShardRouter {
    strategy: RoutingStrategy,
}

impl ShardRouter {
    /// Create a single-node router (always returns ShardId(0)).
    pub fn single() -> Self {
        Self { strategy: RoutingStrategy::Single }
    }

    /// Create a hash-based router with `num_shards` shards.
    pub fn hash(num_shards: u16) -> Result<Self, RouterError> {
        if num_shards == 0 {
            return Err(RouterError::ZeroShards);
        }
        Ok(Self { strategy: RoutingStrategy::Hash { num_shards } })
    }

    /// Create a range-based router from a list of (ShardId, EntityIdRange) pairs.
    ///
    /// Validates that:
    /// - Ranges are non-overlapping
    /// - Ranges cover the full u64 space (no gaps)
    /// - ShardIds are unique
    pub fn range(ranges: Vec<(ShardId, EntityIdRange)>) -> Result<Self, RouterError> {
        Self::validate_ranges(&ranges)?;
        Ok(Self { strategy: RoutingStrategy::Range(ranges) })
    }

    /// Route an EntityId to its owning ShardId.
    ///
    /// Always returns exactly one shard. Never panics.
    pub fn route(&self, entity_id: EntityId) -> ShardId {
        let id = entity_id.as_u64();
        match &self.strategy {
            RoutingStrategy::Single => ShardId::SINGLE,
            RoutingStrategy::Hash { num_shards } => {
                // FNV-1a hash for uniform distribution without dependencies
                let hash = fnv1a_hash(id);
                ShardId(hash as u16 % num_shards)
            }
            RoutingStrategy::Range(ranges) => {
                for (shard_id, range) in ranges {
                    if range.contains(id) {
                        return *shard_id;
                    }
                }
                // Invariant: validated at construction time that ranges cover
                // the full space, so this is unreachable.
                ShardId::SINGLE
            }
        }
    }

    /// Returns all ShardIds known to this router.
    pub fn all_shards(&self) -> Vec<ShardId> {
        match &self.strategy {
            RoutingStrategy::Single => vec![ShardId::SINGLE],
            RoutingStrategy::Hash { num_shards } => {
                (0..*num_shards).map(ShardId).collect()
            }
            RoutingStrategy::Range(ranges) => {
                let mut shards: Vec<_> = ranges.iter().map(|(s, _)| *s).collect();
                shards.sort();
                shards.dedup();
                shards
            }
        }
    }

    fn validate_ranges(ranges: &[(ShardId, EntityIdRange)]) -> Result<(), RouterError> {
        if ranges.is_empty() {
            return Err(RouterError::EmptyRanges);
        }
        // Sort by start position to check coverage and overlap.
        let mut sorted: Vec<_> = ranges.iter().collect();
        sorted.sort_by_key(|(_, r)| r.start);

        // Check no gaps and no overlaps.
        let mut expected_start = 0u64;
        for (_, range) in &sorted {
            if range.start != expected_start {
                return Err(RouterError::Gap {
                    expected: expected_start,
                    found: range.start,
                });
            }
            if range.end <= range.start {
                return Err(RouterError::EmptyRange { start: range.start });
            }
            expected_start = range.end;
        }
        // Check coverage of full space.
        if expected_start != u64::MAX {
            return Err(RouterError::IncompleteCoverage { ends_at: expected_start });
        }
        Ok(())
    }
}

#[inline]
fn fnv1a_hash(value: u64) -> u64 {
    const FNV_OFFSET: u64 = 14_695_981_039_346_656_037;
    const FNV_PRIME: u64 = 1_099_511_628_211;
    let mut hash = FNV_OFFSET;
    let bytes = value.to_le_bytes();
    for byte in &bytes {
        hash ^= *byte as u64;
        hash = hash.wrapping_mul(FNV_PRIME);
    }
    hash
}

#[derive(Debug, thiserror::Error)]
pub enum RouterError {
    #[error("shard count must be > 0")]
    ZeroShards,
    #[error("range list is empty")]
    EmptyRanges,
    #[error("gap in range: expected start {expected}, found {found}")]
    Gap { expected: u64, found: u64 },
    #[error("empty range starting at {start}")]
    EmptyRange { start: u64 },
    #[error("ranges don't cover full u64 space: ends at {ends_at}")]
    IncompleteCoverage { ends_at: u64 },
}

Acceptance Criteria

  • ShardRouter::single() always returns ShardId(0) for any input
  • ShardRouter::hash(n) distributes entities uniformly; property test with 10K IDs shows max deviation < 15% from expected bucket size
  • ShardRouter::range(ranges) returns the correct shard for boundaries; property test with 10K random IDs within each range
  • RouterError::Gap when ranges have a gap; RouterError::IncompleteCoverage when ranges don't reach u64::MAX
  • ShardRouter::all_shards() returns all shards for each routing strategy
  • Routing is a pure function: same input always returns same output (property test with proptest)
  • cargo clippy -D warnings and cargo fmt pass

Test Strategy

#[cfg(test)]
mod tests {
    use super::*;
    use proptest::prelude::*;

    #[test]
    fn single_router_always_returns_shard_zero() {
        let router = ShardRouter::single();
        for id in [0u64, 1, 100, u64::MAX - 1] {
            assert_eq!(router.route(EntityId::from(id)), ShardId(0));
        }
    }

    #[test]
    fn range_router_validates_gap() {
        let result = ShardRouter::range(vec![
            (ShardId(0), EntityIdRange { start: 0, end: 1000 }),
            (ShardId(1), EntityIdRange { start: 2000, end: u64::MAX }),
        ]);
        assert!(matches!(result, Err(RouterError::Gap { .. })));
    }

    proptest! {
        #[test]
        fn hash_routing_is_deterministic(id in 0u64..u64::MAX) {
            let router = ShardRouter::hash(5).unwrap();
            let entity = EntityId::from(id);
            assert_eq!(router.route(entity), router.route(entity));
        }

        #[test]
        fn hash_routing_stays_in_range(id in 0u64..u64::MAX) {
            let router = ShardRouter::hash(5).unwrap();
            let shard = router.route(EntityId::from(id));
            assert!(shard.0 < 5);
        }
    }
}