tidaldb/docs/planning/milestone-8/phase-1/task-04-segment-naming.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

3.4 KiB

Task 04: Shard-Aware Segment Naming

Delivers

Update segment_filename() and parse_segment_seq() in tidal/src/wal/segment.rs to support shard-prefixed filenames. Single-shard (shard_id=0) retains the existing filename format for backward compatibility. Multi-shard deployments use a shard-prefixed format.

Complexity: S

Dependencies

  • Task 01 (ShardId type)

Technical Design

// tidal/src/wal/segment.rs

/// Generate the WAL segment filename for a given shard and sequence number.
///
/// Single-shard (shard_id=0): `wal-{first_seq:020}.seg`
///   -- matches existing format, full backward compatibility
///
/// Multi-shard (shard_id > 0): `wal-s{shard_id:05}-{first_seq:020}.seg`
///   -- includes shard prefix for disambiguation in shared WAL directories
pub fn segment_filename(shard_id: ShardId, first_seq: u64) -> String {
    if shard_id == ShardId::SINGLE {
        format!("wal-{first_seq:020}.seg")
    } else {
        format!("wal-s{:05}-{:020}.seg", shard_id.0, first_seq)
    }
}

/// Parse the first_seq from a WAL segment filename.
///
/// Accepts both formats:
///   - `wal-{first_seq:020}.seg` (single-shard, v1)
///   - `wal-s{shard_id:05}-{first_seq:020}.seg` (multi-shard, v2)
///
/// Returns `(ShardId, first_seq)`.
pub fn parse_segment_filename(filename: &str) -> Option<(ShardId, u64)> {
    let name = filename.strip_suffix(".seg")?;

    // Multi-shard format: wal-s{shard_id}-{first_seq}
    if let Some(rest) = name.strip_prefix("wal-s") {
        let dash = rest.find('-')?;
        let shard_id: u16 = rest[..dash].parse().ok()?;
        let first_seq: u64 = rest[dash + 1..].parse().ok()?;
        return Some((ShardId(shard_id), first_seq));
    }

    // Single-shard format: wal-{first_seq}
    if let Some(seq_str) = name.strip_prefix("wal-") {
        let first_seq: u64 = seq_str.parse().ok()?;
        return Some((ShardId::SINGLE, first_seq));
    }

    None
}

/// Scan a directory for WAL segments belonging to `shard_id`.
///
/// In single-shard deployments, returns all segments (no prefix filtering).
/// In multi-shard deployments, filters by shard prefix.
pub fn list_segments_for_shard(
    dir: &Path,
    shard_id: ShardId,
) -> Result<Vec<(u64, PathBuf)>, WalError> {
    let mut segments = Vec::new();
    for entry in fs::read_dir(dir)? {
        let entry = entry?;
        let file_name = entry.file_name();
        let name = file_name.to_string_lossy();
        if let Some((seg_shard, seq)) = parse_segment_filename(&name) {
            if seg_shard == shard_id || shard_id == ShardId::SINGLE {
                segments.push((seq, entry.path()));
            }
        }
    }
    segments.sort_by_key(|(seq, _)| *seq);
    Ok(segments)
}

Acceptance Criteria

  • segment_filename(ShardId(0), 42) returns "wal-00000000000000000042.seg" (existing format)
  • segment_filename(ShardId(3), 42) returns "wal-s00003-00000000000000000042.seg"
  • parse_segment_filename correctly parses both formats
  • parse_segment_filename("not-a-segment.txt") returns None
  • list_segments_for_shard returns segments in sequence order; filters by shard in multi-shard directories
  • All existing WAL tests pass (they use ShardId(0) which retains existing filename format)
  • Property test: parse_segment_filename(segment_filename(shard, seq)) roundtrips correctly
  • cargo clippy -D warnings and cargo fmt pass