Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
94 lines
3.4 KiB
Markdown
94 lines
3.4 KiB
Markdown
# Task 04: Shard-Aware Segment Naming
|
|
|
|
## Delivers
|
|
|
|
Update `segment_filename()` and `parse_segment_seq()` in `tidal/src/wal/segment.rs` to support shard-prefixed filenames. Single-shard (shard_id=0) retains the existing filename format for backward compatibility. Multi-shard deployments use a shard-prefixed format.
|
|
|
|
## Complexity: S
|
|
|
|
## Dependencies
|
|
|
|
- Task 01 (ShardId type)
|
|
|
|
## Technical Design
|
|
|
|
```rust
|
|
// tidal/src/wal/segment.rs
|
|
|
|
/// Generate the WAL segment filename for a given shard and sequence number.
|
|
///
|
|
/// Single-shard (shard_id=0): `wal-{first_seq:020}.seg`
|
|
/// -- matches existing format, full backward compatibility
|
|
///
|
|
/// Multi-shard (shard_id > 0): `wal-s{shard_id:05}-{first_seq:020}.seg`
|
|
/// -- includes shard prefix for disambiguation in shared WAL directories
|
|
pub fn segment_filename(shard_id: ShardId, first_seq: u64) -> String {
|
|
if shard_id == ShardId::SINGLE {
|
|
format!("wal-{first_seq:020}.seg")
|
|
} else {
|
|
format!("wal-s{:05}-{:020}.seg", shard_id.0, first_seq)
|
|
}
|
|
}
|
|
|
|
/// Parse the first_seq from a WAL segment filename.
|
|
///
|
|
/// Accepts both formats:
|
|
/// - `wal-{first_seq:020}.seg` (single-shard, v1)
|
|
/// - `wal-s{shard_id:05}-{first_seq:020}.seg` (multi-shard, v2)
|
|
///
|
|
/// Returns `(ShardId, first_seq)`.
|
|
pub fn parse_segment_filename(filename: &str) -> Option<(ShardId, u64)> {
|
|
let name = filename.strip_suffix(".seg")?;
|
|
|
|
// Multi-shard format: wal-s{shard_id}-{first_seq}
|
|
if let Some(rest) = name.strip_prefix("wal-s") {
|
|
let dash = rest.find('-')?;
|
|
let shard_id: u16 = rest[..dash].parse().ok()?;
|
|
let first_seq: u64 = rest[dash + 1..].parse().ok()?;
|
|
return Some((ShardId(shard_id), first_seq));
|
|
}
|
|
|
|
// Single-shard format: wal-{first_seq}
|
|
if let Some(seq_str) = name.strip_prefix("wal-") {
|
|
let first_seq: u64 = seq_str.parse().ok()?;
|
|
return Some((ShardId::SINGLE, first_seq));
|
|
}
|
|
|
|
None
|
|
}
|
|
|
|
/// Scan a directory for WAL segments belonging to `shard_id`.
|
|
///
|
|
/// In single-shard deployments, returns all segments (no prefix filtering).
|
|
/// In multi-shard deployments, filters by shard prefix.
|
|
pub fn list_segments_for_shard(
|
|
dir: &Path,
|
|
shard_id: ShardId,
|
|
) -> Result<Vec<(u64, PathBuf)>, WalError> {
|
|
let mut segments = Vec::new();
|
|
for entry in fs::read_dir(dir)? {
|
|
let entry = entry?;
|
|
let file_name = entry.file_name();
|
|
let name = file_name.to_string_lossy();
|
|
if let Some((seg_shard, seq)) = parse_segment_filename(&name) {
|
|
if seg_shard == shard_id || shard_id == ShardId::SINGLE {
|
|
segments.push((seq, entry.path()));
|
|
}
|
|
}
|
|
}
|
|
segments.sort_by_key(|(seq, _)| *seq);
|
|
Ok(segments)
|
|
}
|
|
```
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `segment_filename(ShardId(0), 42)` returns `"wal-00000000000000000042.seg"` (existing format)
|
|
- [ ] `segment_filename(ShardId(3), 42)` returns `"wal-s00003-00000000000000000042.seg"`
|
|
- [ ] `parse_segment_filename` correctly parses both formats
|
|
- [ ] `parse_segment_filename("not-a-segment.txt")` returns `None`
|
|
- [ ] `list_segments_for_shard` returns segments in sequence order; filters by shard in multi-shard directories
|
|
- [ ] All existing WAL tests pass (they use ShardId(0) which retains existing filename format)
|
|
- [ ] Property test: `parse_segment_filename(segment_filename(shard, seq))` roundtrips correctly
|
|
- [ ] `cargo clippy -D warnings` and `cargo fmt` pass
|