Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3.0 KiB
3.0 KiB
Task 06: ReplicationLagGauge
Delivers
ReplicationLagGauge in tidal/src/replication/lag.rs tracking per-follower lag (leader_seqno - follower_applied_seqno). Exposed via MetricsState so existing Prometheus scraping picks it up automatically.
Complexity: S
Dependencies
- Phase 8.1 (ReplicationState)
- Task 03 (WalShipper -- for leader_seqno)
Technical Design
// tidal/src/replication/lag.rs
/// Tracks per-follower replication lag.
///
/// Lag = leader's latest shipped seqno - follower's applied seqno.
/// A lag of 0 means the follower is fully caught up.
#[derive(Debug, Default)]
pub struct ReplicationLagGauge {
/// Per-follower: last seqno the leader has shipped.
leader_seqno: DashMap<ShardId, AtomicU64>,
/// Per-follower: last seqno the follower has applied.
follower_applied: Arc<ReplicationState>,
}
impl ReplicationLagGauge {
pub fn new(replication_state: Arc<ReplicationState>) -> Self {
Self {
leader_seqno: DashMap::new(),
follower_applied: replication_state,
}
}
/// Update the leader's known shipped seqno for a follower.
pub fn update_leader_seqno(&self, follower: ShardId, seqno: u64) {
self.leader_seqno
.entry(follower)
.or_insert_with(|| AtomicU64::new(0))
.store(seqno, Ordering::Release);
}
/// Get the current lag for a follower in seqno units.
pub fn lag_seqno(&self, follower: ShardId) -> i64 {
let leader = self.leader_seqno
.get(&follower)
.map(|a| a.load(Ordering::Acquire))
.unwrap_or(0);
let applied = self.follower_applied
.applied_seqno(follower)
.unwrap_or(0);
leader as i64 - applied as i64
}
/// Collect Prometheus-style gauge values for all followers.
pub fn collect_metrics(&self) -> Vec<(ShardId, i64)> {
self.leader_seqno
.iter()
.map(|entry| {
let follower = *entry.key();
(follower, self.lag_seqno(follower))
})
.collect()
}
}
MetricsState integration
// tidal/src/db/metrics.rs (existing metrics module)
impl MetricsState {
// Add to existing collect() method:
pub fn replication_lag_seqno(&self, follower_shard: u16) -> i64 {
self.lag_gauge
.as_ref()
.map(|g| g.lag_seqno(ShardId(follower_shard)))
.unwrap_or(0)
}
}
Acceptance Criteria
ReplicationLagGauge::lag_seqno(follower)returnsleader_seqno - follower_applied_seqnolag_seqnoreturns 0 when follower is fully caught uplag_seqnoreturns > 0 when follower is behindcollect_metrics()returns a snapshot of all follower lags- Integrated into
MetricsStateso existing/metricsendpoint exposesreplication_lag_seqnogauge - Integration test: leader writes 100 segments; before follower applies them, lag = 100; after apply, lag = 0
cargo clippy -D warningsandcargo fmtpass