# Task 05: RollingUpgradeCoordinator ## Delivers `RollingUpgradeCoordinator` in `tidal/src/replication/upgrade.rs`. Upgrades nodes one at a time with drain → upgrade → rejoin. Uses WAL shipping to keep remaining followers current during the upgrade window. Query availability remains 100% because at least one node is always serving during each upgrade step. ## Complexity: M ## Dependencies - Task 03 (ControlPlane) - Phase 8.2, Task 03 (WalShipper) - Phase 8.2, Task 05 (FollowerDb / NodeRole) ## Technical Design ```rust // tidal/src/replication/upgrade.rs /// Coordinates a rolling upgrade across all nodes in a cluster. /// /// Protocol (per node): /// 1. `drain(node)` -- stop routing new writes to the target node; /// let in-flight operations complete; verify replication lag = 0. /// 2. Caller performs the upgrade (outside this coordinator's scope). /// 3. `rejoin(node)` -- re-enable routing to the upgraded node; /// verify it can process new WAL segments. /// /// At any point, at least (N-1) nodes are serving queries. pub struct RollingUpgradeCoordinator { control_plane: Arc, wal_shipper: Arc, /// Nodes currently in the "draining" state (not routing new writes). drained_nodes: Mutex>, } /// Status of a single node's upgrade step. #[derive(Debug, Clone, PartialEq, Eq, serde::Serialize, serde::Deserialize)] pub enum NodeUpgradeStatus { Pending, Draining, Drained, // ready for upgrade Upgrading, // external process is upgrading the node Rejoining, // node is catching up from WAL Complete, Failed { reason: String }, } impl RollingUpgradeCoordinator { pub fn new( control_plane: Arc, wal_shipper: Arc, ) -> Self { Self { control_plane, wal_shipper, drained_nodes: Mutex::new(HashSet::new()), } } /// Drain a node: stop routing writes to it, wait for replication lag = 0. /// /// Fails if draining this node would leave fewer than 1 serving node. pub async fn drain(&self, target_shard: ShardId) -> Result<()> { // Safety check: cannot drain if it would leave 0 serving nodes. let drained = self.drained_nodes.lock().unwrap(); let topology = self.control_plane.topology(); let total_nodes = topology.shards.len(); let already_drained = drained.len(); if already_drained + 1 >= total_nodes { return Err(TidalError::InvalidState( "cannot drain: would leave no serving nodes".into() )); } drop(drained); // Mark as draining: routing layer stops sending new writes here. self.drained_nodes.lock().unwrap().insert(target_shard); // Wait for replication lag to reach 0 (target has all events). self.await_zero_lag(target_shard, Duration::from_secs(30)).await?; Ok(()) } /// Rejoin a (newly upgraded) node: re-enable routing, ship missing WAL segments. /// /// The upgraded node may have missed WAL segments during its downtime. /// We ship those segments before re-enabling routing. pub async fn rejoin(&self, target_shard: ShardId) -> Result<()> { // Get the node's current applied seqno (via its reported stats). let follower_seqno = self.control_plane .shard_stats(target_shard) .map(|s| s.applied_seqno) .unwrap_or(0); // Ship missed segments. self.wal_shipper .ship_segments_since(target_shard, follower_seqno) .await?; // Wait for the node to apply all shipped segments. self.await_zero_lag(target_shard, Duration::from_secs(60)).await?; // Re-enable routing to this node. self.drained_nodes.lock().unwrap().remove(&target_shard); Ok(()) } /// Returns `true` if `shard_id` is currently drained (not receiving writes). pub fn is_drained(&self, shard_id: ShardId) -> bool { self.drained_nodes.lock().unwrap().contains(&shard_id) } /// Wait until the replication lag for `target_shard` reaches 0. /// /// Polls the `ReplicationLagGauge` every 100ms. Times out after `timeout`. async fn await_zero_lag( &self, target_shard: ShardId, timeout: Duration, ) -> Result<()> { let deadline = Instant::now() + timeout; loop { if Instant::now() > deadline { return Err(TidalError::Timeout( format!("drain timeout: shard {:?} still has replication lag", target_shard) )); } let lag = self.control_plane.lag_for(target_shard); if lag == 0 { return Ok(()); } tokio::time::sleep(Duration::from_millis(100)).await; } } } ``` ### Routing Integration ```rust // In WalShipper (additions) impl WalShipper { /// Skip shipping to drained nodes. async fn should_ship_to(&self, shard_id: ShardId) -> bool { !self.upgrade_coordinator .as_ref() .map(|c| c.is_drained(shard_id)) .unwrap_or(false) } } ``` ## Acceptance Criteria - [ ] `drain(shard)` fails with `TidalError::InvalidState` if draining would leave 0 serving nodes - [ ] `drain(shard)` succeeds once replication lag for that shard reaches 0 - [ ] During drain: writes from `WalShipper` skip the drained shard; reads from other shards succeed - [ ] `rejoin(shard)` ships all WAL segments the node missed during its downtime, then re-enables routing - [ ] Rolling upgrade of all N nodes: each drain+rejoin step maintains availability (property: at least 1 node serving throughout) - [ ] Integration test: 3-node simulated cluster; drain node 0, "upgrade" (simulated by stop+restart), rejoin; verify all signals written during the upgrade are present on the rejoined node - [ ] `cargo clippy -D warnings` and `cargo fmt` pass