tidaldb/docs/planning/milestone-7/phase-2/task-03-degradation-response-backpressure.md
2026-02-23 22:41:16 -07:00

9.4 KiB

Task 03: Degradation Level in Response + Backpressure Error

Delivers

degradation_level field on Results and SearchResults so callers can observe the current load state. TidalError::Backpressure variant for WAL queue saturation. Backpressure check in the signal write path before enqueueing to the WAL channel.

Complexity: M

Dependencies

  • task-01 (DegradationLevel enum)
  • m2p5 Results struct (tidal/src/query/retrieve/types.rs)
  • m5p3 SearchResults struct (tidal/src/query/search/types.rs)
  • m1p4 WAL handle (tidal/src/wal/mod.rs, DEFAULT_CHANNEL_CAPACITY)
  • TidalError (tidal/src/schema/error.rs)

Technical Design

1. Add degradation_level to Results

// In tidal/src/query/retrieve/types.rs, add to Results:

/// The response from executing a RETRIEVE query.
pub struct Results {
    pub items: Vec<RetrieveResult>,
    pub next_cursor: Option<Cursor>,
    pub total_candidates: usize,
    pub constraints_satisfied: bool,
    pub warnings: Vec<String>,
    pub session_snapshot: Option<SessionSnapshot>,
    /// The degradation level under which this query was executed.
    ///
    /// `Full` means the query ran at full fidelity. Any other level
    /// indicates that quality was reduced due to load pressure. Callers
    /// should treat non-`Full` responses as best-effort: results are
    /// valid but may be less diverse, use coarser aggregation windows,
    /// or draw from a smaller candidate pool.
    pub degradation_level: crate::load::DegradationLevel,
}

Update ALL construction sites of Results to include the new field. There are two:

  1. RetrieveExecutor::execute() -- sets degradation_level: self.degradation_level
  2. Any test helpers that construct Results directly -- set degradation_level: DegradationLevel::Full

2. Add degradation_level to SearchResults

// In tidal/src/query/search/types.rs, add to SearchResults:

pub struct SearchResults {
    pub items: Vec<SearchResultItem>,
    pub next_cursor: Option<Cursor>,
    pub total_candidates: usize,
    pub constraints_satisfied: bool,
    pub warnings: Vec<String>,
    pub session_snapshot: Option<SessionSnapshot>,
    /// The degradation level under which this search was executed.
    pub degradation_level: crate::load::DegradationLevel,
}

Update the construction site in SearchExecutor::execute().

3. Add TidalError::Backpressure variant

// In tidal/src/schema/error.rs, add to TidalError:

/// The WAL write queue is saturated. The caller should retry after the
/// suggested delay. This is NOT a data loss event -- the signal was
/// never enqueued, so it can be safely retried.
#[error("backpressure: WAL queue full, retry after {retry_after_ms}ms")]
Backpressure {
    /// Suggested delay before retrying, in milliseconds.
    retry_after_ms: u64,
},

4. Backpressure threshold config

// In tidal/src/load/detector.rs (or a separate backpressure.rs):

/// Configuration for WAL backpressure.
///
/// When the WAL command channel's pending message count exceeds
/// `queue_depth_threshold`, signal writes are rejected with
/// `TidalError::Backpressure` to prevent unbounded memory growth
/// and give the writer thread time to drain.
#[derive(Debug, Clone, Copy)]
pub struct BackpressureConfig {
    /// Maximum pending messages in the WAL channel before rejecting.
    /// Default: 80% of `DEFAULT_CHANNEL_CAPACITY` (8000 out of 10000).
    pub queue_depth_threshold: usize,
    /// Suggested retry delay in milliseconds returned to the caller.
    /// Default: 50ms.
    pub retry_after_ms: u64,
}

impl Default for BackpressureConfig {
    fn default() -> Self {
        Self {
            queue_depth_threshold: 8_000,
            retry_after_ms: 50,
        }
    }
}

Store this config on TidalDb:

// In tidal/src/db/mod.rs:
backpressure_config: crate::load::BackpressureConfig,

5. Backpressure check in TidalDb::signal()

The TidalDb::signal() method currently writes to the WAL via the WalHandleWriter (which implements signals::WalWriter). The backpressure check must happen BEFORE the WAL enqueue, not inside the WAL writer, because:

  1. The WAL writer trait does not return typed errors (it returns signals::WalError)
  2. The check is a policy decision belonging to the database layer, not the WAL layer
// In tidal/src/db/signals.rs, in TidalDb::signal():

impl TidalDb {
    pub fn signal(
        &self,
        signal_type: &str,
        entity_id: EntityId,
        weight: f64,
        ts: Timestamp,
    ) -> crate::Result<()> {
        // Backpressure check: inspect WAL channel depth before enqueuing.
        // This is O(1) -- crossbeam::channel::bounded::len() is atomic.
        if let Ok(guard) = self.wal.lock()
            && let Some(wal) = guard.as_ref()
        {
            let queue_depth = wal.channel_len();
            if queue_depth >= self.backpressure_config.queue_depth_threshold {
                tracing::warn!(
                    queue_depth,
                    threshold = self.backpressure_config.queue_depth_threshold,
                    "WAL backpressure: rejecting signal write"
                );
                return Err(TidalError::Backpressure {
                    retry_after_ms: self.backpressure_config.retry_after_ms,
                });
            }
        }

        // ... existing signal write logic ...
    }
}

6. Expose channel length on WalHandle

The WalHandle currently does not expose the channel's pending message count. Add a method:

// In tidal/src/wal/mod.rs, add to WalHandle:

impl WalHandle {
    /// Return the number of pending commands in the writer channel.
    ///
    /// O(1) operation. Used by the backpressure check in `TidalDb::signal()`
    /// to detect queue saturation before enqueuing.
    #[must_use]
    pub fn channel_len(&self) -> usize {
        self.tx.len()
    }
}

crossbeam::channel::Sender::len() is documented as O(1) and returns the number of messages currently in the channel. This does not require holding a lock.

7. Re-export from lib.rs

The DegradationLevel should be accessible from the public API:

// In tidal/src/lib.rs, add:
pub use load::DegradationLevel;

Acceptance Criteria

  • Results.degradation_level field present and set by RetrieveExecutor
  • SearchResults.degradation_level field present and set by SearchExecutor
  • TidalError::Backpressure { retry_after_ms } variant added
  • BackpressureConfig with configurable threshold (default 8000) and retry delay (default 50ms)
  • WalHandle::channel_len() returns pending command count
  • Backpressure check in TidalDb::signal() rejects writes when queue exceeds threshold
  • Backpressure does NOT affect query reads (only signal writes)
  • DegradationLevel re-exported from tidaldb::DegradationLevel
  • All existing tests pass (Results construction updated with degradation_level: DegradationLevel::Full)
  • cargo clippy -D warnings clean

Test Strategy

#[cfg(test)]
#[allow(clippy::unwrap_used)]
mod tests {
    use super::*;

    #[test]
    fn results_includes_degradation_level() {
        let results = Results {
            items: vec![],
            next_cursor: None,
            total_candidates: 0,
            constraints_satisfied: true,
            warnings: vec![],
            session_snapshot: None,
            degradation_level: DegradationLevel::ReducedCandidates,
        };
        assert_eq!(
            results.degradation_level,
            DegradationLevel::ReducedCandidates
        );
    }

    #[test]
    fn search_results_includes_degradation_level() {
        let results = SearchResults {
            items: vec![],
            next_cursor: None,
            total_candidates: 0,
            constraints_satisfied: true,
            warnings: vec![],
            session_snapshot: None,
            degradation_level: DegradationLevel::Full,
        };
        assert_eq!(results.degradation_level, DegradationLevel::Full);
    }

    #[test]
    fn backpressure_error_display() {
        let err = TidalError::Backpressure { retry_after_ms: 50 };
        let msg = err.to_string();
        assert!(msg.contains("backpressure"));
        assert!(msg.contains("50"));
    }

    #[test]
    fn wal_channel_len_reports_zero_when_empty() {
        let dir = tempfile::tempdir().unwrap();
        let config = crate::wal::WalConfig {
            dir: dir.path().to_path_buf(),
            ..Default::default()
        };
        let (handle, _, _) = crate::wal::WalHandle::open(config).unwrap();
        // After open with no pending commands, len should be 0 (or very small).
        // The writer thread may have consumed any initial commands.
        assert!(handle.channel_len() < 10);
        handle.shutdown().unwrap();
    }

    #[test]
    fn backpressure_rejects_signal_when_queue_full() {
        // Integration test:
        // 1. Open a TidalDb with a very low backpressure threshold (e.g., 1).
        // 2. Flood the WAL channel by sending commands faster than the writer
        //    can drain them (or use a mock WAL that never consumes).
        // 3. Call db.signal() and assert TidalError::Backpressure is returned.
    }

    #[test]
    fn backpressure_does_not_affect_queries() {
        // Integration test:
        // Even when the WAL queue is saturated, retrieve() and search()
        // should still return Ok results (possibly with degraded quality,
        // but never a Backpressure error).
    }
}