# m7p2: Graceful Degradation, Rate Limiting, and Session Cleanup ## Delivers Automatic quality reduction under load pressure. 4-stage degradation order documented and enforced. Backpressure on write path. Per-agent token-bucket rate limiting. Session TTL auto-cleanup sweeper. All load behavior visible in query responses. ## Dependencies - m7p1 complete (observability, metrics, health check) - m4 sessions (active session tracking, `SessionHandle`, `SessionState`) - m1p4 WAL (write path, `WalHandle`, `WalSender`, channel-based group commit) - m2p5 query executor (`RetrieveExecutor`, `Results`) - m5p3 search executor (`SearchExecutor`, `SearchResults`) ## Research References - `docs/research/tidaldb_signal_ledger.md` -- signal write path, WAL durability guarantees - `CODING_GUIDELINES.md` Section 7 -- error handling, `Result` everywhere - `thoughts.md` -- lessons from Engram on graceful degradation - `ARCHITECTURE.md` -- single-node scaling philosophy ## Acceptance Criteria (Phase Level) - [ ] `DegradationLevel` enum: `Full`, `ReducedCandidates`, `CoarseAggregates`, `NoDiversity` - [ ] Load detection: `AtomicU64` tracking in-flight query count; thresholds: 200->ReducedCandidates, 500->CoarseAggregates, 1000->NoDiversity (configurable) - [ ] `ReducedCandidates`: ANN `top_k` reduced from 500 to 200; BM25 candidate limit halved - [ ] `CoarseAggregates`: windowed count reads use AllTime; velocity reads use 24h window - [ ] `NoDiversity`: diversity pass skipped entirely - [ ] Under 3x overload, all well-formed queries return results (no ServiceUnavailable) - [ ] Degradation level in `Results.degradation_level` and `SearchResults.degradation_level` - [ ] `TidalError::Backpressure { retry_after_ms: u64 }` when WAL queue depth exceeds threshold - [ ] Per-agent token-bucket rate limiting: `RateLimiter` with `DashMap<(AgentId, SessionId), TokenBucket>`; unlimited by default; configure via `RateLimiterConfig::limited(rate, burst)` in `TidalDb::builder()` - [ ] `TidalError::RateLimited { agent_id, limit, retry_after_ms }` on excess - [ ] Rate limiter does not affect non-session signal writes - [ ] Session TTL sweeper: runs every 60s; auto-closes expired sessions; `SessionSummary.auto_closed: bool` - [ ] Sweeper cancellable on db.close(); no dangling threads ## Task Execution Order ``` task-01 (DegradationLevel + LoadDetector) | +------+------+ | | | task-02 task-03 task-04 task-05 (exec) (resp+bp) (ratelimit) (sweeper) | | | | +------+------+------+ | task-06 (load test) ``` Tasks 02-05 can parallelize after task-01. Task-06 depends on all prior tasks. ## Module Location New module: `tidal/src/load/` with submodules: - `mod.rs` -- public re-exports (`DegradationLevel`, `LoadDetector`, `InFlightGuard`, `RateLimiter`) - `detector.rs` -- `LoadDetector` struct, `InFlightGuard` RAII type, degradation threshold config - `rate_limiter.rs` -- `RateLimiter`, `TokenBucket`, per-agent keying Modified modules: - `tidal/src/query/executor/mod.rs` -- `RetrieveExecutor` degradation branches - `tidal/src/query/search/executor.rs` -- `SearchExecutor` degradation branches - `tidal/src/query/retrieve/types.rs` -- `Results.degradation_level` field - `tidal/src/query/search/types.rs` -- `SearchResults.degradation_level` field - `tidal/src/schema/error.rs` -- `TidalError::Backpressure`, `TidalError::RateLimited` - `tidal/src/session/types.rs` -- `SessionSummary.auto_closed` field - `tidal/src/db/mod.rs` -- `LoadDetector` field, sweeper thread field - `tidal/src/db/sessions.rs` -- rate limiter check in `session_signal()`, sweeper logic - `tidal/src/lib.rs` -- `pub mod load;` re-export ## Notes - The `LoadDetector` uses only atomics -- no mutex on the hot path. `AtomicU64::fetch_add` on query entry, `AtomicU64::fetch_sub` via RAII `Drop` on exit. The `DegradationLevel` is computed from the current counter value at query entry time and threaded through the executor, not re-checked mid-pipeline. - The token-bucket `RateLimiter` is keyed by `(AgentId, SessionId)` so that each agent-session pair gets its own bucket. The refill is computed lazily at check time (no background thread), using `Instant::now()` delta. This avoids a timer thread and keeps the hot path lock-free via `DashMap` shard-level locking. - The session sweeper thread is a simple `loop { sleep(60s); scan; }` with a `shutdown: AtomicBool` checked each iteration. It calls `TidalDb::close_session_internal()` (a variant of `close_session` that does not require a `SessionHandle`) and sets the `auto_closed` flag. - Backpressure on the WAL write path is checked by inspecting the channel's `len()` against a configurable threshold. This is O(1) on `crossbeam::channel::bounded`. The check happens in `TidalDb::signal()` before enqueuing, not inside the WAL writer. - The `DegradationLevel` is `Copy + Clone + PartialEq + Eq + Debug` and has a `u8` repr for cheap embedding in response structs and tracing spans. ## Done When All 13 acceptance criteria above pass. `cargo test --manifest-path tidal/Cargo.toml` passes. Load test (task-06) verifies degradation progression under 3x simulated overload. No ServiceUnavailable errors under sustained load.