tidaldb/docs/planning/milestone-7/phase-2/OVERVIEW.md
2026-02-23 22:41:16 -07:00

5.1 KiB

m7p2: Graceful Degradation, Rate Limiting, and Session Cleanup

Delivers

Automatic quality reduction under load pressure. 4-stage degradation order documented and enforced. Backpressure on write path. Per-agent token-bucket rate limiting. Session TTL auto-cleanup sweeper. All load behavior visible in query responses.

Dependencies

  • m7p1 complete (observability, metrics, health check)
  • m4 sessions (active session tracking, SessionHandle, SessionState)
  • m1p4 WAL (write path, WalHandle, WalSender, channel-based group commit)
  • m2p5 query executor (RetrieveExecutor, Results)
  • m5p3 search executor (SearchExecutor, SearchResults)

Research References

  • docs/research/tidaldb_signal_ledger.md -- signal write path, WAL durability guarantees
  • CODING_GUIDELINES.md Section 7 -- error handling, Result<T, E> everywhere
  • thoughts.md -- lessons from Engram on graceful degradation
  • ARCHITECTURE.md -- single-node scaling philosophy

Acceptance Criteria (Phase Level)

  • DegradationLevel enum: Full, ReducedCandidates, CoarseAggregates, NoDiversity
  • Load detection: AtomicU64 tracking in-flight query count; thresholds: 200->ReducedCandidates, 500->CoarseAggregates, 1000->NoDiversity (configurable)
  • ReducedCandidates: ANN top_k reduced from 500 to 200; BM25 candidate limit halved
  • CoarseAggregates: windowed count reads use AllTime; velocity reads use 24h window
  • NoDiversity: diversity pass skipped entirely
  • Under 3x overload, all well-formed queries return results (no ServiceUnavailable)
  • Degradation level in Results.degradation_level and SearchResults.degradation_level
  • TidalError::Backpressure { retry_after_ms: u64 } when WAL queue depth exceeds threshold
  • Per-agent token-bucket rate limiting: RateLimiter with DashMap<(AgentId, SessionId), TokenBucket>; unlimited by default; configure via RateLimiterConfig::limited(rate, burst) in TidalDb::builder()
  • TidalError::RateLimited { agent_id, limit, retry_after_ms } on excess
  • Rate limiter does not affect non-session signal writes
  • Session TTL sweeper: runs every 60s; auto-closes expired sessions; SessionSummary.auto_closed: bool
  • Sweeper cancellable on db.close(); no dangling threads

Task Execution Order

task-01 (DegradationLevel + LoadDetector)
    |
    +------+------+
    |      |      |
task-02  task-03  task-04  task-05
(exec)  (resp+bp) (ratelimit) (sweeper)
    |      |      |      |
    +------+------+------+
           |
        task-06 (load test)

Tasks 02-05 can parallelize after task-01. Task-06 depends on all prior tasks.

Module Location

New module: tidal/src/load/ with submodules:

  • mod.rs -- public re-exports (DegradationLevel, LoadDetector, InFlightGuard, RateLimiter)
  • detector.rs -- LoadDetector struct, InFlightGuard RAII type, degradation threshold config
  • rate_limiter.rs -- RateLimiter, TokenBucket, per-agent keying

Modified modules:

  • tidal/src/query/executor/mod.rs -- RetrieveExecutor degradation branches
  • tidal/src/query/search/executor.rs -- SearchExecutor degradation branches
  • tidal/src/query/retrieve/types.rs -- Results.degradation_level field
  • tidal/src/query/search/types.rs -- SearchResults.degradation_level field
  • tidal/src/schema/error.rs -- TidalError::Backpressure, TidalError::RateLimited
  • tidal/src/session/types.rs -- SessionSummary.auto_closed field
  • tidal/src/db/mod.rs -- LoadDetector field, sweeper thread field
  • tidal/src/db/sessions.rs -- rate limiter check in session_signal(), sweeper logic
  • tidal/src/lib.rs -- pub mod load; re-export

Notes

  • The LoadDetector uses only atomics -- no mutex on the hot path. AtomicU64::fetch_add on query entry, AtomicU64::fetch_sub via RAII Drop on exit. The DegradationLevel is computed from the current counter value at query entry time and threaded through the executor, not re-checked mid-pipeline.
  • The token-bucket RateLimiter is keyed by (AgentId, SessionId) so that each agent-session pair gets its own bucket. The refill is computed lazily at check time (no background thread), using Instant::now() delta. This avoids a timer thread and keeps the hot path lock-free via DashMap shard-level locking.
  • The session sweeper thread is a simple loop { sleep(60s); scan; } with a shutdown: AtomicBool checked each iteration. It calls TidalDb::close_session_internal() (a variant of close_session that does not require a SessionHandle) and sets the auto_closed flag.
  • Backpressure on the WAL write path is checked by inspecting the channel's len() against a configurable threshold. This is O(1) on crossbeam::channel::bounded. The check happens in TidalDb::signal() before enqueuing, not inside the WAL writer.
  • The DegradationLevel is Copy + Clone + PartialEq + Eq + Debug and has a u8 repr for cheap embedding in response structs and tracing spans.

Done When

All 13 acceptance criteria above pass. cargo test --manifest-path tidal/Cargo.toml passes. Load test (task-06) verifies degradation progression under 3x simulated overload. No ServiceUnavailable errors under sustained load.