5.1 KiB
5.1 KiB
m7p2: Graceful Degradation, Rate Limiting, and Session Cleanup
Delivers
Automatic quality reduction under load pressure. 4-stage degradation order documented and enforced. Backpressure on write path. Per-agent token-bucket rate limiting. Session TTL auto-cleanup sweeper. All load behavior visible in query responses.
Dependencies
- m7p1 complete (observability, metrics, health check)
- m4 sessions (active session tracking,
SessionHandle,SessionState) - m1p4 WAL (write path,
WalHandle,WalSender, channel-based group commit) - m2p5 query executor (
RetrieveExecutor,Results) - m5p3 search executor (
SearchExecutor,SearchResults)
Research References
docs/research/tidaldb_signal_ledger.md-- signal write path, WAL durability guaranteesCODING_GUIDELINES.mdSection 7 -- error handling,Result<T, E>everywherethoughts.md-- lessons from Engram on graceful degradationARCHITECTURE.md-- single-node scaling philosophy
Acceptance Criteria (Phase Level)
DegradationLevelenum:Full,ReducedCandidates,CoarseAggregates,NoDiversity- Load detection:
AtomicU64tracking in-flight query count; thresholds: 200->ReducedCandidates, 500->CoarseAggregates, 1000->NoDiversity (configurable) ReducedCandidates: ANNtop_kreduced from 500 to 200; BM25 candidate limit halvedCoarseAggregates: windowed count reads use AllTime; velocity reads use 24h windowNoDiversity: diversity pass skipped entirely- Under 3x overload, all well-formed queries return results (no ServiceUnavailable)
- Degradation level in
Results.degradation_levelandSearchResults.degradation_level TidalError::Backpressure { retry_after_ms: u64 }when WAL queue depth exceeds threshold- Per-agent token-bucket rate limiting:
RateLimiterwithDashMap<(AgentId, SessionId), TokenBucket>; unlimited by default; configure viaRateLimiterConfig::limited(rate, burst)inTidalDb::builder() TidalError::RateLimited { agent_id, limit, retry_after_ms }on excess- Rate limiter does not affect non-session signal writes
- Session TTL sweeper: runs every 60s; auto-closes expired sessions;
SessionSummary.auto_closed: bool - Sweeper cancellable on db.close(); no dangling threads
Task Execution Order
task-01 (DegradationLevel + LoadDetector)
|
+------+------+
| | |
task-02 task-03 task-04 task-05
(exec) (resp+bp) (ratelimit) (sweeper)
| | | |
+------+------+------+
|
task-06 (load test)
Tasks 02-05 can parallelize after task-01. Task-06 depends on all prior tasks.
Module Location
New module: tidal/src/load/ with submodules:
mod.rs-- public re-exports (DegradationLevel,LoadDetector,InFlightGuard,RateLimiter)detector.rs--LoadDetectorstruct,InFlightGuardRAII type, degradation threshold configrate_limiter.rs--RateLimiter,TokenBucket, per-agent keying
Modified modules:
tidal/src/query/executor/mod.rs--RetrieveExecutordegradation branchestidal/src/query/search/executor.rs--SearchExecutordegradation branchestidal/src/query/retrieve/types.rs--Results.degradation_levelfieldtidal/src/query/search/types.rs--SearchResults.degradation_levelfieldtidal/src/schema/error.rs--TidalError::Backpressure,TidalError::RateLimitedtidal/src/session/types.rs--SessionSummary.auto_closedfieldtidal/src/db/mod.rs--LoadDetectorfield, sweeper thread fieldtidal/src/db/sessions.rs-- rate limiter check insession_signal(), sweeper logictidal/src/lib.rs--pub mod load;re-export
Notes
- The
LoadDetectoruses only atomics -- no mutex on the hot path.AtomicU64::fetch_addon query entry,AtomicU64::fetch_subvia RAIIDropon exit. TheDegradationLevelis computed from the current counter value at query entry time and threaded through the executor, not re-checked mid-pipeline. - The token-bucket
RateLimiteris keyed by(AgentId, SessionId)so that each agent-session pair gets its own bucket. The refill is computed lazily at check time (no background thread), usingInstant::now()delta. This avoids a timer thread and keeps the hot path lock-free viaDashMapshard-level locking. - The session sweeper thread is a simple
loop { sleep(60s); scan; }with ashutdown: AtomicBoolchecked each iteration. It callsTidalDb::close_session_internal()(a variant ofclose_sessionthat does not require aSessionHandle) and sets theauto_closedflag. - Backpressure on the WAL write path is checked by inspecting the channel's
len()against a configurable threshold. This is O(1) oncrossbeam::channel::bounded. The check happens inTidalDb::signal()before enqueuing, not inside the WAL writer. - The
DegradationLevelisCopy + Clone + PartialEq + Eq + Debugand has au8repr for cheap embedding in response structs and tracing spans.
Done When
All 13 acceptance criteria above pass. cargo test --manifest-path tidal/Cargo.toml passes. Load test (task-06) verifies degradation progression under 3x simulated overload. No ServiceUnavailable errors under sustained load.