4.8 KiB
4.8 KiB
m7p5: M7 UAT Integration Test
Delivers
End-to-end M7 UAT integration test suite (tidal/tests/m7_uat.rs) proving all production hardening capabilities work together. Crash recovery, graceful degradation, rate limiting, session auto-cleanup, observability (QueryStats + Prometheus metrics), RLHF export, and cross-session aggregation all exercised in a single comprehensive test file with separate #[test] functions.
Dependencies
- m7p1 (Crash Recovery Hardening) --
CrashPointenum, WAL compaction, checkpoint BLAKE3 integrity, crash fencing for M6 state - m7p2 (Graceful Degradation, Rate Limiting, Session Cleanup) --
DegradationLevel,TidalError::RateLimited,TidalError::Backpressure, session TTL auto-cleanup sweeper,SessionSummary.auto_closed - m7p3 (Performance at Scale) -- benchmarks and optimizations; UAT validates behaviour, not scale numbers
- m7p4 (Operational Visibility) --
QueryStats, Prometheus metrics export,db.export_signals(),db.user_session_summary()
Research References
- All M7 phase specifications in
docs/planning/ROADMAP.md docs/research/tidaldb_wal.md-- crash recovery, segment formatdocs/research/tidaldb_signal_ledger.md-- checkpoint format, running-score formulathoughts.mdPart V -- graceful degradation, operational simplicity
Acceptance Criteria (Phase Level)
tidal/tests/m7_uat.rswith separate#[test]functions per UAT step- Crash recovery tests: write items + signals; simulate crash at WAL-write, checkpoint, and with M6 state; verify recovery produces correct state; verify hard negatives (hidden items, blocked creators) never leak after any crash scenario
- Session cleanup test: create session with 30s TTL; wait 35s; verify sweeper auto-closed the session; verify
auto_closed: truein summary - Degradation test: simulate concurrent load above threshold; verify
degradation_levelin response matches expected level; verify all queries still return results - Rate limiting test: configure 10 signals/sec rate limit; write 50 signals in 1 second; first 10 succeed; remaining return
TidalError::RateLimited; other sessions unaffected - QueryStats test: execute RETRIEVE and SEARCH; verify
statsfield populated with non-zerocandidates_considered,scoring_time_us,total_time_us - Metrics test: verify Prometheus text output contains expected metric names (
tidaldb_signal_hot_entries,tidaldb_wal_lag_bytes, etc.) - Export + aggregation test: write session signals; close session;
export_signals()returns expected events;user_session_summary()returns correct counts - All prior UAT suites pass (m2_uat, m3_uat, m4_uat, m5_uat, m6_uat) -- no regressions
- No individual test exceeds 60 seconds
Task Execution Order
task-01 (Crash Recovery UAT) ──┐
├── task-04 (Regression Gate)
task-02 (Degradation + Rate ──┤
Limiting + Cleanup) │
│
task-03 (Observability + ──┘
Export UAT)
Tasks 01, 02, 03 can be implemented in parallel (they are independent test functions in the same file). Task 04 runs last -- it verifies nothing regressed across all prior UAT suites.
Tasks
| # | Task | Delivers | Complexity |
|---|---|---|---|
| 01 | Crash recovery UAT tests | 3 tests: crash at WAL-write, crash at checkpoint, crash with M6 state; verify correct recovery and hard negative invariant | L |
| 02 | Degradation + rate limiting + session cleanup UAT tests | 3 tests: degradation progression, rate limiting isolation, session auto-cleanup | L |
| 03 | Observability + export UAT tests | 3 tests: QueryStats populated, metrics content, RLHF export + session aggregation | M |
| 04 | Regression gate | Verify all prior UAT suites pass (m2_uat through m6_uat) | S |
Notes
- Tests use small datasets (100-500 items, not 1M) to keep individual test runtime under 60 seconds. Scale performance is already validated by m7p3 benchmarks.
- Crash recovery tests use
TempTidalHomefor on-disk durability and#[cfg(feature = "test-utils")]for crash injection hooks. - The session cleanup test is the only test that involves a wall-clock wait (35s). All other tests are compute-bound and complete in under 5 seconds.
- Rate limiting tests use a low token count (10/sec) so they do not need to actually wait -- they exhaust the bucket immediately.
- The regression gate (task-04) is a shell-level check documented in the task file; it does not add Rust code.
Done When
cargo test --manifest-path tidal/Cargo.toml --test m7_uatpasses with all tests green.- Each individual test completes in under 60 seconds.
cargo test --manifest-path tidal/Cargo.toml --test m2_uat --test m3_uat --test m4_uat --test m5_uat --test m6_uatall pass -- no regressions.cargo clippy --manifest-path tidal/Cargo.toml -- -D warningspasses.