# m7p5: M7 UAT Integration Test ## Delivers End-to-end M7 UAT integration test suite (`tidal/tests/m7_uat.rs`) proving all production hardening capabilities work together. Crash recovery, graceful degradation, rate limiting, session auto-cleanup, observability (QueryStats + Prometheus metrics), RLHF export, and cross-session aggregation all exercised in a single comprehensive test file with separate `#[test]` functions. ## Dependencies - m7p1 (Crash Recovery Hardening) -- `CrashPoint` enum, WAL compaction, checkpoint BLAKE3 integrity, crash fencing for M6 state - m7p2 (Graceful Degradation, Rate Limiting, Session Cleanup) -- `DegradationLevel`, `TidalError::RateLimited`, `TidalError::Backpressure`, session TTL auto-cleanup sweeper, `SessionSummary.auto_closed` - m7p3 (Performance at Scale) -- benchmarks and optimizations; UAT validates behaviour, not scale numbers - m7p4 (Operational Visibility) -- `QueryStats`, Prometheus metrics export, `db.export_signals()`, `db.user_session_summary()` ## Research References - All M7 phase specifications in `docs/planning/ROADMAP.md` - `docs/research/tidaldb_wal.md` -- crash recovery, segment format - `docs/research/tidaldb_signal_ledger.md` -- checkpoint format, running-score formula - `thoughts.md` Part V -- graceful degradation, operational simplicity ## Acceptance Criteria (Phase Level) - [ ] `tidal/tests/m7_uat.rs` with separate `#[test]` functions per UAT step - [ ] Crash recovery tests: write items + signals; simulate crash at WAL-write, checkpoint, and with M6 state; verify recovery produces correct state; verify hard negatives (hidden items, blocked creators) never leak after any crash scenario - [ ] Session cleanup test: create session with 30s TTL; wait 35s; verify sweeper auto-closed the session; verify `auto_closed: true` in summary - [ ] Degradation test: simulate concurrent load above threshold; verify `degradation_level` in response matches expected level; verify all queries still return results - [ ] Rate limiting test: configure 10 signals/sec rate limit; write 50 signals in 1 second; first 10 succeed; remaining return `TidalError::RateLimited`; other sessions unaffected - [ ] QueryStats test: execute RETRIEVE and SEARCH; verify `stats` field populated with non-zero `candidates_considered`, `scoring_time_us`, `total_time_us` - [ ] Metrics test: verify Prometheus text output contains expected metric names (`tidaldb_signal_hot_entries`, `tidaldb_wal_lag_bytes`, etc.) - [ ] Export + aggregation test: write session signals; close session; `export_signals()` returns expected events; `user_session_summary()` returns correct counts - [ ] All prior UAT suites pass (m2_uat, m3_uat, m4_uat, m5_uat, m6_uat) -- no regressions - [ ] No individual test exceeds 60 seconds ## Task Execution Order ``` task-01 (Crash Recovery UAT) ──┐ ├── task-04 (Regression Gate) task-02 (Degradation + Rate ──┤ Limiting + Cleanup) │ │ task-03 (Observability + ──┘ Export UAT) ``` Tasks 01, 02, 03 can be implemented in parallel (they are independent test functions in the same file). Task 04 runs last -- it verifies nothing regressed across all prior UAT suites. ## Tasks | # | Task | Delivers | Complexity | |---|------|----------|------------| | 01 | Crash recovery UAT tests | 3 tests: crash at WAL-write, crash at checkpoint, crash with M6 state; verify correct recovery and hard negative invariant | L | | 02 | Degradation + rate limiting + session cleanup UAT tests | 3 tests: degradation progression, rate limiting isolation, session auto-cleanup | L | | 03 | Observability + export UAT tests | 3 tests: QueryStats populated, metrics content, RLHF export + session aggregation | M | | 04 | Regression gate | Verify all prior UAT suites pass (m2_uat through m6_uat) | S | ## Notes - Tests use small datasets (100-500 items, not 1M) to keep individual test runtime under 60 seconds. Scale performance is already validated by m7p3 benchmarks. - Crash recovery tests use `TempTidalHome` for on-disk durability and `#[cfg(feature = "test-utils")]` for crash injection hooks. - The session cleanup test is the only test that involves a wall-clock wait (35s). All other tests are compute-bound and complete in under 5 seconds. - Rate limiting tests use a low token count (10/sec) so they do not need to actually wait -- they exhaust the bucket immediately. - The regression gate (task-04) is a shell-level check documented in the task file; it does not add Rust code. ## Done When 1. `cargo test --manifest-path tidal/Cargo.toml --test m7_uat` passes with all tests green. 2. Each individual test completes in under 60 seconds. 3. `cargo test --manifest-path tidal/Cargo.toml --test m2_uat --test m3_uat --test m4_uat --test m5_uat --test m6_uat` all pass -- no regressions. 4. `cargo clippy --manifest-path tidal/Cargo.toml -- -D warnings` passes.