tidaldb/docs/planning/milestone-7/phase-5/OVERVIEW.md

# m7p5: M7 UAT Integration Test

## Delivers

End-to-end M7 UAT integration test suite (`tidal/tests/m7_uat.rs`) proving all production hardening capabilities work together. Crash recovery, graceful degradation, rate limiting, session auto-cleanup, observability (QueryStats + Prometheus metrics), RLHF export, and cross-session aggregation all exercised in a single comprehensive test file with separate `#[test]` functions.

## Dependencies

- m7p1 (Crash Recovery Hardening) -- `CrashPoint` enum, WAL compaction, checkpoint BLAKE3 integrity, crash fencing for M6 state
- m7p2 (Graceful Degradation, Rate Limiting, Session Cleanup) -- `DegradationLevel`, `TidalError::RateLimited`, `TidalError::Backpressure`, session TTL auto-cleanup sweeper, `SessionSummary.auto_closed`
- m7p3 (Performance at Scale) -- benchmarks and optimizations; UAT validates behaviour, not scale numbers
- m7p4 (Operational Visibility) -- `QueryStats`, Prometheus metrics export, `db.export_signals()`, `db.user_session_summary()`

## Research References

- All M7 phase specifications in `docs/planning/ROADMAP.md`
- `docs/research/tidaldb_wal.md` -- crash recovery, segment format
- `docs/research/tidaldb_signal_ledger.md` -- checkpoint format, running-score formula
- `thoughts.md` Part V -- graceful degradation, operational simplicity

## Acceptance Criteria (Phase Level)

- [ ] `tidal/tests/m7_uat.rs` with separate `#[test]` functions per UAT step
- [ ] Crash recovery tests: write items + signals; simulate crash at WAL-write, checkpoint, and with M6 state; verify recovery produces correct state; verify hard negatives (hidden items, blocked creators) never leak after any crash scenario
- [ ] Session cleanup test: create session with 30s TTL; wait 35s; verify sweeper auto-closed the session; verify `auto_closed: true` in summary
- [ ] Degradation test: simulate concurrent load above threshold; verify `degradation_level` in response matches expected level; verify all queries still return results
- [ ] Rate limiting test: configure 10 signals/sec rate limit; write 50 signals in 1 second; first 10 succeed; remaining return `TidalError::RateLimited`; other sessions unaffected
- [ ] QueryStats test: execute RETRIEVE and SEARCH; verify `stats` field populated with non-zero `candidates_considered`, `scoring_time_us`, `total_time_us`
- [ ] Metrics test: verify Prometheus text output contains expected metric names (`tidaldb_signal_hot_entries`, `tidaldb_wal_lag_bytes`, etc.)
- [ ] Export + aggregation test: write session signals; close session; `export_signals()` returns expected events; `user_session_summary()` returns correct counts
- [ ] All prior UAT suites pass (m2_uat, m3_uat, m4_uat, m5_uat, m6_uat) -- no regressions
- [ ] No individual test exceeds 60 seconds

## Task Execution Order

```
task-01 (Crash Recovery UAT)  ──┐
                                ├──  task-04 (Regression Gate)
task-02 (Degradation + Rate   ──┤
         Limiting + Cleanup)    │
                                │
task-03 (Observability +       ──┘
         Export UAT)
```

Tasks 01, 02, 03 can be implemented in parallel (they are independent test functions in the same file). Task 04 runs last -- it verifies nothing regressed across all prior UAT suites.

## Tasks

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Crash recovery UAT tests | 3 tests: crash at WAL-write, crash at checkpoint, crash with M6 state; verify correct recovery and hard negative invariant | L |
| 02 | Degradation + rate limiting + session cleanup UAT tests | 3 tests: degradation progression, rate limiting isolation, session auto-cleanup | L |
| 03 | Observability + export UAT tests | 3 tests: QueryStats populated, metrics content, RLHF export + session aggregation | M |
| 04 | Regression gate | Verify all prior UAT suites pass (m2_uat through m6_uat) | S |

## Notes

- Tests use small datasets (100-500 items, not 1M) to keep individual test runtime under 60 seconds. Scale performance is already validated by m7p3 benchmarks.
- Crash recovery tests use `TempTidalHome` for on-disk durability and `#[cfg(feature = "test-utils")]` for crash injection hooks.
- The session cleanup test is the only test that involves a wall-clock wait (35s). All other tests are compute-bound and complete in under 5 seconds.
- Rate limiting tests use a low token count (10/sec) so they do not need to actually wait -- they exhaust the bucket immediately.
- The regression gate (task-04) is a shell-level check documented in the task file; it does not add Rust code.

## Done When

1. `cargo test --manifest-path tidal/Cargo.toml --test m7_uat` passes with all tests green.
2. Each individual test completes in under 60 seconds.
3. `cargo test --manifest-path tidal/Cargo.toml --test m2_uat --test m3_uat --test m4_uat --test m5_uat --test m6_uat` all pass -- no regressions.
4. `cargo clippy --manifest-path tidal/Cargo.toml -- -D warnings` passes.