tidaldb/docs/planning/milestone-7/phase-5/OVERVIEW.md
2026-02-23 22:41:16 -07:00

4.8 KiB

m7p5: M7 UAT Integration Test

Delivers

End-to-end M7 UAT integration test suite (tidal/tests/m7_uat.rs) proving all production hardening capabilities work together. Crash recovery, graceful degradation, rate limiting, session auto-cleanup, observability (QueryStats + Prometheus metrics), RLHF export, and cross-session aggregation all exercised in a single comprehensive test file with separate #[test] functions.

Dependencies

  • m7p1 (Crash Recovery Hardening) -- CrashPoint enum, WAL compaction, checkpoint BLAKE3 integrity, crash fencing for M6 state
  • m7p2 (Graceful Degradation, Rate Limiting, Session Cleanup) -- DegradationLevel, TidalError::RateLimited, TidalError::Backpressure, session TTL auto-cleanup sweeper, SessionSummary.auto_closed
  • m7p3 (Performance at Scale) -- benchmarks and optimizations; UAT validates behaviour, not scale numbers
  • m7p4 (Operational Visibility) -- QueryStats, Prometheus metrics export, db.export_signals(), db.user_session_summary()

Research References

  • All M7 phase specifications in docs/planning/ROADMAP.md
  • docs/research/tidaldb_wal.md -- crash recovery, segment format
  • docs/research/tidaldb_signal_ledger.md -- checkpoint format, running-score formula
  • thoughts.md Part V -- graceful degradation, operational simplicity

Acceptance Criteria (Phase Level)

  • tidal/tests/m7_uat.rs with separate #[test] functions per UAT step
  • Crash recovery tests: write items + signals; simulate crash at WAL-write, checkpoint, and with M6 state; verify recovery produces correct state; verify hard negatives (hidden items, blocked creators) never leak after any crash scenario
  • Session cleanup test: create session with 30s TTL; wait 35s; verify sweeper auto-closed the session; verify auto_closed: true in summary
  • Degradation test: simulate concurrent load above threshold; verify degradation_level in response matches expected level; verify all queries still return results
  • Rate limiting test: configure 10 signals/sec rate limit; write 50 signals in 1 second; first 10 succeed; remaining return TidalError::RateLimited; other sessions unaffected
  • QueryStats test: execute RETRIEVE and SEARCH; verify stats field populated with non-zero candidates_considered, scoring_time_us, total_time_us
  • Metrics test: verify Prometheus text output contains expected metric names (tidaldb_signal_hot_entries, tidaldb_wal_lag_bytes, etc.)
  • Export + aggregation test: write session signals; close session; export_signals() returns expected events; user_session_summary() returns correct counts
  • All prior UAT suites pass (m2_uat, m3_uat, m4_uat, m5_uat, m6_uat) -- no regressions
  • No individual test exceeds 60 seconds

Task Execution Order

task-01 (Crash Recovery UAT)  ──┐
                                ├──  task-04 (Regression Gate)
task-02 (Degradation + Rate   ──┤
         Limiting + Cleanup)    │
                                │
task-03 (Observability +       ──┘
         Export UAT)

Tasks 01, 02, 03 can be implemented in parallel (they are independent test functions in the same file). Task 04 runs last -- it verifies nothing regressed across all prior UAT suites.

Tasks

# Task Delivers Complexity
01 Crash recovery UAT tests 3 tests: crash at WAL-write, crash at checkpoint, crash with M6 state; verify correct recovery and hard negative invariant L
02 Degradation + rate limiting + session cleanup UAT tests 3 tests: degradation progression, rate limiting isolation, session auto-cleanup L
03 Observability + export UAT tests 3 tests: QueryStats populated, metrics content, RLHF export + session aggregation M
04 Regression gate Verify all prior UAT suites pass (m2_uat through m6_uat) S

Notes

  • Tests use small datasets (100-500 items, not 1M) to keep individual test runtime under 60 seconds. Scale performance is already validated by m7p3 benchmarks.
  • Crash recovery tests use TempTidalHome for on-disk durability and #[cfg(feature = "test-utils")] for crash injection hooks.
  • The session cleanup test is the only test that involves a wall-clock wait (35s). All other tests are compute-bound and complete in under 5 seconds.
  • Rate limiting tests use a low token count (10/sec) so they do not need to actually wait -- they exhaust the bucket immediately.
  • The regression gate (task-04) is a shell-level check documented in the task file; it does not add Rust code.

Done When

  1. cargo test --manifest-path tidal/Cargo.toml --test m7_uat passes with all tests green.
  2. Each individual test completes in under 60 seconds.
  3. cargo test --manifest-path tidal/Cargo.toml --test m2_uat --test m3_uat --test m4_uat --test m5_uat --test m6_uat all pass -- no regressions.
  4. cargo clippy --manifest-path tidal/Cargo.toml -- -D warnings passes.