tidaldb/docs/planning/milestone-7/phase-5/task-03-observability-export-uat.md
2026-02-23 22:41:16 -07:00

357 lines
13 KiB
Markdown

# Task 03: Observability + Export UAT Tests
## Delivers
Three integration tests in `tidal/tests/m7_uat.rs` proving operational visibility features work end-to-end:
1. **`uat_query_stats_populated`** -- Execute RETRIEVE and SEARCH; verify the `stats` field on both results contains non-zero `candidates_considered`, `scoring_time_us`, `total_time_us`, and a valid `profile_name`.
2. **`uat_metrics_content`** -- Verify Prometheus text output contains expected metric names (`tidaldb_signal_hot_entries`, `tidaldb_wal_lag_bytes`, etc.) and is well-formed.
3. **`uat_export_and_session_aggregation`** -- Write session signals, close the session, call `export_signals()` to verify events are present, call `user_session_summary()` to verify aggregated counts.
## Complexity: M
## Dependencies
- m7p4 complete (`QueryStats` struct, `Results.stats`, `SearchResults.stats`, Prometheus metrics export, `db.export_signals()`, `db.user_session_summary()`)
- m7p2 complete (`DegradationLevel` used in `QueryStats`)
- m7p1 complete (WAL segments readable for `export_signals`)
## Technical Design
### Test 7: QueryStats populated on RETRIEVE and SEARCH
```rust
#[test]
fn uat_query_stats_populated() {
let db = TidalDb::builder()
.ephemeral()
.with_schema(m7_uat_schema())
.open()
.unwrap();
let now = Timestamp::now();
// Write 100 items with metadata + signals so both RETRIEVE and SEARCH
// have non-trivial work to do.
for id in 1..=100u64 {
let mut meta = HashMap::new();
meta.insert("title".to_string(), format!("Stats Item {id}"));
meta.insert("description".to_string(), format!("Description for item {id}"));
meta.insert("category".to_string(), "test".to_string());
db.write_item_with_metadata(EntityId::new(id), &meta).unwrap();
}
for id in 1..=50u64 {
for _ in 0..5 {
db.signal("view", EntityId::new(id), 1.0, now).unwrap();
}
}
// Flush text index so SEARCH has results.
db.flush_text_index().unwrap();
// -- RETRIEVE stats --
let retrieve_results = db
.retrieve(
&Retrieve::builder()
.profile("trending")
.limit(20)
.build()
.unwrap(),
)
.unwrap();
let stats = &retrieve_results.stats;
assert!(
stats.candidates_considered > 0,
"RETRIEVE stats.candidates_considered should be > 0, got {}",
stats.candidates_considered
);
assert!(
stats.total_time_us > 0,
"RETRIEVE stats.total_time_us should be > 0, got {}",
stats.total_time_us
);
assert!(
stats.scoring_time_us > 0,
"RETRIEVE stats.scoring_time_us should be > 0, got {}",
stats.scoring_time_us
);
assert!(
!stats.profile_name.is_empty(),
"RETRIEVE stats.profile_name should not be empty"
);
assert_eq!(
stats.profile_name, "trending",
"RETRIEVE stats.profile_name should be 'trending'"
);
// -- SEARCH stats --
let search_results = db
.search(
&Search::builder()
.query("Stats Item")
.limit(20)
.build()
.unwrap(),
)
.unwrap();
let search_stats = &search_results.stats;
assert!(
search_stats.candidates_considered > 0,
"SEARCH stats.candidates_considered should be > 0, got {}",
search_stats.candidates_considered
);
assert!(
search_stats.total_time_us > 0,
"SEARCH stats.total_time_us should be > 0, got {}",
search_stats.total_time_us
);
}
```
### Test 8: Prometheus metrics content
```rust
#[test]
fn uat_metrics_content() {
let db = TidalDb::builder()
.ephemeral()
.with_schema(m7_uat_schema())
.open()
.unwrap();
let now = Timestamp::now();
// Write some data so metrics have non-zero values.
for id in 1..=50u64 {
let mut meta = HashMap::new();
meta.insert("title".to_string(), format!("Metrics Item {id}"));
db.write_item_with_metadata(EntityId::new(id), &meta).unwrap();
}
for id in 1..=20u64 {
db.signal("view", EntityId::new(id), 1.0, now).unwrap();
}
// Render Prometheus text metrics.
let metrics_text = db.render_metrics();
// Verify the output is non-empty.
assert!(
!metrics_text.is_empty(),
"metrics output should not be empty"
);
// Verify expected metric names are present.
let expected_metrics = [
"tidaldb_signal_hot_entries",
"tidaldb_signal_writes_total",
];
for metric_name in &expected_metrics {
assert!(
metrics_text.contains(metric_name),
"metrics output should contain '{metric_name}', got:\n{metrics_text}"
);
}
// Verify Prometheus text format: lines starting with # are comments/HELP/TYPE,
// data lines are "metric_name{labels} value" or "metric_name value".
for line in metrics_text.lines() {
if line.is_empty() || line.starts_with('#') {
continue;
}
// Data line: should have at least a metric name and a numeric value.
let parts: Vec<&str> = line.split_whitespace().collect();
assert!(
parts.len() >= 2,
"malformed Prometheus line (expected 'name value'): {line}"
);
// The last token should be parseable as a number.
let value_str = parts.last().unwrap();
assert!(
value_str.parse::<f64>().is_ok(),
"Prometheus value should be numeric, got '{value_str}' in line: {line}"
);
}
}
```
### Test 9: RLHF export + cross-session aggregation
```rust
#[test]
fn uat_export_and_session_aggregation() {
let mut builder = SchemaBuilder::new();
for &(name, half_life_days) in &[("view", 7), ("like", 14), ("skip", 1)] {
let _ = builder
.signal(
name,
EntityKind::Item,
DecaySpec::Exponential {
half_life: Duration::from_secs(half_life_days * 24 * 3600),
},
)
.windows(&[Window::AllTime])
.velocity(false)
.add();
}
builder.session_policy(
"default",
tidaldb::schema::AgentPolicy::builder()
.max_session_duration(Duration::from_secs(300))
.build(),
);
let schema = builder.build().unwrap();
let db = TidalDb::builder()
.ephemeral()
.with_schema(schema)
.open()
.unwrap();
let now = Timestamp::now();
// Write items.
for id in 1..=10u64 {
let mut meta = HashMap::new();
meta.insert("title".to_string(), format!("Export Item {id}"));
db.write_item_with_metadata(EntityId::new(id), &meta).unwrap();
}
let user_id = 42u64;
// -- Session 1: 5 views + 2 likes --
let session_1 = db
.start_session(user_id, "agent_export", "default", HashMap::new())
.unwrap();
for id in 1..=5u64 {
db.session_signal(&session_1, "view", EntityId::new(id), 1.0, now, None)
.unwrap();
}
db.session_signal(&session_1, "like", EntityId::new(1), 1.0, now, Some("loved it".into()))
.unwrap();
db.session_signal(&session_1, "like", EntityId::new(2), 1.0, now, None)
.unwrap();
let summary_1 = db.close_session(session_1).unwrap();
assert_eq!(summary_1.signals_written, 7, "session 1 should have 7 signals");
// -- Session 2: 3 views + 1 skip --
let session_2 = db
.start_session(user_id, "agent_export", "default", HashMap::new())
.unwrap();
for id in 6..=8u64 {
db.session_signal(&session_2, "view", EntityId::new(id), 1.0, now, None)
.unwrap();
}
db.session_signal(&session_2, "skip", EntityId::new(9), 1.0, now, None)
.unwrap();
let summary_2 = db.close_session(session_2).unwrap();
assert_eq!(summary_2.signals_written, 4, "session 2 should have 4 signals");
// -- Export signals --
let exported = db
.export_signals(tidaldb::ExportRequest {
user_id: Some(user_id),
signal_types: vec!["view".into(), "like".into(), "skip".into()],
since: Timestamp::from_nanos(0),
until: Timestamp::from_nanos(u64::MAX),
format: tidaldb::ExportFormat::JsonLines,
})
.unwrap();
// Should have at least 11 exported events (7 from session 1 + 4 from session 2).
assert!(
exported.len() >= 11,
"export should return at least 11 signal events, got {}",
exported.len()
);
// Verify the annotation survived export.
let annotated = exported
.iter()
.filter(|e| e.annotation.is_some())
.count();
assert!(
annotated >= 1,
"at least one exported event should have an annotation"
);
// Verify signal types are correct.
let view_count = exported.iter().filter(|e| e.signal_type == "view").count();
let like_count = exported.iter().filter(|e| e.signal_type == "like").count();
let skip_count = exported.iter().filter(|e| e.signal_type == "skip").count();
assert!(view_count >= 8, "should have at least 8 view events, got {view_count}");
assert!(like_count >= 2, "should have at least 2 like events, got {like_count}");
assert!(skip_count >= 1, "should have at least 1 skip event, got {skip_count}");
// -- Cross-session aggregation --
let summary = db
.user_session_summary(user_id, Timestamp::from_nanos(0))
.unwrap();
assert_eq!(
summary.sessions_count, 2,
"user should have 2 closed sessions"
);
assert_eq!(
summary.total_signals, 11,
"total signals across sessions should be 11"
);
assert_eq!(
summary.total_rejections, 0,
"no policy rejections should have occurred"
);
// top_signal_types should include "view" as the most frequent.
assert!(
!summary.top_signal_types.is_empty(),
"top_signal_types should not be empty"
);
let top_type = &summary.top_signal_types[0];
assert_eq!(
top_type.0, "view",
"most frequent signal type should be 'view', got '{}'",
top_type.0
);
assert!(
top_type.1 >= 8,
"view count should be at least 8, got {}",
top_type.1
);
}
```
### Imports needed (additions to the shared header)
```rust
use tidaldb::{ExportRequest, ExportFormat}; // From m7p4
```
### Note on `render_metrics`
The test calls `db.render_metrics()` which is the m7p4 API for producing Prometheus text. If the actual API name differs (e.g., `db.metrics_text()` or requires the `metrics` feature), the test should be gated accordingly:
```rust
#[test]
#[cfg(feature = "metrics")]
fn uat_metrics_content() { ... }
```
The task document uses `render_metrics()` as the expected name per the m7p4 spec. The implementer should match whatever m7p4 actually delivers.
## Acceptance Criteria
- [ ] `uat_query_stats_populated` passes: both RETRIEVE and SEARCH return `stats` with non-zero `candidates_considered`, `total_time_us`, `scoring_time_us`; RETRIEVE `profile_name == "trending"`
- [ ] `uat_metrics_content` passes: Prometheus text output contains `tidaldb_signal_hot_entries` and `tidaldb_signal_writes_total`; all data lines have numeric values; output is non-empty
- [ ] `uat_export_and_session_aggregation` passes: export returns >= 11 events with correct signal type distribution; annotation preserved; `user_session_summary` returns `sessions_count == 2`, `total_signals == 11`, `total_rejections == 0`; `top_signal_types[0]` is `("view", >= 8)`
- [ ] Each test completes in under 10 seconds (no wall-clock waits)
- [ ] `cargo clippy --manifest-path tidal/Cargo.toml -- -D warnings` passes
## Test Strategy
**QueryStats test** validates the instrumentation wiring, not the exact values. The assertions check `> 0` rather than specific numbers because execution time varies by machine. The `profile_name` assertion verifies the stats carry semantic context.
**Metrics test** validates Prometheus text format compliance at a basic level: lines are either comments (`#`) or data lines with `name value` structure where value is numeric. The test does not exhaustively check every metric defined in m7p4 -- it verifies the two most important signal system metrics to confirm the pipeline is wired correctly. Additional metrics can be asserted as the m7p4 implementation stabilizes.
**Export + aggregation test** writes a deliberate signal pattern across two sessions with known types and counts, then verifies the export and aggregation APIs return matching numbers. The annotation on one `like` signal verifies that annotations flow through the WAL and into the export path. The `user_session_summary` is tested with `since: 0` to capture all historical sessions.