tidaldb/docs/planning/milestone-7/phase-4/task-01-query-stats.md
2026-02-23 22:41:16 -07:00

263 lines
8.5 KiB
Markdown

# Task 01: QueryStats Struct + Executor Instrumentation
## Delivers
`QueryStats` struct capturing per-query execution statistics. Instrumentation of both `RetrieveExecutor` and `SearchExecutor` to populate stats at each pipeline stage. `Results.stats` and `SearchResults.stats` fields added so every query response carries its execution telemetry.
## Complexity: M
## Dependencies
- None from prior m7p4 tasks (this is the foundation)
- `tidal/src/query/executor/mod.rs` -- `RetrieveExecutor` 6-stage pipeline
- `tidal/src/query/search/executor.rs` -- `SearchExecutor` 8-stage pipeline
- `tidal/src/query/retrieve/types.rs` -- `Results` struct
- `tidal/src/query/search/types.rs` -- `SearchResults` struct
## Technical Design
### 1. QueryStats struct
Create `tidal/src/query/stats.rs`:
```rust
/// Per-query execution statistics.
///
/// Always populated by the executor -- never `None`. Even queries that
/// return zero results carry stats reflecting the work done to determine
/// that. All timing fields are in microseconds (u64).
///
/// Designed as a pure data struct: no methods, no builders, no heap
/// allocations. The executor constructs it incrementally using
/// `Instant::elapsed()` at each stage boundary.
#[derive(Debug, Clone)]
pub struct QueryStats {
/// Total candidates considered before any filtering.
pub candidates_considered: usize,
/// Candidates remaining after filter evaluation (Stage 2).
pub candidates_after_filter: usize,
/// Candidates remaining after diversity enforcement (Stage 4).
pub candidates_after_diversity: usize,
/// Number of filter expressions evaluated.
pub filters_applied: usize,
/// Time spent in signal scoring (Stage 3), in microseconds.
pub scoring_time_us: u64,
/// Time spent in diversity enforcement (Stage 4), in microseconds.
pub diversity_time_us: u64,
/// Total query execution time from executor entry to result assembly,
/// in microseconds.
pub total_time_us: u64,
/// Current degradation level at query time (0 = healthy, 3 = critical).
/// Mirrors the `DegradationLevel` from m7p2.
pub degradation_level: u8,
/// Name of the ranking profile used.
pub profile_name: String,
}
impl QueryStats {
/// Create a stats struct with all zeroed counters and the given profile name.
///
/// The executor fills in the fields as it progresses through stages.
#[must_use]
pub fn new(profile_name: String) -> Self {
Self {
candidates_considered: 0,
candidates_after_filter: 0,
candidates_after_diversity: 0,
filters_applied: 0,
scoring_time_us: 0,
diversity_time_us: 0,
total_time_us: 0,
degradation_level: 0,
profile_name,
}
}
}
```
### 2. Wire QueryStats into Results and SearchResults
In `tidal/src/query/retrieve/types.rs`, add to `Results`:
```rust
use crate::query::stats::QueryStats;
pub struct Results {
// ... existing fields ...
/// Per-query execution statistics.
pub stats: QueryStats,
}
```
In `tidal/src/query/search/types.rs`, add to `SearchResults`:
```rust
use crate::query::stats::QueryStats;
pub struct SearchResults {
// ... existing fields ...
/// Per-query execution statistics.
pub stats: QueryStats,
}
```
### 3. Re-export from query module
In `tidal/src/query/mod.rs`, add:
```rust
pub mod stats;
pub use stats::QueryStats;
```
### 4. Instrument RetrieveExecutor
In `tidal/src/query/executor/mod.rs`, wrap the `execute()` method's stages with timing:
```rust
use std::time::Instant;
use crate::query::stats::QueryStats;
// Inside execute():
let query_start = Instant::now();
let mut stats = QueryStats::new(query.profile.name.clone());
// After Stage 1 (candidate generation):
stats.candidates_considered = candidates.len();
// After Stage 2 (filter evaluation):
stats.candidates_after_filter = filtered.len();
stats.filters_applied = query.filters.len();
// Stage 3 (scoring):
let scoring_start = Instant::now();
// ... existing scoring logic ...
stats.scoring_time_us = scoring_start.elapsed().as_micros() as u64;
// Stage 4 (diversity):
let diversity_start = Instant::now();
// ... existing diversity logic ...
stats.diversity_time_us = diversity_start.elapsed().as_micros() as u64;
stats.candidates_after_diversity = diversified.len();
// Final assembly:
stats.total_time_us = query_start.elapsed().as_micros() as u64;
```
### 5. Instrument SearchExecutor
Same pattern in `tidal/src/query/search/executor.rs`:
```rust
// Inside execute():
let query_start = Instant::now();
let mut stats = QueryStats::new(query.profile.name.clone());
// After Stage 1c (fusion):
stats.candidates_considered = fused.len();
// After Stage 2 (metadata + user filter):
stats.candidates_after_filter = filtered.len();
stats.filters_applied = query.filters.len();
// Stage 3 (profile scoring):
let scoring_start = Instant::now();
// ... existing scoring logic ...
stats.scoring_time_us = scoring_start.elapsed().as_micros() as u64;
// Stage 4 (diversity):
let diversity_start = Instant::now();
// ... existing diversity logic ...
stats.diversity_time_us = diversity_start.elapsed().as_micros() as u64;
stats.candidates_after_diversity = diversified.len();
// Final assembly:
stats.total_time_us = query_start.elapsed().as_micros() as u64;
```
### 6. Update all Results construction sites
Every place that constructs `Results` or `SearchResults` must now include the `stats` field. Search the codebase for `Results {` in retrieve executor code and `SearchResults {` in search executor code. Each site gets the stats struct built during that execution.
For test code that constructs `Results` or `SearchResults` directly, use `QueryStats::new("test".to_owned())` as a sensible default.
## Acceptance Criteria
- [ ] `QueryStats` struct defined in `tidal/src/query/stats.rs` with all 9 fields
- [ ] `Results.stats: QueryStats` field added
- [ ] `SearchResults.stats: QueryStats` field added
- [ ] `RetrieveExecutor::execute()` populates all `QueryStats` fields with correct values
- [ ] `SearchExecutor::execute()` populates all `QueryStats` fields with correct values
- [ ] `total_time_us >= scoring_time_us + diversity_time_us` (invariant)
- [ ] `candidates_considered >= candidates_after_filter >= candidates_after_diversity` (invariant)
- [ ] `filters_applied` matches the number of filter expressions in the query
- [ ] `profile_name` matches the profile used for scoring
- [ ] All existing tests updated to include `stats` field in constructed results
- [ ] `cargo clippy -D warnings` and `cargo fmt --check` pass
## Test Strategy
```rust
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn query_stats_new_zeroed() {
let stats = QueryStats::new("trending".to_owned());
assert_eq!(stats.candidates_considered, 0);
assert_eq!(stats.candidates_after_filter, 0);
assert_eq!(stats.candidates_after_diversity, 0);
assert_eq!(stats.filters_applied, 0);
assert_eq!(stats.scoring_time_us, 0);
assert_eq!(stats.diversity_time_us, 0);
assert_eq!(stats.total_time_us, 0);
assert_eq!(stats.degradation_level, 0);
assert_eq!(stats.profile_name, "trending");
}
#[test]
fn query_stats_timing_invariant() {
// After a real query execution, total >= scoring + diversity.
// Tested via integration test against a live TidalDb instance.
}
#[test]
fn query_stats_candidate_funnel_invariant() {
// After a real query, considered >= after_filter >= after_diversity.
// Tested via integration test against a live TidalDb instance.
}
}
```
Integration test in `tidal/tests/m7p4_visibility.rs`:
```rust
#[test]
fn retrieve_populates_query_stats() {
let db = make_test_db_with_items(100);
let query = Retrieve::builder()
.profile("trending")
.limit(10)
.build()
.unwrap();
let results = db.retrieve(&query).unwrap();
assert!(results.stats.candidates_considered > 0);
assert!(results.stats.candidates_considered >= results.stats.candidates_after_filter);
assert!(results.stats.candidates_after_filter >= results.stats.candidates_after_diversity);
assert!(results.stats.total_time_us > 0);
assert_eq!(results.stats.profile_name, "trending");
}
#[test]
fn search_populates_query_stats() {
let db = make_test_db_with_items_and_text(100);
let query = Search::builder().query("test").limit(10).build().unwrap();
let results = db.search(&query).unwrap();
assert!(results.stats.total_time_us > 0);
assert_eq!(results.stats.profile_name, "search");
}
```