tidaldb/docs/planning/milestone-7/phase-4/task-01-query-stats.md
2026-02-23 22:41:16 -07:00

8.5 KiB

Task 01: QueryStats Struct + Executor Instrumentation

Delivers

QueryStats struct capturing per-query execution statistics. Instrumentation of both RetrieveExecutor and SearchExecutor to populate stats at each pipeline stage. Results.stats and SearchResults.stats fields added so every query response carries its execution telemetry.

Complexity: M

Dependencies

  • None from prior m7p4 tasks (this is the foundation)
  • tidal/src/query/executor/mod.rs -- RetrieveExecutor 6-stage pipeline
  • tidal/src/query/search/executor.rs -- SearchExecutor 8-stage pipeline
  • tidal/src/query/retrieve/types.rs -- Results struct
  • tidal/src/query/search/types.rs -- SearchResults struct

Technical Design

1. QueryStats struct

Create tidal/src/query/stats.rs:

/// Per-query execution statistics.
///
/// Always populated by the executor -- never `None`. Even queries that
/// return zero results carry stats reflecting the work done to determine
/// that. All timing fields are in microseconds (u64).
///
/// Designed as a pure data struct: no methods, no builders, no heap
/// allocations. The executor constructs it incrementally using
/// `Instant::elapsed()` at each stage boundary.
#[derive(Debug, Clone)]
pub struct QueryStats {
    /// Total candidates considered before any filtering.
    pub candidates_considered: usize,
    /// Candidates remaining after filter evaluation (Stage 2).
    pub candidates_after_filter: usize,
    /// Candidates remaining after diversity enforcement (Stage 4).
    pub candidates_after_diversity: usize,
    /// Number of filter expressions evaluated.
    pub filters_applied: usize,
    /// Time spent in signal scoring (Stage 3), in microseconds.
    pub scoring_time_us: u64,
    /// Time spent in diversity enforcement (Stage 4), in microseconds.
    pub diversity_time_us: u64,
    /// Total query execution time from executor entry to result assembly,
    /// in microseconds.
    pub total_time_us: u64,
    /// Current degradation level at query time (0 = healthy, 3 = critical).
    /// Mirrors the `DegradationLevel` from m7p2.
    pub degradation_level: u8,
    /// Name of the ranking profile used.
    pub profile_name: String,
}

impl QueryStats {
    /// Create a stats struct with all zeroed counters and the given profile name.
    ///
    /// The executor fills in the fields as it progresses through stages.
    #[must_use]
    pub fn new(profile_name: String) -> Self {
        Self {
            candidates_considered: 0,
            candidates_after_filter: 0,
            candidates_after_diversity: 0,
            filters_applied: 0,
            scoring_time_us: 0,
            diversity_time_us: 0,
            total_time_us: 0,
            degradation_level: 0,
            profile_name,
        }
    }
}

2. Wire QueryStats into Results and SearchResults

In tidal/src/query/retrieve/types.rs, add to Results:

use crate::query::stats::QueryStats;

pub struct Results {
    // ... existing fields ...
    /// Per-query execution statistics.
    pub stats: QueryStats,
}

In tidal/src/query/search/types.rs, add to SearchResults:

use crate::query::stats::QueryStats;

pub struct SearchResults {
    // ... existing fields ...
    /// Per-query execution statistics.
    pub stats: QueryStats,
}

3. Re-export from query module

In tidal/src/query/mod.rs, add:

pub mod stats;
pub use stats::QueryStats;

4. Instrument RetrieveExecutor

In tidal/src/query/executor/mod.rs, wrap the execute() method's stages with timing:

use std::time::Instant;
use crate::query::stats::QueryStats;

// Inside execute():
let query_start = Instant::now();
let mut stats = QueryStats::new(query.profile.name.clone());

// After Stage 1 (candidate generation):
stats.candidates_considered = candidates.len();

// After Stage 2 (filter evaluation):
stats.candidates_after_filter = filtered.len();
stats.filters_applied = query.filters.len();

// Stage 3 (scoring):
let scoring_start = Instant::now();
// ... existing scoring logic ...
stats.scoring_time_us = scoring_start.elapsed().as_micros() as u64;

// Stage 4 (diversity):
let diversity_start = Instant::now();
// ... existing diversity logic ...
stats.diversity_time_us = diversity_start.elapsed().as_micros() as u64;
stats.candidates_after_diversity = diversified.len();

// Final assembly:
stats.total_time_us = query_start.elapsed().as_micros() as u64;

5. Instrument SearchExecutor

Same pattern in tidal/src/query/search/executor.rs:

// Inside execute():
let query_start = Instant::now();
let mut stats = QueryStats::new(query.profile.name.clone());

// After Stage 1c (fusion):
stats.candidates_considered = fused.len();

// After Stage 2 (metadata + user filter):
stats.candidates_after_filter = filtered.len();
stats.filters_applied = query.filters.len();

// Stage 3 (profile scoring):
let scoring_start = Instant::now();
// ... existing scoring logic ...
stats.scoring_time_us = scoring_start.elapsed().as_micros() as u64;

// Stage 4 (diversity):
let diversity_start = Instant::now();
// ... existing diversity logic ...
stats.diversity_time_us = diversity_start.elapsed().as_micros() as u64;
stats.candidates_after_diversity = diversified.len();

// Final assembly:
stats.total_time_us = query_start.elapsed().as_micros() as u64;

6. Update all Results construction sites

Every place that constructs Results or SearchResults must now include the stats field. Search the codebase for Results { in retrieve executor code and SearchResults { in search executor code. Each site gets the stats struct built during that execution.

For test code that constructs Results or SearchResults directly, use QueryStats::new("test".to_owned()) as a sensible default.

Acceptance Criteria

  • QueryStats struct defined in tidal/src/query/stats.rs with all 9 fields
  • Results.stats: QueryStats field added
  • SearchResults.stats: QueryStats field added
  • RetrieveExecutor::execute() populates all QueryStats fields with correct values
  • SearchExecutor::execute() populates all QueryStats fields with correct values
  • total_time_us >= scoring_time_us + diversity_time_us (invariant)
  • candidates_considered >= candidates_after_filter >= candidates_after_diversity (invariant)
  • filters_applied matches the number of filter expressions in the query
  • profile_name matches the profile used for scoring
  • All existing tests updated to include stats field in constructed results
  • cargo clippy -D warnings and cargo fmt --check pass

Test Strategy

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn query_stats_new_zeroed() {
        let stats = QueryStats::new("trending".to_owned());
        assert_eq!(stats.candidates_considered, 0);
        assert_eq!(stats.candidates_after_filter, 0);
        assert_eq!(stats.candidates_after_diversity, 0);
        assert_eq!(stats.filters_applied, 0);
        assert_eq!(stats.scoring_time_us, 0);
        assert_eq!(stats.diversity_time_us, 0);
        assert_eq!(stats.total_time_us, 0);
        assert_eq!(stats.degradation_level, 0);
        assert_eq!(stats.profile_name, "trending");
    }

    #[test]
    fn query_stats_timing_invariant() {
        // After a real query execution, total >= scoring + diversity.
        // Tested via integration test against a live TidalDb instance.
    }

    #[test]
    fn query_stats_candidate_funnel_invariant() {
        // After a real query, considered >= after_filter >= after_diversity.
        // Tested via integration test against a live TidalDb instance.
    }
}

Integration test in tidal/tests/m7p4_visibility.rs:

#[test]
fn retrieve_populates_query_stats() {
    let db = make_test_db_with_items(100);
    let query = Retrieve::builder()
        .profile("trending")
        .limit(10)
        .build()
        .unwrap();
    let results = db.retrieve(&query).unwrap();

    assert!(results.stats.candidates_considered > 0);
    assert!(results.stats.candidates_considered >= results.stats.candidates_after_filter);
    assert!(results.stats.candidates_after_filter >= results.stats.candidates_after_diversity);
    assert!(results.stats.total_time_us > 0);
    assert_eq!(results.stats.profile_name, "trending");
}

#[test]
fn search_populates_query_stats() {
    let db = make_test_db_with_items_and_text(100);
    let query = Search::builder().query("test").limit(10).build().unwrap();
    let results = db.search(&query).unwrap();

    assert!(results.stats.total_time_us > 0);
    assert_eq!(results.stats.profile_name, "search");
}