tidaldb/docs/planning/milestone-2/phase-5/task-01-retrieve-ast-and-parser.md
jordan 6fdaa1584b feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint
- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples
  (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test
  coverage for every public API surface
- m1p5: TidalDb public API — write_item, signal, read_decay_score,
  read_windowed_count, read_velocity; StorageBox enum routing memory vs
  fjall; WalSender/WalHandleWriter bridge; WAL replay on open
- Periodic checkpoint: 30s background thread for persistent+schema mode;
  FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful
  shutdown via Arc<AtomicBool> + join before final checkpoint
- ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing)
- Milestone 2 planning scaffolding added under docs/planning/milestone-2/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 22:45:10 -07:00

33 KiB

Task 01: RETRIEVE AST + Parser

Context

Milestone: 2 -- Ranked Retrieval Phase: m2p5 -- Query Parser and RETRIEVE Executor Depends On: None (uses types from m2p2, m2p3, m2p4 but no m2p5 tasks) Blocks: Task 02 (RETRIEVE Executor Pipeline), Task 03 (M2 UAT Integration Test) Complexity: M

Objective

Deliver the typed AST for the RETRIEVE query operation and a Rust builder API for constructing queries ergonomically. For M2, there is no text grammar parser -- the "parser" is the RetrieveBuilder which validates and constructs a Retrieve struct. The text syntax parser (RETRIEVE items USING PROFILE trending LIMIT 25) is deferred to M5.

This task also defines the response types (Results, RetrieveResult), the pagination cursor, the Signal write command struct (wired to the existing M1 signal write path), and the QueryError enum. These types are consumed by Task 02's executor and returned to the caller.

The types defined here map directly to the spec's input/output types (Spec 08 Section 3) but are scoped to M2: no for_user, no similar_to, no for_cohort, no window, no context, no for_session. These fields exist on the struct as Option types for forward compatibility but are validated as unsupported in M2 if set.

Requirements

  • Retrieve struct: the complete RETRIEVE query request
  • RetrieveBuilder: ergonomic builder pattern for constructing Retrieve queries
  • ProfileRef: profile name + optional version reference
  • Cursor: opaque offset-based pagination cursor with base64 encoding
  • Results: the query response (items, cursor, total_scored, constraints_satisfied)
  • RetrieveResult: one result item (entity_id, score, rank, signal_snapshot)
  • Signal: write command struct wired to TidalDb::signal()
  • QueryError: error enum for query validation and execution failures
  • Validation: limit range, profile reference format, filter compatibility
  • No unsafe code

Technical Design

Module Structure

tidal/src/
  query/
    mod.rs       -- pub mod retrieve; re-exports
    retrieve.rs  -- all types from this task

Public API

// === query/retrieve.rs ===

use crate::schema::{EntityId, EntityKind, Timestamp};
use crate::ranking::diversity::DiversityConstraints;
use crate::storage::indexes::filter::FilterExpr;

/// Reference to a ranking profile by name, optionally pinned to a version.
///
/// The executor resolves this against the `ProfileRegistry` at query time.
/// If `version` is `None`, the latest version is used.
#[derive(Debug, Clone)]
pub struct ProfileRef {
    /// Profile name. Must match a registered profile in the registry.
    pub name: String,
    /// Optional version pin. `None` = latest version.
    pub version: Option<u32>,
}

impl ProfileRef {
    pub fn new(name: impl Into<String>) -> Self {
        Self {
            name: name.into(),
            version: None,
        }
    }

    pub fn versioned(name: impl Into<String>, version: u32) -> Self {
        Self {
            name: name.into(),
            version: Some(version),
        }
    }
}

/// A RETRIEVE query. Declarative: specifies what, not how.
///
/// For M2, this struct is constructed via `RetrieveBuilder` (Rust API).
/// A text syntax parser is deferred to M5.
///
/// The profile determines the candidate generation strategy and scoring
/// formula. The caller never specifies how candidates are found -- only
/// which profile to use and which filters to apply.
///
/// Spec reference: docs/specs/08-query-engine.md Section 3.1
#[derive(Debug, Clone)]
pub struct Retrieve {
    /// Target entity type. For M2, only `EntityKind::Item` is supported.
    pub entity_kind: EntityKind,

    /// Named ranking profile. Determines candidate strategy and scoring.
    pub profile: ProfileRef,

    /// Metadata and signal filters. Combined as AND.
    /// Uses `FilterExpr` from m2p2 for composable filter evaluation.
    pub filters: Vec<FilterExpr>,

    /// Diversity constraints. Applied as a post-scoring pass.
    /// If `None`, no diversity enforcement is applied.
    pub diversity: Option<DiversityConstraints>,

    /// Maximum results to return. Default: 50. Range: [1, 500].
    pub limit: usize,

    /// Explicit item exclusions. Removed from candidate set before scoring.
    pub exclude: Vec<EntityId>,

    /// Pagination cursor from a previous result set.
    /// If `None`, returns the first page.
    pub cursor: Option<Cursor>,

    // --- Fields present for forward compatibility (M3+), validated as unsupported in M2 ---

    /// User context for personalization. M3+.
    pub for_user: Option<u64>,

    /// Anchor item for related/similar queries. M3+.
    pub similar_to: Option<EntityId>,

    /// Surface context for the feedback loop. M3+.
    pub context: Option<String>,
}

/// Builder for constructing `Retrieve` queries ergonomically.
///
/// # Example
///
/// ```ignore
/// let query = Retrieve::builder()
///     .entity(EntityKind::Item)
///     .profile("trending")
///     .filter(FilterExpr::eq("category", "jazz"))
///     .diversity(DiversityConstraints::new().max_per_creator(2))
///     .limit(25)
///     .build()?;
/// ```
pub struct RetrieveBuilder {
    entity_kind: Option<EntityKind>,
    profile: Option<ProfileRef>,
    filters: Vec<FilterExpr>,
    diversity: Option<DiversityConstraints>,
    limit: usize,
    exclude: Vec<EntityId>,
    cursor: Option<Cursor>,
    for_user: Option<u64>,
    similar_to: Option<EntityId>,
    context: Option<String>,
}

impl RetrieveBuilder {
    pub fn new() -> Self {
        Self {
            entity_kind: None,
            profile: None,
            filters: Vec::new(),
            diversity: None,
            limit: 50,
            exclude: Vec::new(),
            cursor: None,
            for_user: None,
            similar_to: None,
            context: None,
        }
    }

    /// Set the target entity kind.
    pub fn entity(mut self, kind: EntityKind) -> Self {
        self.entity_kind = Some(kind);
        self
    }

    /// Set the ranking profile by name.
    pub fn profile(mut self, name: impl Into<String>) -> Self {
        self.profile = Some(ProfileRef::new(name));
        self
    }

    /// Set the ranking profile by name and version.
    pub fn profile_versioned(mut self, name: impl Into<String>, version: u32) -> Self {
        self.profile = Some(ProfileRef::versioned(name, version));
        self
    }

    /// Add a filter expression. Multiple filters are ANDed together.
    pub fn filter(mut self, expr: FilterExpr) -> Self {
        self.filters.push(expr);
        self
    }

    /// Set diversity constraints.
    pub fn diversity(mut self, constraints: DiversityConstraints) -> Self {
        self.diversity = Some(constraints);
        self
    }

    /// Set the maximum number of results. Range: [1, 500]. Default: 50.
    pub fn limit(mut self, limit: usize) -> Self {
        self.limit = limit;
        self
    }

    /// Add an entity ID to the exclusion list.
    pub fn exclude(mut self, id: EntityId) -> Self {
        self.exclude.push(id);
        self
    }

    /// Add multiple entity IDs to the exclusion list.
    pub fn exclude_ids(mut self, ids: impl IntoIterator<Item = EntityId>) -> Self {
        self.exclude.extend(ids);
        self
    }

    /// Set the pagination cursor.
    pub fn cursor(mut self, cursor: Cursor) -> Self {
        self.cursor = Some(cursor);
        self
    }

    /// Validate and build the `Retrieve` query.
    ///
    /// Returns `QueryError::InvalidLimit` if limit is 0 or > 500.
    /// Returns `QueryError::ProfileNotFound` if no profile is set.
    /// Returns `QueryError::InvalidFilter` if `for_user` or `similar_to` are set (M2).
    pub fn build(self) -> Result<Retrieve, QueryError> {
        let entity_kind = self.entity_kind.unwrap_or(EntityKind::Item);

        let profile = self.profile.ok_or_else(|| {
            QueryError::ProfileNotFound("no profile specified".to_string())
        })?;

        if self.limit == 0 || self.limit > 500 {
            return Err(QueryError::InvalidLimit {
                requested: self.limit,
                min: 1,
                max: 500,
            });
        }

        // M2: reject unsupported features
        if self.for_user.is_some() {
            return Err(QueryError::InvalidFilter {
                field: "for_user".to_string(),
                reason: "FOR USER clause is not supported until M3".to_string(),
            });
        }

        if self.similar_to.is_some() {
            return Err(QueryError::InvalidFilter {
                field: "similar_to".to_string(),
                reason: "SIMILAR TO clause is not supported until M3".to_string(),
            });
        }

        Ok(Retrieve {
            entity_kind,
            profile,
            filters: self.filters,
            diversity: self.diversity,
            limit: self.limit,
            exclude: self.exclude,
            cursor: self.cursor,
            for_user: self.for_user,
            similar_to: self.similar_to,
            context: self.context,
        })
    }
}

impl Default for RetrieveBuilder {
    fn default() -> Self {
        Self::new()
    }
}

impl Retrieve {
    /// Start building a RETRIEVE query.
    pub fn builder() -> RetrieveBuilder {
        RetrieveBuilder::new()
    }
}

/// The combined filter expression for the query.
///
/// Multiple filters are ANDed together. This helper constructs the
/// combined filter from the `Retrieve` query's filter list.
impl Retrieve {
    /// Combine all filters into a single AND expression.
    /// Returns `None` if no filters are specified.
    pub fn combined_filter(&self) -> Option<FilterExpr> {
        match self.filters.len() {
            0 => None,
            1 => Some(self.filters[0].clone()),
            _ => Some(FilterExpr::And(self.filters.clone())),
        }
    }
}

// ============================================================
// Response Types
// ============================================================

/// The complete response from a RETRIEVE query.
///
/// Spec reference: docs/specs/08-query-engine.md Section 5.7
#[derive(Debug, Clone)]
pub struct Results {
    /// The ranked result items for this page.
    pub items: Vec<RetrieveResult>,

    /// Cursor for the next page. `None` if this is the last page.
    pub next_cursor: Option<Cursor>,

    /// How many candidates were scored by the profile executor.
    /// This is the count after filtering but before diversity and limit.
    pub total_scored: usize,

    /// Whether all diversity constraints were fully satisfied.
    /// `false` if constraints were relaxed (see `DiversityResult::violations`).
    pub constraints_satisfied: bool,

    /// Non-fatal warnings from query execution.
    ///
    /// Warnings are surfaced when the executor degrades gracefully:
    /// - Metadata enrichment fails for a candidate (the item is treated as a
    ///   unique creator for diversity purposes; results are still returned)
    /// - A filter references a field with no index (predicate fallback used)
    ///
    /// An empty `warnings` vec means clean execution with no degradation.
    pub warnings: Vec<String>,
}

impl Results {
    /// Number of items in this page.
    pub fn len(&self) -> usize {
        self.items.len()
    }

    /// Whether this page is empty.
    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }
}

/// A single result item from a RETRIEVE query.
///
/// Includes the entity ID, composite score, rank position, and a
/// signal snapshot for debugging and transparency.
///
/// Spec reference: docs/specs/08-query-engine.md Section 5, Stage 10
#[derive(Debug, Clone)]
pub struct RetrieveResult {
    /// The entity ID of the result.
    pub entity_id: EntityId,

    /// The composite score from the ranking profile, normalized to [0.0, 1.0].
    pub score: f64,

    /// The 1-based rank position in the result set.
    pub rank: usize,

    /// Key signal values used in scoring. For debugging and transparency.
    /// Contains (signal_name, value) pairs for signals referenced by the
    /// profile's scoring rules. Capped at 10 entries.
    pub signal_snapshot: Vec<(String, f64)>,
}

// ============================================================
// Pagination Cursor
// ============================================================

/// Opaque pagination cursor for RETRIEVE queries.
///
/// For M2, this is a simple offset-based cursor encoded as a base64 string.
/// True keyset-based pagination (score + entity_id tiebreaker, Spec 08
/// Section 8.2) is deferred to M5.
///
/// # Limitation: Not Stable Under Concurrent Writes
///
/// Offset-based cursors are not stable when the underlying ranked list
/// changes between page requests (e.g., due to concurrent signal writes).
/// Items may appear on multiple pages or be skipped if the ranking shifts.
/// This is documented and acceptable for M2; the spec says to prefer
/// keyset cursors for production use. Do not use cursor-based pagination
/// in write-heavy workloads until M5.
///
/// The cursor is opaque to the caller -- they receive it as a string and
/// pass it back on the next request. The internal representation is an
/// implementation detail.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Cursor {
    /// The offset into the full result set for the next page.
    offset: usize,
}

impl Cursor {
    /// Create a cursor from an offset.
    pub(crate) fn from_offset(offset: usize) -> Self {
        Self { offset }
    }

    /// Get the offset this cursor represents.
    pub(crate) fn offset(&self) -> usize {
        self.offset
    }

    /// Encode the cursor as an opaque base64 string.
    pub fn encode(&self) -> String {
        use base64::Engine as _;
        let bytes = self.offset.to_le_bytes();
        base64::engine::general_purpose::URL_SAFE_NO_PAD.encode(bytes)
    }

    /// Decode a cursor from an opaque base64 string.
    pub fn decode(encoded: &str) -> Result<Self, QueryError> {
        use base64::Engine as _;
        let bytes = base64::engine::general_purpose::URL_SAFE_NO_PAD
            .decode(encoded)
            .map_err(|e| QueryError::InvalidCursor(format!("invalid base64: {e}")))?;

        if bytes.len() != std::mem::size_of::<usize>() {
            return Err(QueryError::InvalidCursor(format!(
                "expected {} bytes, got {}",
                std::mem::size_of::<usize>(),
                bytes.len()
            )));
        }

        let offset = usize::from_le_bytes(
            bytes
                .try_into()
                .map_err(|_| QueryError::InvalidCursor("byte conversion failed".to_string()))?,
        );
        Ok(Self { offset })
    }
}

// ============================================================
// Signal Write Command
// ============================================================

/// A signal write command.
///
/// For M2, this is a thin wrapper that routes to the existing
/// `TidalDb::signal()` method from M1. The struct form enables
/// future batching and the query language parser (M5) to produce
/// signal writes from parsed text.
#[derive(Debug, Clone)]
pub struct Signal {
    /// The signal type name (e.g., "view", "like", "share").
    pub signal_type: String,

    /// The target entity ID.
    pub entity_id: EntityId,

    /// The signal weight. Typically 1.0 for count-based signals.
    pub weight: f64,

    /// The timestamp of the event.
    pub timestamp: Timestamp,
}

impl Signal {
    /// Create a new signal write command.
    pub fn new(
        signal_type: impl Into<String>,
        entity_id: EntityId,
        weight: f64,
        timestamp: Timestamp,
    ) -> Self {
        Self {
            signal_type: signal_type.into(),
            entity_id,
            weight,
            timestamp,
        }
    }
}

// ============================================================
// Query Error
// ============================================================

/// Errors returned by the query engine.
///
/// Spec reference: docs/specs/08-query-engine.md Section 3.4
#[derive(Debug, Clone)]
pub enum QueryError {
    /// The named profile does not exist in the profile registry.
    ProfileNotFound(String),

    /// A filter references a field or uses a condition that is invalid.
    InvalidFilter {
        field: String,
        reason: String,
    },

    /// The requested limit is out of the valid range [1, 500].
    InvalidLimit {
        requested: usize,
        min: usize,
        max: usize,
    },

    /// A required index (vector, bitmap, range) is not available.
    IndexNotAvailable(String),

    /// A storage engine error occurred during query execution.
    ///
    /// Preserves the original error type so callers can match on specific
    /// storage failure modes (e.g., `StorageError::Corruption`). Use
    /// `From<crate::storage::StorageError>` for automatic conversion.
    StorageError(crate::storage::StorageError),

    /// The pagination cursor is invalid or could not be decoded.
    InvalidCursor(String),

    /// The profile's candidate strategy is not supported in this milestone.
    /// M2 supports: Ann, Scan, SignalRanked. Others require M3+ infrastructure.
    UnsupportedStrategy(String),
}

impl std::fmt::Display for QueryError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            QueryError::ProfileNotFound(name) => {
                write!(f, "ranking profile '{name}' not found in registry")
            }
            QueryError::InvalidFilter { field, reason } => {
                write!(f, "invalid filter on field '{field}': {reason}")
            }
            QueryError::InvalidLimit {
                requested,
                min,
                max,
            } => {
                write!(f, "limit {requested} is out of range [{min}, {max}]")
            }
            QueryError::IndexNotAvailable(name) => {
                write!(f, "required index not available: {name}")
            }
            QueryError::StorageError(e) => {
                write!(f, "storage error during query: {e}")
            }
            QueryError::InvalidCursor(msg) => {
                write!(f, "invalid pagination cursor: {msg}")
            }
            QueryError::UnsupportedStrategy(msg) => {
                write!(f, "unsupported candidate strategy: {msg}")
            }
        }
    }
}

impl std::error::Error for QueryError {}

impl From<crate::storage::StorageError> for QueryError {
    fn from(e: crate::storage::StorageError) -> Self {
        QueryError::StorageError(e)
    }
}

Validation Logic

impl Retrieve {
    /// Validate the query against a ProfileRegistry and Schema.
    ///
    /// Called by the executor before pipeline execution. Separated from
    /// `build()` because profile existence requires the registry, which
    /// the builder does not have access to.
    pub(crate) fn validate(
        &self,
        registry: &ProfileRegistry,
    ) -> Result<(), QueryError> {
        // 1. Profile existence
        let profile_name = &self.profile.name;
        let profile = match self.profile.version {
            Some(v) => registry.get_versioned(profile_name, v),
            None => registry.get(profile_name),
        };
        if profile.is_none() {
            return Err(QueryError::ProfileNotFound(profile_name.clone()));
        }

        // 2. Limit range (already validated in builder, but defense in depth)
        if self.limit == 0 || self.limit > 500 {
            return Err(QueryError::InvalidLimit {
                requested: self.limit,
                min: 1,
                max: 500,
            });
        }

        // 3. M2: unsupported features
        if self.for_user.is_some() {
            return Err(QueryError::InvalidFilter {
                field: "for_user".to_string(),
                reason: "FOR USER clause requires M3".to_string(),
            });
        }

        if self.similar_to.is_some() {
            return Err(QueryError::InvalidFilter {
                field: "similar_to".to_string(),
                reason: "SIMILAR TO clause requires M3".to_string(),
            });
        }

        // 4. Candidate strategy support check
        let resolved_profile = profile.unwrap();
        match resolved_profile.candidate_strategy() {
            CandidateStrategy::Ann { .. }
            | CandidateStrategy::Scan { .. }
            | CandidateStrategy::SignalRanked { .. } => {}
            other => {
                return Err(QueryError::UnsupportedStrategy(
                    format!("{other:?} requires M3+ infrastructure"),
                ));
            }
        }

        Ok(())
    }
}

Test Strategy

Unit Tests

// === RetrieveBuilder tests ===

#[test]
fn builder_default_limit_50() {
    let query = Retrieve::builder()
        .profile("trending")
        .build()
        .unwrap();
    assert_eq!(query.limit, 50);
    assert_eq!(query.entity_kind, EntityKind::Item);
    assert!(query.filters.is_empty());
    assert!(query.diversity.is_none());
    assert!(query.exclude.is_empty());
    assert!(query.cursor.is_none());
}

#[test]
fn builder_with_all_fields() {
    let query = Retrieve::builder()
        .entity(EntityKind::Item)
        .profile("hot")
        .filter(FilterExpr::eq("category", "jazz"))
        .filter(FilterExpr::eq("format", "video"))
        .diversity(DiversityConstraints::new().max_per_creator(2))
        .limit(25)
        .exclude(EntityId::new(999))
        .build()
        .unwrap();

    assert_eq!(query.entity_kind, EntityKind::Item);
    assert_eq!(query.profile.name, "hot");
    assert_eq!(query.filters.len(), 2);
    assert!(query.diversity.is_some());
    assert_eq!(query.limit, 25);
    assert_eq!(query.exclude.len(), 1);
}

#[test]
fn builder_rejects_zero_limit() {
    let result = Retrieve::builder()
        .profile("trending")
        .limit(0)
        .build();
    assert!(matches!(result, Err(QueryError::InvalidLimit { .. })));
}

#[test]
fn builder_rejects_limit_over_500() {
    let result = Retrieve::builder()
        .profile("trending")
        .limit(501)
        .build();
    assert!(matches!(result, Err(QueryError::InvalidLimit { .. })));
}

#[test]
fn builder_rejects_missing_profile() {
    let result = Retrieve::builder()
        .entity(EntityKind::Item)
        .limit(25)
        .build();
    assert!(matches!(result, Err(QueryError::ProfileNotFound(_))));
}

#[test]
fn builder_limit_boundary_values() {
    // Min valid
    let r1 = Retrieve::builder().profile("a").limit(1).build();
    assert!(r1.is_ok());
    assert_eq!(r1.unwrap().limit, 1);

    // Max valid
    let r2 = Retrieve::builder().profile("a").limit(500).build();
    assert!(r2.is_ok());
    assert_eq!(r2.unwrap().limit, 500);
}

#[test]
fn builder_profile_versioned() {
    let query = Retrieve::builder()
        .profile_versioned("trending", 3)
        .build()
        .unwrap();
    assert_eq!(query.profile.name, "trending");
    assert_eq!(query.profile.version, Some(3));
}

#[test]
fn builder_multiple_excludes() {
    let query = Retrieve::builder()
        .profile("new")
        .exclude(EntityId::new(1))
        .exclude(EntityId::new(2))
        .exclude_ids(vec![EntityId::new(3), EntityId::new(4)])
        .build()
        .unwrap();
    assert_eq!(query.exclude.len(), 4);
}

#[test]
fn combined_filter_none_when_empty() {
    let query = Retrieve::builder().profile("new").build().unwrap();
    assert!(query.combined_filter().is_none());
}

#[test]
fn combined_filter_single() {
    let query = Retrieve::builder()
        .profile("new")
        .filter(FilterExpr::eq("category", "jazz"))
        .build()
        .unwrap();
    let combined = query.combined_filter().unwrap();
    assert!(matches!(combined, FilterExpr::Eq { .. }));
}

#[test]
fn combined_filter_multiple_becomes_and() {
    let query = Retrieve::builder()
        .profile("new")
        .filter(FilterExpr::eq("category", "jazz"))
        .filter(FilterExpr::eq("format", "video"))
        .build()
        .unwrap();
    let combined = query.combined_filter().unwrap();
    assert!(matches!(combined, FilterExpr::And(_)));
}

// === Cursor tests ===

#[test]
fn cursor_encode_decode_roundtrip() {
    let cursor = Cursor::from_offset(42);
    let encoded = cursor.encode();
    let decoded = Cursor::decode(&encoded).unwrap();
    assert_eq!(cursor, decoded);
}

#[test]
fn cursor_encode_decode_zero() {
    let cursor = Cursor::from_offset(0);
    let encoded = cursor.encode();
    let decoded = Cursor::decode(&encoded).unwrap();
    assert_eq!(cursor, decoded);
}

#[test]
fn cursor_encode_decode_large_offset() {
    let cursor = Cursor::from_offset(100_000);
    let encoded = cursor.encode();
    let decoded = Cursor::decode(&encoded).unwrap();
    assert_eq!(cursor, decoded);
}

#[test]
fn cursor_decode_invalid_base64() {
    let result = Cursor::decode("!!!not-base64!!!");
    assert!(matches!(result, Err(QueryError::InvalidCursor(_))));
}

#[test]
fn cursor_decode_wrong_length() {
    use base64::Engine as _;
    let encoded = base64::engine::general_purpose::URL_SAFE_NO_PAD.encode(&[1u8, 2, 3]);
    let result = Cursor::decode(&encoded);
    assert!(matches!(result, Err(QueryError::InvalidCursor(_))));
}

// === QueryError display tests ===

#[test]
fn query_error_display_messages() {
    let e1 = QueryError::ProfileNotFound("trending".to_string());
    assert!(e1.to_string().contains("trending"));

    let e2 = QueryError::InvalidLimit {
        requested: 0,
        min: 1,
        max: 500,
    };
    assert!(e2.to_string().contains("0"));
    assert!(e2.to_string().contains("500"));

    let e3 = QueryError::InvalidFilter {
        field: "category".to_string(),
        reason: "unknown field".to_string(),
    };
    assert!(e3.to_string().contains("category"));

    let e4 = QueryError::InvalidCursor("bad cursor".to_string());
    assert!(e4.to_string().contains("bad cursor"));
}

// === RetrieveResult tests ===

#[test]
fn results_len_and_is_empty() {
    let empty = Results {
        items: vec![],
        next_cursor: None,
        total_scored: 0,
        constraints_satisfied: true,
        warnings: vec![],
    };
    assert_eq!(empty.len(), 0);
    assert!(empty.is_empty());

    let one = Results {
        items: vec![RetrieveResult {
            entity_id: EntityId::new(1),
            score: 0.5,
            rank: 1,
            signal_snapshot: vec![],
        }],
        next_cursor: None,
        total_scored: 1,
        constraints_satisfied: true,
        warnings: vec![],
    };
    assert_eq!(one.len(), 1);
    assert!(!one.is_empty());
}

// === Signal struct tests ===

#[test]
fn signal_new() {
    let sig = Signal::new(
        "view",
        EntityId::new(42),
        1.0,
        Timestamp::from_nanos(1_000_000),
    );
    assert_eq!(sig.signal_type, "view");
    assert_eq!(sig.entity_id, EntityId::new(42));
    assert!((sig.weight - 1.0).abs() < f64::EPSILON);
}

Property Tests

use proptest::prelude::*;

// P1: Cursor encode/decode is lossless for all valid offsets.
proptest! {
    #[test]
    fn cursor_roundtrip(offset in 0usize..1_000_000) {
        let cursor = Cursor::from_offset(offset);
        let encoded = cursor.encode();
        let decoded = Cursor::decode(&encoded).unwrap();
        prop_assert_eq!(cursor, decoded);
    }
}

// P2: Builder always produces valid Retrieve when required fields are set.
proptest! {
    #[test]
    fn builder_valid_with_limit(limit in 1usize..=500) {
        let result = Retrieve::builder()
            .profile("test_profile")
            .limit(limit)
            .build();
        prop_assert!(result.is_ok());
        prop_assert_eq!(result.unwrap().limit, limit);
    }
}

// P3: Builder rejects invalid limits.
proptest! {
    #[test]
    fn builder_rejects_invalid_limit(limit in 501usize..10_000) {
        let result = Retrieve::builder()
            .profile("test_profile")
            .limit(limit)
            .build();
        prop_assert!(matches!(result, Err(QueryError::InvalidLimit { .. })));
    }
}

Acceptance Criteria

  • Retrieve struct with all fields: entity_kind, profile, filters, diversity, limit, exclude, cursor, for_user, similar_to, context
  • RetrieveBuilder with methods: entity(), profile(), profile_versioned(), filter(), diversity(), limit(), exclude(), exclude_ids(), cursor(), build()
  • RetrieveBuilder::build() validates: limit in [1, 500], profile present, for_user and similar_to rejected in M2
  • Retrieve::combined_filter() returns None for empty filters, single filter as-is, AND for multiple
  • ProfileRef with new() (latest version) and versioned() (pinned version)
  • Results struct with items, next_cursor, total_scored, constraints_satisfied, warnings, len(), is_empty()
  • RetrieveResult struct with entity_id, score, rank, signal_snapshot
  • Cursor with from_offset(), offset(), encode(), decode() -- base64 roundtrip is lossless
  • Cursor::decode() returns QueryError::InvalidCursor for invalid input
  • Signal struct with new() constructor and all fields
  • QueryError enum with ProfileNotFound, InvalidFilter, InvalidLimit, IndexNotAvailable, StorageError, InvalidCursor, UnsupportedStrategy
  • QueryError implements Display and Error
  • Retrieve::validate() checks profile existence against ProfileRegistry, rejects unsupported candidate strategies
  • Property test: cursor roundtrip for all valid offsets
  • Property test: builder accepts valid limits [1, 500], rejects [501, ...]
  • No unsafe code
  • cargo clippy -- -D warnings passes
  • All unit tests and property tests pass

Research References

  • docs/specs/08-query-engine.md -- Section 3.1 (Retrieve struct fields), Section 3.4 (QueryError enum variants and validation rules), Section 8.2 (Cursor structure and encoding)

Spec References

  • docs/specs/08-query-engine.md -- Section 2.1 (RETRIEVE operation overview), Section 3.1 (Retrieve input struct), Section 3.4 (QueryError enum), Section 8 (Pagination: cursor design, encoding, semantics)

Implementation Notes

  • Add base64 = "0.22" to [dependencies] in tidal/Cargo.toml. The base64 crate is small (no transitive deps beyond std) and provides the URL_SAFE_NO_PAD engine for cursor encoding.
  • The query/mod.rs file should be created with pub mod retrieve; and re-exports of all public types. The pub mod executor; line is added in Task 02 when the executor module is created.
  • lib.rs should add pub mod query; -- this is the first time the query module exists.
  • FilterExpr is imported from crate::storage::indexes::filter (m2p2). If the exact import path differs from the m2p2 implementation, adapt accordingly.
  • DiversityConstraints is imported from crate::ranking::diversity (m2p4). The DiversityConstraints::new() and .max_per_creator() API must match the m2p4 implementation.
  • The for_user: Option<u64> field uses u64 rather than UserId because UserId (a newtype over u64) is not defined until M3 when user entities are introduced. In M3, this field will be changed to Option<UserId>.
  • Do NOT add serde derives to the query types for M2. Serialization of query types is an M5+ concern when the text parser and network protocol are built.

Migration from M1 QueryError Stub

This task must remove the M1 stub QueryError before adding the new one. A stub QueryError struct exists in tidal/src/schema/error.rs (simple { message: String } struct) and is re-exported through schema/mod.rs and wrapped by LumenError::Query. The M2 QueryError is a rich enum that replaces it.

Migration steps (in order):

  1. Remove the stub QueryError struct from tidal/src/schema/error.rs
  2. Add pub use crate::query::retrieve::QueryError; in tidal/src/schema/mod.rs (or update the From<QueryError> impl in LumenError to use the new path)
  3. Update LumenError::Query variant to hold crate::query::retrieve::QueryError
  4. Update the From<QueryError> for LumenError impl to use the new enum
  5. Fix any existing tests in schema/error.rs that construct the old QueryError { message: "..." } struct

Do NOT create query/retrieve.rs before completing steps 1-4 -- the name collision will cause confusing compilation errors.

Intentional Spec Deviations

The following fields differ from Spec 08 Section 3.1's Retrieve struct definition. These are intentional improvements:

This task Spec 08 Reason
entity_kind: EntityKind entity: EntityKind More explicit — avoids confusion with entity_id
exclude: Vec<EntityId> exclude_ids: Vec<EntityId> Shorter; builder method .exclude() reads naturally
profile: ProfileRef profile: String Richer type; supports version pinning for A/B testing
diversity: Option<DiversityConstraints> diversity: Option<DiversitySpec> Matches m2p4's concrete type name

These deviations are safe: the spec defines behavior, not names. Future language parsing (M5) maps text tokens to these struct fields.