tidaldb/docs/planning/milestone-2/phase-5/task-01-retrieve-ast-and-parser.md
jordan 6fdaa1584b feat: complete M1 signal engine — m0p3 samples/docs, m1p5 TidalDb API, examples, and periodic checkpoint
- m0p3: CONTRIBUTING.md with run-samples checklist, all 4 examples
  (quickstart, cli_embedding, axum_embedding, actix_embedding), doc-test
  coverage for every public API surface
- m1p5: TidalDb public API — write_item, signal, read_decay_score,
  read_windowed_count, read_velocity; StorageBox enum routing memory vs
  fjall; WalSender/WalHandleWriter bridge; WAL replay on open
- Periodic checkpoint: 30s background thread for persistent+schema mode;
  FjallBackend::Clone (O(1), fjall::Keyspace is ref-counted); graceful
  shutdown via Arc<AtomicBool> + join before final checkpoint
- ROADMAP.md: M0 and M1 fully marked COMPLETE (341 tests passing)
- Milestone 2 planning scaffolding added under docs/planning/milestone-2/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 22:45:10 -07:00

983 lines
33 KiB
Markdown

# Task 01: RETRIEVE AST + Parser
## Context
**Milestone:** 2 -- Ranked Retrieval
**Phase:** m2p5 -- Query Parser and RETRIEVE Executor
**Depends On:** None (uses types from m2p2, m2p3, m2p4 but no m2p5 tasks)
**Blocks:** Task 02 (RETRIEVE Executor Pipeline), Task 03 (M2 UAT Integration Test)
**Complexity:** M
## Objective
Deliver the typed AST for the RETRIEVE query operation and a Rust builder API for constructing queries ergonomically. For M2, there is no text grammar parser -- the "parser" is the `RetrieveBuilder` which validates and constructs a `Retrieve` struct. The text syntax parser (`RETRIEVE items USING PROFILE trending LIMIT 25`) is deferred to M5.
This task also defines the response types (`Results`, `RetrieveResult`), the pagination cursor, the `Signal` write command struct (wired to the existing M1 signal write path), and the `QueryError` enum. These types are consumed by Task 02's executor and returned to the caller.
The types defined here map directly to the spec's input/output types (Spec 08 Section 3) but are scoped to M2: no `for_user`, no `similar_to`, no `for_cohort`, no `window`, no `context`, no `for_session`. These fields exist on the struct as `Option` types for forward compatibility but are validated as unsupported in M2 if set.
## Requirements
- `Retrieve` struct: the complete RETRIEVE query request
- `RetrieveBuilder`: ergonomic builder pattern for constructing `Retrieve` queries
- `ProfileRef`: profile name + optional version reference
- `Cursor`: opaque offset-based pagination cursor with base64 encoding
- `Results`: the query response (items, cursor, total_scored, constraints_satisfied)
- `RetrieveResult`: one result item (entity_id, score, rank, signal_snapshot)
- `Signal`: write command struct wired to `TidalDb::signal()`
- `QueryError`: error enum for query validation and execution failures
- Validation: limit range, profile reference format, filter compatibility
- No `unsafe` code
## Technical Design
### Module Structure
```
tidal/src/
query/
mod.rs -- pub mod retrieve; re-exports
retrieve.rs -- all types from this task
```
### Public API
```rust
// === query/retrieve.rs ===
use crate::schema::{EntityId, EntityKind, Timestamp};
use crate::ranking::diversity::DiversityConstraints;
use crate::storage::indexes::filter::FilterExpr;
/// Reference to a ranking profile by name, optionally pinned to a version.
///
/// The executor resolves this against the `ProfileRegistry` at query time.
/// If `version` is `None`, the latest version is used.
#[derive(Debug, Clone)]
pub struct ProfileRef {
/// Profile name. Must match a registered profile in the registry.
pub name: String,
/// Optional version pin. `None` = latest version.
pub version: Option<u32>,
}
impl ProfileRef {
pub fn new(name: impl Into<String>) -> Self {
Self {
name: name.into(),
version: None,
}
}
pub fn versioned(name: impl Into<String>, version: u32) -> Self {
Self {
name: name.into(),
version: Some(version),
}
}
}
/// A RETRIEVE query. Declarative: specifies what, not how.
///
/// For M2, this struct is constructed via `RetrieveBuilder` (Rust API).
/// A text syntax parser is deferred to M5.
///
/// The profile determines the candidate generation strategy and scoring
/// formula. The caller never specifies how candidates are found -- only
/// which profile to use and which filters to apply.
///
/// Spec reference: docs/specs/08-query-engine.md Section 3.1
#[derive(Debug, Clone)]
pub struct Retrieve {
/// Target entity type. For M2, only `EntityKind::Item` is supported.
pub entity_kind: EntityKind,
/// Named ranking profile. Determines candidate strategy and scoring.
pub profile: ProfileRef,
/// Metadata and signal filters. Combined as AND.
/// Uses `FilterExpr` from m2p2 for composable filter evaluation.
pub filters: Vec<FilterExpr>,
/// Diversity constraints. Applied as a post-scoring pass.
/// If `None`, no diversity enforcement is applied.
pub diversity: Option<DiversityConstraints>,
/// Maximum results to return. Default: 50. Range: [1, 500].
pub limit: usize,
/// Explicit item exclusions. Removed from candidate set before scoring.
pub exclude: Vec<EntityId>,
/// Pagination cursor from a previous result set.
/// If `None`, returns the first page.
pub cursor: Option<Cursor>,
// --- Fields present for forward compatibility (M3+), validated as unsupported in M2 ---
/// User context for personalization. M3+.
pub for_user: Option<u64>,
/// Anchor item for related/similar queries. M3+.
pub similar_to: Option<EntityId>,
/// Surface context for the feedback loop. M3+.
pub context: Option<String>,
}
/// Builder for constructing `Retrieve` queries ergonomically.
///
/// # Example
///
/// ```ignore
/// let query = Retrieve::builder()
/// .entity(EntityKind::Item)
/// .profile("trending")
/// .filter(FilterExpr::eq("category", "jazz"))
/// .diversity(DiversityConstraints::new().max_per_creator(2))
/// .limit(25)
/// .build()?;
/// ```
pub struct RetrieveBuilder {
entity_kind: Option<EntityKind>,
profile: Option<ProfileRef>,
filters: Vec<FilterExpr>,
diversity: Option<DiversityConstraints>,
limit: usize,
exclude: Vec<EntityId>,
cursor: Option<Cursor>,
for_user: Option<u64>,
similar_to: Option<EntityId>,
context: Option<String>,
}
impl RetrieveBuilder {
pub fn new() -> Self {
Self {
entity_kind: None,
profile: None,
filters: Vec::new(),
diversity: None,
limit: 50,
exclude: Vec::new(),
cursor: None,
for_user: None,
similar_to: None,
context: None,
}
}
/// Set the target entity kind.
pub fn entity(mut self, kind: EntityKind) -> Self {
self.entity_kind = Some(kind);
self
}
/// Set the ranking profile by name.
pub fn profile(mut self, name: impl Into<String>) -> Self {
self.profile = Some(ProfileRef::new(name));
self
}
/// Set the ranking profile by name and version.
pub fn profile_versioned(mut self, name: impl Into<String>, version: u32) -> Self {
self.profile = Some(ProfileRef::versioned(name, version));
self
}
/// Add a filter expression. Multiple filters are ANDed together.
pub fn filter(mut self, expr: FilterExpr) -> Self {
self.filters.push(expr);
self
}
/// Set diversity constraints.
pub fn diversity(mut self, constraints: DiversityConstraints) -> Self {
self.diversity = Some(constraints);
self
}
/// Set the maximum number of results. Range: [1, 500]. Default: 50.
pub fn limit(mut self, limit: usize) -> Self {
self.limit = limit;
self
}
/// Add an entity ID to the exclusion list.
pub fn exclude(mut self, id: EntityId) -> Self {
self.exclude.push(id);
self
}
/// Add multiple entity IDs to the exclusion list.
pub fn exclude_ids(mut self, ids: impl IntoIterator<Item = EntityId>) -> Self {
self.exclude.extend(ids);
self
}
/// Set the pagination cursor.
pub fn cursor(mut self, cursor: Cursor) -> Self {
self.cursor = Some(cursor);
self
}
/// Validate and build the `Retrieve` query.
///
/// Returns `QueryError::InvalidLimit` if limit is 0 or > 500.
/// Returns `QueryError::ProfileNotFound` if no profile is set.
/// Returns `QueryError::InvalidFilter` if `for_user` or `similar_to` are set (M2).
pub fn build(self) -> Result<Retrieve, QueryError> {
let entity_kind = self.entity_kind.unwrap_or(EntityKind::Item);
let profile = self.profile.ok_or_else(|| {
QueryError::ProfileNotFound("no profile specified".to_string())
})?;
if self.limit == 0 || self.limit > 500 {
return Err(QueryError::InvalidLimit {
requested: self.limit,
min: 1,
max: 500,
});
}
// M2: reject unsupported features
if self.for_user.is_some() {
return Err(QueryError::InvalidFilter {
field: "for_user".to_string(),
reason: "FOR USER clause is not supported until M3".to_string(),
});
}
if self.similar_to.is_some() {
return Err(QueryError::InvalidFilter {
field: "similar_to".to_string(),
reason: "SIMILAR TO clause is not supported until M3".to_string(),
});
}
Ok(Retrieve {
entity_kind,
profile,
filters: self.filters,
diversity: self.diversity,
limit: self.limit,
exclude: self.exclude,
cursor: self.cursor,
for_user: self.for_user,
similar_to: self.similar_to,
context: self.context,
})
}
}
impl Default for RetrieveBuilder {
fn default() -> Self {
Self::new()
}
}
impl Retrieve {
/// Start building a RETRIEVE query.
pub fn builder() -> RetrieveBuilder {
RetrieveBuilder::new()
}
}
/// The combined filter expression for the query.
///
/// Multiple filters are ANDed together. This helper constructs the
/// combined filter from the `Retrieve` query's filter list.
impl Retrieve {
/// Combine all filters into a single AND expression.
/// Returns `None` if no filters are specified.
pub fn combined_filter(&self) -> Option<FilterExpr> {
match self.filters.len() {
0 => None,
1 => Some(self.filters[0].clone()),
_ => Some(FilterExpr::And(self.filters.clone())),
}
}
}
// ============================================================
// Response Types
// ============================================================
/// The complete response from a RETRIEVE query.
///
/// Spec reference: docs/specs/08-query-engine.md Section 5.7
#[derive(Debug, Clone)]
pub struct Results {
/// The ranked result items for this page.
pub items: Vec<RetrieveResult>,
/// Cursor for the next page. `None` if this is the last page.
pub next_cursor: Option<Cursor>,
/// How many candidates were scored by the profile executor.
/// This is the count after filtering but before diversity and limit.
pub total_scored: usize,
/// Whether all diversity constraints were fully satisfied.
/// `false` if constraints were relaxed (see `DiversityResult::violations`).
pub constraints_satisfied: bool,
/// Non-fatal warnings from query execution.
///
/// Warnings are surfaced when the executor degrades gracefully:
/// - Metadata enrichment fails for a candidate (the item is treated as a
/// unique creator for diversity purposes; results are still returned)
/// - A filter references a field with no index (predicate fallback used)
///
/// An empty `warnings` vec means clean execution with no degradation.
pub warnings: Vec<String>,
}
impl Results {
/// Number of items in this page.
pub fn len(&self) -> usize {
self.items.len()
}
/// Whether this page is empty.
pub fn is_empty(&self) -> bool {
self.items.is_empty()
}
}
/// A single result item from a RETRIEVE query.
///
/// Includes the entity ID, composite score, rank position, and a
/// signal snapshot for debugging and transparency.
///
/// Spec reference: docs/specs/08-query-engine.md Section 5, Stage 10
#[derive(Debug, Clone)]
pub struct RetrieveResult {
/// The entity ID of the result.
pub entity_id: EntityId,
/// The composite score from the ranking profile, normalized to [0.0, 1.0].
pub score: f64,
/// The 1-based rank position in the result set.
pub rank: usize,
/// Key signal values used in scoring. For debugging and transparency.
/// Contains (signal_name, value) pairs for signals referenced by the
/// profile's scoring rules. Capped at 10 entries.
pub signal_snapshot: Vec<(String, f64)>,
}
// ============================================================
// Pagination Cursor
// ============================================================
/// Opaque pagination cursor for RETRIEVE queries.
///
/// For M2, this is a simple offset-based cursor encoded as a base64 string.
/// True keyset-based pagination (score + entity_id tiebreaker, Spec 08
/// Section 8.2) is deferred to M5.
///
/// # Limitation: Not Stable Under Concurrent Writes
///
/// Offset-based cursors are not stable when the underlying ranked list
/// changes between page requests (e.g., due to concurrent signal writes).
/// Items may appear on multiple pages or be skipped if the ranking shifts.
/// This is documented and acceptable for M2; the spec says to prefer
/// keyset cursors for production use. Do not use cursor-based pagination
/// in write-heavy workloads until M5.
///
/// The cursor is opaque to the caller -- they receive it as a string and
/// pass it back on the next request. The internal representation is an
/// implementation detail.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Cursor {
/// The offset into the full result set for the next page.
offset: usize,
}
impl Cursor {
/// Create a cursor from an offset.
pub(crate) fn from_offset(offset: usize) -> Self {
Self { offset }
}
/// Get the offset this cursor represents.
pub(crate) fn offset(&self) -> usize {
self.offset
}
/// Encode the cursor as an opaque base64 string.
pub fn encode(&self) -> String {
use base64::Engine as _;
let bytes = self.offset.to_le_bytes();
base64::engine::general_purpose::URL_SAFE_NO_PAD.encode(bytes)
}
/// Decode a cursor from an opaque base64 string.
pub fn decode(encoded: &str) -> Result<Self, QueryError> {
use base64::Engine as _;
let bytes = base64::engine::general_purpose::URL_SAFE_NO_PAD
.decode(encoded)
.map_err(|e| QueryError::InvalidCursor(format!("invalid base64: {e}")))?;
if bytes.len() != std::mem::size_of::<usize>() {
return Err(QueryError::InvalidCursor(format!(
"expected {} bytes, got {}",
std::mem::size_of::<usize>(),
bytes.len()
)));
}
let offset = usize::from_le_bytes(
bytes
.try_into()
.map_err(|_| QueryError::InvalidCursor("byte conversion failed".to_string()))?,
);
Ok(Self { offset })
}
}
// ============================================================
// Signal Write Command
// ============================================================
/// A signal write command.
///
/// For M2, this is a thin wrapper that routes to the existing
/// `TidalDb::signal()` method from M1. The struct form enables
/// future batching and the query language parser (M5) to produce
/// signal writes from parsed text.
#[derive(Debug, Clone)]
pub struct Signal {
/// The signal type name (e.g., "view", "like", "share").
pub signal_type: String,
/// The target entity ID.
pub entity_id: EntityId,
/// The signal weight. Typically 1.0 for count-based signals.
pub weight: f64,
/// The timestamp of the event.
pub timestamp: Timestamp,
}
impl Signal {
/// Create a new signal write command.
pub fn new(
signal_type: impl Into<String>,
entity_id: EntityId,
weight: f64,
timestamp: Timestamp,
) -> Self {
Self {
signal_type: signal_type.into(),
entity_id,
weight,
timestamp,
}
}
}
// ============================================================
// Query Error
// ============================================================
/// Errors returned by the query engine.
///
/// Spec reference: docs/specs/08-query-engine.md Section 3.4
#[derive(Debug, Clone)]
pub enum QueryError {
/// The named profile does not exist in the profile registry.
ProfileNotFound(String),
/// A filter references a field or uses a condition that is invalid.
InvalidFilter {
field: String,
reason: String,
},
/// The requested limit is out of the valid range [1, 500].
InvalidLimit {
requested: usize,
min: usize,
max: usize,
},
/// A required index (vector, bitmap, range) is not available.
IndexNotAvailable(String),
/// A storage engine error occurred during query execution.
///
/// Preserves the original error type so callers can match on specific
/// storage failure modes (e.g., `StorageError::Corruption`). Use
/// `From<crate::storage::StorageError>` for automatic conversion.
StorageError(crate::storage::StorageError),
/// The pagination cursor is invalid or could not be decoded.
InvalidCursor(String),
/// The profile's candidate strategy is not supported in this milestone.
/// M2 supports: Ann, Scan, SignalRanked. Others require M3+ infrastructure.
UnsupportedStrategy(String),
}
impl std::fmt::Display for QueryError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
QueryError::ProfileNotFound(name) => {
write!(f, "ranking profile '{name}' not found in registry")
}
QueryError::InvalidFilter { field, reason } => {
write!(f, "invalid filter on field '{field}': {reason}")
}
QueryError::InvalidLimit {
requested,
min,
max,
} => {
write!(f, "limit {requested} is out of range [{min}, {max}]")
}
QueryError::IndexNotAvailable(name) => {
write!(f, "required index not available: {name}")
}
QueryError::StorageError(e) => {
write!(f, "storage error during query: {e}")
}
QueryError::InvalidCursor(msg) => {
write!(f, "invalid pagination cursor: {msg}")
}
QueryError::UnsupportedStrategy(msg) => {
write!(f, "unsupported candidate strategy: {msg}")
}
}
}
}
impl std::error::Error for QueryError {}
impl From<crate::storage::StorageError> for QueryError {
fn from(e: crate::storage::StorageError) -> Self {
QueryError::StorageError(e)
}
}
```
### Validation Logic
```rust
impl Retrieve {
/// Validate the query against a ProfileRegistry and Schema.
///
/// Called by the executor before pipeline execution. Separated from
/// `build()` because profile existence requires the registry, which
/// the builder does not have access to.
pub(crate) fn validate(
&self,
registry: &ProfileRegistry,
) -> Result<(), QueryError> {
// 1. Profile existence
let profile_name = &self.profile.name;
let profile = match self.profile.version {
Some(v) => registry.get_versioned(profile_name, v),
None => registry.get(profile_name),
};
if profile.is_none() {
return Err(QueryError::ProfileNotFound(profile_name.clone()));
}
// 2. Limit range (already validated in builder, but defense in depth)
if self.limit == 0 || self.limit > 500 {
return Err(QueryError::InvalidLimit {
requested: self.limit,
min: 1,
max: 500,
});
}
// 3. M2: unsupported features
if self.for_user.is_some() {
return Err(QueryError::InvalidFilter {
field: "for_user".to_string(),
reason: "FOR USER clause requires M3".to_string(),
});
}
if self.similar_to.is_some() {
return Err(QueryError::InvalidFilter {
field: "similar_to".to_string(),
reason: "SIMILAR TO clause requires M3".to_string(),
});
}
// 4. Candidate strategy support check
let resolved_profile = profile.unwrap();
match resolved_profile.candidate_strategy() {
CandidateStrategy::Ann { .. }
| CandidateStrategy::Scan { .. }
| CandidateStrategy::SignalRanked { .. } => {}
other => {
return Err(QueryError::UnsupportedStrategy(
format!("{other:?} requires M3+ infrastructure"),
));
}
}
Ok(())
}
}
```
## Test Strategy
### Unit Tests
```rust
// === RetrieveBuilder tests ===
#[test]
fn builder_default_limit_50() {
let query = Retrieve::builder()
.profile("trending")
.build()
.unwrap();
assert_eq!(query.limit, 50);
assert_eq!(query.entity_kind, EntityKind::Item);
assert!(query.filters.is_empty());
assert!(query.diversity.is_none());
assert!(query.exclude.is_empty());
assert!(query.cursor.is_none());
}
#[test]
fn builder_with_all_fields() {
let query = Retrieve::builder()
.entity(EntityKind::Item)
.profile("hot")
.filter(FilterExpr::eq("category", "jazz"))
.filter(FilterExpr::eq("format", "video"))
.diversity(DiversityConstraints::new().max_per_creator(2))
.limit(25)
.exclude(EntityId::new(999))
.build()
.unwrap();
assert_eq!(query.entity_kind, EntityKind::Item);
assert_eq!(query.profile.name, "hot");
assert_eq!(query.filters.len(), 2);
assert!(query.diversity.is_some());
assert_eq!(query.limit, 25);
assert_eq!(query.exclude.len(), 1);
}
#[test]
fn builder_rejects_zero_limit() {
let result = Retrieve::builder()
.profile("trending")
.limit(0)
.build();
assert!(matches!(result, Err(QueryError::InvalidLimit { .. })));
}
#[test]
fn builder_rejects_limit_over_500() {
let result = Retrieve::builder()
.profile("trending")
.limit(501)
.build();
assert!(matches!(result, Err(QueryError::InvalidLimit { .. })));
}
#[test]
fn builder_rejects_missing_profile() {
let result = Retrieve::builder()
.entity(EntityKind::Item)
.limit(25)
.build();
assert!(matches!(result, Err(QueryError::ProfileNotFound(_))));
}
#[test]
fn builder_limit_boundary_values() {
// Min valid
let r1 = Retrieve::builder().profile("a").limit(1).build();
assert!(r1.is_ok());
assert_eq!(r1.unwrap().limit, 1);
// Max valid
let r2 = Retrieve::builder().profile("a").limit(500).build();
assert!(r2.is_ok());
assert_eq!(r2.unwrap().limit, 500);
}
#[test]
fn builder_profile_versioned() {
let query = Retrieve::builder()
.profile_versioned("trending", 3)
.build()
.unwrap();
assert_eq!(query.profile.name, "trending");
assert_eq!(query.profile.version, Some(3));
}
#[test]
fn builder_multiple_excludes() {
let query = Retrieve::builder()
.profile("new")
.exclude(EntityId::new(1))
.exclude(EntityId::new(2))
.exclude_ids(vec![EntityId::new(3), EntityId::new(4)])
.build()
.unwrap();
assert_eq!(query.exclude.len(), 4);
}
#[test]
fn combined_filter_none_when_empty() {
let query = Retrieve::builder().profile("new").build().unwrap();
assert!(query.combined_filter().is_none());
}
#[test]
fn combined_filter_single() {
let query = Retrieve::builder()
.profile("new")
.filter(FilterExpr::eq("category", "jazz"))
.build()
.unwrap();
let combined = query.combined_filter().unwrap();
assert!(matches!(combined, FilterExpr::Eq { .. }));
}
#[test]
fn combined_filter_multiple_becomes_and() {
let query = Retrieve::builder()
.profile("new")
.filter(FilterExpr::eq("category", "jazz"))
.filter(FilterExpr::eq("format", "video"))
.build()
.unwrap();
let combined = query.combined_filter().unwrap();
assert!(matches!(combined, FilterExpr::And(_)));
}
// === Cursor tests ===
#[test]
fn cursor_encode_decode_roundtrip() {
let cursor = Cursor::from_offset(42);
let encoded = cursor.encode();
let decoded = Cursor::decode(&encoded).unwrap();
assert_eq!(cursor, decoded);
}
#[test]
fn cursor_encode_decode_zero() {
let cursor = Cursor::from_offset(0);
let encoded = cursor.encode();
let decoded = Cursor::decode(&encoded).unwrap();
assert_eq!(cursor, decoded);
}
#[test]
fn cursor_encode_decode_large_offset() {
let cursor = Cursor::from_offset(100_000);
let encoded = cursor.encode();
let decoded = Cursor::decode(&encoded).unwrap();
assert_eq!(cursor, decoded);
}
#[test]
fn cursor_decode_invalid_base64() {
let result = Cursor::decode("!!!not-base64!!!");
assert!(matches!(result, Err(QueryError::InvalidCursor(_))));
}
#[test]
fn cursor_decode_wrong_length() {
use base64::Engine as _;
let encoded = base64::engine::general_purpose::URL_SAFE_NO_PAD.encode(&[1u8, 2, 3]);
let result = Cursor::decode(&encoded);
assert!(matches!(result, Err(QueryError::InvalidCursor(_))));
}
// === QueryError display tests ===
#[test]
fn query_error_display_messages() {
let e1 = QueryError::ProfileNotFound("trending".to_string());
assert!(e1.to_string().contains("trending"));
let e2 = QueryError::InvalidLimit {
requested: 0,
min: 1,
max: 500,
};
assert!(e2.to_string().contains("0"));
assert!(e2.to_string().contains("500"));
let e3 = QueryError::InvalidFilter {
field: "category".to_string(),
reason: "unknown field".to_string(),
};
assert!(e3.to_string().contains("category"));
let e4 = QueryError::InvalidCursor("bad cursor".to_string());
assert!(e4.to_string().contains("bad cursor"));
}
// === RetrieveResult tests ===
#[test]
fn results_len_and_is_empty() {
let empty = Results {
items: vec![],
next_cursor: None,
total_scored: 0,
constraints_satisfied: true,
warnings: vec![],
};
assert_eq!(empty.len(), 0);
assert!(empty.is_empty());
let one = Results {
items: vec![RetrieveResult {
entity_id: EntityId::new(1),
score: 0.5,
rank: 1,
signal_snapshot: vec![],
}],
next_cursor: None,
total_scored: 1,
constraints_satisfied: true,
warnings: vec![],
};
assert_eq!(one.len(), 1);
assert!(!one.is_empty());
}
// === Signal struct tests ===
#[test]
fn signal_new() {
let sig = Signal::new(
"view",
EntityId::new(42),
1.0,
Timestamp::from_nanos(1_000_000),
);
assert_eq!(sig.signal_type, "view");
assert_eq!(sig.entity_id, EntityId::new(42));
assert!((sig.weight - 1.0).abs() < f64::EPSILON);
}
```
### Property Tests
```rust
use proptest::prelude::*;
// P1: Cursor encode/decode is lossless for all valid offsets.
proptest! {
#[test]
fn cursor_roundtrip(offset in 0usize..1_000_000) {
let cursor = Cursor::from_offset(offset);
let encoded = cursor.encode();
let decoded = Cursor::decode(&encoded).unwrap();
prop_assert_eq!(cursor, decoded);
}
}
// P2: Builder always produces valid Retrieve when required fields are set.
proptest! {
#[test]
fn builder_valid_with_limit(limit in 1usize..=500) {
let result = Retrieve::builder()
.profile("test_profile")
.limit(limit)
.build();
prop_assert!(result.is_ok());
prop_assert_eq!(result.unwrap().limit, limit);
}
}
// P3: Builder rejects invalid limits.
proptest! {
#[test]
fn builder_rejects_invalid_limit(limit in 501usize..10_000) {
let result = Retrieve::builder()
.profile("test_profile")
.limit(limit)
.build();
prop_assert!(matches!(result, Err(QueryError::InvalidLimit { .. })));
}
}
```
## Acceptance Criteria
- [ ] `Retrieve` struct with all fields: `entity_kind`, `profile`, `filters`, `diversity`, `limit`, `exclude`, `cursor`, `for_user`, `similar_to`, `context`
- [ ] `RetrieveBuilder` with methods: `entity()`, `profile()`, `profile_versioned()`, `filter()`, `diversity()`, `limit()`, `exclude()`, `exclude_ids()`, `cursor()`, `build()`
- [ ] `RetrieveBuilder::build()` validates: limit in [1, 500], profile present, `for_user` and `similar_to` rejected in M2
- [ ] `Retrieve::combined_filter()` returns `None` for empty filters, single filter as-is, AND for multiple
- [ ] `ProfileRef` with `new()` (latest version) and `versioned()` (pinned version)
- [ ] `Results` struct with `items`, `next_cursor`, `total_scored`, `constraints_satisfied`, `warnings`, `len()`, `is_empty()`
- [ ] `RetrieveResult` struct with `entity_id`, `score`, `rank`, `signal_snapshot`
- [ ] `Cursor` with `from_offset()`, `offset()`, `encode()`, `decode()` -- base64 roundtrip is lossless
- [ ] `Cursor::decode()` returns `QueryError::InvalidCursor` for invalid input
- [ ] `Signal` struct with `new()` constructor and all fields
- [ ] `QueryError` enum with `ProfileNotFound`, `InvalidFilter`, `InvalidLimit`, `IndexNotAvailable`, `StorageError`, `InvalidCursor`, `UnsupportedStrategy`
- [ ] `QueryError` implements `Display` and `Error`
- [ ] `Retrieve::validate()` checks profile existence against `ProfileRegistry`, rejects unsupported candidate strategies
- [ ] Property test: cursor roundtrip for all valid offsets
- [ ] Property test: builder accepts valid limits [1, 500], rejects [501, ...]
- [ ] No `unsafe` code
- [ ] `cargo clippy -- -D warnings` passes
- [ ] All unit tests and property tests pass
## Research References
- [docs/specs/08-query-engine.md](../../../specs/08-query-engine.md) -- Section 3.1 (`Retrieve` struct fields), Section 3.4 (`QueryError` enum variants and validation rules), Section 8.2 (`Cursor` structure and encoding)
## Spec References
- [docs/specs/08-query-engine.md](../../../specs/08-query-engine.md) -- Section 2.1 (RETRIEVE operation overview), Section 3.1 (Retrieve input struct), Section 3.4 (QueryError enum), Section 8 (Pagination: cursor design, encoding, semantics)
## Implementation Notes
- Add `base64 = "0.22"` to `[dependencies]` in `tidal/Cargo.toml`. The `base64` crate is small (no transitive deps beyond `std`) and provides the `URL_SAFE_NO_PAD` engine for cursor encoding.
- The `query/mod.rs` file should be created with `pub mod retrieve;` and re-exports of all public types. The `pub mod executor;` line is added in Task 02 when the executor module is created.
- `lib.rs` should add `pub mod query;` -- this is the first time the query module exists.
- `FilterExpr` is imported from `crate::storage::indexes::filter` (m2p2). If the exact import path differs from the m2p2 implementation, adapt accordingly.
- `DiversityConstraints` is imported from `crate::ranking::diversity` (m2p4). The `DiversityConstraints::new()` and `.max_per_creator()` API must match the m2p4 implementation.
- The `for_user: Option<u64>` field uses `u64` rather than `UserId` because `UserId` (a newtype over `u64`) is not defined until M3 when user entities are introduced. In M3, this field will be changed to `Option<UserId>`.
- Do NOT add `serde` derives to the query types for M2. Serialization of query types is an M5+ concern when the text parser and network protocol are built.
## Migration from M1 QueryError Stub
**This task must remove the M1 stub `QueryError` before adding the new one.** A stub `QueryError` struct exists in `tidal/src/schema/error.rs` (simple `{ message: String }` struct) and is re-exported through `schema/mod.rs` and wrapped by `LumenError::Query`. The M2 `QueryError` is a rich enum that replaces it.
Migration steps (in order):
1. Remove the stub `QueryError` struct from `tidal/src/schema/error.rs`
2. Add `pub use crate::query::retrieve::QueryError;` in `tidal/src/schema/mod.rs` (or update the `From<QueryError>` impl in `LumenError` to use the new path)
3. Update `LumenError::Query` variant to hold `crate::query::retrieve::QueryError`
4. Update the `From<QueryError> for LumenError` impl to use the new enum
5. Fix any existing tests in `schema/error.rs` that construct the old `QueryError { message: "..." }` struct
Do NOT create `query/retrieve.rs` before completing steps 1-4 -- the name collision will cause confusing compilation errors.
## Intentional Spec Deviations
The following fields differ from Spec 08 Section 3.1's `Retrieve` struct definition. These are intentional improvements:
| This task | Spec 08 | Reason |
|-----------|---------|--------|
| `entity_kind: EntityKind` | `entity: EntityKind` | More explicit — avoids confusion with `entity_id` |
| `exclude: Vec<EntityId>` | `exclude_ids: Vec<EntityId>` | Shorter; builder method `.exclude()` reads naturally |
| `profile: ProfileRef` | `profile: String` | Richer type; supports version pinning for A/B testing |
| `diversity: Option<DiversityConstraints>` | `diversity: Option<DiversitySpec>` | Matches m2p4's concrete type name |
These deviations are safe: the spec defines behavior, not names. Future language parsing (M5) maps text tokens to these struct fields.