tidaldb/docs/planning/milestone-7/phase-4/task-06-structured-error-context.md
2026-02-23 22:41:16 -07:00

8.2 KiB

Task 06: Structured TidalError Context Audit

Delivers

Audit and upgrade of all TidalError variants to carry structured operation context (operation name, entity ID, signal type) instead of bare strings. Every error produced by tidalDB identifies what operation was attempted and which entity or signal type was involved, enabling structured logging and programmatic error handling by embedders.

Complexity: M

Dependencies

  • task-01 complete (establishes the instrumentation pattern; new errors during query execution should carry QueryStats context)
  • tidal/src/schema/error.rs -- TidalError enum definition
  • tidal/src/query/retrieve/errors.rs -- QueryError enum definition
  • All db/*.rs files that construct TidalError variants

Technical Design

1. Add ErrorContext struct

Create a lightweight context struct that can be attached to error variants:

/// Structured context attached to errors for diagnostics.
///
/// Provides operation name, optional entity identification, and optional
/// signal type. Every `TidalError` variant that carries a `String` today
/// is migrated to carry an `ErrorContext` instead.
#[derive(Debug, Clone)]
pub struct ErrorContext {
    /// The operation that failed (e.g., "signal", "retrieve", "write_item").
    pub operation: &'static str,
    /// The entity involved, if any.
    pub entity_id: Option<u64>,
    /// The entity kind involved, if any.
    pub entity_kind: Option<EntityKind>,
    /// The signal type involved, if any.
    pub signal_type: Option<String>,
    /// Additional detail (replaces bare String messages).
    pub detail: String,
}

impl std::fmt::Display for ErrorContext {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "[op={}]", self.operation)?;
        if let Some(id) = self.entity_id {
            write!(f, "[entity={id}]")?;
        }
        if let Some(kind) = &self.entity_kind {
            write!(f, "[kind={kind}]")?;
        }
        if let Some(sig) = &self.signal_type {
            write!(f, "[signal={sig}]")?;
        }
        write!(f, " {}", self.detail)
    }
}

2. Upgrade TidalError::Internal

The Internal(String) variant is the most common bare-string error. Replace:

// Before:
#[error("internal error: {0}")]
Internal(String),

// After:
#[error("internal error: {0}")]
Internal(ErrorContext),

Update all construction sites. Example migration:

// Before:
TidalError::Internal("tantivy open failed".to_string())

// After:
TidalError::Internal(ErrorContext {
    operation: "open",
    entity_id: None,
    entity_kind: None,
    signal_type: None,
    detail: "tantivy open failed".to_string(),
})

3. Add convenience constructor

To keep migration manageable, add a helper:

impl ErrorContext {
    /// Create a context with just an operation and detail message.
    #[must_use]
    pub fn new(operation: &'static str, detail: impl Into<String>) -> Self {
        Self {
            operation,
            entity_id: None,
            entity_kind: None,
            signal_type: None,
            detail: detail.into(),
        }
    }

    /// Attach an entity ID to the context.
    #[must_use]
    pub const fn with_entity(mut self, id: u64) -> Self {
        self.entity_id = Some(id);
        self
    }

    /// Attach an entity kind to the context.
    #[must_use]
    pub const fn with_kind(mut self, kind: EntityKind) -> Self {
        self.entity_kind = Some(kind);
        self
    }

    /// Attach a signal type to the context.
    #[must_use]
    pub fn with_signal(mut self, signal_type: impl Into<String>) -> Self {
        self.signal_type = Some(signal_type.into());
        self
    }
}

impl TidalError {
    /// Convenience: create an `Internal` error with minimal context.
    pub fn internal(operation: &'static str, detail: impl Into<String>) -> Self {
        Self::Internal(ErrorContext::new(operation, detail))
    }
}

4. Audit plan

Systematic search-and-replace across the codebase. The migration touches every file that constructs TidalError::Internal(...):

File Approximate sites Operation name
db/mod.rs 5-8 "open", "close", "shutdown"
db/items.rs 3-5 "write_item", "read_item"
db/users.rs 2-3 "write_user", "read_user"
db/creators.rs 2-3 "write_creator", "read_creator"
db/signals.rs 3-5 "signal", "read_signal"
db/sessions.rs 4-6 "start_session", "close_session", "session_signal"
db/relationships.rs 2-3 "write_relationship"
db/query_ops.rs 2-4 "retrieve", "search"
db/state_rebuild.rs 3-5 "rebuild", "restore"
db/collections.rs 3-4 "create_collection", "add_to_collection"
db/cohorts.rs 2-3 "define_cohort"
text/index.rs 4-6 "text_index_open", "text_index_close"
wal/mod.rs 2-3 "wal_open", "wal_append"
storage/*.rs 3-5 "storage_get", "storage_put"

Estimated total: 40-60 construction sites.

5. QueryError context

QueryError variants already carry structured context (field names, profile names). No changes needed to QueryError -- it already meets the bar. The migration focuses on TidalError::Internal and any other String-only variants.

6. Backward compatibility

The Display implementation of ErrorContext produces output that starts with [op=...] followed by the detail message. Code that pattern-matches on TidalError::Internal will need to update from Internal(String) to Internal(ErrorContext). This is a breaking change to the internal error structure, but TidalError is already #[non_exhaustive]-equivalent (callers match on variant, not destructure).

For callers that only use .to_string(), the output changes from "internal error: foo" to "internal error: [op=signal] foo". This is strictly more informative.

Acceptance Criteria

  • ErrorContext struct defined in tidal/src/schema/error.rs
  • ErrorContext::new(), with_entity(), with_kind(), with_signal() constructors
  • TidalError::Internal(ErrorContext) replaces TidalError::Internal(String)
  • TidalError::internal() convenience constructor
  • All construction sites in db/*.rs migrated with correct operation names
  • All construction sites in text/*.rs migrated with correct operation names
  • All construction sites in wal/*.rs migrated with correct operation names
  • All construction sites in storage/*.rs migrated with correct operation names
  • No bare TidalError::Internal("...".to_string()) remains in the codebase
  • Error messages include [op=...] prefix in .to_string() output
  • Entity ID and signal type included where available
  • All existing tests updated for new error structure
  • cargo clippy -D warnings and cargo fmt --check pass

Test Strategy

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn error_context_display_minimal() {
        let ctx = ErrorContext::new("signal", "WAL append failed");
        assert_eq!(ctx.to_string(), "[op=signal] WAL append failed");
    }

    #[test]
    fn error_context_display_with_entity() {
        let ctx = ErrorContext::new("write_item", "storage full")
            .with_entity(42)
            .with_kind(EntityKind::Item);
        let s = ctx.to_string();
        assert!(s.contains("[op=write_item]"));
        assert!(s.contains("[entity=42]"));
        assert!(s.contains("[kind=item]"));
        assert!(s.contains("storage full"));
    }

    #[test]
    fn error_context_display_with_signal() {
        let ctx = ErrorContext::new("signal", "unknown type")
            .with_signal("view");
        let s = ctx.to_string();
        assert!(s.contains("[signal=view]"));
    }

    #[test]
    fn tidal_error_internal_convenience() {
        let err = TidalError::internal("open", "lock file exists");
        let s = err.to_string();
        assert!(s.contains("internal error"));
        assert!(s.contains("[op=open]"));
        assert!(s.contains("lock file exists"));
    }

    #[test]
    fn no_bare_internal_strings_in_codebase() {
        // This is a grep-based audit test, not a runtime test.
        // Run: grep -rn 'Internal("' tidal/src/ | grep -v 'test' | wc -l
        // Expected: 0
        //
        // Enforced by clippy lint or code review.
    }
}