tidaldb/docs/specs/11-schema.md
jordan 413b712c0a chore: initialize tidalDB repository with schema foundation and standards
- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding
- Stub modules for storage, signals, query, ranking
- Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs
- Marketing site (Next.js) with blog infrastructure
- .claude/ agents and skills for the tidalDB development workflow
- Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config
- .gitignore hardened: .next/, node_modules/, .env, secrets, logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 12:52:20 -07:00

87 KiB

Schema Specification

Status: Draft Author: tidalDB Engineering Last Updated: 2026-02-20 Prerequisites: 02-entity-model.md, 03-signal-system.md, 04-relationships.md, API.md Research: thoughts.md (Stage 3 insight: schema encodes behavior, not just shape)


Table of Contents

  1. Design Principles
  2. Type System
  3. Schema Definition API
  4. Schema Versioning
  5. Schema Validation Rules
  6. Schema Migration
  7. Schema Introspection
  8. Defaults and Population Priors
  9. A/B Testing Support
  10. Schema Storage
  11. Example: Video Platform Schema
  12. Invariants and Correctness Guarantees

1. Design Principles

The schema system is the contract between the application and the database. It defines not just what data exists, but how that data behaves -- decay rates, velocity computation, scoring weights, diversity rules, cohort boundaries. This is the Stage 3 insight from thoughts.md: schema encodes behavior, not just shape.

Schema Is the Source of Truth for Behavior

In traditional databases, schema defines columns and types. Application code defines behavior. In tidalDB, the boundary shifts. A signal's half-life is not a magic constant in application code -- it is a declaration in schema that the database enforces. A ranking profile's scoring weights are not buried in a microservice -- they are versioned schema objects the database executes.

This design choice has three consequences:

  1. The query optimizer reasons about behavior. When the database sees USING PROFILE trending, it knows to use velocity signals, skip total-count indexes, and enforce per-creator diversity. A general-purpose database executing the same logic as an opaque UDF cannot optimize.

  2. Behavior changes do not require redeployment. Changing a ranking profile's exploration budget from 10% to 15% is a schema mutation, not a code change. It takes effect immediately for the next query.

  3. Behavior is auditable. Every ranking profile version is stored with a timestamp. "What scoring function was active during the incident last Tuesday?" is answerable by schema introspection.

Additive Changes Are Always Safe

The schema system distinguishes additive changes (always safe, no migration required) from breaking changes (require explicit migration with dry-run validation). This distinction is enforced at the API level -- an additive change is applied immediately; a breaking change returns a MigrationRequired error with a description of what would break.

Immutability Where It Matters

Signal definitions are immutable once created. Changing a signal's decay half-life would retroactively invalidate all historical running scores -- the O(1) running decay formula assumes a constant lambda. Rather than silently producing incorrect scores, the schema system rejects the mutation and requires the application to define a new signal type.

Ranking profiles are versioned rather than mutated. Version 1 of for_you and version 2 coexist. The application controls which version is active. Old versions can be queried explicitly for comparison and debugging.

Deep Module, Small Interface

The schema system exposes six definition methods (define_entity, define_signal, define_profile, define_cohort, define_relationship, migrate) and six introspection methods. Everything else -- validation, versioning, storage, cache invalidation, WAL logging -- is internal. The caller never interacts with the schema storage format, the version counter, or the validation engine directly.


2. Type System

All types that compose the schema. These are the Rust types that the application constructs and passes to define_* methods.

Entity Types

/// Definition of an entity type (Item, User, or Creator).
/// Passed to `db.define_entity()`.
pub struct EntityDef {
    /// Which entity kind this definition applies to.
    pub kind: EntityKind,
    /// Metadata fields carried by entities of this kind.
    pub metadata_fields: Vec<Field>,
    /// Embedding slots for vector search.
    pub embedding: EmbeddingDef,
}

/// The three entity kinds. Fixed -- not extensible by the application.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
pub enum EntityKind {
    Item,
    User,
    Creator,
}

/// A metadata field declaration.
pub struct Field {
    /// Field name. Lowercase alphanumeric plus underscores. Max 64 chars.
    pub name: String,
    /// Field data type, which determines indexing behavior.
    pub field_type: FieldType,
    /// Writability: who can set this field.
    pub writability: Writability,
}

/// Convenience constructors for Field.
impl Field {
    pub fn text(name: &str) -> Self;
    pub fn keyword(name: &str) -> Self;
    pub fn keywords(name: &str) -> Self;
    pub fn i64(name: &str) -> Self;
    pub fn f64(name: &str) -> Self;
    pub fn bool(name: &str) -> Self;
    pub fn timestamp(name: &str) -> Self;
    pub fn duration(name: &str) -> Self;

    /// A database-computed field with the given underlying storage type.
    /// Writability is automatically set to `DbComputed`.
    pub fn computed(name: &str, underlying: FieldType) -> Self;
}

/// Field data types. Determines storage format, index type, and query semantics.
#[derive(Clone, PartialEq, Eq, Debug)]
pub enum FieldType {
    /// UTF-8 string, BM25-indexed, full-text searchable.
    Text,
    /// UTF-8 string, exact-match indexed, filterable, facetable.
    Keyword,
    /// Vec<String>, each value exact-match indexed.
    Keywords,
    /// 64-bit signed integer, range-filterable, sortable.
    I64,
    /// 64-bit float, range-filterable, sortable.
    F64,
    /// Boolean, equality-filterable.
    Bool,
    /// UTC nanosecond timestamp, range-filterable, sortable.
    Timestamp,
    /// Duration in seconds (f64), range-filterable, sortable.
    Duration,
}

/// Who can write this field.
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub enum Writability {
    /// Application writes via write_*() / update_*().
    AppSet,
    /// Database computes from signal patterns and relationships.
    DbComputed,
    /// Database manages as part of signal processing (embeddings).
    DbManaged,
}

Embedding Types

/// Embedding configuration for an entity type.
pub struct EmbeddingDef {
    /// One or more embedding slots. Max 4 per entity type.
    pub slots: Vec<EmbeddingSlot>,
}

/// A single embedding vector slot.
pub struct EmbeddingSlot {
    /// Slot name. Unique within the entity type.
    pub name: String,
    /// Vector dimensions. Range: [2, 4096].
    pub dimensions: u32,
    /// Who provides this embedding.
    pub source: EmbeddingSource,
    /// Storage precision. Default: F16.
    pub precision: EmbeddingPrecision,
}

/// Who computes and writes the embedding.
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub enum EmbeddingSource {
    /// Application computes externally, writes via API.
    External,
    /// Database computes and maintains (e.g., user preference vector).
    DatabaseManaged,
}

/// Storage precision for embedding vectors.
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub enum EmbeddingPrecision {
    /// 16-bit float. Default. ~1% recall loss vs f32, 50% memory savings.
    F16,
    /// 32-bit float. Use when embedding model requires higher precision.
    F32,
    /// 8-bit integer quantization. For memory-constrained deployments.
    I8,
}

impl Default for EmbeddingPrecision {
    fn default() -> Self { Self::F16 }
}

Signal Types

/// Definition of a signal type. Passed to `db.define_signal()`.
/// Immutable once created -- changing decay would invalidate historical data.
pub struct SignalDef {
    /// Signal name. Unique globally. Lowercase alphanumeric plus underscores.
    pub name: String,
    /// Which entity type this signal targets.
    pub target: EntityKind,
    /// How the signal weight decays over time.
    pub decay: Decay,
    /// Time windows for which aggregates are maintained.
    pub windows: Vec<Window>,
    /// Whether to compute rate-of-change (velocity) per window.
    pub velocity: bool,
    /// Durability level for this signal type's WAL writes.
    /// Default: Batched { max_batch: 256, max_delay: 10ms }.
    pub durability: Option<DurabilityLevel>,
}

/// How signal weight diminishes over time.
#[derive(Clone, Debug, PartialEq)]
pub enum Decay {
    /// Signal weight halves every `half_life` duration.
    /// Formula: w(t) = w_0 * exp(-lambda * t), lambda = ln(2) / half_life
    /// The database precomputes and stores lambda at definition time.
    Exponential { half_life: Duration },

    /// Signal weight drops linearly to zero over `lifetime`.
    /// Formula: w(t) = w_0 * max(0, 1 - t / lifetime)
    /// Cannot use the O(1) running score trick (not multiplicatively
    /// composable). Uses windowed aggregation with linear interpolation
    /// at the boundary.
    Linear { lifetime: Duration },

    /// Signal weight never decays. For permanent state: hides, blocks.
    Permanent,
}

impl Decay {
    /// Precompute the decay rate constant lambda.
    /// Only meaningful for Exponential decay; returns None otherwise.
    pub fn lambda(&self) -> Option<f64> {
        match self {
            Decay::Exponential { half_life } => {
                Some(2.0_f64.ln() / half_life.as_secs_f64())
            }
            _ => None,
        }
    }
}

/// Time window for signal aggregation.
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Window {
    /// Fixed-duration sliding window.
    Sliding { duration: Duration },
    /// Unbounded accumulator -- all events since entity creation.
    AllTime,
}

impl Window {
    pub fn hours(n: u64) -> Self {
        Window::Sliding { duration: Duration::from_secs(n * 3600) }
    }
    pub fn days(n: u64) -> Self {
        Window::Sliding { duration: Duration::from_secs(n * 86400) }
    }
    pub fn all_time() -> Self { Window::AllTime }
}

Ranking Profile Types

/// Definition of a ranking profile. Passed to `db.define_profile()`.
/// Versioned -- multiple versions coexist under the same name.
pub struct ProfileDef {
    /// Profile name. Lowercase alphanumeric plus underscores and hyphens.
    pub name: String,
    /// Version number. Must be strictly greater than the latest existing
    /// version for this name (or 1 if no prior versions exist).
    pub version: u32,
    /// How to generate the initial candidate set.
    pub candidate: Candidate,
    /// Signal and relationship boosts applied during scoring.
    pub boosts: Vec<Boost>,
    /// Recency decay applied to candidate age.
    pub decay: Option<ProfileDecay>,
    /// Quality gates -- candidates below threshold are excluded.
    pub gates: Vec<Gate>,
    /// Negative signal penalties subtracted from score.
    pub penalties: Vec<Penalty>,
    /// Hard exclusion predicates evaluated before scoring.
    pub excludes: Vec<Exclude>,
    /// Post-scoring diversity constraints.
    pub diversity: Option<DiversitySpec>,
    /// Fraction of results reserved for exploration (new/unseen creators).
    /// Range: [0.0, 1.0]. Default: 0.0 (no exploration).
    pub exploration: f64,
    /// Optional sort override. If None, results are ordered by computed
    /// score. If Some, the specified sort mode takes precedence.
    pub sort: Option<Sort>,
}

/// How to generate the initial candidate set for scoring.
#[derive(Clone, Debug)]
pub enum Candidate {
    /// Approximate nearest neighbor retrieval over entity embeddings.
    Ann {
        /// Which vector to use as the query.
        query_vector: VectorSource,
        /// Which entity type to search.
        index: EntityKind,
        /// Which embedding slot to search against.
        embedding_slot: Option<String>,
        /// Number of ANN candidates to retrieve before scoring.
        top_k: u32,
    },
    /// Full scan of all entities of a given kind. Used for trending,
    /// browse, and other non-personalized surfaces.
    Scan {
        entity: EntityKind,
    },
    /// Retrieve content from entities connected by a relationship edge.
    /// E.g., items from followed creators.
    Relationship {
        edge: String,
    },
    /// Social graph traversal -- items engaged by users in the
    /// querying user's extended social graph.
    SocialGraph {
        depth: u8,
        edge: String,
        min_weight: f64,
    },
    /// Hybrid text + vector retrieval (for search).
    Hybrid {
        text_weight: f64,
        vector_weight: f64,
        fusion: Fusion,
    },
}

/// Where the query vector comes from.
#[derive(Clone, Debug)]
pub enum VectorSource {
    /// Use the querying user's preference embedding.
    UserPreference,
    /// Use a specific item's embedding (for related/up-next queries).
    ItemEmbedding { item_id: String },
    /// Use a vector provided by the caller (for search).
    Provided,
}

/// Fusion strategy for hybrid text + vector search.
#[derive(Clone, Debug)]
pub enum Fusion {
    /// Reciprocal Rank Fusion. RRF(d) = 1/(k + rank_bm25) + 1/(k + rank_ann).
    /// k=60 is the standard default. Rank-based, no score normalization needed.
    Rrf { k: u32 },
    /// Linear combination: alpha * text_score + (1-alpha) * vector_score.
    /// Requires score normalization. Use only after relevance tuning.
    Linear { alpha: f64 },
}

/// A positive scoring boost.
#[derive(Clone, Debug)]
pub enum Boost {
    /// Boost based on a signal's value within a window.
    Signal {
        signal: String,
        window: Window,
        mode: SignalMode,
        weight: f64,
    },
    /// Boost based on a relationship edge weight.
    Relationship {
        edge: String,
        weight: f64,
    },
    /// Boost based on social proof (engagement by user's social graph).
    SocialProof {
        weight: f64,
    },
}

/// What aspect of a signal to use in scoring.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum SignalMode {
    /// Raw count within the window.
    Count,
    /// Running decay score (exponentially weighted).
    Value,
    /// Rate of change within the window.
    Velocity,
    /// Ratio of unique users to total count.
    UniqueRatio,
    /// Ratio of this signal to another (e.g., likes / views).
    Ratio,
}

impl Boost {
    pub fn signal(signal: &str, window: Window, mode: SignalMode, weight: f64) -> Self {
        Boost::Signal {
            signal: signal.to_string(),
            window,
            mode,
            weight,
        }
    }
    pub fn relationship(edge: &str, weight: f64) -> Self {
        Boost::Relationship { edge: edge.to_string(), weight }
    }
    pub fn social_proof(weight: f64) -> Self {
        Boost::SocialProof { weight }
    }
}

/// Recency decay applied to candidate age in the profile.
#[derive(Clone, Debug)]
pub struct ProfileDecay {
    /// The timestamp field to use as the age reference.
    pub field: String,
    /// Half-life for age decay.
    pub half_life: Duration,
}

/// Quality gate -- candidates below the threshold are excluded.
#[derive(Clone, Debug)]
pub enum Gate {
    /// Minimum signal value to pass. Candidates below are excluded.
    Min {
        signal: String,
        window: Window,
        threshold: f64,
    },
    /// Minimum ratio of one signal to another.
    MinRatio {
        name: String,
        threshold: f64,
    },
}

impl Gate {
    pub fn min(signal: &str, window: Window, threshold: f64) -> Self {
        Gate::Min {
            signal: signal.to_string(),
            window,
            threshold,
        }
    }
    pub fn min_ratio(name: &str, threshold: f64) -> Self {
        Gate::MinRatio {
            name: name.to_string(),
            threshold,
        }
    }
}

/// Negative signal penalty subtracted from score.
#[derive(Clone, Debug)]
pub struct Penalty {
    /// Signal name.
    pub signal: String,
    /// Window to evaluate.
    pub window: Window,
    /// Penalty weight (should be negative).
    pub weight: f64,
}

impl Penalty {
    pub fn signal(signal: &str, window: Window, weight: f64) -> Self {
        Penalty {
            signal: signal.to_string(),
            window,
            weight,
        }
    }
}

/// Hard exclusion predicate evaluated before scoring begins.
#[derive(Clone, Debug)]
pub enum Exclude {
    /// Exclude items where this signal exists for the querying user.
    /// E.g., Exclude::signal("hide") excludes all hidden items.
    Signal { signal: String },
    /// Exclude based on relationship. E.g., Exclude::relationship("blocked").
    Relationship { edge: String },
}

impl Exclude {
    pub fn signal(signal: &str) -> Self {
        Exclude::Signal { signal: signal.to_string() }
    }
    pub fn relationship(edge: &str) -> Self {
        Exclude::Relationship { edge: edge.to_string() }
    }
}

/// Post-scoring diversity enforcement.
#[derive(Clone, Debug, Default)]
pub struct DiversitySpec {
    /// Maximum items from the same creator in the result set.
    pub max_per_creator: Option<u32>,
    /// Enforce a mix of content formats (video, short, article, etc.).
    pub format_mix: bool,
    /// Topic diversity via maximal marginal relevance (MMR).
    /// 0.0 = no enforcement, 1.0 = maximize diversity.
    pub topic_diversity: Option<f64>,
}

/// Sort mode override. Can be specified per-profile or per-query.
#[derive(Clone, Debug)]
pub enum Sort {
    Relevance,
    Personalized,
    New,
    Old,
    Hot,
    Trending,
    Rising,
    Controversial,
    HiddenGems,
    TopAllTime,
    TopHour,
    TopToday,
    TopWeek,
    TopMonth,
    TopYear,
    MostViewed,
    MostLiked,
    MostCommented,
    MostShared,
    Shortest,
    Longest,
    AlphabeticalAsc,
    AlphabeticalDesc,
    Shuffle,
    LiveViewerCount,
    DateSaved,
    CreatorEngagementRate,
    /// Sort by a specific metadata field.
    Field(String, SortDirection),
}

#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub enum SortDirection {
    Asc,
    Desc,
}

Cohort Types

/// Definition of a named cohort. Passed to `db.define_cohort()`.
/// Cohorts define reusable user segments for cohort-scoped queries.
pub struct CohortDef {
    /// Cohort name. Unique globally. Lowercase alphanumeric plus underscores.
    pub name: String,
    /// Predicate that defines cohort membership.
    pub predicate: Predicate,
    /// How often cohort membership is recomputed.
    pub refresh: RefreshPolicy,
}

/// Composable predicate for cohort membership evaluation.
/// Predicates reference fields on the User entity type.
#[derive(Clone, Debug)]
pub enum Predicate {
    /// Field equals a specific value.
    Eq(String, PredicateValue),
    /// Field does not equal a specific value.
    Neq(String, PredicateValue),
    /// Numeric field is greater than a threshold.
    Gt(String, f64),
    /// Numeric field is less than a threshold.
    Lt(String, f64),
    /// Numeric field is in a range [low, high].
    Range(String, f64, f64),
    /// Keywords field contains a specific value.
    Contains(String, String),
    /// Keywords field contains any of the given values (OR).
    ContainsAny(String, Vec<String>),
    /// All child predicates must be true.
    And(Vec<Predicate>),
    /// At least one child predicate must be true.
    Or(Vec<Predicate>),
    /// Child predicate must be false.
    Not(Box<Predicate>),
}

/// Value types used in predicate comparisons.
#[derive(Clone, Debug)]
pub enum PredicateValue {
    String(String),
    I64(i64),
    F64(f64),
    Bool(bool),
}

/// How often a cohort's membership set is recomputed.
#[derive(Clone, Debug)]
pub enum RefreshPolicy {
    /// Recompute every N minutes.
    Interval { minutes: u32 },
    /// Recompute every hour.
    Hourly,
    /// Recompute every day.
    Daily,
    /// Recompute on every relevant user metadata change.
    /// More expensive but always fresh. Suitable for small cohorts
    /// defined over app-set fields.
    OnWrite,
}

Relationship Types

/// Definition of a relationship type. Passed to `db.define_relationship()`.
pub struct RelationshipDef {
    /// Relationship name. Unique globally.
    pub name: String,
    /// Source entity kind.
    pub from: EntityKind,
    /// Target entity kind.
    pub to: EntityKind,
    /// Default weight for new edges of this type.
    pub weight_default: f64,
    /// Optional decay for the relationship weight.
    /// None = permanent (follows, blocks).
    /// Some = weight decays toward zero over time.
    pub decay: Option<Decay>,
    /// Whether the relationship is symmetric (A->B implies B->A).
    pub symmetric: bool,
}

Error Types

/// All errors that can occur during schema operations.
#[derive(Debug)]
pub enum SchemaError {
    // -- Entity validation errors --

    /// Entity kind already has a definition.
    EntityAlreadyDefined { kind: EntityKind },
    /// Duplicate field name within an entity type.
    DuplicateFieldName { kind: EntityKind, field: String },
    /// Field name is invalid (not lowercase alphanumeric + underscores).
    InvalidFieldName { field: String, reason: String },
    /// Embedding dimensions out of range [2, 4096].
    InvalidDimensions { slot: String, dimensions: u32 },
    /// Too many embedding slots (max 4 per entity type).
    TooManyEmbeddingSlots { kind: EntityKind, count: usize },
    /// Duplicate embedding slot name within an entity type.
    DuplicateEmbeddingSlot { kind: EntityKind, slot: String },

    // -- Signal validation errors --

    /// Signal name already exists.
    SignalAlreadyDefined { name: String },
    /// Signal name is invalid.
    InvalidSignalName { name: String, reason: String },
    /// Signal targets an entity kind that has no definition.
    UndefinedTargetEntity { signal: String, target: EntityKind },
    /// Permanent-decay signal has velocity enabled (meaningless).
    PermanentWithVelocity { signal: String },
    /// Too many windows on a signal (max 8).
    TooManyWindows { signal: String, count: usize },
    /// Too many signal types per entity type (max 64).
    TooManySignals { target: EntityKind, count: usize },
    /// AllTime window specified with velocity (undefined operation).
    AllTimeWithVelocity { signal: String },
    /// Attempted to modify an immutable signal definition.
    SignalImmutable { name: String },

    // -- Profile validation errors --

    /// Profile version already exists for this name.
    ProfileVersionExists { name: String, version: u32 },
    /// Profile version is not sequential (must be > latest).
    ProfileVersionNotSequential { name: String, expected: u32, got: u32 },
    /// Profile references a signal that is not defined.
    UndefinedSignal { profile: String, signal: String },
    /// Profile references a relationship type that is not defined.
    UndefinedRelationship { profile: String, edge: String },
    /// Profile references an entity type that is not defined.
    UndefinedEntity { profile: String, entity: EntityKind },
    /// Profile candidate strategy references an embedding slot that
    /// does not exist on the target entity type.
    UndefinedEmbeddingSlot { profile: String, slot: String },
    /// Exploration budget out of range [0.0, 1.0].
    InvalidExploration { profile: String, value: f64 },
    /// Topic diversity out of range [0.0, 1.0].
    InvalidTopicDiversity { profile: String, value: f64 },
    /// Profile name is invalid.
    InvalidProfileName { name: String, reason: String },

    // -- Cohort validation errors --

    /// Cohort name already exists.
    CohortAlreadyDefined { name: String },
    /// Cohort predicate references a field not defined on User entity.
    UndefinedCohortField { cohort: String, field: String },
    /// Cohort predicate references a field with incompatible type.
    CohortFieldTypeMismatch {
        cohort: String,
        field: String,
        expected: FieldType,
        got: String,
    },
    /// Maximum number of cohorts exceeded (100).
    TooManyCohorts { count: usize },

    // -- Relationship validation errors --

    /// Relationship name already exists.
    RelationshipAlreadyDefined { name: String },
    /// Relationship references an entity kind that is not defined.
    UndefinedRelationshipEntity { relationship: String, entity: EntityKind },
    /// Default weight out of range [0.0, 1.0].
    InvalidDefaultWeight { relationship: String, weight: f64 },

    // -- Migration errors --

    /// A breaking change was attempted without using the migration API.
    MigrationRequired { description: String },
    /// Migration references objects that no longer exist.
    MigrationTargetNotFound { description: String },
    /// Migration would invalidate active profiles or cohorts.
    MigrationBreaksDependent { migration: String, dependents: Vec<String> },

    // -- Write-path errors --

    /// Attempted to write a computed field via the write API.
    ComputedFieldWrite { entity: EntityKind, field: String },
    /// Entity with this ID already exists (use update_*() instead).
    EntityExists { kind: EntityKind, id: String },
    /// Entity ID collision in BLAKE3 hash space (astronomically unlikely).
    IdCollision { id_a: String, id_b: String },

    // -- Storage errors --

    /// Schema storage operation failed.
    StorageFailure(String),
}

3. Schema Definition API

The schema definition API is the set of methods on TidalDB that declare the structure and behavior of the database. All definitions are WAL-logged for crash recovery and stored in the B-tree backend under the SCHEMA: key prefix.

3.1 Define Entity

impl TidalDB {
    /// Define an entity type's metadata fields and embedding slots.
    ///
    /// Each entity kind (Item, User, Creator) is defined exactly once.
    /// Calling define_entity for an already-defined kind returns
    /// SchemaError::EntityAlreadyDefined.
    ///
    /// After definition, entities of this kind can be written via
    /// write_item(), write_user(), or write_creator().
    pub fn define_entity(&self, def: EntityDef) -> Result<(), SchemaError>;
}

Behavior on commit:

  1. Validate field names (unique, valid characters, max length).
  2. Validate embedding slots (unique names, valid dimensions, max 4 slots).
  3. Validate field types (computed fields have valid underlying type).
  4. WAL-log the schema change (record type 0x04).
  5. Store definition in SCHEMA:entity:{kind} key.
  6. Update in-memory schema cache.
  7. Initialize indexes for all declared fields (inverted index for text fields, term dictionary for keyword fields, sorted numeric index for numeric fields, etc.).

3.2 Define Signal

impl TidalDB {
    /// Define a signal type with its decay, windowing, and velocity behavior.
    ///
    /// Signal names are globally unique. The target entity kind must already
    /// be defined via define_entity.
    ///
    /// Signal definitions are immutable once created. Attempting to redefine
    /// an existing signal returns SchemaError::SignalImmutable.
    ///
    /// On success, all existing entities of the target kind receive an
    /// initialized (zeroed) signal ledger for this signal type.
    pub fn define_signal(&self, def: SignalDef) -> Result<(), SchemaError>;
}

Behavior on commit:

  1. Validate signal name (unique, valid characters).
  2. Validate target entity kind is defined.
  3. Validate decay/window/velocity constraints (see Section 5).
  4. Precompute lambda for exponential decay and store alongside definition.
  5. WAL-log the schema change.
  6. Store definition in SCHEMA:signal:{name} key.
  7. Update in-memory schema cache (signal type registry).
  8. Register signal type index (u8) for compact storage in WAL events.
  9. Existing entities of the target kind lazily receive zeroed ledger state for this signal on their next signal write (not eagerly initialized -- this would be O(N) for 10M entities).

3.3 Define Profile

impl TidalDB {
    /// Define a ranking profile version.
    ///
    /// Profile names are reusable -- each call creates a new version.
    /// Version numbers must be strictly increasing for a given name.
    /// The first version for a new name must be version 1.
    ///
    /// New profiles start in Draft status. Call activate_profile()
    /// to make them available for queries.
    pub fn define_profile(&self, def: ProfileDef) -> Result<(), SchemaError>;

    /// Transition a profile version's lifecycle status.
    pub fn set_profile_status(
        &self,
        name: &str,
        version: u32,
        status: ProfileStatus,
    ) -> Result<(), SchemaError>;

    /// Retrieve a profile by name. If version is None, returns the
    /// latest active version. If no active version exists, returns
    /// the latest version regardless of status.
    pub fn get_profile(
        &self,
        name: &str,
        version: Option<u32>,
    ) -> Result<ProfileDef, SchemaError>;
}

Behavior on commit:

  1. Validate profile name (valid characters).
  2. Validate version is sequential (> latest version for this name, or 1 if new).
  3. Validate all signal references exist (boost signals, gate signals, penalty signals, exclude signals).
  4. Validate all relationship references exist (boost relationships, exclude relationships, candidate edges).
  5. Validate candidate strategy (entity kind is defined, embedding slot exists, dimensions match).
  6. Validate exploration budget is in [0.0, 1.0].
  7. Validate diversity spec (topic_diversity in [0.0, 1.0] if present).
  8. WAL-log the schema change.
  9. Store definition in SCHEMA:profile:{name}:{version} key.
  10. Set initial status to Draft.
  11. Update in-memory schema cache.

3.4 Define Cohort

impl TidalDB {
    /// Define a named cohort (user segment) for cohort-scoped queries.
    ///
    /// Cohort predicates reference fields defined on the User entity type.
    /// The User entity must be defined before any cohorts can be defined.
    ///
    /// Maximum 100 cohort definitions (bounded by the cohort tracking
    /// storage budget -- see 03-signal-system.md Section 7).
    pub fn define_cohort(&self, def: CohortDef) -> Result<(), SchemaError>;
}

Behavior on commit:

  1. Validate cohort name (unique, valid characters).
  2. Validate total cohort count does not exceed 100.
  3. Validate predicate: all referenced fields exist on the User entity, types are compatible with the predicate operator.
  4. WAL-log the schema change.
  5. Store definition in SCHEMA:cohort:{name} key.
  6. Update in-memory schema cache.
  7. Schedule initial membership computation (background materializer evaluates the predicate against all existing users).

3.5 Define Relationship

impl TidalDB {
    /// Define a relationship type (edge kind) between entity types.
    ///
    /// Both source and target entity kinds must already be defined.
    /// Relationship names are globally unique.
    pub fn define_relationship(&self, def: RelationshipDef) -> Result<(), SchemaError>;
}

Behavior on commit:

  1. Validate relationship name (unique, valid characters).
  2. Validate from/to entity kinds are defined.
  3. Validate default weight is in [0.0, 1.0].
  4. If decay is specified, validate it (same rules as signal decay).
  5. WAL-log the schema change.
  6. Store definition in SCHEMA:relationship:{name} key.
  7. Update in-memory schema cache.

4. Schema Versioning

Different schema objects have different versioning semantics, reflecting the different consequences of change.

4.1 Versioning by Object Type

Schema Object Versioning Model Rationale
Entity definitions Append-only fields Removing or changing a field type would invalidate indexes and break queries.
Signal definitions Immutable Changing decay invalidates all historical running scores. Lambda is baked into the O(1) formula.
Ranking profiles Explicitly versioned Profiles are the tuning knob. Multiple versions must coexist for A/B testing and rollback.
Cohort definitions Mutable (predicate can change) Cohort membership is recomputed periodically. Changing the predicate simply changes the next computation.
Relationship definitions Immutable Changing from/to entity kinds or decay would invalidate existing edges.

4.2 Profile Version Lifecycle

Every profile version follows a four-state lifecycle:

                define_profile()
    (none) ─────────────────────────> Draft
                                        │
                   set_profile_status()  │  (validate all references)
                                        v
                                      Active
                                        │
                   set_profile_status()  │  (mark as deprecated,
                                        │   still queryable)
                                        v
                                    Deprecated
                                        │
                   set_profile_status()  │  (no longer queryable
                                        │   except by explicit version)
                                        v
                                    Archived
/// Lifecycle status of a ranking profile version.
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub enum ProfileStatus {
    /// Newly defined. Not yet available for queries.
    /// Can be tested via explicit version: get_profile("name", Some(version)).
    Draft,
    /// Available for queries. `get_profile("name", None)` returns
    /// the latest active version.
    Active,
    /// Still queryable by explicit version, but no longer returned
    /// as the "latest" active version. Used during A/B test wind-down.
    Deprecated,
    /// No longer queryable. Retained for audit purposes only.
    /// Querying an archived profile returns SchemaError.
    Archived,
}

Status transition rules:

Current Allowed Next Forbidden
Draft Active Deprecated, Archived
Active Deprecated Draft, Archived
Deprecated Archived, Active (re-activation) Draft
Archived (terminal) Any

Multiple active versions. Multiple versions of the same profile name can be Active simultaneously. This is intentional -- it enables A/B testing. The application decides which version to use per query by specifying the version explicitly. get_profile("for_you", None) returns the highest-versioned active version.

4.3 Schema Version Counter

The database maintains a monotonically increasing schema version counter. Every define_* call, set_profile_status call, and migration increments this counter. The counter serves as a cache invalidation epoch -- query plan caches are invalidated when the schema version changes.

impl TidalDB {
    /// Returns the current schema version number.
    /// Incremented on every schema definition or modification.
    pub fn schema_version(&self) -> u64;
}

5. Schema Validation Rules

Every schema definition is validated at definition time. Validation is eager and complete -- a definition that passes validation is guaranteed to be self-consistent and compatible with all existing definitions.

5.1 Validation Rules Reference

Rule ID Object Rule Error
V-E01 Entity Entity kind can only be defined once. EntityAlreadyDefined
V-E02 Entity Field names must be unique within an entity type. DuplicateFieldName
V-E03 Entity Field names: lowercase [a-z0-9_], max 64 characters, must start with a letter. InvalidFieldName
V-E04 Entity Embedding dimensions must be in [2, 4096]. InvalidDimensions
V-E05 Entity Maximum 4 embedding slots per entity type. TooManyEmbeddingSlots
V-E06 Entity Embedding slot names must be unique within an entity type. DuplicateEmbeddingSlot
V-S01 Signal Signal names must be globally unique. SignalAlreadyDefined
V-S02 Signal Signal names: lowercase [a-z0-9_], max 64 characters. InvalidSignalName
V-S03 Signal Target entity kind must have a definition. UndefinedTargetEntity
V-S04 Signal Permanent decay signals must have velocity: false. PermanentWithVelocity
V-S05 Signal Maximum 8 windows per signal type. TooManyWindows
V-S06 Signal Maximum 64 signal types per entity type. TooManySignals
V-S07 Signal AllTime window with velocity is forbidden. AllTimeWithVelocity
V-S08 Signal Existing signal definitions cannot be modified. SignalImmutable
V-P01 Profile Profile name: lowercase [a-z0-9_-], max 64 characters. InvalidProfileName
V-P02 Profile Version must be > latest version for this name (or 1 if new). ProfileVersionNotSequential
V-P03 Profile Version must not already exist for this name. ProfileVersionExists
V-P04 Profile All boost/penalty/gate signal references must be defined signals. UndefinedSignal
V-P05 Profile All boost/exclude relationship references must be defined relationship types. UndefinedRelationship
V-P06 Profile Candidate entity kind must be defined. UndefinedEntity
V-P07 Profile Candidate ANN embedding slot must exist on the target entity. UndefinedEmbeddingSlot
V-P08 Profile Exploration must be in [0.0, 1.0]. InvalidExploration
V-P09 Profile DiversitySpec.topic_diversity must be in [0.0, 1.0] if present. InvalidTopicDiversity
V-P10 Profile ProfileDecay.field must be a timestamp field on the candidate entity. UndefinedSignal (reused)
V-C01 Cohort Cohort names must be globally unique. CohortAlreadyDefined
V-C02 Cohort Predicate fields must exist on the User entity type. UndefinedCohortField
V-C03 Cohort Predicate field types must be compatible with the operator (Eq on keyword, Gt on numeric, Contains on keywords). CohortFieldTypeMismatch
V-C04 Cohort Maximum 100 cohort definitions. TooManyCohorts
V-R01 Relationship Relationship names must be globally unique. RelationshipAlreadyDefined
V-R02 Relationship From and To entity kinds must be defined. UndefinedRelationshipEntity
V-R03 Relationship Default weight must be in [0.0, 1.0]. InvalidDefaultWeight

5.2 Cross-Object Dependency Graph

Schema objects reference each other. The validation system maintains a dependency graph to prevent orphaned references and to power impact analysis during migrations.

EntityDef (Item)
    ^
    |-- SignalDef (view, target: Item)
    |       ^
    |       |-- ProfileDef (for_you, boost: view.velocity(24h))
    |       |-- ProfileDef (trending, boost: view.velocity(6h))
    |
    |-- EmbeddingSlot (content, 1536D)
    |       ^
    |       |-- ProfileDef (for_you, candidate: Ann, slot: content)
    |
    |-- Field (category)
            ^
            |-- CohortDef (jazz_fans, predicate: Contains(inferred_interests, "jazz"))

EntityDef (User)
    ^
    |-- CohortDef (young_us_jazz, predicate: And(...))
    |
    |-- Field (region)
            ^
            |-- CohortDef (us_users, predicate: Eq(region, "US"))

RelationshipDef (follows, from: User, to: Creator)
    ^
    |-- ProfileDef (following, candidate: Relationship("follows"))
    |-- ProfileDef (for_you, exclude: Relationship("blocked"))

Invariant: no dangling references. Every signal, profile, cohort, and relationship definition references only objects that exist at definition time. The validation engine checks all references eagerly. There are no deferred reference checks.

Invariant: no circular dependencies. Entity definitions depend on nothing. Signal definitions depend on entity definitions. Profile definitions depend on signal and relationship definitions. Cohort definitions depend on entity field definitions. This is a strict DAG with no cycles.


6. Schema Migration

6.1 Additive Changes (Always Safe)

These changes can be applied immediately via the standard define_* methods. No migration API required.

Change Method Effect on Existing Data
Add new field to entity type define_entity with additional fields Existing entities get NULL / default for the new field. Indexes are created empty and populated by background scan.
Add new signal type define_signal Existing entities lazily receive zeroed signal ledger on first signal write.
Add new ranking profile version define_profile New version coexists with old versions. No effect on existing data.
Add new cohort definition define_cohort Membership computed by background materializer. No effect on existing data.
Add new relationship type define_relationship No existing edges. Edges created on first write_relationship call.
Activate/deprecate/archive a profile set_profile_status Only affects which version get_profile(name, None) returns.

Adding fields to an entity type. This is the most common schema change. The API accepts a partial EntityDef that adds fields to an already-defined entity kind:

impl TidalDB {
    /// Add fields to an existing entity type definition.
    /// Only new fields are accepted -- existing fields cannot be
    /// modified or removed via this method.
    pub fn add_fields(
        &self,
        kind: EntityKind,
        fields: Vec<Field>,
    ) -> Result<(), SchemaError>;
}

After add_fields, the new fields are available for filtering, sorting, and cohort predicates. Existing entities that have not been updated return NULL for the new fields. Background index population scans existing entities and builds indexes for any non-NULL values.

6.2 Breaking Changes (Require Migration)

These changes would invalidate existing data, indexes, or references. They cannot be applied via define_* methods -- attempting to do so returns SchemaError::MigrationRequired.

Change Why It Breaks Migration Requirement
Remove entity field Profiles, cohorts, or sorts may reference it. Indexes must be dropped. Verify no dependents reference the field. Drop index.
Change field type Index format changes. Existing values may not be representable in the new type. Rebuild index. Validate existing values are compatible.
Remove signal type Profiles may reference it as a boost/gate/penalty/exclude. Verify no active profiles reference the signal. Mark signal as removed.
Change signal decay/windows Invalidates all historical running scores and windowed aggregates. Cannot be done. Define a new signal type instead.
Remove relationship type Profiles may reference it in candidate, boost, or exclude. Verify no active profiles reference the relationship. Delete all edges.
Remove cohort definition No direct dependents, but users relying on the cohort name lose it. Safe to remove if confirmed.

6.3 Migration API

impl TidalDB {
    /// Analyze a proposed migration and return a plan.
    /// Does NOT apply any changes. The plan describes:
    /// - What objects are affected
    /// - What dependents reference the affected objects
    /// - Estimated cost (index rebuild time, storage impact)
    pub fn plan_migration(
        &self,
        migration: Migration,
    ) -> Result<MigrationPlan, SchemaError>;

    /// Apply a previously planned migration.
    /// The plan must have been generated by plan_migration() in the
    /// same schema version (the plan is invalidated if schema changes
    /// between planning and application).
    pub fn apply_migration(
        &self,
        plan: MigrationPlan,
    ) -> Result<(), SchemaError>;
}

/// A migration describes one or more breaking schema changes.
pub struct Migration {
    /// Human-readable description.
    pub description: String,
    /// The individual operations in this migration.
    pub operations: Vec<MigrationOp>,
}

/// A single migration operation.
pub enum MigrationOp {
    /// Remove a field from an entity type.
    RemoveField { kind: EntityKind, field: String },
    /// Change a field's type (requires index rebuild + value validation).
    ChangeFieldType { kind: EntityKind, field: String, new_type: FieldType },
    /// Remove a signal type definition.
    RemoveSignal { name: String },
    /// Remove a relationship type definition and all its edges.
    RemoveRelationship { name: String },
    /// Remove a cohort definition.
    RemoveCohort { name: String },
}

/// The result of analyzing a migration.
pub struct MigrationPlan {
    /// The schema version at which this plan was generated.
    /// Plan is invalidated if schema_version changes.
    schema_version: u64,
    /// Objects that will be modified or removed.
    affected_objects: Vec<String>,
    /// Active profiles, cohorts, or other objects that reference
    /// the affected objects and must be updated first.
    blocked_by: Vec<MigrationBlocker>,
    /// Estimated cost of applying this migration.
    estimated_cost: MigrationCost,
}

pub struct MigrationBlocker {
    /// The dependent object (e.g., "profile:for_you:v3").
    pub object: String,
    /// Why it blocks the migration.
    pub reason: String,
}

pub struct MigrationCost {
    /// Estimated time to rebuild affected indexes.
    pub index_rebuild_time: Duration,
    /// Number of entities that need to be scanned.
    pub entities_affected: u64,
    /// Storage that will be freed.
    pub storage_freed: u64,
}

Migration workflow:

1. Application defines the migration:
   let migration = Migration {
       description: "Remove deprecated 'flair' field from Item".to_string(),
       operations: vec![MigrationOp::RemoveField {
           kind: EntityKind::Item,
           field: "flair".to_string(),
       }],
   };

2. Application plans the migration (dry-run):
   let plan = db.plan_migration(migration)?;
   // plan.blocked_by = ["cohort:flair_users references field 'flair'"]
   // Application must remove the cohort first.

3. Application resolves blockers:
   db.apply_migration(db.plan_migration(Migration {
       description: "Remove flair_users cohort".to_string(),
       operations: vec![MigrationOp::RemoveCohort {
           name: "flair_users".to_string(),
       }],
   })?)?;

4. Application re-plans the original migration:
   let plan = db.plan_migration(migration)?;
   // plan.blocked_by = []  -- no more blockers

5. Application applies the migration:
   db.apply_migration(plan)?;

6.4 Migration Compatibility Matrix

This matrix shows which schema changes are additive (safe) vs breaking (require migration).

Operation Entity Fields Signal Defs Profiles Cohorts Relationships
Add Safe Safe Safe (new version) Safe Safe
Remove Migration Migration N/A (archive instead) Migration Migration
Modify type Migration Forbidden N/A (new version) Safe (predicate) Forbidden
Modify behavior N/A Forbidden N/A (new version) Safe (refresh) Forbidden
Rename Migration Forbidden N/A (new name) Migration Forbidden

"Forbidden" means the operation is not supported at all -- the application must create a new object. This applies to signal definitions and relationship definitions where the original declaration's semantics are baked into persisted data (running scores, edge weights).


7. Schema Introspection

The introspection API allows the application to discover the current schema state. All introspection methods are read-only and lock-free (they read from the in-memory schema cache).

impl TidalDB {
    // -- Entity introspection --

    /// List all defined entity types with their field schemas.
    pub fn list_entities(&self) -> Vec<EntityInfo>;

    /// Describe a specific entity type.
    pub fn describe_entity(&self, kind: EntityKind) -> Result<EntityInfo, SchemaError>;

    // -- Signal introspection --

    /// List all defined signal types with their decay/window config.
    pub fn list_signals(&self) -> Vec<SignalInfo>;

    /// Describe a specific signal type.
    pub fn describe_signal(&self, name: &str) -> Result<SignalInfo, SchemaError>;

    // -- Profile introspection --

    /// List all profile names with their version history and statuses.
    pub fn list_profiles(&self) -> Vec<ProfileSummary>;

    /// Describe a specific profile version. If version is None,
    /// returns the latest active version.
    pub fn describe_profile(
        &self,
        name: &str,
        version: Option<u32>,
    ) -> Result<ProfileInfo, SchemaError>;

    // -- Cohort introspection --

    /// List all cohort definitions with their membership counts.
    pub fn list_cohorts(&self) -> Vec<CohortInfo>;

    /// Describe a specific cohort with its full predicate.
    pub fn describe_cohort(&self, name: &str) -> Result<CohortInfo, SchemaError>;

    // -- Relationship introspection --

    /// List all defined relationship types.
    pub fn list_relationships(&self) -> Vec<RelationshipInfo>;

    /// Describe a specific relationship type.
    pub fn describe_relationship(&self, name: &str) -> Result<RelationshipInfo, SchemaError>;

    // -- Global schema state --

    /// Current schema version number.
    pub fn schema_version(&self) -> u64;

    /// Full dependency graph of all schema objects.
    /// Useful for understanding the impact of a proposed change.
    pub fn schema_dependencies(&self) -> DependencyGraph;
}

Introspection Return Types

/// Summary of an entity type definition.
pub struct EntityInfo {
    pub kind: EntityKind,
    pub fields: Vec<FieldInfo>,
    pub embedding_slots: Vec<EmbeddingSlotInfo>,
    /// Number of active (non-archived) entities of this kind.
    pub entity_count: u64,
    /// Number of signal types targeting this entity kind.
    pub signal_type_count: u32,
}

pub struct FieldInfo {
    pub name: String,
    pub field_type: FieldType,
    pub writability: Writability,
    /// Whether an index exists for this field.
    pub indexed: bool,
}

pub struct EmbeddingSlotInfo {
    pub name: String,
    pub dimensions: u32,
    pub source: EmbeddingSource,
    pub precision: EmbeddingPrecision,
    /// Number of entities with a non-null vector in this slot.
    pub populated_count: u64,
}

/// Summary of a signal type definition.
pub struct SignalInfo {
    pub name: String,
    pub target: EntityKind,
    pub decay: Decay,
    pub lambda: Option<f64>,
    pub windows: Vec<Window>,
    pub velocity: bool,
    pub durability: DurabilityLevel,
}

/// Summary of profile versions for a given name.
pub struct ProfileSummary {
    pub name: String,
    pub versions: Vec<ProfileVersionSummary>,
}

pub struct ProfileVersionSummary {
    pub version: u32,
    pub status: ProfileStatus,
    pub created_at: Timestamp,
}

/// Full profile definition with metrics.
pub struct ProfileInfo {
    pub definition: ProfileDef,
    pub status: ProfileStatus,
    pub created_at: Timestamp,
    /// Total queries executed with this profile version.
    pub query_count: u64,
    /// Average query latency for this profile version.
    pub avg_latency: Duration,
}

/// Summary of a cohort definition.
pub struct CohortInfo {
    pub name: String,
    pub predicate: Predicate,
    pub refresh: RefreshPolicy,
    /// Current membership count (as of last refresh).
    pub member_count: u64,
    /// When membership was last recomputed.
    pub last_refreshed: Timestamp,
}

/// Summary of a relationship type definition.
pub struct RelationshipInfo {
    pub name: String,
    pub from: EntityKind,
    pub to: EntityKind,
    pub weight_default: f64,
    pub decay: Option<Decay>,
    pub symmetric: bool,
    /// Total number of active edges of this type.
    pub edge_count: u64,
}

/// The full dependency graph of all schema objects.
pub struct DependencyGraph {
    /// Each entry is (object_id, Vec<dependent_object_ids>).
    pub edges: Vec<(String, Vec<String>)>,
}

8. Defaults and Population Priors

The database ships with sensible defaults that enable a working system before the application defines any custom profiles. These defaults are overridable -- defining a profile with the same name replaces the built-in.

8.1 Built-in Ranking Profiles

The following profiles are automatically available after entity and signal types are defined. They are created with ProfileStatus::Active and version 0 (a reserved version number for built-ins that application-defined profiles override starting at version 1).

Profile Candidate Strategy Primary Signal Sort Semantics
for_you ANN over user preference vector, top_k=500 preference match + engagement velocity Personalized blend of semantic relevance and social proof
trending Scan all items view.velocity(6h) + share.velocity(6h) Pure signal velocity, no personalization
rising Scan all items Relative velocity: velocity(1h) / velocity(24h), age-boosted Content accelerating relative to its baseline
hot Scan all items score / (age_hours + 2)^1.8 Reddit-model age decay over cumulative engagement
following Relationship: follows N/A created_at DESC (pure chronological)
related ANN over anchor item embedding, top_k=200 Semantic similarity + collaborative filtering Most similar content to the anchor
browse Scan all items completion_rate * 0.4 + like_ratio * 0.3 + log(views) * 0.3 Quality-weighted with reach tiebreaker
search Hybrid text + vector, RRF(k=60) BM25 * 0.6 + semantic_similarity * 0.4 Relevance with quality boost
controversial Scan all items sqrt(positive_signals * negative_signals) Maximize engagement polarity
hidden_gems Scan all items completion_rate * like_ratio / log(views + 1) High quality, low reach
notification Relationship: follows, since last_seen interaction_weight * item_quality Most important notifications first
live Filter: status=live interaction_weight * log(viewer_count) Live content the user cares about

Override behavior. When the application defines for_you version 1, the built-in version 0 is automatically archived. The application's version takes precedence. If the application archives all versions of a profile that has a built-in, the built-in is restored as the fallback.

8.2 Built-in Signal Types

The database does not define signal types automatically. Signal types must be explicitly defined by the application because they determine storage layout and memory budget. However, the documentation includes a recommended set of 40+ signal types (see 03-signal-system.md Section 11) that covers the common content platform use case.

8.3 Population-Level Priors

These are database-maintained values that serve as defaults for cold-start entities.

Prior Definition Used For
Population preference vector Centroid (mean) of all active user preference vectors. Recomputed hourly by the background materializer. New users with no signal history. Their preference vector is initialized to this centroid.
Default signal baselines Per-signal-type median values across all active items. Cold-start exploration budget calibration: a new item's signals are compared against these baselines to estimate how much exploration is needed.
Global engagement distribution Distribution of engagement_level across all users (% power_user, regular, casual, dormant, new). Cohort-scoped queries without explicit cohort: "trending globally" uses the full distribution.

8.4 Cold Start Configuration

Cold start behavior is specified per ranking profile, not globally. The exploration field in ProfileDef controls how much of the result set is reserved for cold-start items.

// Profile with 10% exploration budget
ProfileDef {
    name: "for_you",
    exploration: 0.10,  // 10% of results from new/unseen content
    ..
}

Exploration budget mechanics:

  1. The query executor reserves floor(limit * exploration) slots for exploration items.
  2. Exploration candidates are items that meet ALL of:
    • Created within the last 48 hours (configurable)
    • Fewer than 1000 impressions (configurable)
    • Not hidden or blocked by the querying user
  3. Exploration candidates are ranked by a simplified score: content_similarity * freshness_bonus. No signal-based scoring (there are no signals to score).
  4. Exploration slots are distributed evenly through the result set (not clustered at the end).
  5. As an item accumulates signals, it exits the exploration pool and competes normally.

9. A/B Testing Support

tidalDB supports A/B testing of ranking profiles through the profile versioning system. The database does not perform traffic splitting -- that is application logic. The database provides the infrastructure: multiple active profile versions, per-version metrics, and deterministic query execution.

9.1 How A/B Testing Works

// The application maintains its own traffic split logic.
let profile_version = if user_in_experiment_bucket(user_id) {
    "for_you_v2"  // or get_profile("for_you", Some(2))
} else {
    "for_you"     // latest active version (v1)
};

let results = db.retrieve(Retrieve {
    for_user: Some(user_id),
    profile: profile_version,
    ..
})?;

9.2 Profile Metrics

The database tracks per-profile-version metrics automatically:

pub struct ProfileMetrics {
    /// Total queries executed with this profile version.
    pub query_count: u64,
    /// Latency percentiles (p50, p95, p99).
    pub latency_p50: Duration,
    pub latency_p95: Duration,
    pub latency_p99: Duration,
    /// Average number of candidates scored per query.
    pub avg_candidates_scored: f64,
    /// Average number of results returned per query.
    pub avg_results_returned: f64,
    /// When the first query was executed with this version.
    pub first_query_at: Option<Timestamp>,
    /// When the most recent query was executed.
    pub last_query_at: Option<Timestamp>,
}

impl TidalDB {
    /// Retrieve metrics for a specific profile version.
    pub fn profile_metrics(
        &self,
        name: &str,
        version: u32,
    ) -> Result<ProfileMetrics, SchemaError>;
}

These metrics help the application decide when to promote a new version to Active and deprecate the old one. The database does not make this decision -- it only provides the data.

9.3 What the Database Does NOT Do

  • Traffic splitting. The application decides which user sees which profile.
  • Statistical significance testing. The application runs its own hypothesis tests.
  • Automatic promotion. The application calls set_profile_status explicitly.
  • Metric comparison. The application queries profile_metrics for each version and compares.

This is a deliberate design choice. Traffic splitting and experimentation are application-domain concerns with complex requirements (random assignment, sticky bucketing, interaction effects, ramp-up schedules) that vary wildly across organizations. The database provides the building blocks; the application provides the logic.


10. Schema Storage

10.1 Storage Format

Schema definitions are stored in the B-tree backend (redb) under the SCHEMA: key prefix. This is the same backend used for entity metadata and materialized views -- read-heavy, rarely written.

Key Encoding:

SCHEMA:entity:{kind}              -> serialized EntityDef
SCHEMA:signal:{name}              -> serialized SignalDef + precomputed lambda
SCHEMA:profile:{name}:{version}   -> serialized ProfileDef + status + metadata
SCHEMA:cohort:{name}              -> serialized CohortDef + membership bitmap ref
SCHEMA:relationship:{name}        -> serialized RelationshipDef
SCHEMA:version                    -> u64 schema version counter
SCHEMA:metrics:profile:{name}:{v} -> serialized ProfileMetrics

10.2 In-Memory Schema Cache

On database open, all SCHEMA:* keys are loaded into an in-memory cache. The cache provides O(1) access to any schema object. All validation and introspection reads come from the cache, never from disk.

/// In-memory representation of the complete schema.
/// Loaded once at startup. Updated atomically on define_*() calls.
pub(crate) struct SchemaCache {
    /// Entity definitions by kind.
    entities: HashMap<EntityKind, EntityDef>,
    /// Signal definitions by name.
    signals: HashMap<String, SignalDef>,
    /// Signal type index: maps signal name to compact u8 index
    /// used in WAL events and hot-tier state.
    signal_type_ids: HashMap<String, u8>,
    /// Profile definitions by (name, version).
    profiles: HashMap<(String, u32), (ProfileDef, ProfileStatus)>,
    /// Cohort definitions by name.
    cohorts: HashMap<String, CohortDef>,
    /// Relationship definitions by name.
    relationships: HashMap<String, RelationshipDef>,
    /// Dependency graph for migration impact analysis.
    dependencies: DependencyGraph,
    /// Schema version counter.
    version: AtomicU64,
}

Cache invalidation. When a define_* method succeeds:

  1. The new definition is written to the B-tree backend.
  2. The schema cache is updated with the new definition.
  3. The schema version counter is incremented (atomic).
  4. Query plan caches that reference the old schema version are invalidated.

The cache update is performed under a RwLock (write-locked during mutation, read-locked during validation and introspection). Schema mutations are rare (minutes to hours between changes in production), so write-lock contention is negligible. Read-lock acquisition for validation and introspection is practically free.

10.3 WAL Logging

Every schema change is WAL-logged as a SchemaChange record (type 0x04) before the B-tree write occurs. This ensures crash recovery can replay schema changes and restore the schema to a consistent state.

SchemaChange WAL Record Payload:

+----------+-------+-----------------------------+
| Op Type  | Name  | Serialized Definition       |
| 1 byte   | var   | var                         |
+----------+-------+-----------------------------+

Op Types:
  0x01 = DefineEntity
  0x02 = DefineSignal
  0x03 = DefineProfile
  0x04 = DefineCohort
  0x05 = DefineRelationship
  0x06 = SetProfileStatus
  0x07 = AddFields
  0x08 = ApplyMigration

Recovery sequence. On crash recovery, SchemaChange records are replayed in sequence order. The entity store, signal ledger, and other subsystems are not updated until schema recovery completes -- they depend on having a consistent schema to validate incoming replayed events.


11. Example: Video Platform Schema

A complete schema definition for a video streaming platform, demonstrating all five object types. This example produces a working database that supports all 14 use cases from USE_CASES.md.

use tidaldb::{TidalDB, Config};
use tidaldb::schema::*;
use std::time::Duration;

fn define_video_platform_schema(db: &TidalDB) -> Result<(), SchemaError> {

    // =====================================================================
    // 1. ENTITY TYPES
    // =====================================================================

    db.define_entity(EntityDef {
        kind: EntityKind::Item,
        metadata_fields: vec![
            // Text fields (BM25 full-text indexed)
            Field::text("title"),
            Field::text("description"),
            // Keyword fields (exact match, filterable)
            Field::keyword("category"),
            Field::keywords("tags"),
            Field::keyword("format"),          // video, short, live, podcast
            Field::keyword("language"),
            Field::keyword("content_rating"),   // G, PG, PG-13, R
            Field::keyword("status"),           // published, live, scheduled
            Field::keyword("availability"),     // free, premium
            // Numeric
            Field::i64("award_count"),
            // Boolean
            Field::bool("has_subtitles"),
            Field::bool("downloadable"),
            Field::bool("safe_search"),
            // Duration
            Field::duration("duration"),
            // Timestamps
            Field::timestamp("created_at"),
            Field::timestamp("updated_at"),
        ],
        embedding: EmbeddingDef {
            slots: vec![
                EmbeddingSlot {
                    name: "content".to_string(),
                    dimensions: 1536,
                    source: EmbeddingSource::External,
                    precision: EmbeddingPrecision::F16,
                },
            ],
        },
    })?;

    db.define_entity(EntityDef {
        kind: EntityKind::User,
        metadata_fields: vec![
            // Application-set
            Field::keyword("locale"),
            Field::keyword("language"),
            Field::keyword("region"),
            Field::keyword("age_range"),
            Field::keyword("account_type"),
            Field::keywords("explicit_interests"),
            // Database-computed
            Field::computed("inferred_interests", FieldType::Keywords),
            Field::computed("engagement_level", FieldType::Keyword),
            Field::computed("content_format_preference", FieldType::Keyword),
            Field::computed("platform_tenure_days", FieldType::I64),
            Field::computed("followed_creator_count", FieldType::I64),
        ],
        embedding: EmbeddingDef {
            slots: vec![
                EmbeddingSlot {
                    name: "preference".to_string(),
                    dimensions: 1536,
                    source: EmbeddingSource::DatabaseManaged,
                    precision: EmbeddingPrecision::F16,
                },
            ],
        },
    })?;

    db.define_entity(EntityDef {
        kind: EntityKind::Creator,
        metadata_fields: vec![
            Field::text("name"),
            Field::keyword("handle"),
            Field::keyword("language"),
            Field::keyword("region"),
            Field::keywords("categories"),
            Field::bool("verified"),
            // Database-computed
            Field::computed("follower_count", FieldType::I64),
            Field::computed("total_items", FieldType::I64),
            Field::computed("avg_engagement_rate", FieldType::F64),
        ],
        embedding: EmbeddingDef {
            slots: vec![
                EmbeddingSlot {
                    name: "catalog".to_string(),
                    dimensions: 1536,
                    source: EmbeddingSource::DatabaseManaged,
                    precision: EmbeddingPrecision::F16,
                },
            ],
        },
    })?;

    // =====================================================================
    // 2. SIGNAL TYPES
    // =====================================================================

    // -- Positive engagement signals --

    db.define_signal(SignalDef {
        name: "view".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(7 * 86400) },
        windows: vec![
            Window::hours(1),
            Window::hours(24),
            Window::days(7),
            Window::days(30),
            Window::all_time(),
        ],
        velocity: true,
        durability: None, // default: Batched
    })?;

    db.define_signal(SignalDef {
        name: "like".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(7 * 86400) },
        windows: vec![
            Window::hours(1),
            Window::hours(24),
            Window::days(7),
            Window::all_time(),
        ],
        velocity: true,
        durability: None,
    })?;

    db.define_signal(SignalDef {
        name: "share".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(3 * 86400) },
        windows: vec![
            Window::hours(1),
            Window::hours(24),
            Window::days(7),
        ],
        velocity: true,
        durability: None,
    })?;

    db.define_signal(SignalDef {
        name: "comment".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(3 * 86400) },
        windows: vec![
            Window::hours(1),
            Window::hours(24),
            Window::days(7),
            Window::all_time(),
        ],
        velocity: true,
        durability: None,
    })?;

    db.define_signal(SignalDef {
        name: "save".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(7 * 86400) },
        windows: vec![Window::hours(24), Window::days(7), Window::all_time()],
        velocity: false,
        durability: None,
    })?;

    // -- Quality signals --

    db.define_signal(SignalDef {
        name: "completion".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(30 * 86400) },
        windows: vec![Window::all_time()],
        velocity: false,
        durability: None,
    })?;

    db.define_signal(SignalDef {
        name: "dwell_time".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(3 * 86400) },
        windows: vec![Window::hours(24), Window::days(7)],
        velocity: false,
        durability: Some(DurabilityLevel::Eventual),
    })?;

    db.define_signal(SignalDef {
        name: "impression".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(86400) },
        windows: vec![Window::hours(1), Window::hours(24)],
        velocity: false,
        durability: Some(DurabilityLevel::Eventual),
    })?;

    // -- Negative engagement signals --

    db.define_signal(SignalDef {
        name: "skip".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(86400) },
        windows: vec![Window::hours(1), Window::hours(24)],
        velocity: false,
        durability: None,
    })?;

    db.define_signal(SignalDef {
        name: "hide".to_string(),
        target: EntityKind::Item,
        decay: Decay::Permanent,
        windows: vec![],
        velocity: false,
        durability: Some(DurabilityLevel::Immediate),
    })?;

    db.define_signal(SignalDef {
        name: "dislike".to_string(),
        target: EntityKind::Item,
        decay: Decay::Exponential { half_life: Duration::from_secs(7 * 86400) },
        windows: vec![
            Window::hours(1),
            Window::hours(24),
            Window::days(7),
            Window::all_time(),
        ],
        velocity: true,
        durability: None,
    })?;

    db.define_signal(SignalDef {
        name: "report".to_string(),
        target: EntityKind::Item,
        decay: Decay::Permanent,
        windows: vec![Window::all_time()],
        velocity: false,
        durability: Some(DurabilityLevel::Immediate),
    })?;

    // =====================================================================
    // 3. RELATIONSHIP TYPES
    // =====================================================================

    db.define_relationship(RelationshipDef {
        name: "follows".to_string(),
        from: EntityKind::User,
        to: EntityKind::Creator,
        weight_default: 1.0,
        decay: None,
        symmetric: false,
    })?;

    db.define_relationship(RelationshipDef {
        name: "blocked".to_string(),
        from: EntityKind::User,
        to: EntityKind::Creator,
        weight_default: 1.0,
        decay: None,
        symmetric: false,
    })?;

    db.define_relationship(RelationshipDef {
        name: "muted".to_string(),
        from: EntityKind::User,
        to: EntityKind::Creator,
        weight_default: 1.0,
        decay: None,
        symmetric: false,
    })?;

    db.define_relationship(RelationshipDef {
        name: "saved".to_string(),
        from: EntityKind::User,
        to: EntityKind::Item,
        weight_default: 1.0,
        decay: None,
        symmetric: false,
    })?;

    db.define_relationship(RelationshipDef {
        name: "interaction_weight".to_string(),
        from: EntityKind::User,
        to: EntityKind::Creator,
        weight_default: 0.0,
        decay: Some(Decay::Exponential {
            half_life: Duration::from_secs(30 * 86400),
        }),
        symmetric: false,
    })?;

    db.define_relationship(RelationshipDef {
        name: "similarity".to_string(),
        from: EntityKind::Item,
        to: EntityKind::Item,
        weight_default: 0.0,
        decay: None, // recomputed periodically, not decayed
        symmetric: true,
    })?;

    // =====================================================================
    // 4. RANKING PROFILES
    // =====================================================================

    // -- Personalized feed --
    db.define_profile(ProfileDef {
        name: "for_you".to_string(),
        version: 1,
        candidate: Candidate::Ann {
            query_vector: VectorSource::UserPreference,
            index: EntityKind::Item,
            embedding_slot: Some("content".to_string()),
            top_k: 500,
        },
        boosts: vec![
            Boost::signal("view", Window::hours(24), SignalMode::Velocity, 0.3),
            Boost::relationship("interaction_weight", 0.2),
            Boost::social_proof(0.15),
        ],
        decay: Some(ProfileDecay {
            field: "created_at".to_string(),
            half_life: Duration::from_secs(48 * 3600),
        }),
        gates: vec![
            Gate::min("completion", Window::all_time(), 0.3),
        ],
        penalties: vec![
            Penalty::signal("skip", Window::hours(24), -0.5),
        ],
        excludes: vec![
            Exclude::signal("hide"),
            Exclude::relationship("blocked"),
        ],
        diversity: Some(DiversitySpec {
            max_per_creator: Some(2),
            format_mix: true,
            topic_diversity: None,
        }),
        exploration: 0.10,
        sort: None,
    })?;
    db.set_profile_status("for_you", 1, ProfileStatus::Active)?;

    // -- Trending --
    db.define_profile(ProfileDef {
        name: "trending".to_string(),
        version: 1,
        candidate: Candidate::Scan { entity: EntityKind::Item },
        boosts: vec![
            Boost::signal("share", Window::hours(6), SignalMode::Velocity, 0.5),
            Boost::signal("view", Window::hours(6), SignalMode::Velocity, 0.3),
            Boost::signal("view", Window::hours(24), SignalMode::UniqueRatio, 0.2),
        ],
        decay: None,
        gates: vec![],
        penalties: vec![],
        excludes: vec![],
        diversity: Some(DiversitySpec {
            max_per_creator: Some(1),
            format_mix: false,
            topic_diversity: None,
        }),
        exploration: 0.0,
        sort: None,
    })?;
    db.set_profile_status("trending", 1, ProfileStatus::Active)?;

    // -- Following feed --
    db.define_profile(ProfileDef {
        name: "following".to_string(),
        version: 1,
        candidate: Candidate::Relationship { edge: "follows".to_string() },
        boosts: vec![],
        decay: None,
        gates: vec![],
        penalties: vec![],
        excludes: vec![
            Exclude::relationship("blocked"),
        ],
        diversity: None,
        exploration: 0.0,
        sort: Some(Sort::New),
    })?;
    db.set_profile_status("following", 1, ProfileStatus::Active)?;

    // -- Search --
    db.define_profile(ProfileDef {
        name: "search".to_string(),
        version: 1,
        candidate: Candidate::Hybrid {
            text_weight: 0.6,
            vector_weight: 0.4,
            fusion: Fusion::Rrf { k: 60 },
        },
        boosts: vec![
            Boost::signal("completion", Window::all_time(), SignalMode::Value, 0.15),
            Boost::signal("like", Window::all_time(), SignalMode::Ratio, 0.10),
        ],
        decay: Some(ProfileDecay {
            field: "created_at".to_string(),
            half_life: Duration::from_secs(90 * 86400),
        }),
        gates: vec![],
        penalties: vec![],
        excludes: vec![
            Exclude::signal("hide"),
            Exclude::relationship("blocked"),
        ],
        diversity: Some(DiversitySpec {
            max_per_creator: Some(2),
            format_mix: false,
            topic_diversity: None,
        }),
        exploration: 0.0,
        sort: None,
    })?;
    db.set_profile_status("search", 1, ProfileStatus::Active)?;

    // -- Hidden gems --
    db.define_profile(ProfileDef {
        name: "hidden_gems".to_string(),
        version: 1,
        candidate: Candidate::Scan { entity: EntityKind::Item },
        boosts: vec![
            Boost::signal("completion", Window::all_time(), SignalMode::Value, 0.4),
            Boost::signal("like", Window::all_time(), SignalMode::Ratio, 0.3),
        ],
        decay: Some(ProfileDecay {
            field: "created_at".to_string(),
            half_life: Duration::from_secs(30 * 86400),
        }),
        gates: vec![
            Gate::min("completion", Window::all_time(), 0.6),
            Gate::min("view", Window::all_time(), 10.0),
        ],
        penalties: vec![
            // Penalize high-reach content (inverse reach scoring)
            Penalty::signal("view", Window::all_time(), -0.3),
        ],
        excludes: vec![
            Exclude::signal("hide"),
            Exclude::relationship("blocked"),
        ],
        diversity: Some(DiversitySpec {
            max_per_creator: Some(1),
            format_mix: true,
            topic_diversity: Some(0.7),
        }),
        exploration: 0.0,
        sort: None,
    })?;
    db.set_profile_status("hidden_gems", 1, ProfileStatus::Active)?;

    // =====================================================================
    // 5. COHORT DEFINITIONS
    // =====================================================================

    db.define_cohort(CohortDef {
        name: "us_young_jazz".to_string(),
        predicate: Predicate::And(vec![
            Predicate::Eq("region".to_string(), PredicateValue::String("US".to_string())),
            Predicate::Eq("age_range".to_string(), PredicateValue::String("18-24".to_string())),
            Predicate::Or(vec![
                Predicate::Contains("explicit_interests".to_string(), "jazz".to_string()),
                Predicate::Contains("inferred_interests".to_string(), "jazz".to_string()),
            ]),
        ]),
        refresh: RefreshPolicy::Hourly,
    })?;

    db.define_cohort(CohortDef {
        name: "power_users".to_string(),
        predicate: Predicate::Eq(
            "engagement_level".to_string(),
            PredicateValue::String("power_user".to_string()),
        ),
        refresh: RefreshPolicy::Hourly,
    })?;

    db.define_cohort(CohortDef {
        name: "new_users".to_string(),
        predicate: Predicate::And(vec![
            Predicate::Eq(
                "engagement_level".to_string(),
                PredicateValue::String("new".to_string()),
            ),
            Predicate::Lt("platform_tenure_days".to_string(), 30.0),
        ]),
        refresh: RefreshPolicy::Hourly,
    })?;

    Ok(())
}

What this schema enables:

After defining this schema, the application can execute all of these queries without any additional configuration:

// Personalized For You feed
db.retrieve(Retrieve { profile: "for_you", for_user: Some("user_123"), .. })?;

// Global trending
db.retrieve(Retrieve { profile: "trending", .. })?;

// Trending in jazz category
db.retrieve(Retrieve {
    profile: "trending",
    filters: vec![Filter::eq("category", "jazz")],
    ..
})?;

// Trending among US users aged 18-24 who like jazz
db.retrieve(Retrieve {
    profile: "trending",
    for_cohort: Some("us_young_jazz"),
    ..
})?;

// Following feed (chronological)
db.retrieve(Retrieve {
    profile: "following",
    for_user: Some("user_123"),
    ..
})?;

// Search with hybrid text + vector
db.search(Search {
    query: "jazz piano tutorial",
    vector: Some(&query_embedding),
    profile: "search",
    for_user: Some("user_123"),
    ..
})?;

// Hidden gems in the last 30 days
db.retrieve(Retrieve {
    profile: "hidden_gems",
    filters: vec![Filter::created_within(Duration::from_secs(30 * 86400))],
    ..
})?;

12. Invariants and Correctness Guarantees

These invariants must hold at all times. They are encoded as property tests, assertions, and crash recovery tests.

Schema Integrity Invariants

INV-SCH-1: No dangling references. Every signal, profile, cohort, and relationship definition references only objects that exist at the time of definition. Formally: for every reference R in a schema object O, the referenced object exists in the schema when O is defined. No lazy or deferred reference resolution.

INV-SCH-2: No orphaned dependents. A schema object referenced by another schema object cannot be removed unless the referencing object is removed first. The migration API enforces this via the blocked_by field in MigrationPlan.

INV-SCH-3: Signal immutability. Once a signal definition is committed, its name, target, decay, windows, and velocity fields cannot be changed. Any attempt returns SchemaError::SignalImmutable.

INV-SCH-4: Profile version monotonicity. For a given profile name, version numbers are strictly increasing. If versions 1, 2, 3 exist, the next must be 4 or greater.

INV-SCH-5: Schema cache consistency. The in-memory schema cache is always consistent with the B-tree storage. Formally: cache.get(key) == btree.get(key) for all SCHEMA:* keys, at all times after database open completes.

INV-SCH-6: WAL recoverability. After crash recovery, the schema state is identical to the state before the crash. All SchemaChange WAL records are replayed in order, and the resulting schema matches the pre-crash schema.

INV-SCH-7: Computed field write rejection. Any attempt to write a DbComputed or DbManaged field via the write API returns SchemaError::ComputedFieldWrite. The database never silently ignores a computed field write.

INV-SCH-8: Validation completeness. Every validation rule in Section 5 is checked for every definition. A definition that passes all rules is guaranteed to produce a consistent schema state. A definition that fails any rule is rejected without side effects (no partial writes).

Property Tests

// P1: Schema operations are atomic -- a failed define_* has no side effects.
proptest! {
    fn failed_define_no_side_effects(
        def in arb_invalid_signal_def(),
    ) {
        let db = TidalDB::open(test_config())?;
        let version_before = db.schema_version();
        let _ = db.define_signal(def); // expected to fail
        let version_after = db.schema_version();
        prop_assert_eq!(version_before, version_after);
    }
}

// P2: Profile version ordering is maintained.
proptest! {
    fn profile_versions_strictly_increasing(
        versions in prop::collection::vec(1u32..100, 1..20),
    ) {
        let db = TidalDB::open(test_config())?;
        setup_base_schema(&db)?;
        let mut sorted = versions.clone();
        sorted.sort();
        sorted.dedup();
        for &v in &sorted {
            let result = db.define_profile(make_profile("test", v));
            prop_assert!(result.is_ok());
        }
        // Verify versions are stored in order
        let summary = db.list_profiles();
        let stored_versions: Vec<u32> = summary.iter()
            .find(|p| p.name == "test")
            .unwrap()
            .versions.iter()
            .map(|v| v.version)
            .collect();
        prop_assert_eq!(stored_versions, sorted);
    }
}

// P3: Schema survives crash at any point during define_*.
proptest! {
    fn schema_crash_recovery(
        defs in arb_schema_definition_sequence(1..50),
        crash_point in 0usize..50,
    ) {
        let (wal, expected_schema) = execute_defs_with_crash(&defs, crash_point);
        let recovered_schema = replay_schema_from_wal(wal);
        prop_assert_eq!(expected_schema, recovered_schema);
    }
}

// P4: Validation rejects all invalid states.
proptest! {
    fn validation_rejects_invalid_references(
        signal_name in "[a-z]{1,10}",
    ) {
        let db = TidalDB::open(test_config())?;
        // No entity types defined -- signal should fail validation
        let result = db.define_signal(SignalDef {
            name: signal_name,
            target: EntityKind::Item,
            decay: Decay::Permanent,
            windows: vec![],
            velocity: false,
            durability: None,
        });
        prop_assert!(matches!(result, Err(SchemaError::UndefinedTargetEntity { .. })));
    }
}

// P5: Migration blockers are complete -- no migration succeeds
//     that would leave a dangling reference.
proptest! {
    fn migration_blockers_complete(
        schema in arb_complete_schema(),
        removal in arb_removal_from_schema(),
    ) {
        let plan = db.plan_migration(removal.clone())?;
        if plan.blocked_by.is_empty() {
            // Migration should succeed without creating dangling refs
            db.apply_migration(plan)?;
            assert_no_dangling_references(&db);
        } else {
            // Migration should be blocked
            // Verify each blocker is a real dependency
            for blocker in &plan.blocked_by {
                assert!(schema_references(&db, &blocker.object, &removal));
            }
        }
    }
}

Appendix A: Glossary

Term Definition
Schema The complete set of entity, signal, profile, cohort, and relationship definitions that describe the structure and behavior of a tidalDB instance.
Entity Definition Declaration of an entity kind's metadata fields and embedding slots.
Signal Definition Immutable declaration of a signal type's decay, windowing, and velocity behavior.
Ranking Profile Versioned, named scoring function combining candidate generation, boosts, gates, penalties, excludes, and diversity constraints.
Cohort A named user segment defined by a predicate over user entity fields.
Profile Version A specific numbered iteration of a ranking profile. Multiple versions can coexist.
Profile Lifecycle The four-state progression: Draft -> Active -> Deprecated -> Archived.
Additive Change A schema modification that does not invalidate existing data (add field, add signal, new profile version). Always safe.
Breaking Change A schema modification that would invalidate existing data or references (remove field, change type). Requires the migration API.
Migration Plan The result of analyzing a proposed breaking change: affected objects, blockers, and estimated cost.
Schema Version A monotonically increasing counter incremented on every schema change. Used for cache invalidation.
Lambda The precomputed decay rate constant: ln(2) / half_life_seconds. Stored alongside signal definitions.
Exploration Budget The fraction of query results reserved for cold-start items. Declared per ranking profile.
Population Prior Database-maintained default values (preference centroid, signal baselines) used for cold-start entities.

Appendix B: References

  1. thoughts.md -- Stage 3 insight: "Schema encodes behavior, not just shape."
  2. VISION.md -- Design principles: temporal decay as a type, ranking profiles as data.
  3. API.md -- Schema definition API surface and examples.
  4. 02-entity-model.md -- Entity type definitions, field types, writability model.
  5. 03-signal-system.md -- Signal type declarations, decay computation, windowed aggregation.
  6. 04-relationships.md -- Relationship edge types, weight update mechanics.
  7. CODING_GUIDELINES.md -- Error handling (Result<T, E> everywhere), trait abstraction, module boundaries.
  8. Ousterhout, J. "A Philosophy of Software Design." -- Deep modules, small interfaces.