- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding - Stub modules for storage, signals, query, ranking - Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs - Marketing site (Next.js) with blog infrastructure - .claude/ agents and skills for the tidalDB development workflow - Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config - .gitignore hardened: .next/, node_modules/, .env, secrets, logs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2312 lines
87 KiB
Markdown
2312 lines
87 KiB
Markdown
# Schema Specification
|
|
|
|
**Status:** Draft
|
|
**Author:** tidalDB Engineering
|
|
**Last Updated:** 2026-02-20
|
|
**Prerequisites:** [02-entity-model.md](02-entity-model.md), [03-signal-system.md](03-signal-system.md), [04-relationships.md](04-relationships.md), [API.md](../../API.md)
|
|
**Research:** [thoughts.md](../../thoughts.md) (Stage 3 insight: schema encodes behavior, not just shape)
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Design Principles](#1-design-principles)
|
|
2. [Type System](#2-type-system)
|
|
3. [Schema Definition API](#3-schema-definition-api)
|
|
4. [Schema Versioning](#4-schema-versioning)
|
|
5. [Schema Validation Rules](#5-schema-validation-rules)
|
|
6. [Schema Migration](#6-schema-migration)
|
|
7. [Schema Introspection](#7-schema-introspection)
|
|
8. [Defaults and Population Priors](#8-defaults-and-population-priors)
|
|
9. [A/B Testing Support](#9-ab-testing-support)
|
|
10. [Schema Storage](#10-schema-storage)
|
|
11. [Example: Video Platform Schema](#11-example-video-platform-schema)
|
|
12. [Invariants and Correctness Guarantees](#12-invariants-and-correctness-guarantees)
|
|
|
|
---
|
|
|
|
## 1. Design Principles
|
|
|
|
The schema system is the contract between the application and the database. It defines not just what data exists, but how that data behaves -- decay rates, velocity computation, scoring weights, diversity rules, cohort boundaries. This is the Stage 3 insight from thoughts.md: **schema encodes behavior, not just shape**.
|
|
|
|
### Schema Is the Source of Truth for Behavior
|
|
|
|
In traditional databases, schema defines columns and types. Application code defines behavior. In tidalDB, the boundary shifts. A signal's half-life is not a magic constant in application code -- it is a declaration in schema that the database enforces. A ranking profile's scoring weights are not buried in a microservice -- they are versioned schema objects the database executes.
|
|
|
|
This design choice has three consequences:
|
|
|
|
1. **The query optimizer reasons about behavior.** When the database sees `USING PROFILE trending`, it knows to use velocity signals, skip total-count indexes, and enforce per-creator diversity. A general-purpose database executing the same logic as an opaque UDF cannot optimize.
|
|
|
|
2. **Behavior changes do not require redeployment.** Changing a ranking profile's exploration budget from 10% to 15% is a schema mutation, not a code change. It takes effect immediately for the next query.
|
|
|
|
3. **Behavior is auditable.** Every ranking profile version is stored with a timestamp. "What scoring function was active during the incident last Tuesday?" is answerable by schema introspection.
|
|
|
|
### Additive Changes Are Always Safe
|
|
|
|
The schema system distinguishes additive changes (always safe, no migration required) from breaking changes (require explicit migration with dry-run validation). This distinction is enforced at the API level -- an additive change is applied immediately; a breaking change returns a `MigrationRequired` error with a description of what would break.
|
|
|
|
### Immutability Where It Matters
|
|
|
|
Signal definitions are immutable once created. Changing a signal's decay half-life would retroactively invalidate all historical running scores -- the O(1) running decay formula assumes a constant lambda. Rather than silently producing incorrect scores, the schema system rejects the mutation and requires the application to define a new signal type.
|
|
|
|
Ranking profiles are versioned rather than mutated. Version 1 of `for_you` and version 2 coexist. The application controls which version is active. Old versions can be queried explicitly for comparison and debugging.
|
|
|
|
### Deep Module, Small Interface
|
|
|
|
The schema system exposes six definition methods (`define_entity`, `define_signal`, `define_profile`, `define_cohort`, `define_relationship`, `migrate`) and six introspection methods. Everything else -- validation, versioning, storage, cache invalidation, WAL logging -- is internal. The caller never interacts with the schema storage format, the version counter, or the validation engine directly.
|
|
|
|
---
|
|
|
|
## 2. Type System
|
|
|
|
All types that compose the schema. These are the Rust types that the application constructs and passes to `define_*` methods.
|
|
|
|
### Entity Types
|
|
|
|
```rust
|
|
/// Definition of an entity type (Item, User, or Creator).
|
|
/// Passed to `db.define_entity()`.
|
|
pub struct EntityDef {
|
|
/// Which entity kind this definition applies to.
|
|
pub kind: EntityKind,
|
|
/// Metadata fields carried by entities of this kind.
|
|
pub metadata_fields: Vec<Field>,
|
|
/// Embedding slots for vector search.
|
|
pub embedding: EmbeddingDef,
|
|
}
|
|
|
|
/// The three entity kinds. Fixed -- not extensible by the application.
|
|
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
|
|
pub enum EntityKind {
|
|
Item,
|
|
User,
|
|
Creator,
|
|
}
|
|
|
|
/// A metadata field declaration.
|
|
pub struct Field {
|
|
/// Field name. Lowercase alphanumeric plus underscores. Max 64 chars.
|
|
pub name: String,
|
|
/// Field data type, which determines indexing behavior.
|
|
pub field_type: FieldType,
|
|
/// Writability: who can set this field.
|
|
pub writability: Writability,
|
|
}
|
|
|
|
/// Convenience constructors for Field.
|
|
impl Field {
|
|
pub fn text(name: &str) -> Self;
|
|
pub fn keyword(name: &str) -> Self;
|
|
pub fn keywords(name: &str) -> Self;
|
|
pub fn i64(name: &str) -> Self;
|
|
pub fn f64(name: &str) -> Self;
|
|
pub fn bool(name: &str) -> Self;
|
|
pub fn timestamp(name: &str) -> Self;
|
|
pub fn duration(name: &str) -> Self;
|
|
|
|
/// A database-computed field with the given underlying storage type.
|
|
/// Writability is automatically set to `DbComputed`.
|
|
pub fn computed(name: &str, underlying: FieldType) -> Self;
|
|
}
|
|
|
|
/// Field data types. Determines storage format, index type, and query semantics.
|
|
#[derive(Clone, PartialEq, Eq, Debug)]
|
|
pub enum FieldType {
|
|
/// UTF-8 string, BM25-indexed, full-text searchable.
|
|
Text,
|
|
/// UTF-8 string, exact-match indexed, filterable, facetable.
|
|
Keyword,
|
|
/// Vec<String>, each value exact-match indexed.
|
|
Keywords,
|
|
/// 64-bit signed integer, range-filterable, sortable.
|
|
I64,
|
|
/// 64-bit float, range-filterable, sortable.
|
|
F64,
|
|
/// Boolean, equality-filterable.
|
|
Bool,
|
|
/// UTC nanosecond timestamp, range-filterable, sortable.
|
|
Timestamp,
|
|
/// Duration in seconds (f64), range-filterable, sortable.
|
|
Duration,
|
|
}
|
|
|
|
/// Who can write this field.
|
|
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
|
|
pub enum Writability {
|
|
/// Application writes via write_*() / update_*().
|
|
AppSet,
|
|
/// Database computes from signal patterns and relationships.
|
|
DbComputed,
|
|
/// Database manages as part of signal processing (embeddings).
|
|
DbManaged,
|
|
}
|
|
```
|
|
|
|
### Embedding Types
|
|
|
|
```rust
|
|
/// Embedding configuration for an entity type.
|
|
pub struct EmbeddingDef {
|
|
/// One or more embedding slots. Max 4 per entity type.
|
|
pub slots: Vec<EmbeddingSlot>,
|
|
}
|
|
|
|
/// A single embedding vector slot.
|
|
pub struct EmbeddingSlot {
|
|
/// Slot name. Unique within the entity type.
|
|
pub name: String,
|
|
/// Vector dimensions. Range: [2, 4096].
|
|
pub dimensions: u32,
|
|
/// Who provides this embedding.
|
|
pub source: EmbeddingSource,
|
|
/// Storage precision. Default: F16.
|
|
pub precision: EmbeddingPrecision,
|
|
}
|
|
|
|
/// Who computes and writes the embedding.
|
|
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
|
|
pub enum EmbeddingSource {
|
|
/// Application computes externally, writes via API.
|
|
External,
|
|
/// Database computes and maintains (e.g., user preference vector).
|
|
DatabaseManaged,
|
|
}
|
|
|
|
/// Storage precision for embedding vectors.
|
|
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
|
|
pub enum EmbeddingPrecision {
|
|
/// 16-bit float. Default. ~1% recall loss vs f32, 50% memory savings.
|
|
F16,
|
|
/// 32-bit float. Use when embedding model requires higher precision.
|
|
F32,
|
|
/// 8-bit integer quantization. For memory-constrained deployments.
|
|
I8,
|
|
}
|
|
|
|
impl Default for EmbeddingPrecision {
|
|
fn default() -> Self { Self::F16 }
|
|
}
|
|
```
|
|
|
|
### Signal Types
|
|
|
|
```rust
|
|
/// Definition of a signal type. Passed to `db.define_signal()`.
|
|
/// Immutable once created -- changing decay would invalidate historical data.
|
|
pub struct SignalDef {
|
|
/// Signal name. Unique globally. Lowercase alphanumeric plus underscores.
|
|
pub name: String,
|
|
/// Which entity type this signal targets.
|
|
pub target: EntityKind,
|
|
/// How the signal weight decays over time.
|
|
pub decay: Decay,
|
|
/// Time windows for which aggregates are maintained.
|
|
pub windows: Vec<Window>,
|
|
/// Whether to compute rate-of-change (velocity) per window.
|
|
pub velocity: bool,
|
|
/// Durability level for this signal type's WAL writes.
|
|
/// Default: Batched { max_batch: 256, max_delay: 10ms }.
|
|
pub durability: Option<DurabilityLevel>,
|
|
}
|
|
|
|
/// How signal weight diminishes over time.
|
|
#[derive(Clone, Debug, PartialEq)]
|
|
pub enum Decay {
|
|
/// Signal weight halves every `half_life` duration.
|
|
/// Formula: w(t) = w_0 * exp(-lambda * t), lambda = ln(2) / half_life
|
|
/// The database precomputes and stores lambda at definition time.
|
|
Exponential { half_life: Duration },
|
|
|
|
/// Signal weight drops linearly to zero over `lifetime`.
|
|
/// Formula: w(t) = w_0 * max(0, 1 - t / lifetime)
|
|
/// Cannot use the O(1) running score trick (not multiplicatively
|
|
/// composable). Uses windowed aggregation with linear interpolation
|
|
/// at the boundary.
|
|
Linear { lifetime: Duration },
|
|
|
|
/// Signal weight never decays. For permanent state: hides, blocks.
|
|
Permanent,
|
|
}
|
|
|
|
impl Decay {
|
|
/// Precompute the decay rate constant lambda.
|
|
/// Only meaningful for Exponential decay; returns None otherwise.
|
|
pub fn lambda(&self) -> Option<f64> {
|
|
match self {
|
|
Decay::Exponential { half_life } => {
|
|
Some(2.0_f64.ln() / half_life.as_secs_f64())
|
|
}
|
|
_ => None,
|
|
}
|
|
}
|
|
}
|
|
|
|
/// Time window for signal aggregation.
|
|
#[derive(Clone, Debug, PartialEq, Eq)]
|
|
pub enum Window {
|
|
/// Fixed-duration sliding window.
|
|
Sliding { duration: Duration },
|
|
/// Unbounded accumulator -- all events since entity creation.
|
|
AllTime,
|
|
}
|
|
|
|
impl Window {
|
|
pub fn hours(n: u64) -> Self {
|
|
Window::Sliding { duration: Duration::from_secs(n * 3600) }
|
|
}
|
|
pub fn days(n: u64) -> Self {
|
|
Window::Sliding { duration: Duration::from_secs(n * 86400) }
|
|
}
|
|
pub fn all_time() -> Self { Window::AllTime }
|
|
}
|
|
```
|
|
|
|
### Ranking Profile Types
|
|
|
|
```rust
|
|
/// Definition of a ranking profile. Passed to `db.define_profile()`.
|
|
/// Versioned -- multiple versions coexist under the same name.
|
|
pub struct ProfileDef {
|
|
/// Profile name. Lowercase alphanumeric plus underscores and hyphens.
|
|
pub name: String,
|
|
/// Version number. Must be strictly greater than the latest existing
|
|
/// version for this name (or 1 if no prior versions exist).
|
|
pub version: u32,
|
|
/// How to generate the initial candidate set.
|
|
pub candidate: Candidate,
|
|
/// Signal and relationship boosts applied during scoring.
|
|
pub boosts: Vec<Boost>,
|
|
/// Recency decay applied to candidate age.
|
|
pub decay: Option<ProfileDecay>,
|
|
/// Quality gates -- candidates below threshold are excluded.
|
|
pub gates: Vec<Gate>,
|
|
/// Negative signal penalties subtracted from score.
|
|
pub penalties: Vec<Penalty>,
|
|
/// Hard exclusion predicates evaluated before scoring.
|
|
pub excludes: Vec<Exclude>,
|
|
/// Post-scoring diversity constraints.
|
|
pub diversity: Option<DiversitySpec>,
|
|
/// Fraction of results reserved for exploration (new/unseen creators).
|
|
/// Range: [0.0, 1.0]. Default: 0.0 (no exploration).
|
|
pub exploration: f64,
|
|
/// Optional sort override. If None, results are ordered by computed
|
|
/// score. If Some, the specified sort mode takes precedence.
|
|
pub sort: Option<Sort>,
|
|
}
|
|
|
|
/// How to generate the initial candidate set for scoring.
|
|
#[derive(Clone, Debug)]
|
|
pub enum Candidate {
|
|
/// Approximate nearest neighbor retrieval over entity embeddings.
|
|
Ann {
|
|
/// Which vector to use as the query.
|
|
query_vector: VectorSource,
|
|
/// Which entity type to search.
|
|
index: EntityKind,
|
|
/// Which embedding slot to search against.
|
|
embedding_slot: Option<String>,
|
|
/// Number of ANN candidates to retrieve before scoring.
|
|
top_k: u32,
|
|
},
|
|
/// Full scan of all entities of a given kind. Used for trending,
|
|
/// browse, and other non-personalized surfaces.
|
|
Scan {
|
|
entity: EntityKind,
|
|
},
|
|
/// Retrieve content from entities connected by a relationship edge.
|
|
/// E.g., items from followed creators.
|
|
Relationship {
|
|
edge: String,
|
|
},
|
|
/// Social graph traversal -- items engaged by users in the
|
|
/// querying user's extended social graph.
|
|
SocialGraph {
|
|
depth: u8,
|
|
edge: String,
|
|
min_weight: f64,
|
|
},
|
|
/// Hybrid text + vector retrieval (for search).
|
|
Hybrid {
|
|
text_weight: f64,
|
|
vector_weight: f64,
|
|
fusion: Fusion,
|
|
},
|
|
}
|
|
|
|
/// Where the query vector comes from.
|
|
#[derive(Clone, Debug)]
|
|
pub enum VectorSource {
|
|
/// Use the querying user's preference embedding.
|
|
UserPreference,
|
|
/// Use a specific item's embedding (for related/up-next queries).
|
|
ItemEmbedding { item_id: String },
|
|
/// Use a vector provided by the caller (for search).
|
|
Provided,
|
|
}
|
|
|
|
/// Fusion strategy for hybrid text + vector search.
|
|
#[derive(Clone, Debug)]
|
|
pub enum Fusion {
|
|
/// Reciprocal Rank Fusion. RRF(d) = 1/(k + rank_bm25) + 1/(k + rank_ann).
|
|
/// k=60 is the standard default. Rank-based, no score normalization needed.
|
|
Rrf { k: u32 },
|
|
/// Linear combination: alpha * text_score + (1-alpha) * vector_score.
|
|
/// Requires score normalization. Use only after relevance tuning.
|
|
Linear { alpha: f64 },
|
|
}
|
|
|
|
/// A positive scoring boost.
|
|
#[derive(Clone, Debug)]
|
|
pub enum Boost {
|
|
/// Boost based on a signal's value within a window.
|
|
Signal {
|
|
signal: String,
|
|
window: Window,
|
|
mode: SignalMode,
|
|
weight: f64,
|
|
},
|
|
/// Boost based on a relationship edge weight.
|
|
Relationship {
|
|
edge: String,
|
|
weight: f64,
|
|
},
|
|
/// Boost based on social proof (engagement by user's social graph).
|
|
SocialProof {
|
|
weight: f64,
|
|
},
|
|
}
|
|
|
|
/// What aspect of a signal to use in scoring.
|
|
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
|
pub enum SignalMode {
|
|
/// Raw count within the window.
|
|
Count,
|
|
/// Running decay score (exponentially weighted).
|
|
Value,
|
|
/// Rate of change within the window.
|
|
Velocity,
|
|
/// Ratio of unique users to total count.
|
|
UniqueRatio,
|
|
/// Ratio of this signal to another (e.g., likes / views).
|
|
Ratio,
|
|
}
|
|
|
|
impl Boost {
|
|
pub fn signal(signal: &str, window: Window, mode: SignalMode, weight: f64) -> Self {
|
|
Boost::Signal {
|
|
signal: signal.to_string(),
|
|
window,
|
|
mode,
|
|
weight,
|
|
}
|
|
}
|
|
pub fn relationship(edge: &str, weight: f64) -> Self {
|
|
Boost::Relationship { edge: edge.to_string(), weight }
|
|
}
|
|
pub fn social_proof(weight: f64) -> Self {
|
|
Boost::SocialProof { weight }
|
|
}
|
|
}
|
|
|
|
/// Recency decay applied to candidate age in the profile.
|
|
#[derive(Clone, Debug)]
|
|
pub struct ProfileDecay {
|
|
/// The timestamp field to use as the age reference.
|
|
pub field: String,
|
|
/// Half-life for age decay.
|
|
pub half_life: Duration,
|
|
}
|
|
|
|
/// Quality gate -- candidates below the threshold are excluded.
|
|
#[derive(Clone, Debug)]
|
|
pub enum Gate {
|
|
/// Minimum signal value to pass. Candidates below are excluded.
|
|
Min {
|
|
signal: String,
|
|
window: Window,
|
|
threshold: f64,
|
|
},
|
|
/// Minimum ratio of one signal to another.
|
|
MinRatio {
|
|
name: String,
|
|
threshold: f64,
|
|
},
|
|
}
|
|
|
|
impl Gate {
|
|
pub fn min(signal: &str, window: Window, threshold: f64) -> Self {
|
|
Gate::Min {
|
|
signal: signal.to_string(),
|
|
window,
|
|
threshold,
|
|
}
|
|
}
|
|
pub fn min_ratio(name: &str, threshold: f64) -> Self {
|
|
Gate::MinRatio {
|
|
name: name.to_string(),
|
|
threshold,
|
|
}
|
|
}
|
|
}
|
|
|
|
/// Negative signal penalty subtracted from score.
|
|
#[derive(Clone, Debug)]
|
|
pub struct Penalty {
|
|
/// Signal name.
|
|
pub signal: String,
|
|
/// Window to evaluate.
|
|
pub window: Window,
|
|
/// Penalty weight (should be negative).
|
|
pub weight: f64,
|
|
}
|
|
|
|
impl Penalty {
|
|
pub fn signal(signal: &str, window: Window, weight: f64) -> Self {
|
|
Penalty {
|
|
signal: signal.to_string(),
|
|
window,
|
|
weight,
|
|
}
|
|
}
|
|
}
|
|
|
|
/// Hard exclusion predicate evaluated before scoring begins.
|
|
#[derive(Clone, Debug)]
|
|
pub enum Exclude {
|
|
/// Exclude items where this signal exists for the querying user.
|
|
/// E.g., Exclude::signal("hide") excludes all hidden items.
|
|
Signal { signal: String },
|
|
/// Exclude based on relationship. E.g., Exclude::relationship("blocked").
|
|
Relationship { edge: String },
|
|
}
|
|
|
|
impl Exclude {
|
|
pub fn signal(signal: &str) -> Self {
|
|
Exclude::Signal { signal: signal.to_string() }
|
|
}
|
|
pub fn relationship(edge: &str) -> Self {
|
|
Exclude::Relationship { edge: edge.to_string() }
|
|
}
|
|
}
|
|
|
|
/// Post-scoring diversity enforcement.
|
|
#[derive(Clone, Debug, Default)]
|
|
pub struct DiversitySpec {
|
|
/// Maximum items from the same creator in the result set.
|
|
pub max_per_creator: Option<u32>,
|
|
/// Enforce a mix of content formats (video, short, article, etc.).
|
|
pub format_mix: bool,
|
|
/// Topic diversity via maximal marginal relevance (MMR).
|
|
/// 0.0 = no enforcement, 1.0 = maximize diversity.
|
|
pub topic_diversity: Option<f64>,
|
|
}
|
|
|
|
/// Sort mode override. Can be specified per-profile or per-query.
|
|
#[derive(Clone, Debug)]
|
|
pub enum Sort {
|
|
Relevance,
|
|
Personalized,
|
|
New,
|
|
Old,
|
|
Hot,
|
|
Trending,
|
|
Rising,
|
|
Controversial,
|
|
HiddenGems,
|
|
TopAllTime,
|
|
TopHour,
|
|
TopToday,
|
|
TopWeek,
|
|
TopMonth,
|
|
TopYear,
|
|
MostViewed,
|
|
MostLiked,
|
|
MostCommented,
|
|
MostShared,
|
|
Shortest,
|
|
Longest,
|
|
AlphabeticalAsc,
|
|
AlphabeticalDesc,
|
|
Shuffle,
|
|
LiveViewerCount,
|
|
DateSaved,
|
|
CreatorEngagementRate,
|
|
/// Sort by a specific metadata field.
|
|
Field(String, SortDirection),
|
|
}
|
|
|
|
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
|
pub enum SortDirection {
|
|
Asc,
|
|
Desc,
|
|
}
|
|
```
|
|
|
|
### Cohort Types
|
|
|
|
```rust
|
|
/// Definition of a named cohort. Passed to `db.define_cohort()`.
|
|
/// Cohorts define reusable user segments for cohort-scoped queries.
|
|
pub struct CohortDef {
|
|
/// Cohort name. Unique globally. Lowercase alphanumeric plus underscores.
|
|
pub name: String,
|
|
/// Predicate that defines cohort membership.
|
|
pub predicate: Predicate,
|
|
/// How often cohort membership is recomputed.
|
|
pub refresh: RefreshPolicy,
|
|
}
|
|
|
|
/// Composable predicate for cohort membership evaluation.
|
|
/// Predicates reference fields on the User entity type.
|
|
#[derive(Clone, Debug)]
|
|
pub enum Predicate {
|
|
/// Field equals a specific value.
|
|
Eq(String, PredicateValue),
|
|
/// Field does not equal a specific value.
|
|
Neq(String, PredicateValue),
|
|
/// Numeric field is greater than a threshold.
|
|
Gt(String, f64),
|
|
/// Numeric field is less than a threshold.
|
|
Lt(String, f64),
|
|
/// Numeric field is in a range [low, high].
|
|
Range(String, f64, f64),
|
|
/// Keywords field contains a specific value.
|
|
Contains(String, String),
|
|
/// Keywords field contains any of the given values (OR).
|
|
ContainsAny(String, Vec<String>),
|
|
/// All child predicates must be true.
|
|
And(Vec<Predicate>),
|
|
/// At least one child predicate must be true.
|
|
Or(Vec<Predicate>),
|
|
/// Child predicate must be false.
|
|
Not(Box<Predicate>),
|
|
}
|
|
|
|
/// Value types used in predicate comparisons.
|
|
#[derive(Clone, Debug)]
|
|
pub enum PredicateValue {
|
|
String(String),
|
|
I64(i64),
|
|
F64(f64),
|
|
Bool(bool),
|
|
}
|
|
|
|
/// How often a cohort's membership set is recomputed.
|
|
#[derive(Clone, Debug)]
|
|
pub enum RefreshPolicy {
|
|
/// Recompute every N minutes.
|
|
Interval { minutes: u32 },
|
|
/// Recompute every hour.
|
|
Hourly,
|
|
/// Recompute every day.
|
|
Daily,
|
|
/// Recompute on every relevant user metadata change.
|
|
/// More expensive but always fresh. Suitable for small cohorts
|
|
/// defined over app-set fields.
|
|
OnWrite,
|
|
}
|
|
```
|
|
|
|
### Relationship Types
|
|
|
|
```rust
|
|
/// Definition of a relationship type. Passed to `db.define_relationship()`.
|
|
pub struct RelationshipDef {
|
|
/// Relationship name. Unique globally.
|
|
pub name: String,
|
|
/// Source entity kind.
|
|
pub from: EntityKind,
|
|
/// Target entity kind.
|
|
pub to: EntityKind,
|
|
/// Default weight for new edges of this type.
|
|
pub weight_default: f64,
|
|
/// Optional decay for the relationship weight.
|
|
/// None = permanent (follows, blocks).
|
|
/// Some = weight decays toward zero over time.
|
|
pub decay: Option<Decay>,
|
|
/// Whether the relationship is symmetric (A->B implies B->A).
|
|
pub symmetric: bool,
|
|
}
|
|
```
|
|
|
|
### Error Types
|
|
|
|
```rust
|
|
/// All errors that can occur during schema operations.
|
|
#[derive(Debug)]
|
|
pub enum SchemaError {
|
|
// -- Entity validation errors --
|
|
|
|
/// Entity kind already has a definition.
|
|
EntityAlreadyDefined { kind: EntityKind },
|
|
/// Duplicate field name within an entity type.
|
|
DuplicateFieldName { kind: EntityKind, field: String },
|
|
/// Field name is invalid (not lowercase alphanumeric + underscores).
|
|
InvalidFieldName { field: String, reason: String },
|
|
/// Embedding dimensions out of range [2, 4096].
|
|
InvalidDimensions { slot: String, dimensions: u32 },
|
|
/// Too many embedding slots (max 4 per entity type).
|
|
TooManyEmbeddingSlots { kind: EntityKind, count: usize },
|
|
/// Duplicate embedding slot name within an entity type.
|
|
DuplicateEmbeddingSlot { kind: EntityKind, slot: String },
|
|
|
|
// -- Signal validation errors --
|
|
|
|
/// Signal name already exists.
|
|
SignalAlreadyDefined { name: String },
|
|
/// Signal name is invalid.
|
|
InvalidSignalName { name: String, reason: String },
|
|
/// Signal targets an entity kind that has no definition.
|
|
UndefinedTargetEntity { signal: String, target: EntityKind },
|
|
/// Permanent-decay signal has velocity enabled (meaningless).
|
|
PermanentWithVelocity { signal: String },
|
|
/// Too many windows on a signal (max 8).
|
|
TooManyWindows { signal: String, count: usize },
|
|
/// Too many signal types per entity type (max 64).
|
|
TooManySignals { target: EntityKind, count: usize },
|
|
/// AllTime window specified with velocity (undefined operation).
|
|
AllTimeWithVelocity { signal: String },
|
|
/// Attempted to modify an immutable signal definition.
|
|
SignalImmutable { name: String },
|
|
|
|
// -- Profile validation errors --
|
|
|
|
/// Profile version already exists for this name.
|
|
ProfileVersionExists { name: String, version: u32 },
|
|
/// Profile version is not sequential (must be > latest).
|
|
ProfileVersionNotSequential { name: String, expected: u32, got: u32 },
|
|
/// Profile references a signal that is not defined.
|
|
UndefinedSignal { profile: String, signal: String },
|
|
/// Profile references a relationship type that is not defined.
|
|
UndefinedRelationship { profile: String, edge: String },
|
|
/// Profile references an entity type that is not defined.
|
|
UndefinedEntity { profile: String, entity: EntityKind },
|
|
/// Profile candidate strategy references an embedding slot that
|
|
/// does not exist on the target entity type.
|
|
UndefinedEmbeddingSlot { profile: String, slot: String },
|
|
/// Exploration budget out of range [0.0, 1.0].
|
|
InvalidExploration { profile: String, value: f64 },
|
|
/// Topic diversity out of range [0.0, 1.0].
|
|
InvalidTopicDiversity { profile: String, value: f64 },
|
|
/// Profile name is invalid.
|
|
InvalidProfileName { name: String, reason: String },
|
|
|
|
// -- Cohort validation errors --
|
|
|
|
/// Cohort name already exists.
|
|
CohortAlreadyDefined { name: String },
|
|
/// Cohort predicate references a field not defined on User entity.
|
|
UndefinedCohortField { cohort: String, field: String },
|
|
/// Cohort predicate references a field with incompatible type.
|
|
CohortFieldTypeMismatch {
|
|
cohort: String,
|
|
field: String,
|
|
expected: FieldType,
|
|
got: String,
|
|
},
|
|
/// Maximum number of cohorts exceeded (100).
|
|
TooManyCohorts { count: usize },
|
|
|
|
// -- Relationship validation errors --
|
|
|
|
/// Relationship name already exists.
|
|
RelationshipAlreadyDefined { name: String },
|
|
/// Relationship references an entity kind that is not defined.
|
|
UndefinedRelationshipEntity { relationship: String, entity: EntityKind },
|
|
/// Default weight out of range [0.0, 1.0].
|
|
InvalidDefaultWeight { relationship: String, weight: f64 },
|
|
|
|
// -- Migration errors --
|
|
|
|
/// A breaking change was attempted without using the migration API.
|
|
MigrationRequired { description: String },
|
|
/// Migration references objects that no longer exist.
|
|
MigrationTargetNotFound { description: String },
|
|
/// Migration would invalidate active profiles or cohorts.
|
|
MigrationBreaksDependent { migration: String, dependents: Vec<String> },
|
|
|
|
// -- Write-path errors --
|
|
|
|
/// Attempted to write a computed field via the write API.
|
|
ComputedFieldWrite { entity: EntityKind, field: String },
|
|
/// Entity with this ID already exists (use update_*() instead).
|
|
EntityExists { kind: EntityKind, id: String },
|
|
/// Entity ID collision in BLAKE3 hash space (astronomically unlikely).
|
|
IdCollision { id_a: String, id_b: String },
|
|
|
|
// -- Storage errors --
|
|
|
|
/// Schema storage operation failed.
|
|
StorageFailure(String),
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Schema Definition API
|
|
|
|
The schema definition API is the set of methods on `TidalDB` that declare the structure and behavior of the database. All definitions are WAL-logged for crash recovery and stored in the B-tree backend under the `SCHEMA:` key prefix.
|
|
|
|
### 3.1 Define Entity
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
/// Define an entity type's metadata fields and embedding slots.
|
|
///
|
|
/// Each entity kind (Item, User, Creator) is defined exactly once.
|
|
/// Calling define_entity for an already-defined kind returns
|
|
/// SchemaError::EntityAlreadyDefined.
|
|
///
|
|
/// After definition, entities of this kind can be written via
|
|
/// write_item(), write_user(), or write_creator().
|
|
pub fn define_entity(&self, def: EntityDef) -> Result<(), SchemaError>;
|
|
}
|
|
```
|
|
|
|
**Behavior on commit:**
|
|
|
|
1. Validate field names (unique, valid characters, max length).
|
|
2. Validate embedding slots (unique names, valid dimensions, max 4 slots).
|
|
3. Validate field types (computed fields have valid underlying type).
|
|
4. WAL-log the schema change (record type `0x04`).
|
|
5. Store definition in `SCHEMA:entity:{kind}` key.
|
|
6. Update in-memory schema cache.
|
|
7. Initialize indexes for all declared fields (inverted index for text fields, term dictionary for keyword fields, sorted numeric index for numeric fields, etc.).
|
|
|
|
### 3.2 Define Signal
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
/// Define a signal type with its decay, windowing, and velocity behavior.
|
|
///
|
|
/// Signal names are globally unique. The target entity kind must already
|
|
/// be defined via define_entity.
|
|
///
|
|
/// Signal definitions are immutable once created. Attempting to redefine
|
|
/// an existing signal returns SchemaError::SignalImmutable.
|
|
///
|
|
/// On success, all existing entities of the target kind receive an
|
|
/// initialized (zeroed) signal ledger for this signal type.
|
|
pub fn define_signal(&self, def: SignalDef) -> Result<(), SchemaError>;
|
|
}
|
|
```
|
|
|
|
**Behavior on commit:**
|
|
|
|
1. Validate signal name (unique, valid characters).
|
|
2. Validate target entity kind is defined.
|
|
3. Validate decay/window/velocity constraints (see Section 5).
|
|
4. Precompute lambda for exponential decay and store alongside definition.
|
|
5. WAL-log the schema change.
|
|
6. Store definition in `SCHEMA:signal:{name}` key.
|
|
7. Update in-memory schema cache (signal type registry).
|
|
8. Register signal type index (u8) for compact storage in WAL events.
|
|
9. Existing entities of the target kind lazily receive zeroed ledger state for this signal on their next signal write (not eagerly initialized -- this would be O(N) for 10M entities).
|
|
|
|
### 3.3 Define Profile
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
/// Define a ranking profile version.
|
|
///
|
|
/// Profile names are reusable -- each call creates a new version.
|
|
/// Version numbers must be strictly increasing for a given name.
|
|
/// The first version for a new name must be version 1.
|
|
///
|
|
/// New profiles start in Draft status. Call activate_profile()
|
|
/// to make them available for queries.
|
|
pub fn define_profile(&self, def: ProfileDef) -> Result<(), SchemaError>;
|
|
|
|
/// Transition a profile version's lifecycle status.
|
|
pub fn set_profile_status(
|
|
&self,
|
|
name: &str,
|
|
version: u32,
|
|
status: ProfileStatus,
|
|
) -> Result<(), SchemaError>;
|
|
|
|
/// Retrieve a profile by name. If version is None, returns the
|
|
/// latest active version. If no active version exists, returns
|
|
/// the latest version regardless of status.
|
|
pub fn get_profile(
|
|
&self,
|
|
name: &str,
|
|
version: Option<u32>,
|
|
) -> Result<ProfileDef, SchemaError>;
|
|
}
|
|
```
|
|
|
|
**Behavior on commit:**
|
|
|
|
1. Validate profile name (valid characters).
|
|
2. Validate version is sequential (> latest version for this name, or 1 if new).
|
|
3. Validate all signal references exist (boost signals, gate signals, penalty signals, exclude signals).
|
|
4. Validate all relationship references exist (boost relationships, exclude relationships, candidate edges).
|
|
5. Validate candidate strategy (entity kind is defined, embedding slot exists, dimensions match).
|
|
6. Validate exploration budget is in [0.0, 1.0].
|
|
7. Validate diversity spec (topic_diversity in [0.0, 1.0] if present).
|
|
8. WAL-log the schema change.
|
|
9. Store definition in `SCHEMA:profile:{name}:{version}` key.
|
|
10. Set initial status to `Draft`.
|
|
11. Update in-memory schema cache.
|
|
|
|
### 3.4 Define Cohort
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
/// Define a named cohort (user segment) for cohort-scoped queries.
|
|
///
|
|
/// Cohort predicates reference fields defined on the User entity type.
|
|
/// The User entity must be defined before any cohorts can be defined.
|
|
///
|
|
/// Maximum 100 cohort definitions (bounded by the cohort tracking
|
|
/// storage budget -- see 03-signal-system.md Section 7).
|
|
pub fn define_cohort(&self, def: CohortDef) -> Result<(), SchemaError>;
|
|
}
|
|
```
|
|
|
|
**Behavior on commit:**
|
|
|
|
1. Validate cohort name (unique, valid characters).
|
|
2. Validate total cohort count does not exceed 100.
|
|
3. Validate predicate: all referenced fields exist on the User entity, types are compatible with the predicate operator.
|
|
4. WAL-log the schema change.
|
|
5. Store definition in `SCHEMA:cohort:{name}` key.
|
|
6. Update in-memory schema cache.
|
|
7. Schedule initial membership computation (background materializer evaluates the predicate against all existing users).
|
|
|
|
### 3.5 Define Relationship
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
/// Define a relationship type (edge kind) between entity types.
|
|
///
|
|
/// Both source and target entity kinds must already be defined.
|
|
/// Relationship names are globally unique.
|
|
pub fn define_relationship(&self, def: RelationshipDef) -> Result<(), SchemaError>;
|
|
}
|
|
```
|
|
|
|
**Behavior on commit:**
|
|
|
|
1. Validate relationship name (unique, valid characters).
|
|
2. Validate from/to entity kinds are defined.
|
|
3. Validate default weight is in [0.0, 1.0].
|
|
4. If decay is specified, validate it (same rules as signal decay).
|
|
5. WAL-log the schema change.
|
|
6. Store definition in `SCHEMA:relationship:{name}` key.
|
|
7. Update in-memory schema cache.
|
|
|
|
---
|
|
|
|
## 4. Schema Versioning
|
|
|
|
Different schema objects have different versioning semantics, reflecting the different consequences of change.
|
|
|
|
### 4.1 Versioning by Object Type
|
|
|
|
| Schema Object | Versioning Model | Rationale |
|
|
|---------------|-----------------|-----------|
|
|
| Entity definitions | Append-only fields | Removing or changing a field type would invalidate indexes and break queries. |
|
|
| Signal definitions | Immutable | Changing decay invalidates all historical running scores. Lambda is baked into the O(1) formula. |
|
|
| Ranking profiles | Explicitly versioned | Profiles are the tuning knob. Multiple versions must coexist for A/B testing and rollback. |
|
|
| Cohort definitions | Mutable (predicate can change) | Cohort membership is recomputed periodically. Changing the predicate simply changes the next computation. |
|
|
| Relationship definitions | Immutable | Changing from/to entity kinds or decay would invalidate existing edges. |
|
|
|
|
### 4.2 Profile Version Lifecycle
|
|
|
|
Every profile version follows a four-state lifecycle:
|
|
|
|
```
|
|
define_profile()
|
|
(none) ─────────────────────────> Draft
|
|
│
|
|
set_profile_status() │ (validate all references)
|
|
v
|
|
Active
|
|
│
|
|
set_profile_status() │ (mark as deprecated,
|
|
│ still queryable)
|
|
v
|
|
Deprecated
|
|
│
|
|
set_profile_status() │ (no longer queryable
|
|
│ except by explicit version)
|
|
v
|
|
Archived
|
|
```
|
|
|
|
```rust
|
|
/// Lifecycle status of a ranking profile version.
|
|
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
|
|
pub enum ProfileStatus {
|
|
/// Newly defined. Not yet available for queries.
|
|
/// Can be tested via explicit version: get_profile("name", Some(version)).
|
|
Draft,
|
|
/// Available for queries. `get_profile("name", None)` returns
|
|
/// the latest active version.
|
|
Active,
|
|
/// Still queryable by explicit version, but no longer returned
|
|
/// as the "latest" active version. Used during A/B test wind-down.
|
|
Deprecated,
|
|
/// No longer queryable. Retained for audit purposes only.
|
|
/// Querying an archived profile returns SchemaError.
|
|
Archived,
|
|
}
|
|
```
|
|
|
|
**Status transition rules:**
|
|
|
|
| Current | Allowed Next | Forbidden |
|
|
|---------|-------------|-----------|
|
|
| Draft | Active | Deprecated, Archived |
|
|
| Active | Deprecated | Draft, Archived |
|
|
| Deprecated | Archived, Active (re-activation) | Draft |
|
|
| Archived | (terminal) | Any |
|
|
|
|
**Multiple active versions.** Multiple versions of the same profile name can be `Active` simultaneously. This is intentional -- it enables A/B testing. The application decides which version to use per query by specifying the version explicitly. `get_profile("for_you", None)` returns the highest-versioned active version.
|
|
|
|
### 4.3 Schema Version Counter
|
|
|
|
The database maintains a monotonically increasing schema version counter. Every `define_*` call, `set_profile_status` call, and migration increments this counter. The counter serves as a cache invalidation epoch -- query plan caches are invalidated when the schema version changes.
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
/// Returns the current schema version number.
|
|
/// Incremented on every schema definition or modification.
|
|
pub fn schema_version(&self) -> u64;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Schema Validation Rules
|
|
|
|
Every schema definition is validated at definition time. Validation is eager and complete -- a definition that passes validation is guaranteed to be self-consistent and compatible with all existing definitions.
|
|
|
|
### 5.1 Validation Rules Reference
|
|
|
|
| Rule ID | Object | Rule | Error |
|
|
|---------|--------|------|-------|
|
|
| V-E01 | Entity | Entity kind can only be defined once. | `EntityAlreadyDefined` |
|
|
| V-E02 | Entity | Field names must be unique within an entity type. | `DuplicateFieldName` |
|
|
| V-E03 | Entity | Field names: lowercase `[a-z0-9_]`, max 64 characters, must start with a letter. | `InvalidFieldName` |
|
|
| V-E04 | Entity | Embedding dimensions must be in [2, 4096]. | `InvalidDimensions` |
|
|
| V-E05 | Entity | Maximum 4 embedding slots per entity type. | `TooManyEmbeddingSlots` |
|
|
| V-E06 | Entity | Embedding slot names must be unique within an entity type. | `DuplicateEmbeddingSlot` |
|
|
| V-S01 | Signal | Signal names must be globally unique. | `SignalAlreadyDefined` |
|
|
| V-S02 | Signal | Signal names: lowercase `[a-z0-9_]`, max 64 characters. | `InvalidSignalName` |
|
|
| V-S03 | Signal | Target entity kind must have a definition. | `UndefinedTargetEntity` |
|
|
| V-S04 | Signal | Permanent decay signals must have `velocity: false`. | `PermanentWithVelocity` |
|
|
| V-S05 | Signal | Maximum 8 windows per signal type. | `TooManyWindows` |
|
|
| V-S06 | Signal | Maximum 64 signal types per entity type. | `TooManySignals` |
|
|
| V-S07 | Signal | AllTime window with velocity is forbidden. | `AllTimeWithVelocity` |
|
|
| V-S08 | Signal | Existing signal definitions cannot be modified. | `SignalImmutable` |
|
|
| V-P01 | Profile | Profile name: lowercase `[a-z0-9_-]`, max 64 characters. | `InvalidProfileName` |
|
|
| V-P02 | Profile | Version must be > latest version for this name (or 1 if new). | `ProfileVersionNotSequential` |
|
|
| V-P03 | Profile | Version must not already exist for this name. | `ProfileVersionExists` |
|
|
| V-P04 | Profile | All boost/penalty/gate signal references must be defined signals. | `UndefinedSignal` |
|
|
| V-P05 | Profile | All boost/exclude relationship references must be defined relationship types. | `UndefinedRelationship` |
|
|
| V-P06 | Profile | Candidate entity kind must be defined. | `UndefinedEntity` |
|
|
| V-P07 | Profile | Candidate ANN embedding slot must exist on the target entity. | `UndefinedEmbeddingSlot` |
|
|
| V-P08 | Profile | Exploration must be in [0.0, 1.0]. | `InvalidExploration` |
|
|
| V-P09 | Profile | DiversitySpec.topic_diversity must be in [0.0, 1.0] if present. | `InvalidTopicDiversity` |
|
|
| V-P10 | Profile | ProfileDecay.field must be a timestamp field on the candidate entity. | `UndefinedSignal` (reused) |
|
|
| V-C01 | Cohort | Cohort names must be globally unique. | `CohortAlreadyDefined` |
|
|
| V-C02 | Cohort | Predicate fields must exist on the User entity type. | `UndefinedCohortField` |
|
|
| V-C03 | Cohort | Predicate field types must be compatible with the operator (Eq on keyword, Gt on numeric, Contains on keywords). | `CohortFieldTypeMismatch` |
|
|
| V-C04 | Cohort | Maximum 100 cohort definitions. | `TooManyCohorts` |
|
|
| V-R01 | Relationship | Relationship names must be globally unique. | `RelationshipAlreadyDefined` |
|
|
| V-R02 | Relationship | From and To entity kinds must be defined. | `UndefinedRelationshipEntity` |
|
|
| V-R03 | Relationship | Default weight must be in [0.0, 1.0]. | `InvalidDefaultWeight` |
|
|
|
|
### 5.2 Cross-Object Dependency Graph
|
|
|
|
Schema objects reference each other. The validation system maintains a dependency graph to prevent orphaned references and to power impact analysis during migrations.
|
|
|
|
```
|
|
EntityDef (Item)
|
|
^
|
|
|-- SignalDef (view, target: Item)
|
|
| ^
|
|
| |-- ProfileDef (for_you, boost: view.velocity(24h))
|
|
| |-- ProfileDef (trending, boost: view.velocity(6h))
|
|
|
|
|
|-- EmbeddingSlot (content, 1536D)
|
|
| ^
|
|
| |-- ProfileDef (for_you, candidate: Ann, slot: content)
|
|
|
|
|
|-- Field (category)
|
|
^
|
|
|-- CohortDef (jazz_fans, predicate: Contains(inferred_interests, "jazz"))
|
|
|
|
EntityDef (User)
|
|
^
|
|
|-- CohortDef (young_us_jazz, predicate: And(...))
|
|
|
|
|
|-- Field (region)
|
|
^
|
|
|-- CohortDef (us_users, predicate: Eq(region, "US"))
|
|
|
|
RelationshipDef (follows, from: User, to: Creator)
|
|
^
|
|
|-- ProfileDef (following, candidate: Relationship("follows"))
|
|
|-- ProfileDef (for_you, exclude: Relationship("blocked"))
|
|
```
|
|
|
|
**Invariant: no dangling references.** Every signal, profile, cohort, and relationship definition references only objects that exist at definition time. The validation engine checks all references eagerly. There are no deferred reference checks.
|
|
|
|
**Invariant: no circular dependencies.** Entity definitions depend on nothing. Signal definitions depend on entity definitions. Profile definitions depend on signal and relationship definitions. Cohort definitions depend on entity field definitions. This is a strict DAG with no cycles.
|
|
|
|
---
|
|
|
|
## 6. Schema Migration
|
|
|
|
### 6.1 Additive Changes (Always Safe)
|
|
|
|
These changes can be applied immediately via the standard `define_*` methods. No migration API required.
|
|
|
|
| Change | Method | Effect on Existing Data |
|
|
|--------|--------|------------------------|
|
|
| Add new field to entity type | `define_entity` with additional fields | Existing entities get `NULL` / default for the new field. Indexes are created empty and populated by background scan. |
|
|
| Add new signal type | `define_signal` | Existing entities lazily receive zeroed signal ledger on first signal write. |
|
|
| Add new ranking profile version | `define_profile` | New version coexists with old versions. No effect on existing data. |
|
|
| Add new cohort definition | `define_cohort` | Membership computed by background materializer. No effect on existing data. |
|
|
| Add new relationship type | `define_relationship` | No existing edges. Edges created on first `write_relationship` call. |
|
|
| Activate/deprecate/archive a profile | `set_profile_status` | Only affects which version `get_profile(name, None)` returns. |
|
|
|
|
**Adding fields to an entity type.** This is the most common schema change. The API accepts a partial `EntityDef` that adds fields to an already-defined entity kind:
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
/// Add fields to an existing entity type definition.
|
|
/// Only new fields are accepted -- existing fields cannot be
|
|
/// modified or removed via this method.
|
|
pub fn add_fields(
|
|
&self,
|
|
kind: EntityKind,
|
|
fields: Vec<Field>,
|
|
) -> Result<(), SchemaError>;
|
|
}
|
|
```
|
|
|
|
After `add_fields`, the new fields are available for filtering, sorting, and cohort predicates. Existing entities that have not been updated return `NULL` for the new fields. Background index population scans existing entities and builds indexes for any non-NULL values.
|
|
|
|
### 6.2 Breaking Changes (Require Migration)
|
|
|
|
These changes would invalidate existing data, indexes, or references. They cannot be applied via `define_*` methods -- attempting to do so returns `SchemaError::MigrationRequired`.
|
|
|
|
| Change | Why It Breaks | Migration Requirement |
|
|
|--------|--------------|----------------------|
|
|
| Remove entity field | Profiles, cohorts, or sorts may reference it. Indexes must be dropped. | Verify no dependents reference the field. Drop index. |
|
|
| Change field type | Index format changes. Existing values may not be representable in the new type. | Rebuild index. Validate existing values are compatible. |
|
|
| Remove signal type | Profiles may reference it as a boost/gate/penalty/exclude. | Verify no active profiles reference the signal. Mark signal as removed. |
|
|
| Change signal decay/windows | Invalidates all historical running scores and windowed aggregates. | Cannot be done. Define a new signal type instead. |
|
|
| Remove relationship type | Profiles may reference it in candidate, boost, or exclude. | Verify no active profiles reference the relationship. Delete all edges. |
|
|
| Remove cohort definition | No direct dependents, but users relying on the cohort name lose it. | Safe to remove if confirmed. |
|
|
|
|
### 6.3 Migration API
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
/// Analyze a proposed migration and return a plan.
|
|
/// Does NOT apply any changes. The plan describes:
|
|
/// - What objects are affected
|
|
/// - What dependents reference the affected objects
|
|
/// - Estimated cost (index rebuild time, storage impact)
|
|
pub fn plan_migration(
|
|
&self,
|
|
migration: Migration,
|
|
) -> Result<MigrationPlan, SchemaError>;
|
|
|
|
/// Apply a previously planned migration.
|
|
/// The plan must have been generated by plan_migration() in the
|
|
/// same schema version (the plan is invalidated if schema changes
|
|
/// between planning and application).
|
|
pub fn apply_migration(
|
|
&self,
|
|
plan: MigrationPlan,
|
|
) -> Result<(), SchemaError>;
|
|
}
|
|
|
|
/// A migration describes one or more breaking schema changes.
|
|
pub struct Migration {
|
|
/// Human-readable description.
|
|
pub description: String,
|
|
/// The individual operations in this migration.
|
|
pub operations: Vec<MigrationOp>,
|
|
}
|
|
|
|
/// A single migration operation.
|
|
pub enum MigrationOp {
|
|
/// Remove a field from an entity type.
|
|
RemoveField { kind: EntityKind, field: String },
|
|
/// Change a field's type (requires index rebuild + value validation).
|
|
ChangeFieldType { kind: EntityKind, field: String, new_type: FieldType },
|
|
/// Remove a signal type definition.
|
|
RemoveSignal { name: String },
|
|
/// Remove a relationship type definition and all its edges.
|
|
RemoveRelationship { name: String },
|
|
/// Remove a cohort definition.
|
|
RemoveCohort { name: String },
|
|
}
|
|
|
|
/// The result of analyzing a migration.
|
|
pub struct MigrationPlan {
|
|
/// The schema version at which this plan was generated.
|
|
/// Plan is invalidated if schema_version changes.
|
|
schema_version: u64,
|
|
/// Objects that will be modified or removed.
|
|
affected_objects: Vec<String>,
|
|
/// Active profiles, cohorts, or other objects that reference
|
|
/// the affected objects and must be updated first.
|
|
blocked_by: Vec<MigrationBlocker>,
|
|
/// Estimated cost of applying this migration.
|
|
estimated_cost: MigrationCost,
|
|
}
|
|
|
|
pub struct MigrationBlocker {
|
|
/// The dependent object (e.g., "profile:for_you:v3").
|
|
pub object: String,
|
|
/// Why it blocks the migration.
|
|
pub reason: String,
|
|
}
|
|
|
|
pub struct MigrationCost {
|
|
/// Estimated time to rebuild affected indexes.
|
|
pub index_rebuild_time: Duration,
|
|
/// Number of entities that need to be scanned.
|
|
pub entities_affected: u64,
|
|
/// Storage that will be freed.
|
|
pub storage_freed: u64,
|
|
}
|
|
```
|
|
|
|
**Migration workflow:**
|
|
|
|
```
|
|
1. Application defines the migration:
|
|
let migration = Migration {
|
|
description: "Remove deprecated 'flair' field from Item".to_string(),
|
|
operations: vec![MigrationOp::RemoveField {
|
|
kind: EntityKind::Item,
|
|
field: "flair".to_string(),
|
|
}],
|
|
};
|
|
|
|
2. Application plans the migration (dry-run):
|
|
let plan = db.plan_migration(migration)?;
|
|
// plan.blocked_by = ["cohort:flair_users references field 'flair'"]
|
|
// Application must remove the cohort first.
|
|
|
|
3. Application resolves blockers:
|
|
db.apply_migration(db.plan_migration(Migration {
|
|
description: "Remove flair_users cohort".to_string(),
|
|
operations: vec![MigrationOp::RemoveCohort {
|
|
name: "flair_users".to_string(),
|
|
}],
|
|
})?)?;
|
|
|
|
4. Application re-plans the original migration:
|
|
let plan = db.plan_migration(migration)?;
|
|
// plan.blocked_by = [] -- no more blockers
|
|
|
|
5. Application applies the migration:
|
|
db.apply_migration(plan)?;
|
|
```
|
|
|
|
### 6.4 Migration Compatibility Matrix
|
|
|
|
This matrix shows which schema changes are additive (safe) vs breaking (require migration).
|
|
|
|
| Operation | Entity Fields | Signal Defs | Profiles | Cohorts | Relationships |
|
|
|-----------|:---:|:---:|:---:|:---:|:---:|
|
|
| **Add** | Safe | Safe | Safe (new version) | Safe | Safe |
|
|
| **Remove** | Migration | Migration | N/A (archive instead) | Migration | Migration |
|
|
| **Modify type** | Migration | Forbidden | N/A (new version) | Safe (predicate) | Forbidden |
|
|
| **Modify behavior** | N/A | Forbidden | N/A (new version) | Safe (refresh) | Forbidden |
|
|
| **Rename** | Migration | Forbidden | N/A (new name) | Migration | Forbidden |
|
|
|
|
"Forbidden" means the operation is not supported at all -- the application must create a new object. This applies to signal definitions and relationship definitions where the original declaration's semantics are baked into persisted data (running scores, edge weights).
|
|
|
|
---
|
|
|
|
## 7. Schema Introspection
|
|
|
|
The introspection API allows the application to discover the current schema state. All introspection methods are read-only and lock-free (they read from the in-memory schema cache).
|
|
|
|
```rust
|
|
impl TidalDB {
|
|
// -- Entity introspection --
|
|
|
|
/// List all defined entity types with their field schemas.
|
|
pub fn list_entities(&self) -> Vec<EntityInfo>;
|
|
|
|
/// Describe a specific entity type.
|
|
pub fn describe_entity(&self, kind: EntityKind) -> Result<EntityInfo, SchemaError>;
|
|
|
|
// -- Signal introspection --
|
|
|
|
/// List all defined signal types with their decay/window config.
|
|
pub fn list_signals(&self) -> Vec<SignalInfo>;
|
|
|
|
/// Describe a specific signal type.
|
|
pub fn describe_signal(&self, name: &str) -> Result<SignalInfo, SchemaError>;
|
|
|
|
// -- Profile introspection --
|
|
|
|
/// List all profile names with their version history and statuses.
|
|
pub fn list_profiles(&self) -> Vec<ProfileSummary>;
|
|
|
|
/// Describe a specific profile version. If version is None,
|
|
/// returns the latest active version.
|
|
pub fn describe_profile(
|
|
&self,
|
|
name: &str,
|
|
version: Option<u32>,
|
|
) -> Result<ProfileInfo, SchemaError>;
|
|
|
|
// -- Cohort introspection --
|
|
|
|
/// List all cohort definitions with their membership counts.
|
|
pub fn list_cohorts(&self) -> Vec<CohortInfo>;
|
|
|
|
/// Describe a specific cohort with its full predicate.
|
|
pub fn describe_cohort(&self, name: &str) -> Result<CohortInfo, SchemaError>;
|
|
|
|
// -- Relationship introspection --
|
|
|
|
/// List all defined relationship types.
|
|
pub fn list_relationships(&self) -> Vec<RelationshipInfo>;
|
|
|
|
/// Describe a specific relationship type.
|
|
pub fn describe_relationship(&self, name: &str) -> Result<RelationshipInfo, SchemaError>;
|
|
|
|
// -- Global schema state --
|
|
|
|
/// Current schema version number.
|
|
pub fn schema_version(&self) -> u64;
|
|
|
|
/// Full dependency graph of all schema objects.
|
|
/// Useful for understanding the impact of a proposed change.
|
|
pub fn schema_dependencies(&self) -> DependencyGraph;
|
|
}
|
|
```
|
|
|
|
### Introspection Return Types
|
|
|
|
```rust
|
|
/// Summary of an entity type definition.
|
|
pub struct EntityInfo {
|
|
pub kind: EntityKind,
|
|
pub fields: Vec<FieldInfo>,
|
|
pub embedding_slots: Vec<EmbeddingSlotInfo>,
|
|
/// Number of active (non-archived) entities of this kind.
|
|
pub entity_count: u64,
|
|
/// Number of signal types targeting this entity kind.
|
|
pub signal_type_count: u32,
|
|
}
|
|
|
|
pub struct FieldInfo {
|
|
pub name: String,
|
|
pub field_type: FieldType,
|
|
pub writability: Writability,
|
|
/// Whether an index exists for this field.
|
|
pub indexed: bool,
|
|
}
|
|
|
|
pub struct EmbeddingSlotInfo {
|
|
pub name: String,
|
|
pub dimensions: u32,
|
|
pub source: EmbeddingSource,
|
|
pub precision: EmbeddingPrecision,
|
|
/// Number of entities with a non-null vector in this slot.
|
|
pub populated_count: u64,
|
|
}
|
|
|
|
/// Summary of a signal type definition.
|
|
pub struct SignalInfo {
|
|
pub name: String,
|
|
pub target: EntityKind,
|
|
pub decay: Decay,
|
|
pub lambda: Option<f64>,
|
|
pub windows: Vec<Window>,
|
|
pub velocity: bool,
|
|
pub durability: DurabilityLevel,
|
|
}
|
|
|
|
/// Summary of profile versions for a given name.
|
|
pub struct ProfileSummary {
|
|
pub name: String,
|
|
pub versions: Vec<ProfileVersionSummary>,
|
|
}
|
|
|
|
pub struct ProfileVersionSummary {
|
|
pub version: u32,
|
|
pub status: ProfileStatus,
|
|
pub created_at: Timestamp,
|
|
}
|
|
|
|
/// Full profile definition with metrics.
|
|
pub struct ProfileInfo {
|
|
pub definition: ProfileDef,
|
|
pub status: ProfileStatus,
|
|
pub created_at: Timestamp,
|
|
/// Total queries executed with this profile version.
|
|
pub query_count: u64,
|
|
/// Average query latency for this profile version.
|
|
pub avg_latency: Duration,
|
|
}
|
|
|
|
/// Summary of a cohort definition.
|
|
pub struct CohortInfo {
|
|
pub name: String,
|
|
pub predicate: Predicate,
|
|
pub refresh: RefreshPolicy,
|
|
/// Current membership count (as of last refresh).
|
|
pub member_count: u64,
|
|
/// When membership was last recomputed.
|
|
pub last_refreshed: Timestamp,
|
|
}
|
|
|
|
/// Summary of a relationship type definition.
|
|
pub struct RelationshipInfo {
|
|
pub name: String,
|
|
pub from: EntityKind,
|
|
pub to: EntityKind,
|
|
pub weight_default: f64,
|
|
pub decay: Option<Decay>,
|
|
pub symmetric: bool,
|
|
/// Total number of active edges of this type.
|
|
pub edge_count: u64,
|
|
}
|
|
|
|
/// The full dependency graph of all schema objects.
|
|
pub struct DependencyGraph {
|
|
/// Each entry is (object_id, Vec<dependent_object_ids>).
|
|
pub edges: Vec<(String, Vec<String>)>,
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Defaults and Population Priors
|
|
|
|
The database ships with sensible defaults that enable a working system before the application defines any custom profiles. These defaults are overridable -- defining a profile with the same name replaces the built-in.
|
|
|
|
### 8.1 Built-in Ranking Profiles
|
|
|
|
The following profiles are automatically available after entity and signal types are defined. They are created with `ProfileStatus::Active` and version `0` (a reserved version number for built-ins that application-defined profiles override starting at version 1).
|
|
|
|
| Profile | Candidate Strategy | Primary Signal | Sort Semantics |
|
|
|---------|-------------------|----------------|----------------|
|
|
| `for_you` | ANN over user preference vector, top_k=500 | preference match + engagement velocity | Personalized blend of semantic relevance and social proof |
|
|
| `trending` | Scan all items | `view.velocity(6h) + share.velocity(6h)` | Pure signal velocity, no personalization |
|
|
| `rising` | Scan all items | Relative velocity: `velocity(1h) / velocity(24h)`, age-boosted | Content accelerating relative to its baseline |
|
|
| `hot` | Scan all items | `score / (age_hours + 2)^1.8` | Reddit-model age decay over cumulative engagement |
|
|
| `following` | Relationship: `follows` | N/A | `created_at DESC` (pure chronological) |
|
|
| `related` | ANN over anchor item embedding, top_k=200 | Semantic similarity + collaborative filtering | Most similar content to the anchor |
|
|
| `browse` | Scan all items | `completion_rate * 0.4 + like_ratio * 0.3 + log(views) * 0.3` | Quality-weighted with reach tiebreaker |
|
|
| `search` | Hybrid text + vector, RRF(k=60) | BM25 * 0.6 + semantic_similarity * 0.4 | Relevance with quality boost |
|
|
| `controversial` | Scan all items | `sqrt(positive_signals * negative_signals)` | Maximize engagement polarity |
|
|
| `hidden_gems` | Scan all items | `completion_rate * like_ratio / log(views + 1)` | High quality, low reach |
|
|
| `notification` | Relationship: `follows`, since last_seen | `interaction_weight * item_quality` | Most important notifications first |
|
|
| `live` | Filter: `status=live` | `interaction_weight * log(viewer_count)` | Live content the user cares about |
|
|
|
|
**Override behavior.** When the application defines `for_you` version 1, the built-in version 0 is automatically archived. The application's version takes precedence. If the application archives all versions of a profile that has a built-in, the built-in is restored as the fallback.
|
|
|
|
### 8.2 Built-in Signal Types
|
|
|
|
The database does not define signal types automatically. Signal types must be explicitly defined by the application because they determine storage layout and memory budget. However, the documentation includes a recommended set of 40+ signal types (see 03-signal-system.md Section 11) that covers the common content platform use case.
|
|
|
|
### 8.3 Population-Level Priors
|
|
|
|
These are database-maintained values that serve as defaults for cold-start entities.
|
|
|
|
| Prior | Definition | Used For |
|
|
|-------|-----------|----------|
|
|
| Population preference vector | Centroid (mean) of all active user preference vectors. Recomputed hourly by the background materializer. | New users with no signal history. Their preference vector is initialized to this centroid. |
|
|
| Default signal baselines | Per-signal-type median values across all active items. | Cold-start exploration budget calibration: a new item's signals are compared against these baselines to estimate how much exploration is needed. |
|
|
| Global engagement distribution | Distribution of engagement_level across all users (% power_user, regular, casual, dormant, new). | Cohort-scoped queries without explicit cohort: "trending globally" uses the full distribution. |
|
|
|
|
### 8.4 Cold Start Configuration
|
|
|
|
Cold start behavior is specified per ranking profile, not globally. The `exploration` field in `ProfileDef` controls how much of the result set is reserved for cold-start items.
|
|
|
|
```rust
|
|
// Profile with 10% exploration budget
|
|
ProfileDef {
|
|
name: "for_you",
|
|
exploration: 0.10, // 10% of results from new/unseen content
|
|
..
|
|
}
|
|
```
|
|
|
|
**Exploration budget mechanics:**
|
|
|
|
1. The query executor reserves `floor(limit * exploration)` slots for exploration items.
|
|
2. Exploration candidates are items that meet ALL of:
|
|
- Created within the last 48 hours (configurable)
|
|
- Fewer than 1000 impressions (configurable)
|
|
- Not hidden or blocked by the querying user
|
|
3. Exploration candidates are ranked by a simplified score: `content_similarity * freshness_bonus`. No signal-based scoring (there are no signals to score).
|
|
4. Exploration slots are distributed evenly through the result set (not clustered at the end).
|
|
5. As an item accumulates signals, it exits the exploration pool and competes normally.
|
|
|
|
---
|
|
|
|
## 9. A/B Testing Support
|
|
|
|
tidalDB supports A/B testing of ranking profiles through the profile versioning system. The database does not perform traffic splitting -- that is application logic. The database provides the infrastructure: multiple active profile versions, per-version metrics, and deterministic query execution.
|
|
|
|
### 9.1 How A/B Testing Works
|
|
|
|
```rust
|
|
// The application maintains its own traffic split logic.
|
|
let profile_version = if user_in_experiment_bucket(user_id) {
|
|
"for_you_v2" // or get_profile("for_you", Some(2))
|
|
} else {
|
|
"for_you" // latest active version (v1)
|
|
};
|
|
|
|
let results = db.retrieve(Retrieve {
|
|
for_user: Some(user_id),
|
|
profile: profile_version,
|
|
..
|
|
})?;
|
|
```
|
|
|
|
### 9.2 Profile Metrics
|
|
|
|
The database tracks per-profile-version metrics automatically:
|
|
|
|
```rust
|
|
pub struct ProfileMetrics {
|
|
/// Total queries executed with this profile version.
|
|
pub query_count: u64,
|
|
/// Latency percentiles (p50, p95, p99).
|
|
pub latency_p50: Duration,
|
|
pub latency_p95: Duration,
|
|
pub latency_p99: Duration,
|
|
/// Average number of candidates scored per query.
|
|
pub avg_candidates_scored: f64,
|
|
/// Average number of results returned per query.
|
|
pub avg_results_returned: f64,
|
|
/// When the first query was executed with this version.
|
|
pub first_query_at: Option<Timestamp>,
|
|
/// When the most recent query was executed.
|
|
pub last_query_at: Option<Timestamp>,
|
|
}
|
|
|
|
impl TidalDB {
|
|
/// Retrieve metrics for a specific profile version.
|
|
pub fn profile_metrics(
|
|
&self,
|
|
name: &str,
|
|
version: u32,
|
|
) -> Result<ProfileMetrics, SchemaError>;
|
|
}
|
|
```
|
|
|
|
These metrics help the application decide when to promote a new version to `Active` and deprecate the old one. The database does not make this decision -- it only provides the data.
|
|
|
|
### 9.3 What the Database Does NOT Do
|
|
|
|
- **Traffic splitting.** The application decides which user sees which profile.
|
|
- **Statistical significance testing.** The application runs its own hypothesis tests.
|
|
- **Automatic promotion.** The application calls `set_profile_status` explicitly.
|
|
- **Metric comparison.** The application queries `profile_metrics` for each version and compares.
|
|
|
|
This is a deliberate design choice. Traffic splitting and experimentation are application-domain concerns with complex requirements (random assignment, sticky bucketing, interaction effects, ramp-up schedules) that vary wildly across organizations. The database provides the building blocks; the application provides the logic.
|
|
|
|
---
|
|
|
|
## 10. Schema Storage
|
|
|
|
### 10.1 Storage Format
|
|
|
|
Schema definitions are stored in the B-tree backend (redb) under the `SCHEMA:` key prefix. This is the same backend used for entity metadata and materialized views -- read-heavy, rarely written.
|
|
|
|
```
|
|
Key Encoding:
|
|
|
|
SCHEMA:entity:{kind} -> serialized EntityDef
|
|
SCHEMA:signal:{name} -> serialized SignalDef + precomputed lambda
|
|
SCHEMA:profile:{name}:{version} -> serialized ProfileDef + status + metadata
|
|
SCHEMA:cohort:{name} -> serialized CohortDef + membership bitmap ref
|
|
SCHEMA:relationship:{name} -> serialized RelationshipDef
|
|
SCHEMA:version -> u64 schema version counter
|
|
SCHEMA:metrics:profile:{name}:{v} -> serialized ProfileMetrics
|
|
```
|
|
|
|
### 10.2 In-Memory Schema Cache
|
|
|
|
On database open, all `SCHEMA:*` keys are loaded into an in-memory cache. The cache provides O(1) access to any schema object. All validation and introspection reads come from the cache, never from disk.
|
|
|
|
```rust
|
|
/// In-memory representation of the complete schema.
|
|
/// Loaded once at startup. Updated atomically on define_*() calls.
|
|
pub(crate) struct SchemaCache {
|
|
/// Entity definitions by kind.
|
|
entities: HashMap<EntityKind, EntityDef>,
|
|
/// Signal definitions by name.
|
|
signals: HashMap<String, SignalDef>,
|
|
/// Signal type index: maps signal name to compact u8 index
|
|
/// used in WAL events and hot-tier state.
|
|
signal_type_ids: HashMap<String, u8>,
|
|
/// Profile definitions by (name, version).
|
|
profiles: HashMap<(String, u32), (ProfileDef, ProfileStatus)>,
|
|
/// Cohort definitions by name.
|
|
cohorts: HashMap<String, CohortDef>,
|
|
/// Relationship definitions by name.
|
|
relationships: HashMap<String, RelationshipDef>,
|
|
/// Dependency graph for migration impact analysis.
|
|
dependencies: DependencyGraph,
|
|
/// Schema version counter.
|
|
version: AtomicU64,
|
|
}
|
|
```
|
|
|
|
**Cache invalidation.** When a `define_*` method succeeds:
|
|
|
|
1. The new definition is written to the B-tree backend.
|
|
2. The schema cache is updated with the new definition.
|
|
3. The schema version counter is incremented (atomic).
|
|
4. Query plan caches that reference the old schema version are invalidated.
|
|
|
|
The cache update is performed under a `RwLock` (write-locked during mutation, read-locked during validation and introspection). Schema mutations are rare (minutes to hours between changes in production), so write-lock contention is negligible. Read-lock acquisition for validation and introspection is practically free.
|
|
|
|
### 10.3 WAL Logging
|
|
|
|
Every schema change is WAL-logged as a `SchemaChange` record (type `0x04`) before the B-tree write occurs. This ensures crash recovery can replay schema changes and restore the schema to a consistent state.
|
|
|
|
```
|
|
SchemaChange WAL Record Payload:
|
|
|
|
+----------+-------+-----------------------------+
|
|
| Op Type | Name | Serialized Definition |
|
|
| 1 byte | var | var |
|
|
+----------+-------+-----------------------------+
|
|
|
|
Op Types:
|
|
0x01 = DefineEntity
|
|
0x02 = DefineSignal
|
|
0x03 = DefineProfile
|
|
0x04 = DefineCohort
|
|
0x05 = DefineRelationship
|
|
0x06 = SetProfileStatus
|
|
0x07 = AddFields
|
|
0x08 = ApplyMigration
|
|
```
|
|
|
|
**Recovery sequence.** On crash recovery, `SchemaChange` records are replayed in sequence order. The entity store, signal ledger, and other subsystems are not updated until schema recovery completes -- they depend on having a consistent schema to validate incoming replayed events.
|
|
|
|
---
|
|
|
|
## 11. Example: Video Platform Schema
|
|
|
|
A complete schema definition for a video streaming platform, demonstrating all five object types. This example produces a working database that supports all 14 use cases from USE_CASES.md.
|
|
|
|
```rust
|
|
use tidaldb::{TidalDB, Config};
|
|
use tidaldb::schema::*;
|
|
use std::time::Duration;
|
|
|
|
fn define_video_platform_schema(db: &TidalDB) -> Result<(), SchemaError> {
|
|
|
|
// =====================================================================
|
|
// 1. ENTITY TYPES
|
|
// =====================================================================
|
|
|
|
db.define_entity(EntityDef {
|
|
kind: EntityKind::Item,
|
|
metadata_fields: vec![
|
|
// Text fields (BM25 full-text indexed)
|
|
Field::text("title"),
|
|
Field::text("description"),
|
|
// Keyword fields (exact match, filterable)
|
|
Field::keyword("category"),
|
|
Field::keywords("tags"),
|
|
Field::keyword("format"), // video, short, live, podcast
|
|
Field::keyword("language"),
|
|
Field::keyword("content_rating"), // G, PG, PG-13, R
|
|
Field::keyword("status"), // published, live, scheduled
|
|
Field::keyword("availability"), // free, premium
|
|
// Numeric
|
|
Field::i64("award_count"),
|
|
// Boolean
|
|
Field::bool("has_subtitles"),
|
|
Field::bool("downloadable"),
|
|
Field::bool("safe_search"),
|
|
// Duration
|
|
Field::duration("duration"),
|
|
// Timestamps
|
|
Field::timestamp("created_at"),
|
|
Field::timestamp("updated_at"),
|
|
],
|
|
embedding: EmbeddingDef {
|
|
slots: vec![
|
|
EmbeddingSlot {
|
|
name: "content".to_string(),
|
|
dimensions: 1536,
|
|
source: EmbeddingSource::External,
|
|
precision: EmbeddingPrecision::F16,
|
|
},
|
|
],
|
|
},
|
|
})?;
|
|
|
|
db.define_entity(EntityDef {
|
|
kind: EntityKind::User,
|
|
metadata_fields: vec![
|
|
// Application-set
|
|
Field::keyword("locale"),
|
|
Field::keyword("language"),
|
|
Field::keyword("region"),
|
|
Field::keyword("age_range"),
|
|
Field::keyword("account_type"),
|
|
Field::keywords("explicit_interests"),
|
|
// Database-computed
|
|
Field::computed("inferred_interests", FieldType::Keywords),
|
|
Field::computed("engagement_level", FieldType::Keyword),
|
|
Field::computed("content_format_preference", FieldType::Keyword),
|
|
Field::computed("platform_tenure_days", FieldType::I64),
|
|
Field::computed("followed_creator_count", FieldType::I64),
|
|
],
|
|
embedding: EmbeddingDef {
|
|
slots: vec![
|
|
EmbeddingSlot {
|
|
name: "preference".to_string(),
|
|
dimensions: 1536,
|
|
source: EmbeddingSource::DatabaseManaged,
|
|
precision: EmbeddingPrecision::F16,
|
|
},
|
|
],
|
|
},
|
|
})?;
|
|
|
|
db.define_entity(EntityDef {
|
|
kind: EntityKind::Creator,
|
|
metadata_fields: vec![
|
|
Field::text("name"),
|
|
Field::keyword("handle"),
|
|
Field::keyword("language"),
|
|
Field::keyword("region"),
|
|
Field::keywords("categories"),
|
|
Field::bool("verified"),
|
|
// Database-computed
|
|
Field::computed("follower_count", FieldType::I64),
|
|
Field::computed("total_items", FieldType::I64),
|
|
Field::computed("avg_engagement_rate", FieldType::F64),
|
|
],
|
|
embedding: EmbeddingDef {
|
|
slots: vec![
|
|
EmbeddingSlot {
|
|
name: "catalog".to_string(),
|
|
dimensions: 1536,
|
|
source: EmbeddingSource::DatabaseManaged,
|
|
precision: EmbeddingPrecision::F16,
|
|
},
|
|
],
|
|
},
|
|
})?;
|
|
|
|
// =====================================================================
|
|
// 2. SIGNAL TYPES
|
|
// =====================================================================
|
|
|
|
// -- Positive engagement signals --
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "view".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(7 * 86400) },
|
|
windows: vec![
|
|
Window::hours(1),
|
|
Window::hours(24),
|
|
Window::days(7),
|
|
Window::days(30),
|
|
Window::all_time(),
|
|
],
|
|
velocity: true,
|
|
durability: None, // default: Batched
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "like".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(7 * 86400) },
|
|
windows: vec![
|
|
Window::hours(1),
|
|
Window::hours(24),
|
|
Window::days(7),
|
|
Window::all_time(),
|
|
],
|
|
velocity: true,
|
|
durability: None,
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "share".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(3 * 86400) },
|
|
windows: vec![
|
|
Window::hours(1),
|
|
Window::hours(24),
|
|
Window::days(7),
|
|
],
|
|
velocity: true,
|
|
durability: None,
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "comment".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(3 * 86400) },
|
|
windows: vec![
|
|
Window::hours(1),
|
|
Window::hours(24),
|
|
Window::days(7),
|
|
Window::all_time(),
|
|
],
|
|
velocity: true,
|
|
durability: None,
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "save".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(7 * 86400) },
|
|
windows: vec![Window::hours(24), Window::days(7), Window::all_time()],
|
|
velocity: false,
|
|
durability: None,
|
|
})?;
|
|
|
|
// -- Quality signals --
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "completion".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(30 * 86400) },
|
|
windows: vec![Window::all_time()],
|
|
velocity: false,
|
|
durability: None,
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "dwell_time".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(3 * 86400) },
|
|
windows: vec![Window::hours(24), Window::days(7)],
|
|
velocity: false,
|
|
durability: Some(DurabilityLevel::Eventual),
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "impression".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(86400) },
|
|
windows: vec![Window::hours(1), Window::hours(24)],
|
|
velocity: false,
|
|
durability: Some(DurabilityLevel::Eventual),
|
|
})?;
|
|
|
|
// -- Negative engagement signals --
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "skip".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(86400) },
|
|
windows: vec![Window::hours(1), Window::hours(24)],
|
|
velocity: false,
|
|
durability: None,
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "hide".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Permanent,
|
|
windows: vec![],
|
|
velocity: false,
|
|
durability: Some(DurabilityLevel::Immediate),
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "dislike".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Exponential { half_life: Duration::from_secs(7 * 86400) },
|
|
windows: vec![
|
|
Window::hours(1),
|
|
Window::hours(24),
|
|
Window::days(7),
|
|
Window::all_time(),
|
|
],
|
|
velocity: true,
|
|
durability: None,
|
|
})?;
|
|
|
|
db.define_signal(SignalDef {
|
|
name: "report".to_string(),
|
|
target: EntityKind::Item,
|
|
decay: Decay::Permanent,
|
|
windows: vec![Window::all_time()],
|
|
velocity: false,
|
|
durability: Some(DurabilityLevel::Immediate),
|
|
})?;
|
|
|
|
// =====================================================================
|
|
// 3. RELATIONSHIP TYPES
|
|
// =====================================================================
|
|
|
|
db.define_relationship(RelationshipDef {
|
|
name: "follows".to_string(),
|
|
from: EntityKind::User,
|
|
to: EntityKind::Creator,
|
|
weight_default: 1.0,
|
|
decay: None,
|
|
symmetric: false,
|
|
})?;
|
|
|
|
db.define_relationship(RelationshipDef {
|
|
name: "blocked".to_string(),
|
|
from: EntityKind::User,
|
|
to: EntityKind::Creator,
|
|
weight_default: 1.0,
|
|
decay: None,
|
|
symmetric: false,
|
|
})?;
|
|
|
|
db.define_relationship(RelationshipDef {
|
|
name: "muted".to_string(),
|
|
from: EntityKind::User,
|
|
to: EntityKind::Creator,
|
|
weight_default: 1.0,
|
|
decay: None,
|
|
symmetric: false,
|
|
})?;
|
|
|
|
db.define_relationship(RelationshipDef {
|
|
name: "saved".to_string(),
|
|
from: EntityKind::User,
|
|
to: EntityKind::Item,
|
|
weight_default: 1.0,
|
|
decay: None,
|
|
symmetric: false,
|
|
})?;
|
|
|
|
db.define_relationship(RelationshipDef {
|
|
name: "interaction_weight".to_string(),
|
|
from: EntityKind::User,
|
|
to: EntityKind::Creator,
|
|
weight_default: 0.0,
|
|
decay: Some(Decay::Exponential {
|
|
half_life: Duration::from_secs(30 * 86400),
|
|
}),
|
|
symmetric: false,
|
|
})?;
|
|
|
|
db.define_relationship(RelationshipDef {
|
|
name: "similarity".to_string(),
|
|
from: EntityKind::Item,
|
|
to: EntityKind::Item,
|
|
weight_default: 0.0,
|
|
decay: None, // recomputed periodically, not decayed
|
|
symmetric: true,
|
|
})?;
|
|
|
|
// =====================================================================
|
|
// 4. RANKING PROFILES
|
|
// =====================================================================
|
|
|
|
// -- Personalized feed --
|
|
db.define_profile(ProfileDef {
|
|
name: "for_you".to_string(),
|
|
version: 1,
|
|
candidate: Candidate::Ann {
|
|
query_vector: VectorSource::UserPreference,
|
|
index: EntityKind::Item,
|
|
embedding_slot: Some("content".to_string()),
|
|
top_k: 500,
|
|
},
|
|
boosts: vec![
|
|
Boost::signal("view", Window::hours(24), SignalMode::Velocity, 0.3),
|
|
Boost::relationship("interaction_weight", 0.2),
|
|
Boost::social_proof(0.15),
|
|
],
|
|
decay: Some(ProfileDecay {
|
|
field: "created_at".to_string(),
|
|
half_life: Duration::from_secs(48 * 3600),
|
|
}),
|
|
gates: vec![
|
|
Gate::min("completion", Window::all_time(), 0.3),
|
|
],
|
|
penalties: vec![
|
|
Penalty::signal("skip", Window::hours(24), -0.5),
|
|
],
|
|
excludes: vec![
|
|
Exclude::signal("hide"),
|
|
Exclude::relationship("blocked"),
|
|
],
|
|
diversity: Some(DiversitySpec {
|
|
max_per_creator: Some(2),
|
|
format_mix: true,
|
|
topic_diversity: None,
|
|
}),
|
|
exploration: 0.10,
|
|
sort: None,
|
|
})?;
|
|
db.set_profile_status("for_you", 1, ProfileStatus::Active)?;
|
|
|
|
// -- Trending --
|
|
db.define_profile(ProfileDef {
|
|
name: "trending".to_string(),
|
|
version: 1,
|
|
candidate: Candidate::Scan { entity: EntityKind::Item },
|
|
boosts: vec![
|
|
Boost::signal("share", Window::hours(6), SignalMode::Velocity, 0.5),
|
|
Boost::signal("view", Window::hours(6), SignalMode::Velocity, 0.3),
|
|
Boost::signal("view", Window::hours(24), SignalMode::UniqueRatio, 0.2),
|
|
],
|
|
decay: None,
|
|
gates: vec![],
|
|
penalties: vec![],
|
|
excludes: vec![],
|
|
diversity: Some(DiversitySpec {
|
|
max_per_creator: Some(1),
|
|
format_mix: false,
|
|
topic_diversity: None,
|
|
}),
|
|
exploration: 0.0,
|
|
sort: None,
|
|
})?;
|
|
db.set_profile_status("trending", 1, ProfileStatus::Active)?;
|
|
|
|
// -- Following feed --
|
|
db.define_profile(ProfileDef {
|
|
name: "following".to_string(),
|
|
version: 1,
|
|
candidate: Candidate::Relationship { edge: "follows".to_string() },
|
|
boosts: vec![],
|
|
decay: None,
|
|
gates: vec![],
|
|
penalties: vec![],
|
|
excludes: vec![
|
|
Exclude::relationship("blocked"),
|
|
],
|
|
diversity: None,
|
|
exploration: 0.0,
|
|
sort: Some(Sort::New),
|
|
})?;
|
|
db.set_profile_status("following", 1, ProfileStatus::Active)?;
|
|
|
|
// -- Search --
|
|
db.define_profile(ProfileDef {
|
|
name: "search".to_string(),
|
|
version: 1,
|
|
candidate: Candidate::Hybrid {
|
|
text_weight: 0.6,
|
|
vector_weight: 0.4,
|
|
fusion: Fusion::Rrf { k: 60 },
|
|
},
|
|
boosts: vec![
|
|
Boost::signal("completion", Window::all_time(), SignalMode::Value, 0.15),
|
|
Boost::signal("like", Window::all_time(), SignalMode::Ratio, 0.10),
|
|
],
|
|
decay: Some(ProfileDecay {
|
|
field: "created_at".to_string(),
|
|
half_life: Duration::from_secs(90 * 86400),
|
|
}),
|
|
gates: vec![],
|
|
penalties: vec![],
|
|
excludes: vec![
|
|
Exclude::signal("hide"),
|
|
Exclude::relationship("blocked"),
|
|
],
|
|
diversity: Some(DiversitySpec {
|
|
max_per_creator: Some(2),
|
|
format_mix: false,
|
|
topic_diversity: None,
|
|
}),
|
|
exploration: 0.0,
|
|
sort: None,
|
|
})?;
|
|
db.set_profile_status("search", 1, ProfileStatus::Active)?;
|
|
|
|
// -- Hidden gems --
|
|
db.define_profile(ProfileDef {
|
|
name: "hidden_gems".to_string(),
|
|
version: 1,
|
|
candidate: Candidate::Scan { entity: EntityKind::Item },
|
|
boosts: vec![
|
|
Boost::signal("completion", Window::all_time(), SignalMode::Value, 0.4),
|
|
Boost::signal("like", Window::all_time(), SignalMode::Ratio, 0.3),
|
|
],
|
|
decay: Some(ProfileDecay {
|
|
field: "created_at".to_string(),
|
|
half_life: Duration::from_secs(30 * 86400),
|
|
}),
|
|
gates: vec![
|
|
Gate::min("completion", Window::all_time(), 0.6),
|
|
Gate::min("view", Window::all_time(), 10.0),
|
|
],
|
|
penalties: vec![
|
|
// Penalize high-reach content (inverse reach scoring)
|
|
Penalty::signal("view", Window::all_time(), -0.3),
|
|
],
|
|
excludes: vec![
|
|
Exclude::signal("hide"),
|
|
Exclude::relationship("blocked"),
|
|
],
|
|
diversity: Some(DiversitySpec {
|
|
max_per_creator: Some(1),
|
|
format_mix: true,
|
|
topic_diversity: Some(0.7),
|
|
}),
|
|
exploration: 0.0,
|
|
sort: None,
|
|
})?;
|
|
db.set_profile_status("hidden_gems", 1, ProfileStatus::Active)?;
|
|
|
|
// =====================================================================
|
|
// 5. COHORT DEFINITIONS
|
|
// =====================================================================
|
|
|
|
db.define_cohort(CohortDef {
|
|
name: "us_young_jazz".to_string(),
|
|
predicate: Predicate::And(vec![
|
|
Predicate::Eq("region".to_string(), PredicateValue::String("US".to_string())),
|
|
Predicate::Eq("age_range".to_string(), PredicateValue::String("18-24".to_string())),
|
|
Predicate::Or(vec![
|
|
Predicate::Contains("explicit_interests".to_string(), "jazz".to_string()),
|
|
Predicate::Contains("inferred_interests".to_string(), "jazz".to_string()),
|
|
]),
|
|
]),
|
|
refresh: RefreshPolicy::Hourly,
|
|
})?;
|
|
|
|
db.define_cohort(CohortDef {
|
|
name: "power_users".to_string(),
|
|
predicate: Predicate::Eq(
|
|
"engagement_level".to_string(),
|
|
PredicateValue::String("power_user".to_string()),
|
|
),
|
|
refresh: RefreshPolicy::Hourly,
|
|
})?;
|
|
|
|
db.define_cohort(CohortDef {
|
|
name: "new_users".to_string(),
|
|
predicate: Predicate::And(vec![
|
|
Predicate::Eq(
|
|
"engagement_level".to_string(),
|
|
PredicateValue::String("new".to_string()),
|
|
),
|
|
Predicate::Lt("platform_tenure_days".to_string(), 30.0),
|
|
]),
|
|
refresh: RefreshPolicy::Hourly,
|
|
})?;
|
|
|
|
Ok(())
|
|
}
|
|
```
|
|
|
|
**What this schema enables:**
|
|
|
|
After defining this schema, the application can execute all of these queries without any additional configuration:
|
|
|
|
```rust
|
|
// Personalized For You feed
|
|
db.retrieve(Retrieve { profile: "for_you", for_user: Some("user_123"), .. })?;
|
|
|
|
// Global trending
|
|
db.retrieve(Retrieve { profile: "trending", .. })?;
|
|
|
|
// Trending in jazz category
|
|
db.retrieve(Retrieve {
|
|
profile: "trending",
|
|
filters: vec![Filter::eq("category", "jazz")],
|
|
..
|
|
})?;
|
|
|
|
// Trending among US users aged 18-24 who like jazz
|
|
db.retrieve(Retrieve {
|
|
profile: "trending",
|
|
for_cohort: Some("us_young_jazz"),
|
|
..
|
|
})?;
|
|
|
|
// Following feed (chronological)
|
|
db.retrieve(Retrieve {
|
|
profile: "following",
|
|
for_user: Some("user_123"),
|
|
..
|
|
})?;
|
|
|
|
// Search with hybrid text + vector
|
|
db.search(Search {
|
|
query: "jazz piano tutorial",
|
|
vector: Some(&query_embedding),
|
|
profile: "search",
|
|
for_user: Some("user_123"),
|
|
..
|
|
})?;
|
|
|
|
// Hidden gems in the last 30 days
|
|
db.retrieve(Retrieve {
|
|
profile: "hidden_gems",
|
|
filters: vec![Filter::created_within(Duration::from_secs(30 * 86400))],
|
|
..
|
|
})?;
|
|
```
|
|
|
|
---
|
|
|
|
## 12. Invariants and Correctness Guarantees
|
|
|
|
These invariants must hold at all times. They are encoded as property tests, assertions, and crash recovery tests.
|
|
|
|
### Schema Integrity Invariants
|
|
|
|
**INV-SCH-1: No dangling references.** Every signal, profile, cohort, and relationship definition references only objects that exist at the time of definition. Formally: for every reference `R` in a schema object `O`, the referenced object exists in the schema when `O` is defined. No lazy or deferred reference resolution.
|
|
|
|
**INV-SCH-2: No orphaned dependents.** A schema object referenced by another schema object cannot be removed unless the referencing object is removed first. The migration API enforces this via the `blocked_by` field in `MigrationPlan`.
|
|
|
|
**INV-SCH-3: Signal immutability.** Once a signal definition is committed, its `name`, `target`, `decay`, `windows`, and `velocity` fields cannot be changed. Any attempt returns `SchemaError::SignalImmutable`.
|
|
|
|
**INV-SCH-4: Profile version monotonicity.** For a given profile name, version numbers are strictly increasing. If versions 1, 2, 3 exist, the next must be 4 or greater.
|
|
|
|
**INV-SCH-5: Schema cache consistency.** The in-memory schema cache is always consistent with the B-tree storage. Formally: `cache.get(key) == btree.get(key)` for all `SCHEMA:*` keys, at all times after database open completes.
|
|
|
|
**INV-SCH-6: WAL recoverability.** After crash recovery, the schema state is identical to the state before the crash. All `SchemaChange` WAL records are replayed in order, and the resulting schema matches the pre-crash schema.
|
|
|
|
**INV-SCH-7: Computed field write rejection.** Any attempt to write a `DbComputed` or `DbManaged` field via the write API returns `SchemaError::ComputedFieldWrite`. The database never silently ignores a computed field write.
|
|
|
|
**INV-SCH-8: Validation completeness.** Every validation rule in Section 5 is checked for every definition. A definition that passes all rules is guaranteed to produce a consistent schema state. A definition that fails any rule is rejected without side effects (no partial writes).
|
|
|
|
### Property Tests
|
|
|
|
```rust
|
|
// P1: Schema operations are atomic -- a failed define_* has no side effects.
|
|
proptest! {
|
|
fn failed_define_no_side_effects(
|
|
def in arb_invalid_signal_def(),
|
|
) {
|
|
let db = TidalDB::open(test_config())?;
|
|
let version_before = db.schema_version();
|
|
let _ = db.define_signal(def); // expected to fail
|
|
let version_after = db.schema_version();
|
|
prop_assert_eq!(version_before, version_after);
|
|
}
|
|
}
|
|
|
|
// P2: Profile version ordering is maintained.
|
|
proptest! {
|
|
fn profile_versions_strictly_increasing(
|
|
versions in prop::collection::vec(1u32..100, 1..20),
|
|
) {
|
|
let db = TidalDB::open(test_config())?;
|
|
setup_base_schema(&db)?;
|
|
let mut sorted = versions.clone();
|
|
sorted.sort();
|
|
sorted.dedup();
|
|
for &v in &sorted {
|
|
let result = db.define_profile(make_profile("test", v));
|
|
prop_assert!(result.is_ok());
|
|
}
|
|
// Verify versions are stored in order
|
|
let summary = db.list_profiles();
|
|
let stored_versions: Vec<u32> = summary.iter()
|
|
.find(|p| p.name == "test")
|
|
.unwrap()
|
|
.versions.iter()
|
|
.map(|v| v.version)
|
|
.collect();
|
|
prop_assert_eq!(stored_versions, sorted);
|
|
}
|
|
}
|
|
|
|
// P3: Schema survives crash at any point during define_*.
|
|
proptest! {
|
|
fn schema_crash_recovery(
|
|
defs in arb_schema_definition_sequence(1..50),
|
|
crash_point in 0usize..50,
|
|
) {
|
|
let (wal, expected_schema) = execute_defs_with_crash(&defs, crash_point);
|
|
let recovered_schema = replay_schema_from_wal(wal);
|
|
prop_assert_eq!(expected_schema, recovered_schema);
|
|
}
|
|
}
|
|
|
|
// P4: Validation rejects all invalid states.
|
|
proptest! {
|
|
fn validation_rejects_invalid_references(
|
|
signal_name in "[a-z]{1,10}",
|
|
) {
|
|
let db = TidalDB::open(test_config())?;
|
|
// No entity types defined -- signal should fail validation
|
|
let result = db.define_signal(SignalDef {
|
|
name: signal_name,
|
|
target: EntityKind::Item,
|
|
decay: Decay::Permanent,
|
|
windows: vec![],
|
|
velocity: false,
|
|
durability: None,
|
|
});
|
|
prop_assert!(matches!(result, Err(SchemaError::UndefinedTargetEntity { .. })));
|
|
}
|
|
}
|
|
|
|
// P5: Migration blockers are complete -- no migration succeeds
|
|
// that would leave a dangling reference.
|
|
proptest! {
|
|
fn migration_blockers_complete(
|
|
schema in arb_complete_schema(),
|
|
removal in arb_removal_from_schema(),
|
|
) {
|
|
let plan = db.plan_migration(removal.clone())?;
|
|
if plan.blocked_by.is_empty() {
|
|
// Migration should succeed without creating dangling refs
|
|
db.apply_migration(plan)?;
|
|
assert_no_dangling_references(&db);
|
|
} else {
|
|
// Migration should be blocked
|
|
// Verify each blocker is a real dependency
|
|
for blocker in &plan.blocked_by {
|
|
assert!(schema_references(&db, &blocker.object, &removal));
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Appendix A: Glossary
|
|
|
|
| Term | Definition |
|
|
|------|------------|
|
|
| **Schema** | The complete set of entity, signal, profile, cohort, and relationship definitions that describe the structure and behavior of a tidalDB instance. |
|
|
| **Entity Definition** | Declaration of an entity kind's metadata fields and embedding slots. |
|
|
| **Signal Definition** | Immutable declaration of a signal type's decay, windowing, and velocity behavior. |
|
|
| **Ranking Profile** | Versioned, named scoring function combining candidate generation, boosts, gates, penalties, excludes, and diversity constraints. |
|
|
| **Cohort** | A named user segment defined by a predicate over user entity fields. |
|
|
| **Profile Version** | A specific numbered iteration of a ranking profile. Multiple versions can coexist. |
|
|
| **Profile Lifecycle** | The four-state progression: Draft -> Active -> Deprecated -> Archived. |
|
|
| **Additive Change** | A schema modification that does not invalidate existing data (add field, add signal, new profile version). Always safe. |
|
|
| **Breaking Change** | A schema modification that would invalidate existing data or references (remove field, change type). Requires the migration API. |
|
|
| **Migration Plan** | The result of analyzing a proposed breaking change: affected objects, blockers, and estimated cost. |
|
|
| **Schema Version** | A monotonically increasing counter incremented on every schema change. Used for cache invalidation. |
|
|
| **Lambda** | The precomputed decay rate constant: `ln(2) / half_life_seconds`. Stored alongside signal definitions. |
|
|
| **Exploration Budget** | The fraction of query results reserved for cold-start items. Declared per ranking profile. |
|
|
| **Population Prior** | Database-maintained default values (preference centroid, signal baselines) used for cold-start entities. |
|
|
|
|
## Appendix B: References
|
|
|
|
1. thoughts.md -- Stage 3 insight: "Schema encodes behavior, not just shape."
|
|
2. VISION.md -- Design principles: temporal decay as a type, ranking profiles as data.
|
|
3. API.md -- Schema definition API surface and examples.
|
|
4. 02-entity-model.md -- Entity type definitions, field types, writability model.
|
|
5. 03-signal-system.md -- Signal type declarations, decay computation, windowed aggregation.
|
|
6. 04-relationships.md -- Relationship edge types, weight update mechanics.
|
|
7. CODING_GUIDELINES.md -- Error handling (`Result<T, E>` everywhere), trait abstraction, module boundaries.
|
|
8. Ousterhout, J. "A Philosophy of Software Design." -- Deep modules, small interfaces.
|