tidaldb/docs/specs/00-architecture-overview.md

# 00 -- Architecture Overview

**Status:** Draft
**Author:** tidalDB Engineering
**Date:** 2026-02-20
**Purpose:** Show how the 14 specs connect. The forest before the trees.

---

## 1. Core Insight

The WAL is the single event stream. Everything else is a materialized view.

The signal ledger is a materialized view over signal events. The user preference vector is a materialized view over signal events weighted by item embeddings. The relationship weight between a user and a creator is a materialized view over interaction signals. The cohort-scoped trending counter is a materialized view over signal events filtered by user attributes.

This is not a metaphor. The WAL (spec 01) records every mutation: signal events, entity writes, relationship writes, schema changes. After a record is durable in the WAL, downstream materializers consume it and update their derived state. If any materializer's state is lost, it is rebuilt by replaying the WAL from the last checkpoint. The WAL is truth. Everything else is cache.

The existing specs already embody this pattern -- spec 03 Section 3 says "immutable events, mutable aggregates," spec 10 Section 2 shows a single signal event updating six subsystems, spec 01 says "the WAL is the source of truth; everything else is derived state." The architecture overview names the pattern explicitly and shows how the 14 specs are instances of it.

---

## 2. System Diagram

```
                          APPLICATION
                              |
                    db.signal() / db.write_item() / db.retrieve()
                              |
                  +-----------+-----------+
                  |                       |
             WRITE PATH              READ PATH
                  |                       |
                  v                       v
        +------------------+    +-------------------+
        |       WAL        |    |   QUERY ENGINE    |
        | (append-only log)|    | (spec 08)         |
        | spec 01          |    |                   |
        +--------+---------+    +----+---------+----+
                 |                   |         |
                 v                   |    reads from
    +------------------------+      |         |
    | MATERIALIZER REGISTRY  |      |    +----+----+---+--------+
    | fans out each event to |      |    |    |    |   |        |
    | all registered          |     |    |    |    |   |        |
    | materializers           |     |    v    v    v   v        v
    +--+----+----+----+------+      | Signal  Entity  Rel.  User   Cohort
       |    |    |    |             | Ledger  Store   Graph  State  Counters
       v    v    v    v             | (hot/   (redb)  (redb) (redb) (fjall)
    +----+----+----+------+         | warm)
    | G  | U  | R  | C    |         |
    | l  | s  | e  | o    |         +--reads from--+
    | o  | e  | l  | h    |                        |
    | b  | r  | a  | o    |         +---------+---------+---------+
    | a  | P  | t  | r    |         |         |         |         |
    | l  | r  | i  | t    |         v         v         v         v
    |    | e  | o  |      |     +-------+ +-------+ +--------+ +-------+
    | S  | f  | n  | S    |     |Tantivy| |USearch| |Roaring | |Cohort |
    | i  |    | s  | i    |     | Text  | |Vector | |Bitmap  | |Rollup |
    | g  | V  | h  | g    |     | Index | | Index | |Filters | |Tables |
    | n  | e  | i  | n    |     |spec 06| |spec 07| |spec 08 | |spec 05|
    | a  | c  | p  | a    |     +-------+ +-------+ +--------+ +-------+
    | l  | t  |    | l    |
    |    | o  | W  |      |
    | M  | r  | e  | M    |
    | a  |    | i  | a    |
    | t  | M  | g  | t    |
    | .  | a  | h  | .    |
    |    | t  | t  |      |
    |    | .  |    |      |
    |    |    | M  |      |
    |    |    | a  |      |
    |    |    | t  |      |
    |    |    | .  |      |
    +----+----+----+------+
```

Write path: event arrives, WAL appends, materializer registry fans out to all registered materializers. Each materializer updates its scoped state.

Read path: query engine reads from materialized state (signal ledger for scores, entity store for metadata, indexes for retrieval, cohort counters for scoped trending). No materializer is invoked on the read path. Reads never touch the WAL.

---

## 3. Materializer Trait

The materializer is the core abstraction boundary between the event stream and derived state. Every piece of state that a query reads -- signal scores, preference vectors, relationship weights, cohort counters, user-item state -- is produced by a materializer.

```rust
/// The scope at which a materializer operates.
/// Determines what subset of events it processes and what key space it writes to.
pub enum Scope {
    /// All events. Global signal counters, global trending.
    Global,
    /// Events from users in a specific cohort. Cohort-scoped trending.
    Cohort(CohortId),
    /// Events involving a specific user. Preference vectors, user-item state.
    User(UserId),
    /// Events between two entities. Interaction weights, engagement affinity.
    Relationship(EntityId, EntityId),
}

/// A materializer consumes WAL events and produces derived state.
///
/// Implementations:
///   GlobalSignalMaterializer  -- hot-tier decay scores, windowed counters (M1)
///   UserPreferenceMaterializer -- preference vector shifts (M3)
///   RelationshipWeightMaterializer -- interaction weights, engagement affinity (M3)
///   CohortSignalMaterializer  -- dimensional rollup counters (M4)
///   UserStateMaterializer     -- seen/liked/saved/hidden bitmaps (M3)
pub trait Materializer: Send + Sync {
    /// Process a single WAL event. Called by the registry for every event
    /// after WAL durability is confirmed.
    ///
    /// Implementations must be idempotent: replaying the same event twice
    /// must produce the same state as processing it once.
    fn on_event(&self, event: &WalEvent) -> Result<()>;

    /// Write current state to a checkpoint. Called periodically by the
    /// background checkpoint task. After a successful checkpoint, the WAL
    /// segments before the checkpoint sequence number are eligible for cleanup.
    fn checkpoint(&self, writer: &mut dyn Write) -> Result<()>;

    /// Restore state from a checkpoint. Called during crash recovery
    /// before WAL replay begins. After restore, the materializer's state
    /// matches the checkpoint. WAL events after the checkpoint sequence
    /// number are then replayed via on_event().
    fn restore(&self, reader: &mut dyn Read) -> Result<()>;
}

/// The registry holds all active materializers and fans out events.
pub struct MaterializerRegistry {
    materializers: Vec<Box<dyn Materializer>>,
}

impl MaterializerRegistry {
    /// Fan out a single event to all registered materializers.
    /// Called after WAL append confirms durability.
    pub fn on_event(&self, event: &WalEvent) -> Result<()> {
        for m in &self.materializers {
            m.on_event(event)?;
        }
        Ok(())
    }
}
```

The trait is small by design. Three methods. Each materializer owns its scope, its storage, and its invariants. The registry is a fan-out mechanism, nothing more.

This is an S-complexity addition in M1 that prevents an M-complexity refactor later. The `GlobalSignalMaterializer` is the first implementation. `UserPreferenceMaterializer` and `RelationshipWeightMaterializer` arrive in M3. `CohortSignalMaterializer` arrives in M4. The trait boundary means each can be developed and tested in isolation.

---

## 4. Spec Map

Every spec has a role in the data flow. Some define what goes into the event stream. Some define materializers that consume the stream. Some define how the query engine reads materialized state. Some are cross-cutting.

| Spec | Name | Role in Data Flow | Category |
|------|------|-------------------|----------|
| 01 | Storage Engine | WAL format, segment lifecycle, crash recovery, dual-backend (fjall + redb) | **Event Stream** |
| 02 | Entity Model | Entity write events in WAL, entity store as materialized state in redb | **Event Stream + Materialized View** |
| 03 | Signal System | Signal events in WAL, three-tier signal ledger as materialized view, cohort dimensional rollups as materialized views | **Materialized View** (primary) |
| 04 | Relationships | Relationship write events in WAL, edge store as materialized state, implicit edges updated by signal materializers | **Event Stream + Materialized View** |
| 05 | Cohorts | Cohort definitions, membership resolution, scoped signal counters as materialized views | **Materialized View** |
| 06 | Text Retrieval | Tantivy index as materialized view over entity text fields, queried at read time | **Query-Time Index** |
| 07 | Vector Retrieval | USearch HNSW index as materialized view over entity embeddings, queried at read time | **Query-Time Index** |
| 08 | Query Engine | Orchestrator that reads from all materialized state, never writes | **Query-Time Reader** |
| 09 | Ranking/Scoring | Scoring pipeline, profiles, diversity -- reads signals, relationships, vectors at query time | **Query-Time Reader** |
| 10 | Feedback Loop | Defines the semantic mapping from signal events to materializer updates (which signal shifts the preference vector in which direction, which signal increments which relationship weight) | **Materializer Orchestration** |
| 11 | Schema | Definitions for entities, signals, profiles, cohorts -- the contract that all materializers and the query engine validate against | **Cross-Cutting** |
| 12 | Cold Start | Exploration budgets, proxy scoring, cohort priors -- query-time logic for entities with no signal history | **Query-Time Reader** |
| 13 | Concurrency | Lock-free hot path, group commit, thread model, memory ordering -- the mechanism that makes concurrent materialization and querying safe | **Cross-Cutting** |
| 14 | Scale Architecture | Partition keys, capacity model, single-node ceiling -- design constraints that influence WAL format, key encoding, and materializer scope | **Cross-Cutting** |

The pattern: specs 01-05 define the write side (event stream + materialized views). Specs 06-07 define query-time indexes (also materialized views, but read-only from the query engine's perspective). Specs 08-09 define the read side. Spec 10 is the bridge between write and read. Specs 11-14 are cross-cutting concerns.

---

## 5. Signal Write Walkthrough

Trace one event through the system: **user U likes item I** (where item I was created by creator C).

```
Application calls: db.signal(Signal { kind: "like", item: "item_I", user: "user_U" })

Step 1: DEDUPLICATION CHECK                                    ~100 ns
    BLAKE3(like, item_I, user_U, timestamp_trunc_1s) -> hash
    Check bloom filter -> PASS (not a duplicate)

Step 2: WAL APPEND                                             ~50 us
    Serialize to WAL record:
        type: 0x01 (SignalEvent)
        payload: { kind: "like", item_id: I, user_id: U, weight: 1.0, ts: now }
    Write to current WAL segment, fsync (batched)
    Assign sequence number: seqno 47291

    *** DURABILITY BOUNDARY ***
    Event is now durable. All subsequent updates are derived state.

Step 3: MATERIALIZER REGISTRY FAN-OUT
    registry.on_event(WalEvent { seqno: 47291, type: SignalEvent, ... })
    Invokes each registered materializer:

    3a: GlobalSignalMaterializer                               ~40 ns
        Read item I's HotSignalState for signal "like"
        CAS update: decay_score += weight * exp(-lambda * dt)
        Atomic increment: warm tier minute bucket counter
        Atomic increment: all_time_count
        Result: item I's like score, velocity, windowed counts updated

    3b: UserPreferenceMaterializer                             ~10 us
        Load user U's preference vector (1536D)
        Load item I's content embedding (1536D)
        Signal polarity: positive (like)
        Shift: pref_new = normalize(pref_old + lr * item_embedding)
        Write back updated preference vector
        Result: user U's taste profile reflects this like

    3c: RelationshipWeightMaterializer                         ~5 us
        Resolve item I -> creator C
        Load interaction_weight(U, C), apply time decay, add delta (+0.15)
        Clamp to [0.0, 1.0], write back
        Load engagement_affinity(U, I), update similarly
        Result: U's affinity for creator C increased

    3d: CohortSignalMaterializer                               ~20 us
        Load user U's cached cohort memberships: {region:US, age:18-24, lang:en}
        Increment global counter for item I / like / current_hour
        Increment region:US counter for item I / like / current_hour
        Increment age:18-24 counter for item I / like / current_hour
        Increment lang:en counter for item I / like / current_hour
        Check behavioral segments: U is in "jazz_fans" -> increment that counter
        Result: cohort-scoped trending reflects this engagement

    3e: UserStateMaterializer                                  ~5 us
        Set bitmap: user_U has "liked" item_I
        Result: future queries with FILTER liked include this pair

RETURN Ok(())                                          Total: < 100 us p50
```

One API call. One WAL append. Five materializer updates. The next ranking query -- even 1ms later -- sees all of this. No ETL. No Kafka. No stale data.

---

## 6. Query Walkthrough

Trace a composed query through the system:

```
RETRIEVE items
FOR USER @u1
USING PROFILE for_you
FILTER unseen
WITHIN TRENDING
COHORT locale:US, age:18-24
DIVERSITY max_per_creator:2
LIMIT 50
```

This is a three-layer query: personalized ranking within cohort-scoped trending.

```
Step 1: PARSE AND VALIDATE                                     ~1 us
    Resolve profile "for_you" from schema -> ProfileDef v3
    Resolve cohort predicates: locale:US AND age:18-24
    Validate user @u1 exists
    Validate all filter fields exist in schema

Step 2: COHORT RESOLUTION                                      ~2 ms
    Resolve cohort "locale:US AND age:18-24" to a CohortId
    This is a Level 3 (composite) cohort: intersection of
        Level 1 dimension region:US (dimension_id=1, cohort_value=0x0001)
        Level 1 dimension age_group:18-24 (dimension_id=3, cohort_value=0x0002)
    No pre-computed counters for the composite.
    Plan: fetch Level 1 counters for both dimensions, estimate intersection
    using independence assumption: count(US AND 18-24) ~ count(US) * count(18-24) / count(global)

Step 3: CANDIDATE GENERATION FROM COHORT TRENDING              ~15 ms
    Read cohort_signals CF for dimension region:US, signal "view",
        window: last 24 hours (24 hour-buckets)
    Read cohort_signals CF for dimension age_group:18-24, signal "view",
        window: last 24 hours
    For each item: compute estimated cohort velocity using independence assumption
    Sort by estimated velocity, take top 500 candidates
    This is the "what is trending for US users aged 18-24" candidate set

Step 4: FILTER APPLICATION                                     ~3 ms
    Load RoaringBitmap for user @u1's "seen" items
    Remove seen items from candidate set
    Apply any metadata filters (none beyond "unseen" in this query)
    Surviving candidates: ~400

Step 5: SIGNAL LOADING                                         ~2 ms
    For each surviving candidate, load from hot tier:
        like.decay_score, view.velocity(24h), share.decay_score
    For user @u1, load:
        preference_vector (1536D)
        interaction_weight(u1, candidate.creator) for each candidate's creator
    All reads are lock-free atomic loads from memory-resident state

Step 6: SCORING VIA RANKING PROFILE                            ~5 ms
    Profile "for_you" scoring pipeline (9 stages):
        1. Base score: cohort velocity (from step 3)
        2. Personalization boost: cosine_sim(u1.preference_vector, item.embedding)
        3. Relationship boost: interaction_weight(u1, item.creator)
        4. Signal boosts: like.decay_score, share.decay_score
        5. Recency curve: time_decay(item.created_at)
        6. Penalties: low completion rate, flagged content
        7. Quality gates: minimum signal thresholds
        8. Cold start: exploration budget injection (10% of slots)
        9. Final score composition: weighted sum with normalization

Step 7: DIVERSITY ENFORCEMENT                                  ~1 ms
    Sort by score descending
    Enforce max_per_creator:2
        Greedy scan: for each item, if creator already has 2 items in result,
        demote to end of list
    Take top 50 after diversity enforcement

Step 8: RESULT ASSEMBLY                                        ~1 ms
    Load entity metadata for 50 items from redb
    Build cursor for pagination (encodes last item's score + id)
    Return Results { items, cursor, total_estimate }

TOTAL LATENCY: ~30 ms (within 50 ms budget)
```

---

## 7. Three-Layer Trending

Global trending, cohort-scoped trending, and search-within-cohort-trending are not three different systems. They are three scopes applied to the same materializer architecture, using the same math.

**The math:** Velocity is the rate of change of a windowed signal count. For a 24-hour window:

```
velocity(item, signal, window) = count(item, signal, window) / window_duration
```

Acceleration (rising detection) is the rate of change of velocity:

```
acceleration = velocity(current_window) - velocity(previous_window)
```

This formula is identical at every scope. The only thing that changes is which counter you read.

**Layer 1: Global trending**

```
RETRIEVE items USING PROFILE trending WINDOW 24h LIMIT 25
```

Reads from: `GlobalSignalMaterializer` counters. Level 0 in the dimensional hierarchy. One counter per item per signal per hour bucket. Sum the last 24 buckets, divide by 24h. Sort by velocity. Done.

**Layer 2: Cohort-scoped trending**

```
RETRIEVE items USING PROFILE trending COHORT locale:US, age:18-24 WINDOW 24h LIMIT 25
```

Reads from: `CohortSignalMaterializer` counters. Level 1 dimensions region:US and age_group:18-24. For a composite cohort (Level 3), estimate the intersection using independence assumption. Same velocity formula, different counters. The math does not change. The scope does.

**Layer 3: Search within cohort-scoped trending**

```
SEARCH items QUERY "piano tutorial" WITHIN TRENDING COHORT locale:US, age:18-24 WINDOW 24h LIMIT 20
```

Step 1: Generate the cohort-trending candidate set (Layer 2). Step 2: Run text search (Tantivy BM25) restricted to that candidate set. Step 3: Fuse cohort velocity score with BM25 relevance score. Same materializer output, filtered by a text query.

The architecture makes this composable because each layer reads from the same materialized state. The query planner recognizes `WITHIN TRENDING COHORT ...` as "generate candidates from cohort velocity, then filter by text match." No special-case code. No separate trending service. One materializer hierarchy, three query shapes.

---

## 8. Code Module Map

```
tidal/src/
    lib.rs                  # TidalDB struct, public API, lifecycle

    wal/                    # Spec 01: Write-ahead log
        mod.rs              # WAL reader/writer, segment management
        record.rs           # WalEvent enum, serialization
        segment.rs          # Segment file lifecycle, preallocate, seal
        recovery.rs         # Crash recovery: scan, validate, replay

    materializer/           # Architecture overview: core abstraction
        mod.rs              # Materializer trait, Scope enum
        registry.rs         # MaterializerRegistry, fan-out, checkpoint coordination

    storage/                # Spec 01: Dual-backend storage
        mod.rs              # StorageEngine trait
        fjall.rs            # fjall backend: WAL, cold-tier signals, cohort counters
        redb.rs             # redb backend: entities, relationships, user state
        keys.rs             # Key encoding (partition-ready prefixes)

    entity/                 # Spec 02: Items, Users, Creators
        mod.rs              # Entity trait, EntityKind enum
        item.rs             # Item struct, metadata fields, lifecycle
        user.rs             # User struct, attributes, computed fields
        creator.rs          # Creator struct, catalog embedding

    signal/                 # Spec 03: Signal system
        mod.rs              # SignalDef, Decay enum, Window enum
        hot.rs              # HotSignalState (cache-line aligned, atomic)
        warm.rs             # WarmSignalState (per-minute buckets, SWAG)
        cold.rs             # Cold-tier event storage, hourly/daily rollups
        velocity.rs         # Velocity and acceleration computation
        decay.rs            # Exponential/linear decay formulas
        global_mat.rs       # GlobalSignalMaterializer (impl Materializer)
        cohort_mat.rs       # CohortSignalMaterializer (impl Materializer)
        user_pref_mat.rs    # UserPreferenceMaterializer (impl Materializer)
        user_state_mat.rs   # UserStateMaterializer (impl Materializer)

    relationship/           # Spec 04: Edges between entities
        mod.rs              # Edge types, directionality, storage
        weight.rs           # Weight update mechanics, decay
        traversal.rs        # Fan-out queries (following feed, collab filtering)
        rel_mat.rs          # RelationshipWeightMaterializer (impl Materializer)

    cohort/                 # Spec 05: Dynamic population segments
        mod.rs              # CohortDef, CohortId, predicate evaluation
        membership.rs       # Bitmap-based membership resolution
        rollup.rs           # Dimensional hierarchy (Level 0/1/2/3)

    index/                  # Specs 06-07: Secondary indexes
        mod.rs              # Index trait bounds
        text.rs             # TextIndex trait + Tantivy implementation (spec 06)
        vector.rs           # VectorIndex trait + USearch implementation (spec 07)
        bitmap.rs           # RoaringBitmap filter indexes (spec 08)

    query/                  # Spec 08: Query engine
        mod.rs              # retrieve(), search(), suggest() entry points
        parser.rs           # Input validation, schema resolution, AST construction
        planner.rs          # Cost-based plan selection, selectivity estimation
        executor.rs         # Pipeline execution, subsystem coordination
        cursor.rs           # Pagination cursor encoding/decoding
        composition.rs      # WITHIN clause, cohort-scoped candidate generation

    ranking/                # Specs 09 + 12: Scoring and cold start
        mod.rs              # ProfileDef, scoring pipeline (9 stages)
        boosts.rs           # Signal, personalization, relationship, recency boosts
        penalties.rs        # Low-quality, flagged content, repetition penalties
        gates.rs            # Quality gates, minimum thresholds
        diversity.rs        # max_per_creator, format_mix, greedy enforcement
        cold_start.rs       # Exploration budget, proxy scoring, cohort priors
        sort_modes.rs       # 20+ built-in sort modes (trending, hot, rising, etc.)

    schema/                 # Spec 11: Schema system
        mod.rs              # define_entity, define_signal, define_profile, etc.
        validation.rs       # Schema validation rules, breaking change detection
        migration.rs        # Migration planner, dry-run, execute
        version.rs          # Version tracking, introspection

    api/                    # Public Rust API surface
        mod.rs              # Re-exports, builder patterns, error types
```

The materializer implementations live inside their domain modules (`signal/`, `relationship/`), not in `materializer/`. The `materializer/` module owns the trait and the registry. Each domain module owns its materializer implementation. This keeps domain logic co-located with its materializer.

---

## 9. Spec Dependency Graph

```
                    +----------+
                    | 11 Schema|  (cross-cutting: all specs validate against schema)
                    +----+-----+
                         |
                    +----v-----+
                    |01 Storage|  (foundation: WAL, dual-backend, crash recovery)
                    +----+-----+
                         |
              +----------+----------+
              |                     |
        +-----v------+       +-----v------+
        | 02 Entity  |       | 03 Signal  |
        | Model      |       | System     |
        +-----+------+       +--+----+----+
              |                  |    |
    +---------+--------+    +---+    +--------+
    |         |        |    |                 |
+---v---+ +--v---+ +--v--+ |           +-----v-----+
|06 Text| |07 Vec| |04 Rel| |           | 05 Cohort |
|Retriev| |Retri.| |ation.| |           |           |
+---+---+ +--+---+ +--+---+ |           +-----+-----+
    |         |        |     |                 |
    +---------+--------+-----+---------+-------+
              |                        |
        +-----v------+          +-----v------+
        | 08 Query   |          | 10 Feedback|
        | Engine     |          | Loop       |
        +-----+------+          +------------+
              |
        +-----v------+
        | 09 Ranking |
        | + 12 Cold  |
        +------------+

Cross-cutting (not shown as edges -- they constrain everything):
    11 Schema    -- all definitions validated against schema
    13 Concurrency -- lock-free patterns for all hot-path state
    14 Scale     -- partition-ready key encoding, aggregation scopes
```

Read the graph bottom-up for implementation order. Read it top-down for dependency chains.

**Critical path:** 01 -> 03 -> 05 -> 08 -> 09. This is the longest dependency chain and the path that enables the full three-layer trending query. Every milestone must make progress along this chain.

**Parallel tracks after 01:** Entity model (02), signal system (03), and schema (11) can proceed in parallel once the storage engine exists. Text (06) and vector (07) retrieval can proceed in parallel once the entity model exists. Relationships (04) and cohorts (05) can proceed in parallel once signals exist.

---

## 10. Cross-Cutting Principles

**WAL is truth.** Every mutation is durable in the WAL before it is visible anywhere. Materialized state can be lost and rebuilt. The WAL cannot. This is not a design preference -- it is the correctness foundation. Spec 01 Invariant 2: "A signal event acknowledged to the caller survives any single crash."

**Materializers are the abstraction boundary.** The write path does not know what derived state exists. It appends to the WAL and calls `registry.on_event()`. Adding a new kind of derived state means implementing `Materializer` and registering it. No changes to the write path. No changes to existing materializers.

**Same math at every scope.** Velocity is `count / duration`. Decay is `score * exp(-lambda * dt)`. These formulas do not change when you switch from global to cohort to user-local scope. What changes is which counter you read. Global velocity reads Level 0 counters. Cohort velocity reads Level 1/2 counters and estimates Level 3 intersections. The ranking profile does not know the difference -- it sees a velocity number. This uniformity is what makes three-layer trending a query parameter, not a feature.

**Scale is a design constraint from day one.** The WAL record format includes a partition key field (spec 14). Key encoding in the storage layer uses big-endian prefixes that sort correctly under range partitioning. `SignalDef` carries an `aggregation_scope` field. The `Materializer` trait's `Scope` enum maps directly to partition boundaries. None of this requires a distributed runtime to exist. All of it is required so that when the distributed runtime arrives, it does not require a storage engine rewrite. CockroachDB, TiDB, and Elasticsearch learned this lesson. tidalDB builds on it.

**Single-node-first but partition-ready.** A single tidalDB process is a complete, self-contained shard. It runs the full WAL, all materializers, all indexes, and the full query engine. Distribution, when it comes, is the coordination of many such shards -- not a redesign of what a shard does. The atoms are right from day one. The orchestration comes later.

**Readers never block writers. Writers never block readers.** The concurrency model (spec 13) enforces this structurally, not by convention. Hot-tier signal state uses atomic CAS. Warm-tier counters use atomic increments. Entity reads use epoch-based reclamation. The WAL writer is channel-serialized (one writer, many producers). No ranking query ever acquires a lock on the scoring path.

**The query engine is stateless.** It holds no data. It reads from materialized state produced by materializers and from secondary indexes (Tantivy, USearch, RoaringBitmaps). If the query engine crashes, no data is lost, no recovery is needed. It restarts and reads from the same materialized state.

**Schema encodes behavior, not just shape.** A signal's half-life, a ranking profile's scoring weights, a cohort's predicate, a diversity constraint -- these are schema declarations, not application code. The database enforces them. The query optimizer reasons about them. Behavior changes are schema mutations, not redeployments. This is the Stage 3 insight from thoughts.md.