- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding - Stub modules for storage, signals, query, ranking - Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs - Marketing site (Next.js) with blog infrastructure - .claude/ agents and skills for the tidalDB development workflow - Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config - .gitignore hardened: .next/, node_modules/, .env, secrets, logs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
531 lines
29 KiB
Markdown
531 lines
29 KiB
Markdown
# 00 -- Architecture Overview
|
|
|
|
**Status:** Draft
|
|
**Author:** tidalDB Engineering
|
|
**Date:** 2026-02-20
|
|
**Purpose:** Show how the 14 specs connect. The forest before the trees.
|
|
|
|
---
|
|
|
|
## 1. Core Insight
|
|
|
|
The WAL is the single event stream. Everything else is a materialized view.
|
|
|
|
The signal ledger is a materialized view over signal events. The user preference vector is a materialized view over signal events weighted by item embeddings. The relationship weight between a user and a creator is a materialized view over interaction signals. The cohort-scoped trending counter is a materialized view over signal events filtered by user attributes.
|
|
|
|
This is not a metaphor. The WAL (spec 01) records every mutation: signal events, entity writes, relationship writes, schema changes. After a record is durable in the WAL, downstream materializers consume it and update their derived state. If any materializer's state is lost, it is rebuilt by replaying the WAL from the last checkpoint. The WAL is truth. Everything else is cache.
|
|
|
|
The existing specs already embody this pattern -- spec 03 Section 3 says "immutable events, mutable aggregates," spec 10 Section 2 shows a single signal event updating six subsystems, spec 01 says "the WAL is the source of truth; everything else is derived state." The architecture overview names the pattern explicitly and shows how the 14 specs are instances of it.
|
|
|
|
---
|
|
|
|
## 2. System Diagram
|
|
|
|
```
|
|
APPLICATION
|
|
|
|
|
db.signal() / db.write_item() / db.retrieve()
|
|
|
|
|
+-----------+-----------+
|
|
| |
|
|
WRITE PATH READ PATH
|
|
| |
|
|
v v
|
|
+------------------+ +-------------------+
|
|
| WAL | | QUERY ENGINE |
|
|
| (append-only log)| | (spec 08) |
|
|
| spec 01 | | |
|
|
+--------+---------+ +----+---------+----+
|
|
| | |
|
|
v | reads from
|
|
+------------------------+ | |
|
|
| MATERIALIZER REGISTRY | | +----+----+---+--------+
|
|
| fans out each event to | | | | | | |
|
|
| all registered | | | | | | |
|
|
| materializers | | v v v v v
|
|
+--+----+----+----+------+ | Signal Entity Rel. User Cohort
|
|
| | | | | Ledger Store Graph State Counters
|
|
v v v v | (hot/ (redb) (redb) (redb) (fjall)
|
|
+----+----+----+------+ | warm)
|
|
| G | U | R | C | |
|
|
| l | s | e | o | +--reads from--+
|
|
| o | e | l | h | |
|
|
| b | r | a | o | +---------+---------+---------+
|
|
| a | P | t | r | | | | |
|
|
| l | r | i | t | v v v v
|
|
| | e | o | | +-------+ +-------+ +--------+ +-------+
|
|
| S | f | n | S | |Tantivy| |USearch| |Roaring | |Cohort |
|
|
| i | | s | i | | Text | |Vector | |Bitmap | |Rollup |
|
|
| g | V | h | g | | Index | | Index | |Filters | |Tables |
|
|
| n | e | i | n | |spec 06| |spec 07| |spec 08 | |spec 05|
|
|
| a | c | p | a | +-------+ +-------+ +--------+ +-------+
|
|
| l | t | | l |
|
|
| | o | W | |
|
|
| M | r | e | M |
|
|
| a | | i | a |
|
|
| t | M | g | t |
|
|
| . | a | h | . |
|
|
| | t | t | |
|
|
| | . | | |
|
|
| | | M | |
|
|
| | | a | |
|
|
| | | t | |
|
|
| | | . | |
|
|
+----+----+----+------+
|
|
```
|
|
|
|
Write path: event arrives, WAL appends, materializer registry fans out to all registered materializers. Each materializer updates its scoped state.
|
|
|
|
Read path: query engine reads from materialized state (signal ledger for scores, entity store for metadata, indexes for retrieval, cohort counters for scoped trending). No materializer is invoked on the read path. Reads never touch the WAL.
|
|
|
|
---
|
|
|
|
## 3. Materializer Trait
|
|
|
|
The materializer is the core abstraction boundary between the event stream and derived state. Every piece of state that a query reads -- signal scores, preference vectors, relationship weights, cohort counters, user-item state -- is produced by a materializer.
|
|
|
|
```rust
|
|
/// The scope at which a materializer operates.
|
|
/// Determines what subset of events it processes and what key space it writes to.
|
|
pub enum Scope {
|
|
/// All events. Global signal counters, global trending.
|
|
Global,
|
|
/// Events from users in a specific cohort. Cohort-scoped trending.
|
|
Cohort(CohortId),
|
|
/// Events involving a specific user. Preference vectors, user-item state.
|
|
User(UserId),
|
|
/// Events between two entities. Interaction weights, engagement affinity.
|
|
Relationship(EntityId, EntityId),
|
|
}
|
|
|
|
/// A materializer consumes WAL events and produces derived state.
|
|
///
|
|
/// Implementations:
|
|
/// GlobalSignalMaterializer -- hot-tier decay scores, windowed counters (M1)
|
|
/// UserPreferenceMaterializer -- preference vector shifts (M3)
|
|
/// RelationshipWeightMaterializer -- interaction weights, engagement affinity (M3)
|
|
/// CohortSignalMaterializer -- dimensional rollup counters (M4)
|
|
/// UserStateMaterializer -- seen/liked/saved/hidden bitmaps (M3)
|
|
pub trait Materializer: Send + Sync {
|
|
/// Process a single WAL event. Called by the registry for every event
|
|
/// after WAL durability is confirmed.
|
|
///
|
|
/// Implementations must be idempotent: replaying the same event twice
|
|
/// must produce the same state as processing it once.
|
|
fn on_event(&self, event: &WalEvent) -> Result<()>;
|
|
|
|
/// Write current state to a checkpoint. Called periodically by the
|
|
/// background checkpoint task. After a successful checkpoint, the WAL
|
|
/// segments before the checkpoint sequence number are eligible for cleanup.
|
|
fn checkpoint(&self, writer: &mut dyn Write) -> Result<()>;
|
|
|
|
/// Restore state from a checkpoint. Called during crash recovery
|
|
/// before WAL replay begins. After restore, the materializer's state
|
|
/// matches the checkpoint. WAL events after the checkpoint sequence
|
|
/// number are then replayed via on_event().
|
|
fn restore(&self, reader: &mut dyn Read) -> Result<()>;
|
|
}
|
|
|
|
/// The registry holds all active materializers and fans out events.
|
|
pub struct MaterializerRegistry {
|
|
materializers: Vec<Box<dyn Materializer>>,
|
|
}
|
|
|
|
impl MaterializerRegistry {
|
|
/// Fan out a single event to all registered materializers.
|
|
/// Called after WAL append confirms durability.
|
|
pub fn on_event(&self, event: &WalEvent) -> Result<()> {
|
|
for m in &self.materializers {
|
|
m.on_event(event)?;
|
|
}
|
|
Ok(())
|
|
}
|
|
}
|
|
```
|
|
|
|
The trait is small by design. Three methods. Each materializer owns its scope, its storage, and its invariants. The registry is a fan-out mechanism, nothing more.
|
|
|
|
This is an S-complexity addition in M1 that prevents an M-complexity refactor later. The `GlobalSignalMaterializer` is the first implementation. `UserPreferenceMaterializer` and `RelationshipWeightMaterializer` arrive in M3. `CohortSignalMaterializer` arrives in M4. The trait boundary means each can be developed and tested in isolation.
|
|
|
|
---
|
|
|
|
## 4. Spec Map
|
|
|
|
Every spec has a role in the data flow. Some define what goes into the event stream. Some define materializers that consume the stream. Some define how the query engine reads materialized state. Some are cross-cutting.
|
|
|
|
| Spec | Name | Role in Data Flow | Category |
|
|
|------|------|-------------------|----------|
|
|
| 01 | Storage Engine | WAL format, segment lifecycle, crash recovery, dual-backend (fjall + redb) | **Event Stream** |
|
|
| 02 | Entity Model | Entity write events in WAL, entity store as materialized state in redb | **Event Stream + Materialized View** |
|
|
| 03 | Signal System | Signal events in WAL, three-tier signal ledger as materialized view, cohort dimensional rollups as materialized views | **Materialized View** (primary) |
|
|
| 04 | Relationships | Relationship write events in WAL, edge store as materialized state, implicit edges updated by signal materializers | **Event Stream + Materialized View** |
|
|
| 05 | Cohorts | Cohort definitions, membership resolution, scoped signal counters as materialized views | **Materialized View** |
|
|
| 06 | Text Retrieval | Tantivy index as materialized view over entity text fields, queried at read time | **Query-Time Index** |
|
|
| 07 | Vector Retrieval | USearch HNSW index as materialized view over entity embeddings, queried at read time | **Query-Time Index** |
|
|
| 08 | Query Engine | Orchestrator that reads from all materialized state, never writes | **Query-Time Reader** |
|
|
| 09 | Ranking/Scoring | Scoring pipeline, profiles, diversity -- reads signals, relationships, vectors at query time | **Query-Time Reader** |
|
|
| 10 | Feedback Loop | Defines the semantic mapping from signal events to materializer updates (which signal shifts the preference vector in which direction, which signal increments which relationship weight) | **Materializer Orchestration** |
|
|
| 11 | Schema | Definitions for entities, signals, profiles, cohorts -- the contract that all materializers and the query engine validate against | **Cross-Cutting** |
|
|
| 12 | Cold Start | Exploration budgets, proxy scoring, cohort priors -- query-time logic for entities with no signal history | **Query-Time Reader** |
|
|
| 13 | Concurrency | Lock-free hot path, group commit, thread model, memory ordering -- the mechanism that makes concurrent materialization and querying safe | **Cross-Cutting** |
|
|
| 14 | Scale Architecture | Partition keys, capacity model, single-node ceiling -- design constraints that influence WAL format, key encoding, and materializer scope | **Cross-Cutting** |
|
|
|
|
The pattern: specs 01-05 define the write side (event stream + materialized views). Specs 06-07 define query-time indexes (also materialized views, but read-only from the query engine's perspective). Specs 08-09 define the read side. Spec 10 is the bridge between write and read. Specs 11-14 are cross-cutting concerns.
|
|
|
|
---
|
|
|
|
## 5. Signal Write Walkthrough
|
|
|
|
Trace one event through the system: **user U likes item I** (where item I was created by creator C).
|
|
|
|
```
|
|
Application calls: db.signal(Signal { kind: "like", item: "item_I", user: "user_U" })
|
|
|
|
Step 1: DEDUPLICATION CHECK ~100 ns
|
|
BLAKE3(like, item_I, user_U, timestamp_trunc_1s) -> hash
|
|
Check bloom filter -> PASS (not a duplicate)
|
|
|
|
Step 2: WAL APPEND ~50 us
|
|
Serialize to WAL record:
|
|
type: 0x01 (SignalEvent)
|
|
payload: { kind: "like", item_id: I, user_id: U, weight: 1.0, ts: now }
|
|
Write to current WAL segment, fsync (batched)
|
|
Assign sequence number: seqno 47291
|
|
|
|
*** DURABILITY BOUNDARY ***
|
|
Event is now durable. All subsequent updates are derived state.
|
|
|
|
Step 3: MATERIALIZER REGISTRY FAN-OUT
|
|
registry.on_event(WalEvent { seqno: 47291, type: SignalEvent, ... })
|
|
Invokes each registered materializer:
|
|
|
|
3a: GlobalSignalMaterializer ~40 ns
|
|
Read item I's HotSignalState for signal "like"
|
|
CAS update: decay_score += weight * exp(-lambda * dt)
|
|
Atomic increment: warm tier minute bucket counter
|
|
Atomic increment: all_time_count
|
|
Result: item I's like score, velocity, windowed counts updated
|
|
|
|
3b: UserPreferenceMaterializer ~10 us
|
|
Load user U's preference vector (1536D)
|
|
Load item I's content embedding (1536D)
|
|
Signal polarity: positive (like)
|
|
Shift: pref_new = normalize(pref_old + lr * item_embedding)
|
|
Write back updated preference vector
|
|
Result: user U's taste profile reflects this like
|
|
|
|
3c: RelationshipWeightMaterializer ~5 us
|
|
Resolve item I -> creator C
|
|
Load interaction_weight(U, C), apply time decay, add delta (+0.15)
|
|
Clamp to [0.0, 1.0], write back
|
|
Load engagement_affinity(U, I), update similarly
|
|
Result: U's affinity for creator C increased
|
|
|
|
3d: CohortSignalMaterializer ~20 us
|
|
Load user U's cached cohort memberships: {region:US, age:18-24, lang:en}
|
|
Increment global counter for item I / like / current_hour
|
|
Increment region:US counter for item I / like / current_hour
|
|
Increment age:18-24 counter for item I / like / current_hour
|
|
Increment lang:en counter for item I / like / current_hour
|
|
Check behavioral segments: U is in "jazz_fans" -> increment that counter
|
|
Result: cohort-scoped trending reflects this engagement
|
|
|
|
3e: UserStateMaterializer ~5 us
|
|
Set bitmap: user_U has "liked" item_I
|
|
Result: future queries with FILTER liked include this pair
|
|
|
|
RETURN Ok(()) Total: < 100 us p50
|
|
```
|
|
|
|
One API call. One WAL append. Five materializer updates. The next ranking query -- even 1ms later -- sees all of this. No ETL. No Kafka. No stale data.
|
|
|
|
---
|
|
|
|
## 6. Query Walkthrough
|
|
|
|
Trace a composed query through the system:
|
|
|
|
```
|
|
RETRIEVE items
|
|
FOR USER @u1
|
|
USING PROFILE for_you
|
|
FILTER unseen
|
|
WITHIN TRENDING
|
|
COHORT locale:US, age:18-24
|
|
DIVERSITY max_per_creator:2
|
|
LIMIT 50
|
|
```
|
|
|
|
This is a three-layer query: personalized ranking within cohort-scoped trending.
|
|
|
|
```
|
|
Step 1: PARSE AND VALIDATE ~1 us
|
|
Resolve profile "for_you" from schema -> ProfileDef v3
|
|
Resolve cohort predicates: locale:US AND age:18-24
|
|
Validate user @u1 exists
|
|
Validate all filter fields exist in schema
|
|
|
|
Step 2: COHORT RESOLUTION ~2 ms
|
|
Resolve cohort "locale:US AND age:18-24" to a CohortId
|
|
This is a Level 3 (composite) cohort: intersection of
|
|
Level 1 dimension region:US (dimension_id=1, cohort_value=0x0001)
|
|
Level 1 dimension age_group:18-24 (dimension_id=3, cohort_value=0x0002)
|
|
No pre-computed counters for the composite.
|
|
Plan: fetch Level 1 counters for both dimensions, estimate intersection
|
|
using independence assumption: count(US AND 18-24) ~ count(US) * count(18-24) / count(global)
|
|
|
|
Step 3: CANDIDATE GENERATION FROM COHORT TRENDING ~15 ms
|
|
Read cohort_signals CF for dimension region:US, signal "view",
|
|
window: last 24 hours (24 hour-buckets)
|
|
Read cohort_signals CF for dimension age_group:18-24, signal "view",
|
|
window: last 24 hours
|
|
For each item: compute estimated cohort velocity using independence assumption
|
|
Sort by estimated velocity, take top 500 candidates
|
|
This is the "what is trending for US users aged 18-24" candidate set
|
|
|
|
Step 4: FILTER APPLICATION ~3 ms
|
|
Load RoaringBitmap for user @u1's "seen" items
|
|
Remove seen items from candidate set
|
|
Apply any metadata filters (none beyond "unseen" in this query)
|
|
Surviving candidates: ~400
|
|
|
|
Step 5: SIGNAL LOADING ~2 ms
|
|
For each surviving candidate, load from hot tier:
|
|
like.decay_score, view.velocity(24h), share.decay_score
|
|
For user @u1, load:
|
|
preference_vector (1536D)
|
|
interaction_weight(u1, candidate.creator) for each candidate's creator
|
|
All reads are lock-free atomic loads from memory-resident state
|
|
|
|
Step 6: SCORING VIA RANKING PROFILE ~5 ms
|
|
Profile "for_you" scoring pipeline (9 stages):
|
|
1. Base score: cohort velocity (from step 3)
|
|
2. Personalization boost: cosine_sim(u1.preference_vector, item.embedding)
|
|
3. Relationship boost: interaction_weight(u1, item.creator)
|
|
4. Signal boosts: like.decay_score, share.decay_score
|
|
5. Recency curve: time_decay(item.created_at)
|
|
6. Penalties: low completion rate, flagged content
|
|
7. Quality gates: minimum signal thresholds
|
|
8. Cold start: exploration budget injection (10% of slots)
|
|
9. Final score composition: weighted sum with normalization
|
|
|
|
Step 7: DIVERSITY ENFORCEMENT ~1 ms
|
|
Sort by score descending
|
|
Enforce max_per_creator:2
|
|
Greedy scan: for each item, if creator already has 2 items in result,
|
|
demote to end of list
|
|
Take top 50 after diversity enforcement
|
|
|
|
Step 8: RESULT ASSEMBLY ~1 ms
|
|
Load entity metadata for 50 items from redb
|
|
Build cursor for pagination (encodes last item's score + id)
|
|
Return Results { items, cursor, total_estimate }
|
|
|
|
TOTAL LATENCY: ~30 ms (within 50 ms budget)
|
|
```
|
|
|
|
---
|
|
|
|
## 7. Three-Layer Trending
|
|
|
|
Global trending, cohort-scoped trending, and search-within-cohort-trending are not three different systems. They are three scopes applied to the same materializer architecture, using the same math.
|
|
|
|
**The math:** Velocity is the rate of change of a windowed signal count. For a 24-hour window:
|
|
|
|
```
|
|
velocity(item, signal, window) = count(item, signal, window) / window_duration
|
|
```
|
|
|
|
Acceleration (rising detection) is the rate of change of velocity:
|
|
|
|
```
|
|
acceleration = velocity(current_window) - velocity(previous_window)
|
|
```
|
|
|
|
This formula is identical at every scope. The only thing that changes is which counter you read.
|
|
|
|
**Layer 1: Global trending**
|
|
|
|
```
|
|
RETRIEVE items USING PROFILE trending WINDOW 24h LIMIT 25
|
|
```
|
|
|
|
Reads from: `GlobalSignalMaterializer` counters. Level 0 in the dimensional hierarchy. One counter per item per signal per hour bucket. Sum the last 24 buckets, divide by 24h. Sort by velocity. Done.
|
|
|
|
**Layer 2: Cohort-scoped trending**
|
|
|
|
```
|
|
RETRIEVE items USING PROFILE trending COHORT locale:US, age:18-24 WINDOW 24h LIMIT 25
|
|
```
|
|
|
|
Reads from: `CohortSignalMaterializer` counters. Level 1 dimensions region:US and age_group:18-24. For a composite cohort (Level 3), estimate the intersection using independence assumption. Same velocity formula, different counters. The math does not change. The scope does.
|
|
|
|
**Layer 3: Search within cohort-scoped trending**
|
|
|
|
```
|
|
SEARCH items QUERY "piano tutorial" WITHIN TRENDING COHORT locale:US, age:18-24 WINDOW 24h LIMIT 20
|
|
```
|
|
|
|
Step 1: Generate the cohort-trending candidate set (Layer 2). Step 2: Run text search (Tantivy BM25) restricted to that candidate set. Step 3: Fuse cohort velocity score with BM25 relevance score. Same materializer output, filtered by a text query.
|
|
|
|
The architecture makes this composable because each layer reads from the same materialized state. The query planner recognizes `WITHIN TRENDING COHORT ...` as "generate candidates from cohort velocity, then filter by text match." No special-case code. No separate trending service. One materializer hierarchy, three query shapes.
|
|
|
|
---
|
|
|
|
## 8. Code Module Map
|
|
|
|
```
|
|
tidal/src/
|
|
lib.rs # TidalDB struct, public API, lifecycle
|
|
|
|
wal/ # Spec 01: Write-ahead log
|
|
mod.rs # WAL reader/writer, segment management
|
|
record.rs # WalEvent enum, serialization
|
|
segment.rs # Segment file lifecycle, preallocate, seal
|
|
recovery.rs # Crash recovery: scan, validate, replay
|
|
|
|
materializer/ # Architecture overview: core abstraction
|
|
mod.rs # Materializer trait, Scope enum
|
|
registry.rs # MaterializerRegistry, fan-out, checkpoint coordination
|
|
|
|
storage/ # Spec 01: Dual-backend storage
|
|
mod.rs # StorageEngine trait
|
|
fjall.rs # fjall backend: WAL, cold-tier signals, cohort counters
|
|
redb.rs # redb backend: entities, relationships, user state
|
|
keys.rs # Key encoding (partition-ready prefixes)
|
|
|
|
entity/ # Spec 02: Items, Users, Creators
|
|
mod.rs # Entity trait, EntityKind enum
|
|
item.rs # Item struct, metadata fields, lifecycle
|
|
user.rs # User struct, attributes, computed fields
|
|
creator.rs # Creator struct, catalog embedding
|
|
|
|
signal/ # Spec 03: Signal system
|
|
mod.rs # SignalDef, Decay enum, Window enum
|
|
hot.rs # HotSignalState (cache-line aligned, atomic)
|
|
warm.rs # WarmSignalState (per-minute buckets, SWAG)
|
|
cold.rs # Cold-tier event storage, hourly/daily rollups
|
|
velocity.rs # Velocity and acceleration computation
|
|
decay.rs # Exponential/linear decay formulas
|
|
global_mat.rs # GlobalSignalMaterializer (impl Materializer)
|
|
cohort_mat.rs # CohortSignalMaterializer (impl Materializer)
|
|
user_pref_mat.rs # UserPreferenceMaterializer (impl Materializer)
|
|
user_state_mat.rs # UserStateMaterializer (impl Materializer)
|
|
|
|
relationship/ # Spec 04: Edges between entities
|
|
mod.rs # Edge types, directionality, storage
|
|
weight.rs # Weight update mechanics, decay
|
|
traversal.rs # Fan-out queries (following feed, collab filtering)
|
|
rel_mat.rs # RelationshipWeightMaterializer (impl Materializer)
|
|
|
|
cohort/ # Spec 05: Dynamic population segments
|
|
mod.rs # CohortDef, CohortId, predicate evaluation
|
|
membership.rs # Bitmap-based membership resolution
|
|
rollup.rs # Dimensional hierarchy (Level 0/1/2/3)
|
|
|
|
index/ # Specs 06-07: Secondary indexes
|
|
mod.rs # Index trait bounds
|
|
text.rs # TextIndex trait + Tantivy implementation (spec 06)
|
|
vector.rs # VectorIndex trait + USearch implementation (spec 07)
|
|
bitmap.rs # RoaringBitmap filter indexes (spec 08)
|
|
|
|
query/ # Spec 08: Query engine
|
|
mod.rs # retrieve(), search(), suggest() entry points
|
|
parser.rs # Input validation, schema resolution, AST construction
|
|
planner.rs # Cost-based plan selection, selectivity estimation
|
|
executor.rs # Pipeline execution, subsystem coordination
|
|
cursor.rs # Pagination cursor encoding/decoding
|
|
composition.rs # WITHIN clause, cohort-scoped candidate generation
|
|
|
|
ranking/ # Specs 09 + 12: Scoring and cold start
|
|
mod.rs # ProfileDef, scoring pipeline (9 stages)
|
|
boosts.rs # Signal, personalization, relationship, recency boosts
|
|
penalties.rs # Low-quality, flagged content, repetition penalties
|
|
gates.rs # Quality gates, minimum thresholds
|
|
diversity.rs # max_per_creator, format_mix, greedy enforcement
|
|
cold_start.rs # Exploration budget, proxy scoring, cohort priors
|
|
sort_modes.rs # 20+ built-in sort modes (trending, hot, rising, etc.)
|
|
|
|
schema/ # Spec 11: Schema system
|
|
mod.rs # define_entity, define_signal, define_profile, etc.
|
|
validation.rs # Schema validation rules, breaking change detection
|
|
migration.rs # Migration planner, dry-run, execute
|
|
version.rs # Version tracking, introspection
|
|
|
|
api/ # Public Rust API surface
|
|
mod.rs # Re-exports, builder patterns, error types
|
|
```
|
|
|
|
The materializer implementations live inside their domain modules (`signal/`, `relationship/`), not in `materializer/`. The `materializer/` module owns the trait and the registry. Each domain module owns its materializer implementation. This keeps domain logic co-located with its materializer.
|
|
|
|
---
|
|
|
|
## 9. Spec Dependency Graph
|
|
|
|
```
|
|
+----------+
|
|
| 11 Schema| (cross-cutting: all specs validate against schema)
|
|
+----+-----+
|
|
|
|
|
+----v-----+
|
|
|01 Storage| (foundation: WAL, dual-backend, crash recovery)
|
|
+----+-----+
|
|
|
|
|
+----------+----------+
|
|
| |
|
|
+-----v------+ +-----v------+
|
|
| 02 Entity | | 03 Signal |
|
|
| Model | | System |
|
|
+-----+------+ +--+----+----+
|
|
| | |
|
|
+---------+--------+ +---+ +--------+
|
|
| | | | |
|
|
+---v---+ +--v---+ +--v--+ | +-----v-----+
|
|
|06 Text| |07 Vec| |04 Rel| | | 05 Cohort |
|
|
|Retriev| |Retri.| |ation.| | | |
|
|
+---+---+ +--+---+ +--+---+ | +-----+-----+
|
|
| | | | |
|
|
+---------+--------+-----+---------+-------+
|
|
| |
|
|
+-----v------+ +-----v------+
|
|
| 08 Query | | 10 Feedback|
|
|
| Engine | | Loop |
|
|
+-----+------+ +------------+
|
|
|
|
|
+-----v------+
|
|
| 09 Ranking |
|
|
| + 12 Cold |
|
|
+------------+
|
|
|
|
Cross-cutting (not shown as edges -- they constrain everything):
|
|
11 Schema -- all definitions validated against schema
|
|
13 Concurrency -- lock-free patterns for all hot-path state
|
|
14 Scale -- partition-ready key encoding, aggregation scopes
|
|
```
|
|
|
|
Read the graph bottom-up for implementation order. Read it top-down for dependency chains.
|
|
|
|
**Critical path:** 01 -> 03 -> 05 -> 08 -> 09. This is the longest dependency chain and the path that enables the full three-layer trending query. Every milestone must make progress along this chain.
|
|
|
|
**Parallel tracks after 01:** Entity model (02), signal system (03), and schema (11) can proceed in parallel once the storage engine exists. Text (06) and vector (07) retrieval can proceed in parallel once the entity model exists. Relationships (04) and cohorts (05) can proceed in parallel once signals exist.
|
|
|
|
---
|
|
|
|
## 10. Cross-Cutting Principles
|
|
|
|
**WAL is truth.** Every mutation is durable in the WAL before it is visible anywhere. Materialized state can be lost and rebuilt. The WAL cannot. This is not a design preference -- it is the correctness foundation. Spec 01 Invariant 2: "A signal event acknowledged to the caller survives any single crash."
|
|
|
|
**Materializers are the abstraction boundary.** The write path does not know what derived state exists. It appends to the WAL and calls `registry.on_event()`. Adding a new kind of derived state means implementing `Materializer` and registering it. No changes to the write path. No changes to existing materializers.
|
|
|
|
**Same math at every scope.** Velocity is `count / duration`. Decay is `score * exp(-lambda * dt)`. These formulas do not change when you switch from global to cohort to user-local scope. What changes is which counter you read. Global velocity reads Level 0 counters. Cohort velocity reads Level 1/2 counters and estimates Level 3 intersections. The ranking profile does not know the difference -- it sees a velocity number. This uniformity is what makes three-layer trending a query parameter, not a feature.
|
|
|
|
**Scale is a design constraint from day one.** The WAL record format includes a partition key field (spec 14). Key encoding in the storage layer uses big-endian prefixes that sort correctly under range partitioning. `SignalDef` carries an `aggregation_scope` field. The `Materializer` trait's `Scope` enum maps directly to partition boundaries. None of this requires a distributed runtime to exist. All of it is required so that when the distributed runtime arrives, it does not require a storage engine rewrite. CockroachDB, TiDB, and Elasticsearch learned this lesson. tidalDB builds on it.
|
|
|
|
**Single-node-first but partition-ready.** A single tidalDB process is a complete, self-contained shard. It runs the full WAL, all materializers, all indexes, and the full query engine. Distribution, when it comes, is the coordination of many such shards -- not a redesign of what a shard does. The atoms are right from day one. The orchestration comes later.
|
|
|
|
**Readers never block writers. Writers never block readers.** The concurrency model (spec 13) enforces this structurally, not by convention. Hot-tier signal state uses atomic CAS. Warm-tier counters use atomic increments. Entity reads use epoch-based reclamation. The WAL writer is channel-serialized (one writer, many producers). No ranking query ever acquires a lock on the scoring path.
|
|
|
|
**The query engine is stateless.** It holds no data. It reads from materialized state produced by materializers and from secondary indexes (Tantivy, USearch, RoaringBitmaps). If the query engine crashes, no data is lost, no recovery is needed. It restarts and reads from the same materialized state.
|
|
|
|
**Schema encodes behavior, not just shape.** A signal's half-life, a ranking profile's scoring weights, a cohort's predicate, a diversity constraint -- these are schema declarations, not application code. The database enforces them. The query optimizer reasons about them. Behavior changes are schema mutations, not redeployments. This is the Stage 3 insight from thoughts.md.
|