jordan 4f076c927d feat: M0p1 runtime skeleton, M0p2 tooling & diagnostics, m1p4 signal ledger

## M0p1 — Embeddable Runtime Skeleton (329 tests)
- TidalDb with builder(), health_check(), close(), and Drop-based cleanup
- TidalDbBuilder fluent API: ephemeral(), with_data_dir(), wal_dir(), cache_dir()
- Config, StorageMode, ConfigError types; Config(ConfigError) variant on LumenError
- Paths: single source of truth for directory layout (wal, items, users, creators, cache)
- TempTidalHome: test isolation helper gated behind #[cfg(test)] / test-utils feature
- 8 integration tests: tests/sandboxed_storage.rs

## M0p2 — Tooling & Diagnostics (349 tests)
- Workspace root Cargo.toml (members: ["tidal", "tidalctl"])
- tidal/build.rs: BUILD_HASH from GIT_HASH with option_env!() fallback to "dev"
- MetricsState: always-compiled Arc-shared atomics (uptime, health_ok)
- MetricsHandle (metrics feature): hand-rolled TcpListener HTTP, zero new deps
  - GET /healthz → {"status":"ok","uptime_secs":N}
  - GET /metrics → Prometheus text (tidaldb_uptime_seconds, health_ok, info)
- TidalDbBuilder.enable_metrics(addr) starts background metrics thread
- tidalctl binary: status + paths commands, manual std::env::args() parsing
- 7 metrics integration tests, 9 tidalctl CLI tests

## m1p4 Signal Ledger (in-progress)
- SignalLedger: DashMap<(EntityId, SignalTypeId), EntitySignalEntry>, WAL-first writes
- HotSignalState: #[repr(C, align(64))], lock-free CAS decay, out-of-order handling
- BucketedCounter: 60 per-minute + 168 per-hour circular buffers, trigger-based rotation
- CheckpointMeta + serialize/restore: 983-byte fixed records, atomic WriteBatch
- Property tests: running score matches analytical to 1e-6, decay monotonic, non-negative
- Proptest regression: signals/warm.txt

## Documentation and planning
- ROADMAP: m0p1 COMPLETE (329), m0p2 COMPLETE (349), product track milestones
- PRODUCT_ROADMAP: P0-P4 product milestone track (personal briefing beachhead)
- Milestone planning docs: milestone-0 (phases 1-3), milestone-p (phases 1-5)
- docs/research/tidaldb_tooling_and_diagnostics.md
- ARCHITECTURE.md, CLAUDE.md, VISION.md updates

## Site
- Blog: every-platform-builds-the-same-6-systems.mdx (new)
- Blog: why-tidaldb.mdx (updated)
- next.config.ts, layout.tsx, blog/page.tsx updates

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 20:32:00 -07:00

69 KiB

Raw Blame History

TidalDB Roadmap

Vision Statement

When tidalDB is complete, an engineering team building any content platform -- a media library, a social feed, a marketplace, a discovery surface -- can embed a single Rust database and replace the Elasticsearch + Redis + Kafka + feature store + vector database + ranking service stack. One process, one query interface, one operational model. The query RETRIEVE items FOR USER @user_id USING PROFILE for_you FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50 executes in under 50ms, reflects signals written 100ms ago, enforces diversity without application logic, handles cold-start items without application intervention, and returns results a user would describe as "it knows what I want."

Thesis

A single embeddable database can replace the 6-system content ranking stack by treating signals, ranking profiles, and diversity constraints as database primitives rather than application logic.

Milestone Summary

#	Name	Proves	Enables
M0	Embeddable Runtime	tidalDB can run in-process with zero-config defaults and tooling	Cuts proof-of-concept friction, enables internal dogfooding
M1	Signal Engine	Signals are a database primitive with O(1) decay, not application math	UC-03 (partial), UC-06 (partial), UC-14 (partial)
M2	Ranked Retrieval	A single query retrieves, scores, and ranks content using live signals	UC-03, UC-04, UC-06, UC-08, UC-13, UC-14
M3	Personalized Ranking	User context shapes retrieval and ranking -- the "For You" query works	UC-01, UC-05, UC-07, UC-09 (partial)
M4	Agent Memory	Agents can create sessions, write signals, and enforce policy inside tidalDB	Agent-mediated personalization, RLHF loops, conversational memory
M5	Hybrid Search	Text + semantic + signal-ranked search in one query	UC-02, UC-10, UC-11
M6	Full Surface Coverage	Every use case, every sort mode, every filter, every feedback loop	UC-01 through UC-14 complete
M7	Production Hardening	Crash safety, graceful degradation, operational readiness	All UCs at production quality

Product Milestone Summary (New)

The roadmap now has two tracks:

Engine Track (M0-M7): proves tidalDB capabilities.
Product Track (P0-P4): proves end-user value for the beachhead product.

#	Name	Proves	Depends On
P0	Beachhead Validation	Knowledge workers and consumers care about a personal briefing feed enough to use it repeatedly	M0 (embedding/runtime), partial M1
P1	Concierge Alpha	Daily "Today Brief" with explicit feedback controls creates Day-2 retention in a small cohort	M1 complete, partial M2
PG1	Personalization Core Done (Blocking Gate)	Personalization loop is correct, immediate, and measurably better than baseline	P1 + M1/M2/M3 core slices
P2	Productized Beta	Self-serve onboarding + real-time adaptation + explanation UX works without manual curation	M2 complete, partial M3
P3	Public Launch	The product is reliable, useful, and trusted at real user volume	M3 + M5 core, M6 partial
P4	Scale + Revenue Fit	Sustainable retention and monetization without quality collapse	M6 + M7

Current Status

Phase	Status	Tests
m0p1: Embeddable Runtime Skeleton	COMPLETE	329 passing (293 unit + 36 integration + 3 doc)
m0p2: Tooling & Diagnostics	COMPLETE	349 passing (+7 metrics unit + 7 metrics integration + 9 tidalctl CLI)
m0p3: Samples & Docs	NOT STARTED	--
m1p1: Core Type System and Schema	COMPLETE	77 passing
m1p2: Write-Ahead Log	COMPLETE	passing (unit + integration)
m1p3: Storage Engine Trait and fjall Backend	COMPLETE	140 passing (128 unit + 12 integration)
m1p4: Signal Ledger	NOT STARTED	--
m1p5: Entity CRUD and Signal Write API	NOT STARTED	--
P0: Beachhead Validation	NOT STARTED	--
P1: Concierge Alpha	NOT STARTED	--
PG1: Personalization Core Done gate	NOT STARTED	--
P2: Productized Beta	NOT STARTED	--
P3: Public Launch	NOT STARTED	--
P4: Scale + Revenue Fit	NOT STARTED	--

Current phase: m0p2 (Tooling & Diagnostics) or m1p4 (Signal Ledger) — m0p1 unblocks m0p2; m1p2 and m1p3 unblock m1p4.

Lessons learned:

m1p3 keyspaces are organized per EntityKind ("items", "users", "creators"), not by data category. The Tag enum in key encoding provides the data-category namespace within each entity-kind keyspace.
The LumenError name is a legacy artifact from a predecessor project. Will be renamed when convenient but does not block progress.
MSRV was bumped to 1.91 for fjall 3 compatibility.

Product Track: Personal Briefing Feed (Knowledge Workers + Consumers)

This track defines the milestones for the actual product experience (not only the database engine).
Use case reference: docs/personal-briefing-beachhead.md. Dedicated roadmap: docs/planning/PRODUCT_ROADMAP.md.

P0: Beachhead Validation -- "Do users care enough to return?"

Milestone Thesis

Validate that a personal briefing feed solves a painful daily job for users and drives repeat use.

Acceptance Criteria

Recruit 20-50 target users (knowledge workers + high-intent consumers).
Run daily briefing prototype (can include manual source QA).
At least one meaningful feedback action per session for the median user (more, less, hide, mute, save).
User interviews confirm value vs baseline feeds ("less noise", "more useful", "saves time").
D2 retention reaches agreed threshold for target segment.

P1: Concierge Alpha -- "High-value daily brief for a narrow cohort"

Milestone Thesis

Deliver a reliable daily Today Brief experience with immediate visible adaptation after user feedback.

Acceptance Criteria

App surface: ranked brief, reason labels, source links, save/feedback controls.
Feedback loop: next refresh reflects less/hide/mute actions immediately.
Time-budget mode (5/10/20 min) is available and used.
Diversity constraints prevent source/topic domination in top results.
Weekly active usage demonstrates repeated utility.

P2: Productized Beta -- "Self-serve and repeatable without handholding"

Milestone Thesis

Turn the alpha into a self-serve product with stable onboarding, trust UX, and measurable quality.

Acceptance Criteria

Self-serve onboarding completed in under 3 minutes.
"Why this" explanations are present and understandable on every briefing card.
Cohort layer available ("trending for people like you").
Trust controls available (source transparency, mute/hide persistence).
D7 retention and "useful item rate" exceed baseline comparison feed.
PG1 Personalization Core Done gate has passed.

P3: Public Launch -- "Trusted at real volume"

Milestone Thesis

Launch publicly with reliability, quality, and trust guardrails suitable for broad use.

Acceptance Criteria

Reliability and latency SLOs defined and met for briefing generation.
Quality floor enforced (freshness, source quality, duplicate suppression).
Notification cadence controls prevent spam.
Core support and incident process in place for user-facing regressions.

P4: Scale + Revenue Fit -- "Sustainable business without degrading quality"

Milestone Thesis

Prove the product can grow and monetize while preserving user trust and briefing quality.

Acceptance Criteria

Monetization model validated (subscription, team plan, or equivalent).
Revenue metrics tracked alongside quality metrics (no quality-revenue trade-off regressions).
Retention and engagement remain stable as volume increases.
Product roadmap for next segment expansion is data-backed.

PG1: Personalization Core Done (Blocking Gate)

Milestone Thesis

Before product breadth expansion, the core personalization loop must be provably correct and immediately responsive.

Acceptance Criteria

Hard negatives (hide/mute/block) never leak after write, restart, or replay.
Explicit feedback (more/less/skip/save) changes next-refresh ranking within target latency.
User personalization state rebuilds deterministically from checkpoint + WAL replay.
Useful-item rate and repeated-unwanted-item rate outperform a non-personalized baseline.
Diversity guardrails hold while maintaining personalization quality.

Milestone 0: Embeddable Runtime -- "Runs in your process in minutes"

Milestone Thesis

Before we prove any ranking math, developers must be able to embed tidalDB inside an existing service with zero operational prep. M0 delivers the runtime glue — an ergonomic builder API, deterministic storage layout, a tiny admin CLI, and living examples — so the very first experience is cargo add tidaldb, TidalDb::builder().in_memory().open(), and a passing smoke test.

Phases

Phase 1: Embeddable Runtime Skeleton

Delivers: A cohesive Config/Builder API for single-process use, with in-memory and filesystem-backed defaults, sandboxed data directories, and graceful shutdown hooks developers can call from tests or application drop handlers.

Builder exposes ephemeral() / single_process() shortcuts and eagerly validates directories.
Shutdown hooks drain WAL writer threads and surface errors.
Temp-directory helper guarantees deterministic cleanup (used in doctests).

Phase 2: Tooling & Diagnostics

Delivers: tidalctl (a minimal CLI) for inspecting embedded instances, plus a lightweight metrics surface (Prometheus text or JSON) tagged with the same IDs future distributed deployments will use.

tidalctl status --path <dir> returns JSON with WAL seq, config hash, uptime.
Metrics endpoint optional (disabled by default) exposes /metrics and /healthz.
Tooling reuses the same path helpers from Phase 1.

Phase 3: Samples & Docs

Delivers: Quick-start samples (For You POC + integration tests) compiled as doctests, and reference snippets for embedding tidalDB inside Axum/Actix or a CLI app. Keeps DX in lockstep with the runtime.

Quickstart example + doctest run under CI (cargo test --doc --examples).
Axum/Actix embedding examples include graceful shutdown + metrics wiring.
CONTRIBUTING updated with “run samples” checklist.

UAT Scenario

Given:
  // in tests/lib.rs
  let db = TidalDb::builder()
      .ephemeral()
      .with_temp_dir()
      .open()
      .unwrap();

When:
  db.health_check();           // ok
  tidalctl status --path <dir> // prints WAL, storage, signal counts
  cargo test --doc             // quick-start snippet compiles & runs

Then:
  - Builder defaults require zero manual config
  - CLI connects to the same files used by the embedded process
  - Samples stay in sync (failing doctest fails CI)

Milestone 1: Signal Engine -- "Signals are a database primitive"

Milestone Thesis

A developer can open a tidalDB instance, define signal types with decay rates, write engagement events, and read back decay-correct scores and windowed aggregates -- all without computing any temporal math in application code. This proves that the hardest primitive (temporal signals with O(1) decay, velocity, and windowed aggregation) works correctly and meets the performance budget.

UAT Scenario

Given:
  A tidalDB instance is opened with a schema defining:
    - Entity type: Item with metadata fields (title, category, created_at)
    - Signal type: "view" with exponential decay, half_life=7d, windows=[1h, 24h, 7d]
    - Signal type: "like" with exponential decay, half_life=14d, windows=[24h, 7d, all_time]
    - Signal type: "skip" with exponential decay, half_life=1d, windows=[1h, 24h]

When:
  1. Write 100 items with metadata
  2. Write 10,000 signal events across the items (views, likes, skips)
     with timestamps spanning the last 7 days
  3. Read the decay score for item #42, signal "view", at current time
  4. Read the windowed count for item #42, signal "view", window=24h
  5. Read the velocity for item #42, signal "view", window=1h
  6. Write a new "view" event for item #42
  7. Immediately re-read the decay score, windowed count, and velocity
  8. Close and reopen the tidalDB instance
  9. Re-read all values for item #42

Then:
  - Step 3: Decay score matches S(t) = sum(w_i * exp(-lambda * (t - t_i)))
    computed analytically from raw events, to 6 decimal places
  - Step 4: Windowed count equals the exact count of "view" events
    within the last 24h window
  - Step 5: Velocity equals windowed_count / window_duration
  - Step 7: All values reflect the new event immediately
    (decay score increased, count incremented, velocity updated)
  - Step 9: All values match step 7 (crash recovery preserves state)
  - Performance: decay score read < 100ns per entity,
    signal write < 100us including WAL fsync (amortized),
    200-entity scoring pass < 5us

Phases

Phase 1: Core Type System and Schema -- COMPLETE

Delivers: The foundational type system -- entity IDs, signal type definitions, decay rate declarations, window specifications, and the error types that every subsequent module depends on. The schema module that validates and stores signal/entity definitions.

Acceptance Criteria:

EntityId is a u64 newtype with Display, Hash, Eq, Ord, to_be_bytes() (big-endian, preserves numeric ordering)
EntityKind enum: Item, User, Creator
SignalTypeDef captures: name, target EntityKind, DecayModel (exponential with pre-computed lambda / linear / permanent), WindowSet, velocity enabled flag
DecayModel::Exponential stores pre-computed lambda = ln(2) / half_life.as_secs_f64() -- no division on hot path
Window enum: OneHour, TwentyFourHours, SevenDays, ThirtyDays, AllTime with duration(), label(), duration_secs_f64()
WindowSet deduplicates and sorts windows; empty() for permanent signals
LumenError enum covers Storage, NotFound, Schema, Durability, Query, Internal variants with From impls for each sub-error
SchemaError enum validates: duplicate signal names, invalid identifiers, zero half-life/lifetime, empty windows for non-permanent signals, velocity without windows
Schema validation via SchemaBuilder rejects invalid configurations at construction time
Property tests: lambda correctness across half-life range, byte ordering preservation
cargo fmt clean, cargo clippy -D warnings clean, all 77 tests pass

Depends On: None Complexity: M Research Reference: docs/research/tidaldb_signal_ledger.md (decay formula, EntityState struct)

Phase 2: Write-Ahead Log -- COMPLETE

Delivers: A durable, append-only log for signal events. Every signal write is fsync'd before acknowledgment. Group commit amortizes fsync cost. Content-addressed events via BLAKE3 for deduplication. The WAL is the source of truth -- all other state is derived.

Acceptance Criteria:

WAL entries are length-prefixed with BLAKE3 checksums
Group commit batches up to 100 events or 10ms, whichever comes first
Duplicate events (same BLAKE3 hash) are silently deduplicated
WAL replay from any checkpoint produces identical state to uninterrupted execution (property test with 10,000+ random event sequences)
fsync is called per batch, not per event
WAL can be truncated after a checkpoint without losing committed state
Crash simulation (kill at random WAL positions) never produces corrupt state -- either the event is committed or it is not

Depends On: Phase 1 Complexity: L Research Reference: docs/research/tidaldb_wal.md (wire format, group commit, crash detection, deduplication), thoughts.md Part II.1 (WAL convergence), Part V.5-6 (quarantine-first, group commit)

Phase 3: Storage Engine Trait and fjall Backend -- COMPLETE

Delivers: The StorageEngine trait abstraction and two implementations: FjallBackend (fjall 3 LSM-tree) for production and InMemoryBackend (BTreeMap + RwLock) for deterministic testing. Key encoding follows the subject-prefix pattern with a Tag discriminant. FjallStorage coordinates three keyspaces per entity kind. FjallAtomicBatch provides cross-keyspace atomic writes.

Acceptance Criteria:

StorageEngine trait with get, put, delete, scan_prefix, write_batch, flush operations
Key encoding: [entity_id: 8 bytes BE][0x00][Tag: 1 byte][suffix...] with Tag enum (Evt=0x01, Sig=0x02, Meta=0x03, Rel=0x04, Mv=0x05, Idx=0x06)
encode_key, parse_key roundtrip correctly for all tag variants and arbitrary suffixes
entity_prefix (9 bytes) and entity_tag_prefix (10 bytes) for scoped prefix scans
Byte-lexicographic key ordering matches numeric entity ID ordering (property tested)
FjallBackend wraps a single fjall Keyspace, implements StorageEngine
FjallStorage owns a fjall Database with three keyspaces: "items", "users", "creators" (one per EntityKind)
FjallStorage::backend(EntityKind) routes to the correct keyspace backend
Entity kind isolation: same key written to different entity kinds does not collide
FjallAtomicBatch provides cross-keyspace atomic writes via fjall::OwnedWriteBatch
Data persists across close and reopen (flush_all + reopen test)
InMemoryBackend uses BTreeMap + RwLock for deterministic, sorted, concurrent testing
WriteBatch and BatchOp types for atomic multi-operation writes
PrefixIterator type alias for boxed prefix scan iterators
Property tests with proptest: encode/parse roundtrip, prefix ordering, prefix containment
Criterion benchmarks passing
cargo fmt clean, cargo clippy -D warnings clean, all 140 tests pass (128 unit + 12 integration)

Depends On: Phase 1 Complexity: L Research Reference: thoughts.md Part V.9 (hybrid storage), Part V.12 (subject-prefix keys), CODING_GUIDELINES.md section 2

Phase 4: Signal Ledger -- Decay Scores and Windowed Aggregation

Delivers: The in-memory per-entity signal state with running decay scores (O(1) update, O(1) read) and bucketed windowed counters. Signal writes update the running scores atomically. Signal reads return decay-correct values without scanning raw events. State is checkpointed to storage for crash recovery.

Acceptance Criteria:

EntitySignalState is #[repr(C, align(64))] -- one L1 cache line per hot-path struct
Running decay formula: S(t) = S(t_prev) * exp(-lambda * dt) + weight -- mathematically exact, verified against analytical brute-force computation to 6 decimal places across 10,000 random event sequences (property test)
Out-of-order events handled correctly: when t_event < last_update, weight is pre-decayed: score += weight * exp(-lambda * (last_update - t_event))
Windowed counts use per-minute bucketed counters (BucketedCounter) supporting 1h/24h/7d windows
Velocity = windowed_count / window_duration_seconds
Signal write latency < 100 microseconds including WAL write (amortized), benchmarked with criterion
Decay score read latency < 100ns per entity per lambda, benchmarked with criterion
200-entity scoring pass < 5 microseconds, benchmarked with criterion
State checkpointed to storage every 30 seconds; crash recovery reconstructs from checkpoint + WAL replay
DashMap or sharded map for concurrent entity state access; signal counters use AtomicU64 with Relaxed ordering

Depends On: Phase 2, Phase 3 Complexity: XL Research Reference: docs/research/tidaldb_signal_ledger.md (running-score formula, SWAG, BucketedCounter, EntityState struct, three-tier architecture)

Phase 5: Entity CRUD and Signal Write API

Delivers: The public API surface for Milestone 1. TidalDB::open(), TidalDB::shutdown(), entity write/read, signal write/read. This is the interface the UAT scenario tests against. Includes the signal() method that atomically writes to WAL, updates in-memory state, and returns immediately.

Acceptance Criteria:

TidalDB::open(config) opens storage, restores in-memory state from checkpoint + WAL replay, returns Result<TidalDB>
TidalDB::shutdown() checkpoints all in-memory state, syncs WAL, closes storage cleanly
db.write_item(id, metadata) stores entity metadata
db.signal(signal_type, entity_id, weight, timestamp) atomically: appends to WAL, updates decay scores, updates windowed counters
db.read_decay_score(entity_id, signal_type, lambda_index) returns current decayed score
db.read_windowed_count(entity_id, signal_type, window) returns count within window
db.read_velocity(entity_id, signal_type, window) returns count / window_duration
Full UAT scenario passes as an integration test
TidalDB is Send + Sync -- safe to share across threads behind Arc

Depends On: Phase 4 Complexity: M Research Reference: CODING_GUIDELINES.md section 9 (public API surface)

Deferred to Later Milestones

User entities and preference vectors -- deferred to M3 because M1 proves the signal primitive without needing user context
Creator entities and relationship edges -- deferred to M2/M3 because M1 only needs items to prove signal correctness
Vector index (USearch) -- deferred to M2 because M1 does not need ANN retrieval
Text index (Tantivy) -- deferred to M4 because M1 does not need full-text search
Ranking profiles -- deferred to M2 because M1 proves signals work; M2 proves ranking over signals works
Query parser -- deferred to M2; M1 uses the Rust API directly
Diversity enforcement -- deferred to M2 because M1 does not produce ranked result sets
Signal rollups (hourly/daily materialization) -- deferred to M5 because the bucketed counter approach serves the performance budget through M4; rollups become necessary only at scale for 30d+ windows
RocksDB backend -- deferred indefinitely; fjall is the primary backend, RocksDB is the trait-abstracted fallback if benchmarks demand it

Integration Test

#[test]
fn milestone_1_uat() {
    // Open tidalDB with signal schema
    let db = TidalDB::open(Config {
        data_dir: temp_dir(),
        schema: Schema::builder()
            .entity_type("item", &["title", "category", "created_at"])
            .signal("view", Decay::exponential(Duration::days(7)),
                    &[Window::Hours(1), Window::Hours(24), Window::Days(7)])
            .signal("like", Decay::exponential(Duration::days(14)),
                    &[Window::Hours(24), Window::Days(7), Window::AllTime])
            .signal("skip", Decay::exponential(Duration::days(1)),
                    &[Window::Hours(1), Window::Hours(24)])
            .build(),
    }).unwrap();

    // Write 100 items
    for i in 0..100 {
        db.write_item(EntityId(i), metadata(i)).unwrap();
    }

    // Write 10,000 signal events spanning 7 days
    let events = generate_events(10_000, Duration::days(7));
    for e in &events {
        db.signal(e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Read and verify item #42
    let now = Timestamp::now();
    let analytical_score = compute_analytical_decay(&events, EntityId(42), "view", now);
    let actual_score = db.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!((actual_score - analytical_score).abs() < 1e-6);

    let analytical_count = count_events_in_window(&events, EntityId(42), "view", now, Duration::hours(24));
    let actual_count = db.read_windowed_count(EntityId(42), "view", Window::Hours(24)).unwrap();
    assert_eq!(actual_count, analytical_count);

    // Write new event and verify immediate visibility
    db.signal("view", EntityId(42), 1.0, now).unwrap();
    let new_score = db.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!(new_score > actual_score);

    // Close, reopen, verify persistence
    db.shutdown().unwrap();
    let db2 = TidalDB::open(same_config()).unwrap();
    let recovered_score = db2.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!((recovered_score - new_score).abs() < 1e-6);
}

Done When

A developer can embed tidalDB as a Rust dependency, define signal types with decay rates and windows in schema, write thousands of signal events, and read back decay-correct scores, windowed counts, and velocity values that match analytical computation to 6 decimal places -- including after a crash and restart. Performance benchmarks pass: signal write < 100us amortized, decay read < 100ns per entity, 200-entity scoring < 5us.

Milestone 2: Ranked Retrieval -- "A single query retrieves, scores, and ranks content"

Milestone Thesis

A developer can write items with metadata and embeddings, write signal events, and execute a RETRIEVE query that returns items ranked by a named profile using live signal scores -- with metadata filters and diversity constraints applied by the database, not the application. This proves that ranking is a database operation, not application logic.

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items with metadata (title, category, format, duration, created_at)
      and 1536-dim embeddings
    - Signal types: view (7d decay), like (14d decay), skip (1d decay),
      share (3d decay), completion (30d decay)
    - 100,000 signal events spanning 7 days across the items
    - Ranking profiles defined:
      * "trending" -- share_velocity(6h) primary, view_velocity(6h) secondary,
        engagement_ratio gate > 0.03
      * "hot" -- score / (age_hours + 2)^1.8
      * "new" -- created_at DESC
      * "top_week" -- quality_score within 7d window
      * "hidden_gems" -- high completion_rate, inverse view_count
      * "controversial" -- max(likes * dislikes)

When:
  1. RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25
  2. RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20
  3. RETRIEVE items USING PROFILE new LIMIT 20
  4. RETRIEVE items USING PROFILE top_week LIMIT 20
  5. RETRIEVE items USING PROFILE hidden_gems FILTER min_completion_rate:0.7 LIMIT 10
  6. RETRIEVE items USING PROFILE controversial LIMIT 10
  7. Write a burst of 100 "share" signals for item #500
  8. Re-execute the trending query

Then:
  - Step 1: Items ordered by share velocity, max 1 per creator, items with
    engagement_ratio < 0.03 excluded
  - Step 2: Only jazz items returned, ordered by hot formula
  - Step 3: Items ordered by created_at descending, no signal computation
  - Step 4: Items ordered by quality score computed from 7d-windowed signals
  - Step 5: Items with high completion but low views, sorted by quality/reach ratio
  - Step 6: Items with highest product of positive and negative signals
  - Step 7: ok
  - Step 8: Item #500 appears higher in trending results (signal written 100ms ago
    is reflected)
  - Performance: end-to-end RETRIEVE < 50ms for 10K items

Phases

Phase 1: Vector Index Integration (USearch)

Delivers: USearch wrapped behind a trait, with mmap persistence, f16 quantization, and the adaptive filtered search planner. Items can be inserted with embeddings and retrieved by ANN similarity.

Acceptance Criteria:

VectorIndex trait with insert(key, vector), remove(key), search(query, k), filtered_search(query, k, predicate), save(), load(), view()
USearch backend implements the trait with f16 quantization (default), mmap persistence
Vectors normalized at insertion time (L2 distance equivalent to cosine for unit vectors)
Adaptive query planner: selectivity < 2% triggers pre-filter + brute-force; 2-100% uses filtered_search with predicate callback
ANN retrieval at 10K vectors returns top-100 with recall@10 > 0.95
ANN retrieval latency < 10ms at 10K vectors (benchmarked)
Persistence: save on checkpoint, view() on restart for immediate read serving
#![forbid(unsafe_code)] relaxed only in the USearch FFI boundary module with SAFETY comments

Depends On: m1p3 (storage traits) Complexity: L Research Reference: docs/research/ann_for_tidaldb.md (USearch architecture, filtered search, f16, mmap)

Phase 2: Metadata Indexes and Filter Engine

Delivers: Roaring bitmap indexes for categorical metadata, B-tree indexes for range attributes, and a composable filter engine that evaluates arbitrary filter combinations. The filter engine produces either a bitmap (for pre-filtering ANN) or a predicate closure (for in-graph filtering).

Acceptance Criteria:

Roaring bitmap per high-cardinality metadata value: category, format, creator_id
B-tree index for range attributes: created_at, duration
Filter expressions are composable: AND across dimensions, OR within a dimension
filter.selectivity() estimates the fraction of items matching (for query planner)
filter.to_bitmap() returns a RoaringBitmap for pre-filtering
filter.to_predicate() returns a Fn(EntityId) -> bool for in-graph filtering
Filters tested: category:jazz, format:video, duration_min:5m, created_within:7d, and arbitrary combinations
Filter evaluation < 1 microsecond per candidate (benchmarked)

Depends On: m1p3 (storage engine) Complexity: M Research Reference: docs/research/ann_for_tidaldb.md (metadata indexes, selectivity estimation, roaring bitmaps)

Phase 3: Ranking Profile Engine

Delivers: Named ranking profiles declared as data (not compiled code), parsed, validated, stored, and executed by the database. Profiles reference signal scores, windowed aggregates, velocity, metadata fields, and define quality gates. Profiles are versioned and swappable at query time.

Acceptance Criteria:

Profile declaration syntax supports: primary signal, secondary signals with weights, BOOST, GATE (minimum threshold), PENALIZE, EXCLUDE
Profiles stored in schema, versioned, retrievable by name
Profile execution: given a candidate set and a profile, produce a scored and sorted result list
Built-in profiles implemented: trending, hot, new, top_week, top_month, top_all_time, hidden_gems, controversial, most_viewed, most_liked, shuffle
hot formula: score / (age_hours + 2)^gravity with configurable gravity
controversial formula: max(positive_signals * negative_signals)
hidden_gems formula: quality_score * (1 / log(1 + view_count))
Profile change does not require recompile -- profiles are runtime data
200-candidate scoring pass with a profile < 10 microseconds (benchmarked)

Depends On: m1p4 (signal ledger) Complexity: L Research Reference: VISION.md (ranking profile declarations), ai-lookup/services/ranking-profiles.md, USE_CASES.md Appendix B (sort mode formulas)

Phase 4: Diversity Enforcement

Delivers: Post-scoring diversity pass that reorders results to satisfy constraints (max_per_creator, format_mix) without reducing result count. Implemented as a greedy selection pass over the scored candidate list.

Acceptance Criteria:

max_per_creator:N enforced: no more than N items from any single creator in the result set
format_mix:true enforced: no more than 60% of results from any single format
Diversity pass does not reduce result count -- it selects the next-best candidate that satisfies constraints
Diversity pass adds < 1ms for 200 candidates (benchmarked)
When diversity constraints cannot be fully satisfied (too few creators), results are returned with a warning flag, not an error
Property test: diversity constraints hold for 10,000 random candidate sets

Depends On: Phase 3 (ranking profiles produce scored lists) Complexity: M Research Reference: VISION.md (diversity as query constraint), thoughts.md Part V.14 (MMR post-scoring)

Phase 5: Query Parser and RETRIEVE Executor

Delivers: The query parser for the RETRIEVE operation and the executor that orchestrates candidate retrieval, filtering, scoring, diversity, and result assembly. This is the "one query" entry point. For M2, the RETRIEVE query does not require FOR USER (no personalization yet) -- it operates on the full item corpus with filters and profiles.

Acceptance Criteria:

Parser handles: RETRIEVE items, USING PROFILE <name>, FILTER <conditions>, DIVERSITY <constraints>, LIMIT <n>, EXCLUDE [ids]
Parser produces a typed AST; parse errors include position and helpful message
Executor pipeline: candidate retrieval (ANN or full scan based on profile) -> filter -> score -> diversity -> limit -> return
When profile uses velocity/decay signals, executor uses ANN retrieval over embeddings then scores with signal state
When profile is new or alphabetical, executor skips ANN and uses metadata index directly
End-to-end RETRIEVE latency < 50ms at 10K items (benchmarked)
Results include: entity_id, score, and a signal snapshot (key signal values used in scoring) for debugging/transparency
SIGNAL write command also parsed and routed to signal write path from M1
Full M2 UAT scenario passes as an integration test

Depends On: Phase 1, Phase 2, Phase 3, Phase 4 Complexity: L Research Reference: ai-lookup/features/query-language.md, SEQUENCE.md (all sequence diagrams)

Deferred to Later Milestones

FOR USER clause and user preference vectors -- deferred to M3; M2 proves ranking works without personalization
SIMILAR TO clause (related content) -- deferred to M3; requires user context for personalization layer
Relationship graph (follows, blocks) -- deferred to M3; M2 filters on metadata, not relationships
SEARCH query (text + semantic) -- deferred to M4; M2 proves RETRIEVE ranking
Full-text index (Tantivy) -- deferred to M4
Exploration budget / cold start -- deferred to M3; requires user context to be meaningful
User state filters (unseen, saved, liked) -- deferred to M3; requires user entities
Engagement threshold filters (min_views, min_likes) -- partially implemented via signal reads; full composable filter syntax deferred to M5

Integration Test

#[test]
fn milestone_2_uat() {
    let db = open_with_full_schema();

    // Write 10K items with embeddings
    for i in 0..10_000 {
        db.write_item(EntityId(i), metadata(i), Some(embedding(i))).unwrap();
    }

    // Write 100K signal events
    for e in generate_events(100_000, Duration::days(7)) {
        db.signal(e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Trending query with diversity
    let results = db.retrieve(
        "RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25"
    ).unwrap();
    assert_eq!(results.len(), 25);
    assert!(results.windows(2).all(|w| w[0].score >= w[1].score));
    assert!(creator_counts(&results).values().all(|&c| c <= 1));

    // Category filter with hot sort
    let jazz = db.retrieve(
        "RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20"
    ).unwrap();
    assert!(jazz.iter().all(|r| r.metadata["category"] == "jazz"));

    // Signal freshness: write burst, verify ranking change
    let pre_burst = db.retrieve(
        "RETRIEVE items USING PROFILE trending LIMIT 10"
    ).unwrap();
    for _ in 0..100 {
        db.signal("share", EntityId(500), 1.0, Timestamp::now()).unwrap();
    }
    let post_burst = db.retrieve(
        "RETRIEVE items USING PROFILE trending LIMIT 10"
    ).unwrap();
    let pre_rank = pre_burst.iter().position(|r| r.id == EntityId(500));
    let post_rank = post_burst.iter().position(|r| r.id == EntityId(500));
    assert!(post_rank.unwrap() < pre_rank.unwrap_or(25));
}

Done When

A developer can write items with embeddings and metadata, write signal events, and execute RETRIEVE queries with any of the 11+ built-in sort modes, metadata filters, and diversity constraints. Results are correctly ranked by the named profile. Signal events written 100ms ago are reflected in the next query. End-to-end latency < 50ms at 10K items. Diversity constraints hold in every result set.

Milestone 3: Personalized Ranking -- "The For You query works"

Milestone Thesis

A developer can write user entities with preference vectors, write relationship edges (follows, blocks), write engagement signals that update user profiles and relationship weights automatically, and execute RETRIEVE items FOR USER @user_id USING PROFILE for_you -- getting results shaped by the user's history, relationships, and implicit preferences. This proves that the feedback loop closes inside the database.

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items across 200 creators, with embeddings
    - 500 users with initial preference embeddings
    - Relationship edges: follows, blocks
    - Signals: view, like, skip, hide, completion, share
    - 500,000 historical signal events establishing user preferences
    - Profiles: for_you, following, related, notification

When:
  1. RETRIEVE items FOR USER @user_42 USING PROFILE for_you
     FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50
  2. RETRIEVE items FOR USER @user_42 FILTER relationship:follows
     USING PROFILE following LIMIT 50
  3. RETRIEVE items SIMILAR TO @item_abc FOR USER @user_42
     USING PROFILE related FILTER unseen LIMIT 10
  4. SIGNAL like item:@item_xyz user:@user_42
  5. Re-execute the for_you query
  6. SIGNAL hide item:@item_999 user:@user_42
  7. SIGNAL block user:@user_42 target_creator:@creator_77
  8. Re-execute the for_you query

Then:
  - Step 1: Results personalized -- items matching user_42's preference vector
    rank higher; items from blocked creators excluded; items already seen excluded;
    max 2 per creator; 10% exploration budget (items from unfollowed creators)
  - Step 2: Only items from followed creators, chronological order
  - Step 3: Items semantically similar to @item_abc, re-ranked by user_42's
    preference match, already-seen excluded
  - Step 4: Signal write atomically updates: item like count, user->creator
    interaction weight, user preference vector shifted toward item embedding
  - Step 5: Results shift -- items similar to @item_xyz's topic rank higher;
    creator of @item_xyz appears more frequently
  - Step 6: @item_999 never appears in any future query for user_42
  - Step 7: All items by creator_77 excluded from all queries for user_42
  - Step 8: No items from creator_77; no item_999; shift from like reflected

Phases

Phase 1: User and Creator Entities with Relationships

Delivers: User and creator entity types with preference vectors and a relationship graph. Relationship edges are weighted, directional, and queryable. Follows, blocks, interaction weights are first-class.

Acceptance Criteria:

User entities store: user_id, preference embedding (mutable, updated on signals), metadata
Creator entities store: creator_id, catalog embedding (aggregated from items), metadata
Relationship edges: (from_entity, to_entity, type, weight, timestamp) with types: follows, blocks, interaction_weight, hide, mute
follows filter: efficiently enumerate all items by creators a user follows (roaring bitmap of creator's item set, intersected with follows set)
blocked filter: efficiently exclude all items by blocked creators
unseen filter: roaring bitmap of user's seen item set, inverted
Relationship write/read latency < 50 microseconds

Depends On: m1p3 (storage), m2p2 (bitmap indexes) Complexity: L

Phase 2: Feedback Loop -- Signal Writes Update User State

Delivers: When a signal event is written (like, skip, hide, completion), the database atomically updates the item's signal ledger, the user-to-item relationship, the user-to-creator interaction weight, and the user's preference vector. One write, multiple state updates, no application logic.

Acceptance Criteria:

db.signal("like", item_id, user_id, weight, timestamp) atomically:
1. Appends event to WAL
2. Updates item signal ledger (decay scores, windowed counts)
3. Increments user->creator interaction_weight
4. Shifts user preference vector toward item embedding (configurable learning rate)
db.signal("skip", ...) atomically: updates item skip count, decays user->creator weight, shifts preference vector away from item embedding
db.signal("hide", ...) sets permanent hard-negative on user->item relationship; item excluded from all future queries for this user
db.signal("block", user, creator) sets permanent block; all items by creator excluded from all queries for this user
Preference vector update uses exponential moving average: pref = alpha * item_embedding + (1 - alpha) * pref (positive) or pref = pref - alpha * item_embedding (negative), normalized after update
All updates visible to the next query (no eventual consistency lag within the process)
Property test: 10,000 random signal sequences never produce a state where a hidden item or blocked creator appears in query results

Depends On: Phase 1, m1p4 (signal ledger) Complexity: XL

Phase 3: Personalized Ranking Profiles

Delivers: Ranking profiles that incorporate user context: preference match (embedding similarity between user and item), user-creator interaction weight, social proof (engagement from user's follows), and user-specific exclusions. The for_you, following, related, and notification profiles.

Acceptance Criteria:

for_you profile: ANN retrieval using user preference vector, scoring = preference_match * engagement_velocity * recency_decay * social_proof, gates on completion_rate, penalizes skip count, 10% exploration budget
following profile: candidate set restricted to followed creators' items, sorted by created_at DESC, tiebreaker on completion_rate
related profile: ANN retrieval using source item's embedding, collaborative filtering boost (items co-engaged with source), personalization re-rank by user preference
notification profile: candidates from followed creators' recent items, scored by relationship_strength * item_quality
Exploration budget: 10% of for_you results are from creators the user does not follow, to prevent filter bubbles
Cold start: new users with no signal history get results ranked by population-level signals (trending, quality)
Cold start: new items with no signals get an exploration window (appear in a small % of for_you feeds)
FOR USER @user_id clause parsed and user state loaded into query context

Depends On: Phase 2, m2p3 (ranking engine), m2p5 (query parser) Complexity: L

Phase 4: User State Filters

Delivers: Filters that depend on user state: unseen, in_progress, saved, liked, in_collection. These require per-user bitmaps or sets maintained by the signal system.

Acceptance Criteria:

unseen filter: excludes items the user has viewed (maintained as roaring bitmap per user, updated on view signal)
unblocked filter: excludes items from blocked creators and hidden items
saved filter: returns only items the user has saved
liked filter: returns only items the user has liked
in_progress filter: returns items with partial completion signal
User state filters compose with all metadata filters from M2
Per-user seen bitmap memory: ~125KB per user at 1M items (roaring bitmap), manageable for 10K users in memory

Depends On: Phase 1, Phase 2 Complexity: M

Deferred to Later Milestones

SEARCH query with personalization -- deferred to M5; M3 proves personalized RETRIEVE
Tantivy integration -- deferred to M5
People/creator search (UC-10) -- deferred to M5
Social graph traversal for trending ("trending among my follows") -- deferred to M6; requires graph query capabilities beyond simple follows filter
Collaborative filtering -- basic co-engagement signals used in related profile; full matrix-factorization-style CF deferred to M6
User-created collections/boards (UC-09.4) -- deferred to M6
Live content status tracking (UC-12) -- deferred to M6

Integration Test

#[test]
fn milestone_3_uat() {
    let db = open_with_users_and_relationships();

    // User 42 likes jazz, follows creators 1-10, blocked creator 77
    let feed = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();
    assert_eq!(feed.len(), 50);
    assert!(feed.iter().all(|r| !user_42_seen.contains(&r.id)));
    assert!(feed.iter().all(|r| r.creator_id != CreatorId(77)));
    assert!(creator_counts(&feed).values().all(|&c| c <= 2));

    // Like an item, verify preference shift
    db.signal("like", EntityId(500), UserId(42), 1.0, now()).unwrap();
    let feed2 = db.retrieve(same_for_you_query()).unwrap();
    // Items topically similar to item 500 should rank higher
    let topic_500 = db.read_item(EntityId(500)).unwrap().category;
    let topic_match_before = feed.iter().filter(|r| r.category == topic_500).count();
    let topic_match_after = feed2.iter().filter(|r| r.category == topic_500).count();
    assert!(topic_match_after >= topic_match_before);

    // Hide and block, verify exclusion
    db.signal("hide", EntityId(999), UserId(42), 1.0, now()).unwrap();
    db.signal("block", UserId(42), CreatorId(77), 1.0, now()).unwrap();
    let feed3 = db.retrieve(same_for_you_query()).unwrap();
    assert!(feed3.iter().all(|r| r.id != EntityId(999)));
    assert!(feed3.iter().all(|r| r.creator_id != CreatorId(77)));
}

Done When

The full "For You" query works: RETRIEVE items FOR USER @user_id USING PROFILE for_you FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50 returns personalized, diversity-constrained results that reflect the user's engagement history, exclude hidden items and blocked creators, include an exploration budget, handle cold-start users and items, and update in response to new signal events within 100ms. The following, related, and notification profiles also work correctly.

Milestone 4: Agent Memory -- "Agents own the personalization substrate"

Milestone Thesis

Agents mediate the user interaction: they ground LLM responses, collect preferences, and emit feedback. This milestone proves a developer can embed tidalDB alongside an agent runtime, create sessions, append structured feedback signals (reward, tool usage, critiques), enforce per-agent policy, and query session memory in milliseconds.

Phases

Phase 1: Session Schema & Lifecycle

Delivers: SessionId, AgentId, and AgentPolicy types in schema plus builder flags (with_sessions(true)). APIs to start_session, append_session_metadata, close_session. WAL entries tagged with agent metadata and CLI output listing active sessions.

Phase 2: Session Materializers & Short-Lived Aggregates

Delivers: SessionMaterializer (minute-scale decay buckets for reward/pref hints, tool usage counters) registered via the existing materializer trait. Query APIs session_view(session_id) and session_velocity(session_id, signal_type) with <5µs read latency. Integration tests proving hot path throughput at 50k updates/sec.

Phase 3: Policy & Safety Layer

Delivers: Declarative schema-bound policies (allowed signal types, max QPS, storage TTL). Enforcement in the signal write path (reject or queue). Audit log per agent (accessible via CLI/metrics) plus rate-limiters to isolate noisy agents.

Phase 4: Agent-Facing APIs & Explanations

Delivers: retrieve_for_session / search_for_session endpoints returning ranked items plus a session_snapshot (top signals, reasons, reward velocity). Agent-friendly error codes, documentation, and samples (user → agent → tidalDB). Session data plumbed into ranking profiles via new SessionContext.

UAT Scenario

Given:
  An agent opens session S for user @u123 with metadata {tool:"planner"}
  Policy allows signals preference_hint and reward; forbids raw_log

When:
  1. Agent writes preference_hint ("more jazz today")
  2. Agent writes reward(+0.8) after delivering an answer
  3. Agent executes RETRIEVE ... FOR USER @u123 FOR SESSION @S USING PROFILE for_you LIMIT 10
  4. Agent receives ranked items and session_snapshot (reward_velocity, last_tool)
  5. Agent attempts to write raw_log → rejected with policy violation
  6. Session closes; CLI shows duration, writes, rejections

Then:
  - Session aggregates reflect preference/reward immediately
  - Policy enforcement blocks disallowed write with audit trail
  - After closure, querying session S returns archived snapshot with final signals

Milestone 5: Hybrid Search -- "Text + semantic + signals in one query"

Milestone Thesis

A developer can execute SEARCH items QUERY "rust tutorial beginner" VECTOR query_vector FOR USER @user_id USING PROFILE search LIMIT 20 and get results that combine BM25 text relevance, semantic similarity, and user personalization in a single ranked list. This proves that search and retrieval are the same system.

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items with text fields (title, description, tags) indexed for full-text search
    - All items have embeddings
    - 500 users with engagement history
    - Search profile defined: text relevance as floor, semantic similarity,
      personalization adjustment

When:
  1. SEARCH items QUERY "rust tutorial beginner" VECTOR [query_embedding]
     FOR USER @user_42 USING PROFILE search DIVERSITY max_per_creator:2 LIMIT 20
  2. SEARCH items QUERY "jazz piano" FOR USER @user_42
     USING PROFILE search FILTER duration:short, format:video LIMIT 20
  3. SEARCH items QUERY "\"exact phrase match\"" USING PROFILE search LIMIT 10
  4. SEARCH items QUERY "jazz -beginner" USING PROFILE search LIMIT 10
  5. SEARCH creators QUERY "jazz" LIMIT 10
  6. User clicks result #3, record SIGNAL search_click
  7. User searches same query again

Then:
  - Step 1: Results combine BM25 + semantic similarity via RRF;
    personalization re-ranks within relevant set; user_42 (a beginner)
    sees beginner content elevated
  - Step 2: Text-only search (no vector), filtered by duration and format
  - Step 3: Exact phrase match -- only items containing "exact phrase match"
  - Step 4: Boolean exclusion -- no items matching "beginner"
  - Step 5: Creator search by name/topic
  - Step 6: Signal recorded with query context and rank position
  - Step 7: Clicked result may rank higher due to search_click signal
  - Performance: SEARCH < 50ms at 10K items

Phases

Phase 1: Tantivy Integration

Delivers: Tantivy embedded as a derived index for full-text search. DB-primary consistency pattern: entity store is source of truth, Tantivy is a materialized view updated via outbox. BM25 scoring exposed via custom Collector and Weight/Scorer seek pattern.

Acceptance Criteria:

Tantivy index created from schema text field definitions (title, description, tags)
Background indexer reads entity store outbox and feeds Tantivy writer
Tantivy commit stores last-processed sequence number in payload for crash recovery
Custom AllScoresCollector returns all matching doc IDs with BM25 scores
Weight::scorer + DocSet::seek pattern scores specific candidate IDs (for re-ranking ANN results)
External entity ID -> DocAddress mapping maintained and updated on segment merge
Boolean queries supported: AND, OR, NOT, exact phrase, field-scoped
Commit interval: every 1-5 seconds or every N thousand documents
Index rebuild from entity store completes in < 10 minutes at 10K items
BM25 query latency < 10ms at 10K documents (benchmarked)

Depends On: m1p3 (storage engine), m1p5 (entity API) Complexity: L Research Reference: docs/research/tantivy.md (Collector API, consistency pattern, seek scoring, commit model)

Phase 2: Hybrid Fusion (RRF)

Delivers: Reciprocal Rank Fusion combining BM25 ranked lists with ANN ranked lists into a single scored result set. The starting point is RRF with k=60; the architecture supports upgrading to tuned linear combination when relevance labels exist.

Acceptance Criteria:

RRF(d) = 1/(60 + rank_bm25(d)) + 1/(60 + rank_ann(d)) implemented
Documents appearing in only one list contribute only their single-list term
RRF results are re-rankable by personalization (user preference overlay)
When only text query is provided (no vector), pure BM25 ranking used
When only vector is provided (no text), pure ANN ranking used
Fusion adds < 1ms to query time (benchmarked)
k parameter configurable (default 60)

Depends On: Phase 1 (BM25 scores), m2p1 (ANN scores) Complexity: S Research Reference: docs/research/tantivy.md (RRF section, Cormack et al.)

Phase 3: SEARCH Query Parser and Executor

Delivers: The SEARCH query parser and executor that orchestrates text retrieval, semantic retrieval, fusion, personalization, filtering, diversity, and result assembly.

Acceptance Criteria:

Parser handles: SEARCH items/creators, QUERY "text", VECTOR [embedding], FOR USER, USING PROFILE, FILTER, DIVERSITY, LIMIT
Query text parsing: exact phrase ("...""), boolean operators (AND/OR/NOT/-), field-scoped (title:...), wildcard (term*)
Executor pipeline: text retrieval -> ANN retrieval -> fusion -> personalization -> filter -> diversity -> return
When both QUERY and VECTOR provided, hybrid fusion (RRF)
When only QUERY, BM25-only retrieval
When only VECTOR, ANN-only retrieval
Search results include: entity_id, combined_score, bm25_score, semantic_score, rank
search_click signal writes include query context and rank position
End-to-end SEARCH < 50ms at 10K items (benchmarked)

Depends On: Phase 1, Phase 2, m2p5 (query parser infrastructure) Complexity: M

Phase 4: Creator and People Search

Delivers: Search over creator entities by name, topic, and attributes. "Creators like X" via creator embedding similarity. Enables UC-10.

Acceptance Criteria:

Creator entities indexed in Tantivy (name, handle, bio, topics)
Creator embeddings searchable via ANN (aggregated from catalog)
SEARCH creators QUERY "jazz" LIMIT 10 returns creators matching topic
SEARCH creators SIMILAR TO @creator_id LIMIT 10 returns similar creators by embedding
Creator filters: verified, min_followers, language, followed_by_user
Creator sort modes: follower_count, engagement_rate, posting_frequency

Depends On: Phase 1, m3p1 (creator entities) Complexity: M

Deferred to Later Milestones

Autocomplete and search suggestions (UC-02.3) -- deferred to M5; requires prefix indexes and trending query tracking
Saved searches and alerts (UC-02.4) -- deferred to M5; requires persistent query storage and push notification
Visual search / image search (UC-11) -- deferred to M5; requires multi-modal embedding support
"Did you mean" typo correction -- deferred to M5; requires edit-distance computation on term dictionary
Tuned linear combination (replacing RRF) -- deferred to M5; requires relevance labels for alpha tuning

Done When

A developer can execute SEARCH queries that combine full-text BM25 relevance with semantic vector similarity and user personalization in a single ranked result set. Boolean queries, phrase matching, field-scoped search, and creator search all work. Results reflect engagement signals. End-to-end SEARCH latency < 50ms at 10K items.

Milestone 6: Full Surface Coverage -- "Every use case works"

Milestone Thesis

Every one of the 14 use cases works end-to-end. Every sort mode, every filter dimension, every discovery surface described in USE_CASES.md is operational. The query RETRIEVE items FOR USER @user_id CONTEXT feed USING PROFILE for_you FILTER unseen, unblocked, format:video, duration:short DIVERSITY max_per_creator:2, format_mix:true LIMIT 50 is the complete, production-quality end state query.

UAT Scenario

Given:
  A tidalDB instance loaded with:
    - 100,000 items across 1,000 creators
    - 10,000 users with engagement histories
    - All 14 use case scenarios configured
    - All sort modes and filter dimensions exercised

When:
  All 14 use cases are executed as described in USE_CASES.md:
    UC-01: For You Feed with full diversity and exploration
    UC-02: Search with all filter dimensions, autocomplete, saved searches
    UC-03: Trending (global, category, social-graph scoped)
    UC-04: Following feed (chronological, algorithmic modes)
    UC-05: Related/Up Next with collaborative filtering
    UC-06: Browse with all sort modes, faceted filters, mood filters
    UC-07: Notification prioritization with frequency capping
    UC-08: Creator profile (Top, New, Hot, For You modes)
    UC-09: User library (history, saved, liked, collections, continue watching)
    UC-10: People search with "creators like X"
    UC-11: Visual/semantic search with image embeddings
    UC-12: Live content with real-time viewer count
    UC-13: Hidden gems with breakout detection
    UC-14: Controversial and Hot with dual-signal ranking

Then:
  Every query returns correct results per use case specification.
  All 25+ sort modes produce correctly ordered results.
  All filter dimensions compose correctly.
  Performance: < 50ms for all queries at 100K items.

Phases

(Phases for M5 are provisional -- detailed decomposition happens after M4 ships, informed by what was learned.)

Phase 1: Complete Sort Mode Coverage

Delivers: All 25+ sort modes from Appendix B operational. Windowed top sorts (hour, today, week, month, year, all_time), shuffle, alphabetical, shortest/longest, live_viewer_count, date_saved, creator_engagement_rate.

Depends On: M4 complete Complexity: L

Phase 2: Complete Filter Coverage

Delivers: All filter dimensions from Appendix A operational and composable. Geographic filters, accessibility filters, community signal filters, availability filters, engagement threshold filters.

Depends On: Phase 1 Complexity: L

Delivers: Social graph traversal for trending-among-follows, collaborative filtering for related/up-next, "creators followed by people I follow." The graph query capabilities needed for UC-03 (social trending), UC-05 (collaborative filtering), UC-10 (social creator discovery).

Depends On: Phase 1 Complexity: L

Phase 4: User Library, Collections, and Continue Watching

Delivers: UC-09 complete: watch history, saved items, liked items, user-created collections, continue watching (resume position), download state. Collections as rankable entities.

Depends On: Phase 2 Complexity: M

Phase 5: Advanced Search Features

Delivers: Autocomplete, search suggestions, trending searches, saved searches, "did you mean" typo correction, related query suggestions. UC-02.3 and UC-02.4.

Depends On: Phase 1 Complexity: L

Phase 6: Live Content and Notification Systems

Delivers: UC-12 (live content with real-time viewer count, scheduled content, reminders) and UC-07 (notification prioritization with frequency capping, per-creator limits). Real-time signal types for viewer count and schedule awareness.

Depends On: Phase 1 Complexity: M

Deferred to Later Milestones

Signal rollups (hourly/daily materialization) -- built if 100K-item benchmarks show bucketed counters exceeding the latency budget for 30d+ windows
Multi-vector user interest clustering (PinnerSage) -- deferred to M7 or beyond; single preference vector serves through M6
ACORN-1 two-hop expansion for very selective filters -- deferred to M7; USearch predicate callback sufficient through M6

Done When

All 14 use cases pass their UAT scenarios as defined in USE_CASES.md. All 25+ sort modes work. All filter dimensions compose. Every sequence diagram in SEQUENCE.md can be executed. Performance: < 50ms for all queries at 100K items.

Milestone 7: Production Hardening -- "Ready for real workloads"

Milestone Thesis

tidalDB can be embedded in a production application and operated with confidence. Crash recovery is correct and fast. Graceful degradation works under load. Operational visibility exists. Performance meets targets at 1M+ items. The database is trustworthy.

UAT Scenario

Given:
  A tidalDB instance with:
    - 1,000,000 items, 100,000 users, 10,000 creators
    - Sustained write load: 10,000 signal events/second
    - Concurrent read load: 1,000 RETRIEVE queries/second

When:
  1. Run full workload for 1 hour
  2. Kill the process at a random point
  3. Restart and measure recovery time
  4. Verify no data loss and no inconsistency
  5. Run workload at 3x expected load
  6. Verify graceful degradation (reduced precision, not errors)

Then:
  - Step 1: All queries < 50ms p99, all signal writes < 100us amortized
  - Step 3: Recovery time < 30 seconds
  - Step 4: WAL replay produces state identical to pre-crash;
    no phantom items, no lost signals, no inconsistent aggregates
  - Step 5: Under overload, tidalDB reduces candidate set size, uses coarser
    aggregates, skips diversity -- but never returns errors for well-formed queries
  - Step 6: Degradation follows the documented order:
    1. Reduce candidate set (500 -> 200)
    2. Use coarser aggregates
    3. Skip diversity
    4. Return from materialized cache

Phases

(Phases for M7 are provisional -- detailed decomposition happens after M6 ships.)

Phase 1: Crash Recovery Hardening

Delivers: Comprehensive crash recovery testing and hardening. Fault injection at every write-path stage. Recovery time targets. WAL compaction and checkpoint optimization.

Depends On: M6 complete Complexity: XL

Phase 2: Graceful Degradation Under Load

Delivers: Automatic quality reduction under load pressure. Configurable degradation order. Backpressure on write path. Never errors for well-formed queries.

Depends On: Phase 1 Complexity: L

Phase 3: Performance at Scale

Delivers: Benchmarks and optimization at 1M items, 100K users. USearch performance tuning (M, ef_search, quantization). Tantivy segment management. Signal state memory optimization. Hot/warm/cold tiering for signal state if memory budget requires it.

Depends On: Phase 1 Complexity: XL

Phase 4: Operational Visibility

Delivers: Metrics, diagnostics, and observability. Query execution stats (candidates considered, filters applied, scoring time, diversity adjustments). Signal system health (WAL lag, checkpoint age, memory usage). Index health (segment count, tombstone ratio). Error reporting with context.

Depends On: Phase 1 Complexity: M

Deferred (Post-M7 / Future)

Horizontal distribution -- the single-node architecture scales vertically first; distribution is a separate product decision
Multi-tenancy -- per-tenant isolation within a single tidalDB instance
Streaming query results -- cursor-based streaming for very large result sets
A/B testing infrastructure -- comparing two profile versions within the database
Signal rollup to external cold storage -- S3/GCS archival for compliance
Client libraries -- language-specific wrappers beyond Rust embedding

Done When

tidalDB operates correctly at 1M items under sustained concurrent read/write load. Crash recovery completes in < 30 seconds with zero data loss. Graceful degradation works under 3x overload without returning errors. All performance targets met at p99. A developer can embed tidalDB in a production application and operate it with confidence.

Use Case Coverage Progression

UC	Description	M1	M2	M3	M4	M5	M6	M7
UC-01	For You Feed	-	-	Full	Full	Full	Full	Full
UC-02	Search	-	-	-	-	Core	Full	Full
UC-03	Trending/Rising	Signals	Full	Full	Full	Full	Full	Full
UC-04	Following Feed	-	Partial	Full	Full	Full	Full	Full
UC-05	Related/Up Next	-	-	Core	Core	Core	Full	Full
UC-06	Browse/Category	Signals	Core	Core	Core	Core	Full	Full
UC-07	Notifications	-	-	Core	Core	Core	Full	Full
UC-08	Creator Profile	-	Core	Core	Core	Core	Full	Full
UC-09	User Library	-	-	Partial	Partial	Partial	Full	Full
UC-10	People Search	-	-	-	-	Core	Full	Full
UC-11	Visual/Semantic	-	-	-	-	Partial	Full	Full
UC-12	Live Content	-	-	-	-	-	Full	Full
UC-13	Hidden Gems	-	Full	Full	Full	Full	Full	Full
UC-14	Controversial/Hot	Signals	Full	Full	Full	Full	Full	Full

Legend:

- = Not addressed
Signals = Signal primitives exist but no query surface
Partial = Some functionality, not all modes
Core = Primary query path works, some modes/filters missing
Full = All modes, filters, and feedback loops per USE_CASES.md specification

Dependency DAG

m1p1 (Types/Schema) ✓
  |
  +---> m1p2 (WAL) ✓
  |       |
  +---> m1p3 (Storage/fjall) ✓ ---+
  |       |                        |
  |       +---> m1p4 (Signal Ledger)
  |               |
  |               +---> m1p5 (Entity + Signal API)  = M1 COMPLETE
  |               |
  |               +---> m2p3 (Ranking Profiles)
  |                       |
  +---> m2p1 (USearch) ---+
  |                        |
  +---> m2p2 (Filters) ---+---> m2p4 (Diversity)
                           |       |
                           +-------+---> m2p5 (RETRIEVE Query) = M2 COMPLETE
                           |
                           +---> m3p1 (Users/Creators/Relationships)
                           |       |
                           |       +---> m3p2 (Feedback Loop)
                           |       |       |
                           |       |       +---> m3p3 (Personalized Profiles)
                           |       |
                           |       +---> m3p4 (User State Filters)
                           |
                           |       m3p3 + m3p4 = M3 COMPLETE
                           |
                           +---> m4p1 (Tantivy)
                                   |
                                   +---> m4p2 (RRF Fusion)
                                   |       |
                                   |       +---> m4p3 (SEARCH Query)
                                   |
                                   +---> m4p4 (Creator Search)

                                   m4p3 + m4p4 = M4 COMPLETE

                                   M5 phases (provisional) depend on M4
                                   M6 phases (provisional) depend on M5

Parallelization opportunities:

m1p2 (WAL) and m1p3 (Storage) are parallel after m1p1 (both now complete: m1p3 was completed first, m1p2 followed)
m2p1 (USearch) and m2p2 (Filters) can be built in parallel after m1p3
m3p1 (Entities) and m4p1 (Tantivy) can start in parallel with later M2 phases
m3p4 (User State Filters) can be built in parallel with m3p3 (Profiles)
m4p2 (RRF) and m4p4 (Creator Search) can be built in parallel

Architectural Decisions Locked In

These decisions are made. They are not revisited unless benchmarks prove them wrong.

Decision	Chosen	Alternative	Rationale
Storage engine	fjall (pure Rust)	RocksDB	Pure Rust, `#![forbid(unsafe_code)]`, fast compile, trait-abstracted for swap
Vector index	USearch (C++ FFI)	hnsw_rs	10-100x QPS, predicate callbacks, mmap, f16 quantization
Text search	Tantivy (embedded)	Custom BM25	40K lines of battle-tested code; Collector/Scorer API provides exact hooks needed
Decay formula	Running S(t)=S(prev)exp(-lambdadt)+w	Raw event scan	O(1) vs O(N), proven exact, 20-60x faster at 50+ events/entity
Windowed aggregation	Bucketed counters (Scotty pattern)	SWAG two-stacks	Simpler, serves multiple window sizes from one set of buckets
Hybrid fusion	RRF (k=60)	Tuned linear combination	Zero-config, robust; linear combo is the upgrade path with relevance labels
Consistency model	DB-primary, Tantivy as derived index	Two-phase commit	Simpler, deterministic recovery, source of truth is always the entity store
WAL checksums	BLAKE3	CRC32C	Content-addressing enables deduplication; BLAKE3 is fast enough
Key encoding	Subject-prefix `[entity_id][0x00][TAG:suffix]`	Separate key namespaces	Co-locates entity data, natural shard boundary, single prefix scan
Embedding format	f16 quantization (default)	float32	Half memory, < 1% recall loss at 1536D
Query language	Custom (RETRIEVE/SEARCH/SIGNAL)	SQL	Domain semantics cannot be expressed in SQL without losing optimization opportunities

What This Roadmap Does NOT Cover

These are explicitly out of scope for the foreseeable future:

Embedding generation -- tidalDB retrieves and ranks over vectors. It does not generate them. Bring your own model.
Horizontal distribution -- Single-node first. Scale vertically. Distribution is a separate product.
ACID transactions across entities -- Signal writes are atomic within an entity's state. Cross-entity transactions are not needed for the ranking problem.
SQL compatibility -- The custom query language exists because SQL cannot express ranking semantics. No SQL layer.
Multi-tenancy -- One tidalDB instance serves one application. Tenant isolation is the application's concern.
Content moderation, authentication, payments, CDN -- tidalDB solves one problem: ranking. Everything else is someone else's job.

69 KiB Raw Blame History

TidalDB Roadmap

Vision Statement

Thesis

Milestone Summary

Product Milestone Summary (New)

Current Status

Product Track: Personal Briefing Feed (Knowledge Workers + Consumers)

P0: Beachhead Validation -- "Do users care enough to return?"

P1: Concierge Alpha -- "High-value daily brief for a narrow cohort"

P2: Productized Beta -- "Self-serve and repeatable without handholding"

P3: Public Launch -- "Trusted at real volume"

P4: Scale + Revenue Fit -- "Sustainable business without degrading quality"

PG1: Personalization Core Done (Blocking Gate)

Milestone 0: Embeddable Runtime -- "Runs in your process in minutes"

Milestone Thesis

Phases

Phase 1: Embeddable Runtime Skeleton

Phase 2: Tooling & Diagnostics

Phase 3: Samples & Docs

UAT Scenario

Milestone 1: Signal Engine -- "Signals are a database primitive"

Milestone Thesis

UAT Scenario

Phases

Phase 1: Core Type System and Schema -- COMPLETE

Phase 2: Write-Ahead Log -- COMPLETE

Phase 3: Storage Engine Trait and fjall Backend -- COMPLETE

Phase 4: Signal Ledger -- Decay Scores and Windowed Aggregation

Phase 5: Entity CRUD and Signal Write API

Deferred to Later Milestones

Integration Test

Done When

Milestone 2: Ranked Retrieval -- "A single query retrieves, scores, and ranks content"

Milestone Thesis

UAT Scenario

Phases

Phase 1: Vector Index Integration (USearch)

Phase 2: Metadata Indexes and Filter Engine

Phase 3: Ranking Profile Engine

Phase 4: Diversity Enforcement

Phase 5: Query Parser and RETRIEVE Executor

Deferred to Later Milestones

Integration Test

Done When

Milestone 3: Personalized Ranking -- "The For You query works"

Milestone Thesis

UAT Scenario

Phases

Phase 1: User and Creator Entities with Relationships

Phase 2: Feedback Loop -- Signal Writes Update User State

Phase 3: Personalized Ranking Profiles

Phase 4: User State Filters

Deferred to Later Milestones

Integration Test

Done When

Milestone 4: Agent Memory -- "Agents own the personalization substrate"

Milestone Thesis

Phases

Phase 1: Session Schema & Lifecycle

Phase 2: Session Materializers & Short-Lived Aggregates

Phase 3: Policy & Safety Layer

Phase 4: Agent-Facing APIs & Explanations

UAT Scenario

Milestone 5: Hybrid Search -- "Text + semantic + signals in one query"

Milestone Thesis

UAT Scenario

Phases

Phase 1: Tantivy Integration

Phase 2: Hybrid Fusion (RRF)

Phase 3: SEARCH Query Parser and Executor

Phase 4: Creator and People Search

Deferred to Later Milestones

Done When

Milestone 6: Full Surface Coverage -- "Every use case works"

Milestone Thesis

UAT Scenario

Phases

Phase 1: Complete Sort Mode Coverage

Phase 2: Complete Filter Coverage

69 KiB

Raw Blame History