tidaldb/docs/planning/ROADMAP.md
jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions
M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:24:48 -07:00

127 KiB
Raw Blame History

TidalDB Roadmap

Vision Statement

When tidalDB is complete, an engineering team building any content platform -- a media library, a social feed, a marketplace, a discovery surface, or an agentic UX -- can embed a single Rust database and replace the Elasticsearch + Redis + Kafka + feature store + vector database + ranking service stack. One process, one query interface, one operational model. The query RETRIEVE items FOR USER @user_id USING PROFILE for_you FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50 executes in under 50ms, reflects signals written 100ms ago, enforces diversity without application logic, handles cold-start items without application intervention, and returns results a user would describe as "it knows what I want."

The same runtime doubles as the personalization memory substrate for agents: user → agent → tidalDB. Agents ground themselves by reading live session context, write structured signals (preferences, critiques, tool usage) with decay budgets, and immediately query those updates on the next turn. The embeddable runtime is step zero; the exact same WAL + subject-prefix key architecture grows into a multi-region, eventually-consistent fabric so agent memory travels with the user across devices and datacenters without losing correctness.

The long-term model is user-owned personalization across three scopes: global profile, opt-in community overlays, and agent/session context. Users can grant and revoke access per scope, and remove scoped contributions from future ranking without destroying local history.

Thesis

A single embeddable database can replace the 6-system content ranking stack by treating signals, ranking profiles, session policy, and diversity constraints as database primitives rather than application logic. Every agent or product surface gets an always-fresh memory lane without standing up Vespa-scale search clusters or bespoke feature stores.

Differentiation vs Vespa and search platforms

  1. Agent-owned memory lanes. Signals, session context, and reward metadata are schema-level objects. Agents can create scoped sessions, write feedback with decay guarantees, and read it back with zero glue code. Vespa is optimized for serving queries; it assumes you run feature updates elsewhere.
  2. Embeddable-first ergonomics. cargo add tidaldb gives you the full signal + ranking stack with WAL durability and diagnostics in-process. Vespa demands a cluster, config servers, and feed pipelines before you can prototype.
  3. Temporal math on the write path. Decay, windowing, velocity, and diversity guards are computed atomically when signals arrive. There is no notion of "update documents later" or external CRON math.
  4. Session- and policy-aware query language. RETRIEVE ... FOR USER ... FOR SESSION ... USING PROFILE ... encodes permissions, diversity and cohort constraints; agent policies live in schema, not middleware.
  5. Roadmapped scale path. The same WAL segments, subject-prefix keys, and checkpoint formats we ship for the embeddable runtime become the replication log and deterministic conflict-resolution substrate for the distributed fabric (see M8). Vespa already starts distributed; tidalDB grows there without sacrificing the zero-config DX.

Milestone Summary

# Name Proves Enables
M0 Embeddable Runtime tidalDB can run in-process with zero-config defaults and tooling Cuts proof-of-concept friction, enables internal dogfooding
M1 Signal Engine Signals are a database primitive with O(1) decay, not application math UC-03 (partial), UC-06 (partial), UC-14 (partial)
M2 Ranked Retrieval A single query retrieves, scores, and ranks content using live signals UC-03, UC-04, UC-06, UC-08, UC-13, UC-14
M3 Personalized Ranking User context shapes retrieval and ranking -- the "For You" query works UC-01, UC-05, UC-07, UC-09 (partial)
M4 Agent Memory Agents can create sessions, write signals, and enforce policy inside tidalDB Agent-mediated personalization, RLHF loops, conversational memory
M5 Hybrid Search Text + semantic + signal-ranked search in one query UC-02, UC-10, UC-11
M6 Full Surface Coverage Every use case, every sort mode, every filter, every feedback loop UC-01 through UC-14 complete
M7 Production Hardening Crash safety, graceful degradation, operational readiness All UCs at production quality
M8 Distributed Fabric Multi-region, multi-tenant replication keeps agent-memory semantics intact Hosted tidalDB, cloud/edge deployments, shared agent substrate
M9 Community Sync & Revocation Local embeddable profiles can opt into community personalization and safely leave/purge contributions Community personalization, federated taste graphs, shared feeds
M10 Governance & Agent Rights Community rules and agent-scoped permissions control what signals influence ranking User-owned AI personalization at scale, policy-compliant agents

Embeddable → Distributed Path

  1. M0M2 (Embed & prove primitives): Establish the deterministic builder, WAL, key encoding, and checkpoint semantics that make on-device instances safe to embed. Research refs: docs/research/tidaldb_wal.md, docs/research/tidaldb_signal_ledger.md.
  2. M3M4 (Session + agent policy): Layer user/creator entities, sessions, and policy enforcement so agents can write/read scoped memory lanes without glue. This also defines the logical replication unit: entity + session keyspaces.
  3. M5M6 (Surface completeness): Ship hybrid search and every retrieval mode so a single tidalDB node can back any personalization surface or agent prompt grounding workload.
  4. M7 (Operational envelope): Hardening (crash fencing, throttling, observability) creates the guarantees the fabric will rely on when shipping WAL segments across machines.
  5. M8 (Distributed Fabric): Introduce shard-aware keyspaces, WAL shipping + deterministic reconciliation, and multi-region eventual-consistency policies so embeddable instances graduate to hosted, global deployments without rewriting application code.
  6. M9 (Community Sync & Revocation): Add opt-in sharing from local profiles to community layers, plus leave/removal semantics (stop-forward + retroactive purge) with deterministic re-materialization.
  7. M10 (Governance & Agent Rights): Add community policy engines and agent capability boundaries so users and communities can control exactly which signals affect ranking and revoke them safely.

Product Milestone Summary (New)

The roadmap now has two tracks:

  • Engine Track (M0-M7): proves tidalDB capabilities.
  • Product Track (P0-P4): proves end-user value for the beachhead product.
# Name Proves Depends On
P0 Beachhead Validation Knowledge workers and consumers care about a personal briefing feed enough to use it repeatedly M0 (embedding/runtime), partial M1
P1 Concierge Alpha Daily "Today Brief" with explicit feedback controls creates Day-2 retention in a small cohort M1 complete, partial M2
PG1 Personalization Core Done (Blocking Gate) Personalization loop is correct, immediate, and measurably better than baseline P1 + M1/M2/M3 core slices
P2 Productized Beta Self-serve onboarding + real-time adaptation + explanation UX works without manual curation M2 complete, partial M3
P3 Public Launch The product is reliable, useful, and trusted at real user volume M3 + M5 core, M6 partial
P4 Scale + Revenue Fit Sustainable retention and monetization without quality collapse M6 + M7

Current Status

Phase Status Tests
m0p1: Embeddable Runtime Skeleton COMPLETE 329 passing (293 unit + 36 integration + 3 doc)
m0p2: Tooling & Diagnostics COMPLETE 349 passing (+7 metrics unit + 7 metrics integration + 9 tidalctl CLI)
m0p3: Samples & Docs COMPLETE 11 doc tests (14 with features); 4 examples compile and run
m1p1: Core Type System and Schema COMPLETE 77 passing
m1p2: Write-Ahead Log COMPLETE passing (unit + integration)
m1p3: Storage Engine Trait and fjall Backend COMPLETE 140 passing (128 unit + 12 integration)
m1p4: Signal Ledger COMPLETE 300 passing
m1p5: Entity CRUD and Signal Write API COMPLETE 305 passing (300 unit + 5 integration)
m2p1: Vector Index Integration (USearch) COMPLETE passing
m2p2: Metadata Indexes and Filter Engine COMPLETE passing
m2p3: Ranking Profile Engine COMPLETE passing
m2p4: Diversity Enforcement COMPLETE passing
m2p5: Query Parser and RETRIEVE Executor COMPLETE passing
m3p1: User and Creator Entities with Relationships COMPLETE passing
m3p2: Feedback Loop -- Signal Writes Update User State COMPLETE passing
m3p3: Personalized Ranking Profiles COMPLETE passing
m3p4: User State Filters + M3 UAT COMPLETE 571 lib + 11 m3_uat + 6 m2_uat + 5 signal_api + 8 vector_usearch passing
P0: Beachhead Validation NOT STARTED --
P1: Concierge Alpha NOT STARTED --
PG1: Personalization Core Done gate NOT STARTED --
P2: Productized Beta NOT STARTED --
P3: Public Launch NOT STARTED --
P4: Scale + Revenue Fit NOT STARTED --

Current phase: Milestone 3 COMPLETE. All phases (m3p1m3p4) and all 12 tasks are done. Next: M4 Agent Memory.

Lessons learned:

  • m1p3 keyspaces are organized per EntityKind ("items", "users", "creators"), not by data category. The Tag enum in key encoding provides the data-category namespace within each entity-kind keyspace.
  • The LumenError name is a legacy artifact from a predecessor project. Will be renamed when convenient but does not block progress.
  • MSRV was bumped to 1.91 for fjall 3 compatibility.
  • M2 complete: RETRIEVE query with 11+ sort modes, metadata filters, diversity constraints, and live signal ranking all operational at < 50ms at 10K items.

Product Track: Personal Briefing Feed (Knowledge Workers + Consumers)

This track defines the milestones for the actual product experience (not only the database engine).
Use case reference: docs/personal-briefing-beachhead.md. Dedicated roadmap: docs/planning/PRODUCT_ROADMAP.md.

P0: Beachhead Validation -- "Do users care enough to return?"

Milestone Thesis

Validate that a personal briefing feed solves a painful daily job for users and drives repeat use.

Acceptance Criteria

  • Recruit 20-50 target users (knowledge workers + high-intent consumers).
  • Run daily briefing prototype (can include manual source QA).
  • At least one meaningful feedback action per session for the median user (more, less, hide, mute, save).
  • User interviews confirm value vs baseline feeds ("less noise", "more useful", "saves time").
  • D2 retention reaches agreed threshold for target segment.

P1: Concierge Alpha -- "High-value daily brief for a narrow cohort"

Milestone Thesis

Deliver a reliable daily Today Brief experience with immediate visible adaptation after user feedback.

Acceptance Criteria

  • App surface: ranked brief, reason labels, source links, save/feedback controls.
  • Feedback loop: next refresh reflects less/hide/mute actions immediately.
  • Time-budget mode (5/10/20 min) is available and used.
  • Diversity constraints prevent source/topic domination in top results.
  • Weekly active usage demonstrates repeated utility.

P2: Productized Beta -- "Self-serve and repeatable without handholding"

Milestone Thesis

Turn the alpha into a self-serve product with stable onboarding, trust UX, and measurable quality.

Acceptance Criteria

  • Self-serve onboarding completed in under 3 minutes.
  • "Why this" explanations are present and understandable on every briefing card.
  • Cohort layer available ("trending for people like you").
  • Trust controls available (source transparency, mute/hide persistence).
  • D7 retention and "useful item rate" exceed baseline comparison feed.
  • PG1 Personalization Core Done gate has passed.

P3: Public Launch -- "Trusted at real volume"

Milestone Thesis

Launch publicly with reliability, quality, and trust guardrails suitable for broad use.

Acceptance Criteria

  • Reliability and latency SLOs defined and met for briefing generation.
  • Quality floor enforced (freshness, source quality, duplicate suppression).
  • Notification cadence controls prevent spam.
  • Core support and incident process in place for user-facing regressions.

P4: Scale + Revenue Fit -- "Sustainable business without degrading quality"

Milestone Thesis

Prove the product can grow and monetize while preserving user trust and briefing quality.

Acceptance Criteria

  • Monetization model validated (subscription, team plan, or equivalent).
  • Revenue metrics tracked alongside quality metrics (no quality-revenue trade-off regressions).
  • Retention and engagement remain stable as volume increases.
  • Product roadmap for next segment expansion is data-backed.

PG1: Personalization Core Done (Blocking Gate)

Milestone Thesis

Before product breadth expansion, the core personalization loop must be provably correct and immediately responsive.

Acceptance Criteria

  • Hard negatives (hide/mute/block) never leak after write, restart, or replay.
  • Explicit feedback (more/less/skip/save) changes next-refresh ranking within target latency.
  • User personalization state rebuilds deterministically from checkpoint + WAL replay.
  • Useful-item rate and repeated-unwanted-item rate outperform a non-personalized baseline.
  • Diversity guardrails hold while maintaining personalization quality.

Milestone 0: Embeddable Runtime -- "Runs in your process in minutes"

Milestone Thesis

Before we prove any ranking math, developers must be able to embed tidalDB inside an existing service with zero operational prep. M0 delivers the runtime glue — an ergonomic builder API, deterministic storage layout, a tiny admin CLI, and living examples — so the very first experience is cargo add tidaldb, TidalDb::builder().in_memory().open(), and a passing smoke test.

Phases

Phase 1: Embeddable Runtime Skeleton

Delivers: A cohesive Config/Builder API for single-process use, with in-memory and filesystem-backed defaults, sandboxed data directories, and graceful shutdown hooks developers can call from tests or application drop handlers.

  • Builder exposes ephemeral() / single_process() shortcuts and eagerly validates directories.
  • Shutdown hooks drain WAL writer threads and surface errors.
  • Temp-directory helper guarantees deterministic cleanup (used in doctests).

Phase 2: Tooling & Diagnostics

Delivers: tidalctl (a minimal CLI) for inspecting embedded instances, plus a lightweight metrics surface (Prometheus text or JSON) tagged with the same IDs future distributed deployments will use.

  • tidalctl status --path <dir> returns JSON with WAL seq, config hash, uptime.
  • Metrics endpoint optional (disabled by default) exposes /metrics and /healthz.
  • Tooling reuses the same path helpers from Phase 1.

Phase 3: Samples & Docs

Delivers: Quick-start samples (For You POC + integration tests) compiled as doctests, and reference snippets for embedding tidalDB inside Axum/Actix or a CLI app. Keeps DX in lockstep with the runtime.

  • Quickstart example + doctest run under CI (cargo test --doc --examples).
  • Axum/Actix embedding examples include graceful shutdown + metrics wiring.
  • CONTRIBUTING updated with “run samples” checklist.

UAT Scenario

Given:
  // in tests/lib.rs
  let db = TidalDb::builder()
      .ephemeral()
      .with_temp_dir()
      .open()
      .unwrap();

When:
  db.health_check();           // ok
  tidalctl status --path <dir> // prints WAL, storage, signal counts
  cargo test --doc             // quick-start snippet compiles & runs

Then:
  - Builder defaults require zero manual config
  - CLI connects to the same files used by the embedded process
  - Samples stay in sync (failing doctest fails CI)

Milestone 1: Signal Engine -- "Signals are a database primitive"

Milestone Thesis

A developer can open a tidalDB instance, define signal types with decay rates, write engagement events, and read back decay-correct scores and windowed aggregates -- all without computing any temporal math in application code. This proves that the hardest primitive (temporal signals with O(1) decay, velocity, and windowed aggregation) works correctly and meets the performance budget.

UAT Scenario

Given:
  A tidalDB instance is opened with a schema defining:
    - Entity type: Item with metadata fields (title, category, created_at)
    - Signal type: "view" with exponential decay, half_life=7d, windows=[1h, 24h, 7d]
    - Signal type: "like" with exponential decay, half_life=14d, windows=[24h, 7d, all_time]
    - Signal type: "skip" with exponential decay, half_life=1d, windows=[1h, 24h]

When:
  1. Write 100 items with metadata
  2. Write 10,000 signal events across the items (views, likes, skips)
     with timestamps spanning the last 7 days
  3. Read the decay score for item #42, signal "view", at current time
  4. Read the windowed count for item #42, signal "view", window=24h
  5. Read the velocity for item #42, signal "view", window=1h
  6. Write a new "view" event for item #42
  7. Immediately re-read the decay score, windowed count, and velocity
  8. Close and reopen the tidalDB instance
  9. Re-read all values for item #42

Then:
  - Step 3: Decay score matches S(t) = sum(w_i * exp(-lambda * (t - t_i)))
    computed analytically from raw events, to 6 decimal places
  - Step 4: Windowed count equals the exact count of "view" events
    within the last 24h window
  - Step 5: Velocity equals windowed_count / window_duration
  - Step 7: All values reflect the new event immediately
    (decay score increased, count incremented, velocity updated)
  - Step 9: All values match step 7 (crash recovery preserves state)
  - Performance: decay score read < 100ns per entity,
    signal write < 100us including WAL fsync (amortized),
    200-entity scoring pass < 5us

Phases

Phase 1: Core Type System and Schema -- COMPLETE

Delivers: The foundational type system -- entity IDs, signal type definitions, decay rate declarations, window specifications, and the error types that every subsequent module depends on. The schema module that validates and stores signal/entity definitions.

Acceptance Criteria:

  • EntityId is a u64 newtype with Display, Hash, Eq, Ord, to_be_bytes() (big-endian, preserves numeric ordering)
  • EntityKind enum: Item, User, Creator
  • SignalTypeDef captures: name, target EntityKind, DecayModel (exponential with pre-computed lambda / linear / permanent), WindowSet, velocity enabled flag
  • DecayModel::Exponential stores pre-computed lambda = ln(2) / half_life.as_secs_f64() -- no division on hot path
  • Window enum: OneHour, TwentyFourHours, SevenDays, ThirtyDays, AllTime with duration(), label(), duration_secs_f64()
  • WindowSet deduplicates and sorts windows; empty() for permanent signals
  • LumenError enum covers Storage, NotFound, Schema, Durability, Query, Internal variants with From impls for each sub-error
  • SchemaError enum validates: duplicate signal names, invalid identifiers, zero half-life/lifetime, empty windows for non-permanent signals, velocity without windows
  • Schema validation via SchemaBuilder rejects invalid configurations at construction time
  • Property tests: lambda correctness across half-life range, byte ordering preservation
  • cargo fmt clean, cargo clippy -D warnings clean, all 77 tests pass

Depends On: None Complexity: M Research Reference: docs/research/tidaldb_signal_ledger.md (decay formula, EntityState struct)

Phase 2: Write-Ahead Log -- COMPLETE

Delivers: A durable, append-only log for signal events. Every signal write is fsync'd before acknowledgment. Group commit amortizes fsync cost. Content-addressed events via BLAKE3 for deduplication. The WAL is the source of truth -- all other state is derived.

Acceptance Criteria:

  • WAL entries are length-prefixed with BLAKE3 checksums
  • Group commit batches up to 100 events or 10ms, whichever comes first
  • Duplicate events (same BLAKE3 hash) are silently deduplicated
  • WAL replay from any checkpoint produces identical state to uninterrupted execution (property test with 10,000+ random event sequences)
  • fsync is called per batch, not per event
  • WAL can be truncated after a checkpoint without losing committed state
  • Crash simulation (kill at random WAL positions) never produces corrupt state -- either the event is committed or it is not

Depends On: Phase 1 Complexity: L Research Reference: docs/research/tidaldb_wal.md (wire format, group commit, crash detection, deduplication), thoughts.md Part II.1 (WAL convergence), Part V.5-6 (quarantine-first, group commit)

Phase 3: Storage Engine Trait and fjall Backend -- COMPLETE

Delivers: The StorageEngine trait abstraction and two implementations: FjallBackend (fjall 3 LSM-tree) for production and InMemoryBackend (BTreeMap + RwLock) for deterministic testing. Key encoding follows the subject-prefix pattern with a Tag discriminant. FjallStorage coordinates three keyspaces per entity kind. FjallAtomicBatch provides cross-keyspace atomic writes.

Acceptance Criteria:

  • StorageEngine trait with get, put, delete, scan_prefix, write_batch, flush operations
  • Key encoding: [entity_id: 8 bytes BE][0x00][Tag: 1 byte][suffix...] with Tag enum (Evt=0x01, Sig=0x02, Meta=0x03, Rel=0x04, Mv=0x05, Idx=0x06)
  • encode_key, parse_key roundtrip correctly for all tag variants and arbitrary suffixes
  • entity_prefix (9 bytes) and entity_tag_prefix (10 bytes) for scoped prefix scans
  • Byte-lexicographic key ordering matches numeric entity ID ordering (property tested)
  • FjallBackend wraps a single fjall Keyspace, implements StorageEngine
  • FjallStorage owns a fjall Database with three keyspaces: "items", "users", "creators" (one per EntityKind)
  • FjallStorage::backend(EntityKind) routes to the correct keyspace backend
  • Entity kind isolation: same key written to different entity kinds does not collide
  • FjallAtomicBatch provides cross-keyspace atomic writes via fjall::OwnedWriteBatch
  • Data persists across close and reopen (flush_all + reopen test)
  • InMemoryBackend uses BTreeMap + RwLock for deterministic, sorted, concurrent testing
  • WriteBatch and BatchOp types for atomic multi-operation writes
  • PrefixIterator type alias for boxed prefix scan iterators
  • Property tests with proptest: encode/parse roundtrip, prefix ordering, prefix containment
  • Criterion benchmarks passing
  • cargo fmt clean, cargo clippy -D warnings clean, all 140 tests pass (128 unit + 12 integration)

Depends On: Phase 1 Complexity: L Research Reference: thoughts.md Part V.9 (hybrid storage), Part V.12 (subject-prefix keys), CODING_GUIDELINES.md section 2

Phase 4: Signal Ledger -- Decay Scores and Windowed Aggregation

Delivers: The in-memory per-entity signal state with running decay scores (O(1) update, O(1) read) and bucketed windowed counters. Signal writes update the running scores atomically. Signal reads return decay-correct values without scanning raw events. State is checkpointed to storage for crash recovery.

Acceptance Criteria:

  • EntitySignalState is #[repr(C, align(64))] -- one L1 cache line per hot-path struct
  • Running decay formula: S(t) = S(t_prev) * exp(-lambda * dt) + weight -- mathematically exact, verified against analytical brute-force computation to 6 decimal places across 10,000 random event sequences (property test)
  • Out-of-order events handled correctly: when t_event < last_update, weight is pre-decayed: score += weight * exp(-lambda * (last_update - t_event))
  • Windowed counts use per-minute bucketed counters (BucketedCounter) supporting 1h/24h/7d windows
  • Velocity = windowed_count / window_duration_seconds
  • Signal write latency < 100 microseconds including WAL write (amortized), benchmarked with criterion
  • Decay score read latency < 100ns per entity per lambda, benchmarked with criterion
  • 200-entity scoring pass < 5 microseconds, benchmarked with criterion
  • State checkpointed to storage every 30 seconds; crash recovery reconstructs from checkpoint + WAL replay
  • DashMap or sharded map for concurrent entity state access; signal counters use AtomicU64 with Relaxed ordering

Depends On: Phase 2, Phase 3 Complexity: XL Research Reference: docs/research/tidaldb_signal_ledger.md (running-score formula, SWAG, BucketedCounter, EntityState struct, three-tier architecture)

Phase 5: Entity CRUD and Signal Write API

Delivers: The public API surface for Milestone 1. TidalDB::open(), TidalDB::shutdown(), entity write/read, signal write/read. This is the interface the UAT scenario tests against. Includes the signal() method that atomically writes to WAL, updates in-memory state, and returns immediately.

Acceptance Criteria:

  • TidalDB::open(config) opens storage, restores in-memory state from checkpoint + WAL replay, returns Result<TidalDB>
  • TidalDB::shutdown() checkpoints all in-memory state, syncs WAL, closes storage cleanly
  • db.write_item(id, metadata) stores entity metadata
  • db.signal(signal_type, entity_id, weight, timestamp) atomically: appends to WAL, updates decay scores, updates windowed counters
  • db.read_decay_score(entity_id, signal_type, lambda_index) returns current decayed score
  • db.read_windowed_count(entity_id, signal_type, window) returns count within window
  • db.read_velocity(entity_id, signal_type, window) returns count / window_duration
  • Full UAT scenario passes as an integration test
  • TidalDB is Send + Sync -- safe to share across threads behind Arc

Depends On: Phase 4 Complexity: M Research Reference: CODING_GUIDELINES.md section 9 (public API surface)

Deferred to Later Milestones

  • User entities and preference vectors -- deferred to M3 because M1 proves the signal primitive without needing user context
  • Creator entities and relationship edges -- deferred to M2/M3 because M1 only needs items to prove signal correctness
  • Vector index (USearch) -- deferred to M2 because M1 does not need ANN retrieval
  • Text index (Tantivy) -- deferred to M4 because M1 does not need full-text search
  • Ranking profiles -- deferred to M2 because M1 proves signals work; M2 proves ranking over signals works
  • Query parser -- deferred to M2; M1 uses the Rust API directly
  • Diversity enforcement -- deferred to M2 because M1 does not produce ranked result sets
  • Signal rollups (hourly/daily materialization) -- deferred to M5 because the bucketed counter approach serves the performance budget through M4; rollups become necessary only at scale for 30d+ windows
  • RocksDB backend -- deferred indefinitely; fjall is the primary backend, RocksDB is the trait-abstracted fallback if benchmarks demand it

Integration Test

#[test]
fn milestone_1_uat() {
    // Open tidalDB with signal schema
    let db = TidalDB::open(Config {
        data_dir: temp_dir(),
        schema: Schema::builder()
            .entity_type("item", &["title", "category", "created_at"])
            .signal("view", Decay::exponential(Duration::days(7)),
                    &[Window::Hours(1), Window::Hours(24), Window::Days(7)])
            .signal("like", Decay::exponential(Duration::days(14)),
                    &[Window::Hours(24), Window::Days(7), Window::AllTime])
            .signal("skip", Decay::exponential(Duration::days(1)),
                    &[Window::Hours(1), Window::Hours(24)])
            .build(),
    }).unwrap();

    // Write 100 items
    for i in 0..100 {
        db.write_item(EntityId(i), metadata(i)).unwrap();
    }

    // Write 10,000 signal events spanning 7 days
    let events = generate_events(10_000, Duration::days(7));
    for e in &events {
        db.signal(e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Read and verify item #42
    let now = Timestamp::now();
    let analytical_score = compute_analytical_decay(&events, EntityId(42), "view", now);
    let actual_score = db.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!((actual_score - analytical_score).abs() < 1e-6);

    let analytical_count = count_events_in_window(&events, EntityId(42), "view", now, Duration::hours(24));
    let actual_count = db.read_windowed_count(EntityId(42), "view", Window::Hours(24)).unwrap();
    assert_eq!(actual_count, analytical_count);

    // Write new event and verify immediate visibility
    db.signal("view", EntityId(42), 1.0, now).unwrap();
    let new_score = db.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!(new_score > actual_score);

    // Close, reopen, verify persistence
    db.shutdown().unwrap();
    let db2 = TidalDB::open(same_config()).unwrap();
    let recovered_score = db2.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!((recovered_score - new_score).abs() < 1e-6);
}

Done When

A developer can embed tidalDB as a Rust dependency, define signal types with decay rates and windows in schema, write thousands of signal events, and read back decay-correct scores, windowed counts, and velocity values that match analytical computation to 6 decimal places -- including after a crash and restart. Performance benchmarks pass: signal write < 100us amortized, decay read < 100ns per entity, 200-entity scoring < 5us.


Milestone 2: Ranked Retrieval -- "A single query retrieves, scores, and ranks content"

Milestone Thesis

A developer can write items with metadata and embeddings, write signal events, and execute a RETRIEVE query that returns items ranked by a named profile using live signal scores -- with metadata filters and diversity constraints applied by the database, not the application. This proves that ranking is a database operation, not application logic.

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items with metadata (title, category, format, duration, created_at)
      and 1536-dim embeddings
    - Signal types: view (7d decay), like (14d decay), skip (1d decay),
      share (3d decay), completion (30d decay)
    - 100,000 signal events spanning 7 days across the items
    - Ranking profiles defined:
      * "trending" -- share_velocity(6h) primary, view_velocity(6h) secondary,
        engagement_ratio gate > 0.03
      * "hot" -- score / (age_hours + 2)^1.8
      * "new" -- created_at DESC
      * "top_week" -- quality_score within 7d window
      * "hidden_gems" -- high completion_rate, inverse view_count
      * "controversial" -- max(likes * dislikes)

When:
  1. RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25
  2. RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20
  3. RETRIEVE items USING PROFILE new LIMIT 20
  4. RETRIEVE items USING PROFILE top_week LIMIT 20
  5. RETRIEVE items USING PROFILE hidden_gems FILTER min_completion_rate:0.7 LIMIT 10
  6. RETRIEVE items USING PROFILE controversial LIMIT 10
  7. Write a burst of 100 "share" signals for item #500
  8. Re-execute the trending query

Then:
  - Step 1: Items ordered by share velocity, max 1 per creator, items with
    engagement_ratio < 0.03 excluded
  - Step 2: Only jazz items returned, ordered by hot formula
  - Step 3: Items ordered by created_at descending, no signal computation
  - Step 4: Items ordered by quality score computed from 7d-windowed signals
  - Step 5: Items with high completion but low views, sorted by quality/reach ratio
  - Step 6: Items with highest product of positive and negative signals
  - Step 7: ok
  - Step 8: Item #500 appears higher in trending results (signal written 100ms ago
    is reflected)
  - Performance: end-to-end RETRIEVE < 50ms for 10K items

Phases

Phase 1: Vector Index Integration (USearch)

Delivers: USearch wrapped behind a trait, with mmap persistence, f16 quantization, and the adaptive filtered search planner. Items can be inserted with embeddings and retrieved by ANN similarity.

Acceptance Criteria:

  • VectorIndex trait with insert(key, vector), remove(key), search(query, k), filtered_search(query, k, predicate), save(), load(), view()
  • USearch backend implements the trait with f16 quantization (default), mmap persistence
  • Vectors normalized at insertion time (L2 distance equivalent to cosine for unit vectors)
  • Adaptive query planner: selectivity < 2% triggers pre-filter + brute-force; 2-100% uses filtered_search with predicate callback
  • ANN retrieval at 10K vectors returns top-100 with recall@10 > 0.95
  • ANN retrieval latency < 10ms at 10K vectors (benchmarked)
  • Persistence: save on checkpoint, view() on restart for immediate read serving
  • #![forbid(unsafe_code)] relaxed only in the USearch FFI boundary module with SAFETY comments

Depends On: m1p3 (storage traits) Complexity: L Research Reference: docs/research/ann_for_tidaldb.md (USearch architecture, filtered search, f16, mmap)

Phase 2: Metadata Indexes and Filter Engine

Delivers: Roaring bitmap indexes for categorical metadata, B-tree indexes for range attributes, and a composable filter engine that evaluates arbitrary filter combinations. The filter engine produces either a bitmap (for pre-filtering ANN) or a predicate closure (for in-graph filtering).

Acceptance Criteria:

  • Roaring bitmap per high-cardinality metadata value: category, format, creator_id
  • B-tree index for range attributes: created_at, duration
  • Filter expressions are composable: AND across dimensions, OR within a dimension
  • filter.selectivity() estimates the fraction of items matching (for query planner)
  • filter.to_bitmap() returns a RoaringBitmap for pre-filtering
  • filter.to_predicate() returns a Fn(EntityId) -> bool for in-graph filtering
  • Filters tested: category:jazz, format:video, duration_min:5m, created_within:7d, and arbitrary combinations
  • Filter evaluation < 1 microsecond per candidate (benchmarked)

Depends On: m1p3 (storage engine) Complexity: M Research Reference: docs/research/ann_for_tidaldb.md (metadata indexes, selectivity estimation, roaring bitmaps)

Phase 3: Ranking Profile Engine

Delivers: Named ranking profiles declared as data (not compiled code), parsed, validated, stored, and executed by the database. Profiles reference signal scores, windowed aggregates, velocity, metadata fields, and define quality gates. Profiles are versioned and swappable at query time.

Acceptance Criteria:

  • Profile declaration syntax supports: primary signal, secondary signals with weights, BOOST, GATE (minimum threshold), PENALIZE, EXCLUDE
  • Profiles stored in schema, versioned, retrievable by name
  • Profile execution: given a candidate set and a profile, produce a scored and sorted result list
  • Built-in profiles implemented: trending, hot, new, top_week, top_month, top_all_time, hidden_gems, controversial, most_viewed, most_liked, shuffle
  • hot formula: log10(max(|positive - negative|, 1)) / (age_hours + 2)^gravity with configurable gravity
  • controversial formula: (positive * negative) / (positive + negative)^2
  • hidden_gems formula: quality_score * (1 / log10(view_count + 10)) -- the +10 prevents division by zero for items with zero views
  • Profile change does not require recompile -- profiles are runtime data
  • 200-candidate scoring pass with decay-only profile < 10 microseconds, with velocity-based profile (trending) < 100 microseconds (both Criterion benchmarked)

Depends On: m1p4 (signal ledger) Complexity: L Research Reference: VISION.md (ranking profile declarations), ai-lookup/services/ranking-profiles.md, USE_CASES.md Appendix B (sort mode formulas)

Phase 4: Diversity Enforcement

Delivers: Post-scoring diversity pass that reorders results to satisfy constraints (max_per_creator, format_mix) without reducing result count. Implemented as a greedy selection pass over the scored candidate list.

Acceptance Criteria:

  • max_per_creator:N enforced: no more than N items from any single creator in the result set
  • format_mix:true enforced: no more than 60% of results from any single format
  • Diversity pass does not reduce result count -- it selects the next-best candidate that satisfies constraints
  • Diversity pass adds < 1ms for 200 candidates (benchmarked)
  • When diversity constraints cannot be fully satisfied (too few creators), results are returned with a warning flag, not an error
  • Property test: diversity constraints hold for 10,000 random candidate sets

Depends On: Phase 3 (ranking profiles produce scored lists) Complexity: M Research Reference: VISION.md (diversity as query constraint), thoughts.md Part V.14 (MMR post-scoring)

Phase 5: Query Parser and RETRIEVE Executor

Delivers: The query parser for the RETRIEVE operation and the executor that orchestrates candidate retrieval, filtering, scoring, diversity, and result assembly. This is the "one query" entry point. For M2, the RETRIEVE query does not require FOR USER (no personalization yet) -- it operates on the full item corpus with filters and profiles.

Acceptance Criteria:

  • Parser handles: RETRIEVE items, USING PROFILE <name>, FILTER <conditions>, DIVERSITY <constraints>, LIMIT <n>, EXCLUDE [ids]
  • Parser produces a typed AST; parse errors include position and helpful message
  • Executor pipeline: candidate retrieval (ANN or full scan based on profile) -> filter -> score -> diversity -> limit -> return
  • When profile uses velocity/decay signals, executor uses ANN retrieval over embeddings then scores with signal state
  • When profile is new or alphabetical, executor skips ANN and uses metadata index directly
  • End-to-end RETRIEVE latency < 50ms at 10K items (benchmarked)
  • Results include: entity_id, score, and a signal snapshot (key signal values used in scoring) for debugging/transparency
  • SIGNAL write command also parsed and routed to signal write path from M1
  • Full M2 UAT scenario passes as an integration test

Depends On: Phase 1, Phase 2, Phase 3, Phase 4 Complexity: L Research Reference: ai-lookup/features/query-language.md, SEQUENCE.md (all sequence diagrams)

Deferred to Later Milestones

  • FOR USER clause and user preference vectors -- deferred to M3; M2 proves ranking works without personalization
  • SIMILAR TO clause (related content) -- deferred to M3; requires user context for personalization layer
  • Relationship graph (follows, blocks) -- deferred to M3; M2 filters on metadata, not relationships
  • SEARCH query (text + semantic) -- deferred to M4; M2 proves RETRIEVE ranking
  • Full-text index (Tantivy) -- deferred to M4
  • Exploration budget / cold start -- deferred to M3; requires user context to be meaningful
  • User state filters (unseen, saved, liked) -- deferred to M3; requires user entities
  • Engagement threshold filters (min_views, min_likes) -- partially implemented via signal reads; full composable filter syntax deferred to M5

Integration Test

#[test]
fn milestone_2_uat() {
    let db = open_with_full_schema();

    // Write 10K items with embeddings
    for i in 0..10_000 {
        db.write_item(EntityId(i), metadata(i), Some(embedding(i))).unwrap();
    }

    // Write 100K signal events
    for e in generate_events(100_000, Duration::days(7)) {
        db.signal(e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Trending query with diversity
    let results = db.retrieve(
        "RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25"
    ).unwrap();
    assert_eq!(results.len(), 25);
    assert!(results.windows(2).all(|w| w[0].score >= w[1].score));
    assert!(creator_counts(&results).values().all(|&c| c <= 1));

    // Category filter with hot sort
    let jazz = db.retrieve(
        "RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20"
    ).unwrap();
    assert!(jazz.iter().all(|r| r.metadata["category"] == "jazz"));

    // Signal freshness: write burst, verify ranking change
    let pre_burst = db.retrieve(
        "RETRIEVE items USING PROFILE trending LIMIT 10"
    ).unwrap();
    for _ in 0..100 {
        db.signal("share", EntityId(500), 1.0, Timestamp::now()).unwrap();
    }
    let post_burst = db.retrieve(
        "RETRIEVE items USING PROFILE trending LIMIT 10"
    ).unwrap();
    let pre_rank = pre_burst.iter().position(|r| r.id == EntityId(500));
    let post_rank = post_burst.iter().position(|r| r.id == EntityId(500));
    assert!(post_rank.unwrap() < pre_rank.unwrap_or(25));
}

Done When

A developer can write items with embeddings and metadata, write signal events, and execute RETRIEVE queries with any of the 11+ built-in sort modes, metadata filters, and diversity constraints. Results are correctly ranked by the named profile. Signal events written 100ms ago are reflected in the next query. End-to-end latency < 50ms at 10K items. Diversity constraints hold in every result set.


Milestone 3: Personalized Ranking -- "The For You query works"

Milestone Thesis

A developer can write user entities with preference vectors, write relationship edges (follows, blocks), write engagement signals that update user profiles and relationship weights automatically, and execute RETRIEVE items FOR USER @user_id USING PROFILE for_you -- getting results shaped by the user's history, relationships, and implicit preferences. This proves that the feedback loop closes inside the database.

Enables

  • UC-01 (For You Feed) -- Full: personalized ranking with diversity, exploration, cold start
  • UC-04 (Following Feed) -- Full: restricted to followed creators, chronological + quality tiebreaker
  • UC-05 (Related/Up Next) -- Core: ANN retrieval from source item, user preference re-ranking
  • UC-07 (Notifications) -- Core: relationship-strength scoring, recency filtering
  • UC-09 (User Library) -- Partial: unseen/liked/saved filters enable history and library queries

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items across 200 creators, with embeddings
    - 500 users with initial preference embeddings
    - Relationship edges: follows, blocks
    - Signals: view, like, skip, hide, completion, share
    - 500,000 historical signal events establishing user preferences
    - Profiles: for_you, following, related, notification

When:
  1. RETRIEVE items FOR USER @user_42 USING PROFILE for_you
     FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50
  2. RETRIEVE items FOR USER @user_42 FILTER relationship:follows
     USING PROFILE following LIMIT 50
  3. RETRIEVE items SIMILAR TO @item_abc FOR USER @user_42
     USING PROFILE related FILTER unseen LIMIT 10
  4. SIGNAL like item:@item_xyz user:@user_42
  5. Re-execute the for_you query
  6. SIGNAL hide item:@item_999 user:@user_42
  7. SIGNAL block user:@user_42 target_creator:@creator_77
  8. Re-execute the for_you query

Then:
  - Step 1: Results personalized -- items matching user_42's preference vector
    rank higher; items from blocked creators excluded; items already seen excluded;
    max 2 per creator; 10% exploration budget (items from unfollowed creators)
  - Step 2: Only items from followed creators, chronological order
  - Step 3: Items semantically similar to @item_abc, re-ranked by user_42's
    preference match, already-seen excluded
  - Step 4: Signal write atomically updates: item like count, user->creator
    interaction weight, user preference vector shifted toward item embedding
  - Step 5: Results shift -- items similar to @item_xyz's topic rank higher;
    creator of @item_xyz appears more frequently
  - Step 6: @item_999 never appears in any future query for user_42
  - Step 7: All items by creator_77 excluded from all queries for user_42
  - Step 8: No items from creator_77; no item_999; shift from like reflected

Phases

Phase 1: User and Creator Entities with Relationships (m3p1)

Delivers: User and creator entity types stored in their own fjall keyspaces (EntityKind::User, EntityKind::Creator) with preference embeddings, metadata, and a relationship graph. Relationship edges are (from_entity, to_entity, type, weight, timestamp) stored under the Tag::Rel key prefix. Three user-state bitmap indexes (FollowsBitmap, UserSeenBitmap, UserBlockedSet) power the unseen, unblocked, and relationship:follows filters.

Acceptance Criteria:

  • db.write_user(user_id, metadata, Option<embedding>) stores user entity in the users keyspace
  • db.write_creator(creator_id, metadata, Option<embedding>) stores creator entity in the creators keyspace
  • db.write_relationship(from, to, rel_type, weight, timestamp) stores a directional weighted edge
  • db.read_relationship(from, to, rel_type) returns Option<RelationshipEdge>
  • db.list_relationships(from, rel_type) returns all edges of a type from a source entity
  • Relationship types supported: follows, blocks, interaction_weight, hide, mute
  • Key encoding: [from_entity_id][0x00][REL][type_byte][to_entity_id] for O(1) lookup and prefix scan by (from, type)
  • FollowsBitmap::for_user(user_id) returns a RoaringBitmap of item IDs from all followed creators
  • UserSeenBitmap::for_user(user_id) returns a RoaringBitmap of item IDs the user has viewed
  • UserBlockedSet::for_user(user_id) returns blocked creator IDs + hidden item IDs
  • Relationship write/read latency < 50 microseconds (benchmarked)
  • User and creator entities persist across shutdown and restart
  • Relationships persist across shutdown and restart via storage engine

Task Breakdown:

# Task Delivers Complexity
01 User + Creator Entity Types and Storage UserEntity, CreatorEntity, write/read APIs, metadata codec, embedding slots M
02 Relationship Graph RelationshipEdge, RelationshipType, storage codec, CRUD operations, prefix scan L
03 User-State Bitmap Indexes FollowsBitmap, UserSeenBitmap, UserBlockedSet, bitmap maintenance hooks M

Depends On: m1p1 (types), m1p3 (storage engine, key encoding, Tag::Rel), m1p5 (entity write API pattern), m2p1 (vector index for embedding storage), m2p2 (bitmap indexes, FilterExpr, FilterResult) Complexity: L (3 sequential tasks: 01 -> 02 -> 03) Research Reference: docs/research/tidaldb_signal_ledger.md (three-tier storage, subject-prefix keys), docs/research/ann_for_tidaldb.md (user preference vector in embedding slot), thoughts.md Part V.12 (subject-prefix keys), Part V.16 (user preference vector)

Phase 2: Feedback Loop -- Signal Writes Update User State (m3p2)

Delivers: Atomic multi-state updates on signal write. When a signal event is written (view, like, skip, hide, block, completion, share), the database atomically updates: the item's signal ledger, the user's preference vector (EMA), the user-to-creator interaction weight, and the user-state bitmap indexes. Four components: (1) user preference vector EMA update with configurable learning rate, (2) interaction weight ledger using the existing decay infrastructure from m1p4, (3) hard negative storage with WAL-backed durability, and (4) an atomic signal dispatch that wires all state updates into a single transactional signal write.

Acceptance Criteria:

  • db.signal("view", item_id, 1.0, ts) with user context atomically: updates item signal ledger, marks item as seen in UserSeenBitmap, increments user->creator interaction weight
  • db.signal("like", item_id, 1.0, ts) with user context atomically: updates item signal ledger, shifts user preference vector toward item embedding (EMA), increments user->creator interaction weight
  • db.signal("skip", item_id, 1.0, ts) with user context atomically: updates item signal ledger, shifts user preference vector away from item embedding, decays user->creator interaction weight
  • db.signal("hide", item_id, 1.0, ts) with user context atomically: writes permanent hide edge, adds item to UserBlockedSet.hidden_items, excludes from all future queries for this user
  • db.signal("block", user_id, creator_id, ...) atomically: writes permanent block edge, adds creator to UserBlockedSet.blocked_creators, excludes all creator items from all future queries
  • Preference vector EMA: pref_new = normalize(alpha * item_embedding + (1 - alpha) * pref_old) with configurable alpha (default 0.1)
  • Interaction weights use the same DecayModel::Exponential infrastructure from m1p4
  • Hard negatives (hide/block) are WAL-backed and survive crash + replay
  • Property test: for any sequence of hide/block/signal events, a RETRIEVE query NEVER returns a hidden item or blocked creator's items
  • All updates visible to the next query (no eventual consistency lag within the process)
  • Signal dispatch overhead < 50 microseconds beyond the base item signal write

Task Breakdown:

# Task Delivers Complexity
01 User Preference Vector EMA update, normalization, learning rate config, cold-start initialization, storage codec L
02 Interaction Weight Ledger User-to-creator weights using decay infrastructure, update on engagement signals, read API M
03 Hard Negatives Hide/block permanent storage, WAL-backed durability, crash-safe replay, bitmap integration L
04 Atomic Signal Dispatch UserSignalContext wiring, multi-target dispatch, property tests for correctness invariants L

Depends On: m3p1 (user/creator entities, relationship graph, user-state bitmaps), m1p4 (signal ledger, decay infrastructure), m1p5 (signal write API), m2p1 (vector index for embedding reads) Complexity: XL (4 tasks; Tasks 01 and 03 can parallelize; Task 04 depends on all three) Research Reference: docs/research/tidaldb_signal_ledger.md (three-tier storage, signal dispatch), docs/research/ann_for_tidaldb.md (user preference vector management), thoughts.md Part V.16 (user preference vector as database-managed embedding)

Phase 3: Personalized Ranking Profiles (m3p3)

Delivers: Four personalized ranking profiles (for_you, following, related, notification) that incorporate user context into scoring, plus cold-start handling for new users and new items. The FOR USER @user_id clause is parsed and resolved into a UserContext that loads the user's preference vector, interaction weights, followed creators, and blocked state. The SIMILAR TO @item_id clause is parsed for the related profile. The profile executor uses this context to score candidates with personalization factors.

Acceptance Criteria:

  • FOR USER @user_id clause parsed by the query parser and resolved into UserContext
  • SIMILAR TO @item_id clause parsed for related-content retrieval
  • UserContext loaded from UserStateIndex, InteractionWeightLedger, preference vector
  • for_you profile: ANN retrieval using user preference vector, scoring = preference_match * engagement_velocity * recency_decay * social_proof, gates on completion_rate, penalizes skip count, 10% exploration budget
  • following profile: candidates restricted to followed creators' items (via FollowsBitmap), sorted by created_at DESC
  • related profile: ANN retrieval using source item embedding, re-ranked by user preference match, seen items excluded
  • notification profile: candidates from followed creators' recent items, scored by relationship_strength * item_quality
  • Cold-start users (no preference vector): fall back to population-level signals (trending/quality)
  • Cold-start items (no signals): exploration window -- appear in ~2% of for_you feeds
  • Exploration budget: ~5 of 50 for_you results from unfollowed creators to prevent filter bubbles
  • ProfileExecutor extended with score_with_user_context() method
  • for_you, following, related, notification added to ProfileRegistry as builtins

Task Breakdown:

# Task Delivers Complexity
01 FOR USER Query Context UserContext loader, query parser extensions for FOR USER and SIMILAR TO, planner integration M
02 Personalized Profiles for_you, following, related, notification profile implementations, executor extensions L
03 Cold Start and Exploration Cold-start user fallback, cold-start item injection, exploration budget enforcement M

Depends On: m3p2 (feedback loop: preference vectors populated, interaction weights updated, user-state bitmaps maintained), m2p3 (ranking profile engine, ProfileExecutor), m2p5 (query parser, RETRIEVE executor), m2p1 (vector index for ANN retrieval with user preference vector) Complexity: L (3 sequential tasks: 01 -> 02 -> 03) Research Reference: docs/research/ann_for_tidaldb.md (ANN retrieval with user preference vector as query), VISION.md (ranking profiles, personalization factors, cold start), USE_CASES.md (UC-01 For You, UC-04 Following, UC-05 Related, UC-07 Notifications)

Phase 4: User State Filters + M3 UAT Integration Test (m3p4)

Delivers: Composable user-state filters (unseen, unblocked, saved, liked, in_progress) integrated with the existing FilterExpr/FilterResult system from m2p2, plus the end-to-end M3 UAT integration test that proves the full "For You" query works. User-state filters require the FOR USER clause (from m3p3) to resolve user context and are evaluated alongside metadata filters during the RETRIEVE pipeline.

Acceptance Criteria:

  • FILTER unseen excludes items the user has viewed (via UserSeenBitmap)
  • FILTER unblocked excludes items from blocked creators and hidden items (via UserBlockedSet)
  • FILTER saved returns only items the user has saved
  • FILTER liked returns only items the user has liked
  • FILTER in_progress returns items with partial completion signal (0.0 < completion < 0.8)
  • User-state filters compose with metadata filters: FILTER unseen, category:jazz, format:video
  • User-state filters require FOR USER clause; used without it returns LumenError::Query error with helpful message
  • FilterExpr extended with Unseen, Unblocked, Saved, Liked, InProgress variants
  • Filter evaluation produces FilterResult::Predicate for user-state filters (not bitmap)
  • The RETRIEVE executor intersects user-state predicates with metadata filter bitmaps
  • Full M3 UAT integration test passes (all 8 UAT scenario steps verified)

Task Breakdown:

# Task Delivers Complexity
01 User State Filters Unseen, Unblocked, Saved, Liked, InProgress filter variants, parser recognition, executor integration M
02 M3 UAT Integration Test End-to-end integration test covering all 8 UAT scenario steps, property tests for hard-negative invariants L

Depends On: m3p3 (personalized profiles, FOR USER query context parsing, cold-start handling), m3p2 (feedback loop: seen bitmaps populated, hard negatives enforced), m3p1 (user-state index, relationships), m2p2 (filter engine, FilterExpr, FilterResult) Complexity: M (2 sequential tasks: 01 -> 02) Research Reference: VISION.md (user-state filters as first-class query primitives), USE_CASES.md (Appendix A: user state filters), API.md (FILTER clause syntax)

Phase Dependency DAG

m3p1 (Users/Creators/Relationships)
    |
    v
m3p2 (Feedback Loop)      [Tasks 01 & 03 parallel within phase]
    |
    v
m3p3 (Personalized Profiles)
    |
    v
m3p4 (User State Filters + UAT)

All four phases are strictly sequential. m3p2 cannot begin without the entity and relationship foundation from m3p1. m3p3 cannot begin without the preference vectors and interaction weights from m3p2. m3p4 cannot begin without the FOR USER clause parsing and profile execution from m3p3.

Within m3p2, Tasks 01 (User Preference Vector) and 03 (Hard Negatives) can be built in parallel. Task 04 (Atomic Signal Dispatch) depends on all three preceding tasks.

Deferred to Later Milestones

  • SEARCH query with personalization -- deferred to M5; M3 proves personalized RETRIEVE works. Adding text search on top of a proven personalization layer is the correct sequence.
  • Tantivy integration -- deferred to M5; M3 uses ANN retrieval only. Full-text search requires the hybrid fusion layer (RRF) which belongs in M5.
  • People/creator search (UC-10) -- deferred to M5; requires Tantivy indexing of creator entities and "creators like X" similarity search.
  • Social graph traversal for trending ("trending among my follows") -- deferred to M6; requires graph query capabilities beyond the simple follows filter delivered in m3p1. M3 uses population-level signals as a proxy for social proof.
  • Collaborative filtering -- deferred to M6; M3's related profile uses ANN similarity + user preference re-ranking. Full matrix-factorization-style CF (co-engagement signals, "users who liked X also liked Y") adds a new data structure and compute model.
  • User-created collections/boards (UC-09.4) -- deferred to M6; collections are a new entity type with their own ranking surface. M3 delivers the simpler user-state filters (saved, liked, in_progress).
  • Live content status tracking (UC-12) -- deferred to M6; requires real-time viewer count signals and schedule awareness.
  • Notification frequency capping -- deferred to M6; M3's notification profile ranks by recency * relationship_strength without per-creator or per-user caps.
  • Adaptive preference learning rate -- deferred to M6; M3 uses constant alpha (0.1). Adaptive alpha that decays with update count is a refinement that requires tracking per-user update history.
  • Reverse relationship index (creator -> followers) -- deferred to M6; M3 only needs forward traversal (user -> creators they follow). Reverse traversal enables social graph queries.

Integration Test

#[test]
fn milestone_3_uat() {
    let db = open_with_users_and_relationships();

    // User 42 likes jazz, follows creators 1-10, blocked creator 77
    let feed = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();
    assert_eq!(feed.len(), 50);
    assert!(feed.iter().all(|r| !user_42_seen.contains(&r.id)));
    assert!(feed.iter().all(|r| r.creator_id != CreatorId(77)));
    assert!(creator_counts(&feed).values().all(|&c| c <= 2));

    // Following feed -- only followed creators, chronological
    let following = db.retrieve(
        "RETRIEVE items FOR USER @42 FILTER relationship:follows \
         USING PROFILE following LIMIT 50"
    ).unwrap();
    assert!(following.iter().all(|r| followed_creators.contains(&r.creator_id)));
    assert!(following.windows(2).all(|w| w[0].created_at >= w[1].created_at));

    // Related content -- similar to item_abc, personalized
    let related = db.retrieve(
        "RETRIEVE items SIMILAR TO @item_abc FOR USER @42 \
         USING PROFILE related FILTER unseen LIMIT 10"
    ).unwrap();
    assert!(related.iter().all(|r| !user_42_seen.contains(&r.id)));

    // Like an item, verify preference shift
    db.signal("like", EntityId(500), UserId(42), 1.0, now()).unwrap();
    let feed2 = db.retrieve(same_for_you_query()).unwrap();
    // Items topically similar to item 500 should rank higher
    let topic_500 = db.read_item(EntityId(500)).unwrap().category;
    let topic_match_before = feed.iter().filter(|r| r.category == topic_500).count();
    let topic_match_after = feed2.iter().filter(|r| r.category == topic_500).count();
    assert!(topic_match_after >= topic_match_before);

    // Hide and block, verify exclusion
    db.signal("hide", EntityId(999), UserId(42), 1.0, now()).unwrap();
    db.signal("block", UserId(42), CreatorId(77), 1.0, now()).unwrap();
    let feed3 = db.retrieve(same_for_you_query()).unwrap();
    assert!(feed3.iter().all(|r| r.id != EntityId(999)));
    assert!(feed3.iter().all(|r| r.creator_id != CreatorId(77)));

    // Verify cold-start user gets population-level results
    let cold_feed = db.retrieve(
        "RETRIEVE items FOR USER @new_user USING PROFILE for_you \
         FILTER unseen, unblocked LIMIT 50"
    ).unwrap();
    assert_eq!(cold_feed.len(), 50); // falls back to trending/quality

    // Verify crash recovery preserves hard negatives
    db.shutdown().unwrap();
    let db2 = TidalDb::reopen(same_config()).unwrap();
    let feed4 = db2.retrieve(same_for_you_query_user_42()).unwrap();
    assert!(feed4.iter().all(|r| r.id != EntityId(999)));
    assert!(feed4.iter().all(|r| r.creator_id != CreatorId(77)));
}

Done When

The full "For You" query works: RETRIEVE items FOR USER @user_id USING PROFILE for_you FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50 returns personalized, diversity-constrained results that reflect the user's engagement history, exclude hidden items and blocked creators, include an exploration budget, handle cold-start users and items, and update in response to new signal events within 100ms. The following, related, and notification profiles also work correctly. Hard negatives survive crash and restart. All 8 UAT scenario steps pass.


Milestone 4: Agent Memory -- "Agents own the personalization substrate"

Milestone Thesis

M3 proved the feedback loop closes inside the database for direct user interactions. M4 proves that agents -- the dominant interaction mediator -- can create scoped sessions, write structured feedback signals with aggressive decay, enforce declarative policy on the write path, and query live session context as part of ranking, all within the same embeddable runtime. A developer wiring an LLM agent to tidalDB gets instant-on session memory without standing up Redis, a feature store, or a policy middleware.

Enables

  • Agent-mediated personalization -- agents ground LLM responses by reading the user's preference state plus the session's accumulated reward and preference hints, then write structured feedback that immediately shapes the next ranking pass.
  • RLHF-style reward loops -- reward signals with minute-scale decay let agents record how well a recommendation served the user; the next RETRIEVE incorporates reward velocity into scoring.
  • Conversational memory -- multi-turn tool usage and preference hints are short-lived signals scoped to a session; they influence ranking for the session's lifetime and are archived on close.
  • Policy-safe agent integration -- the schema declares which signal types an agent may write per session; the database enforces this, not application middleware. Disallowed writes are rejected with audit trail.
  • Partial UC-01 enhancement -- "For You" queries that incorporate session context (e.g., "more jazz today") produce results shaped by both long-lived user preferences and ephemeral session preferences.

UAT Scenario

Given:
  A tidalDB instance with:
    - Schema defining session signal types:
      * "preference_hint" with linear decay (lifetime=30m), target=Item
      * "reward" with exponential decay (half_life=10m), windows=[5m, 15m], velocity=true
      * "tool_use" with linear decay (lifetime=1h), target=Item
    - An AgentPolicy "planner_policy" in schema:
        allowed_signals: [preference_hint, reward]
        denied_signals: [tool_use]
        max_session_duration: 2h
        max_signals_per_session: 1000
    - 100 items with embeddings and metadata (category, format, creator_id)
    - 10,000 signal events establishing item signal state
    - User @42 with preference vector and engagement history
    - Profiles: for_you (updated to accept optional SessionContext)

When:
  1. Agent starts a session:
     let session = db.start_session(user_id: 42, agent_id: "planner",
         policy: "planner_policy", metadata: {"tool": "planner"})?;
     // Returns SessionHandle with SessionId

  2. Agent writes a preference_hint signal:
     db.session_signal(&session, "preference_hint", EntityId(0), 1.0,
         Timestamp::now(), Some("more jazz today".into()))?;
     // Accepted: preference_hint is in allowed_signals

  3. Agent writes a reward signal after delivering an answer:
     db.session_signal(&session, "reward", EntityId(42), 0.8,
         Timestamp::now(), None)?;
     // Accepted: reward is in allowed_signals

  4. Agent queries with session context:
     let query = RetrieveBuilder::new(EntityKind::Item, ProfileRef::new("for_you"))
         .for_user(42)
         .for_session(session.id())
         .limit(10)
         .build()?;
     let results = db.retrieve(&query)?;
     // Returns ranked items with session_snapshot attached

  5. Agent reads session snapshot:
     let snapshot = db.session_snapshot(session.id())?;
     // Returns: signal counts, reward velocity, duration, metadata

  6. Agent attempts a disallowed write:
     let err = db.session_signal(&session, "tool_use", EntityId(0), 1.0,
         Timestamp::now(), None);
     // Returns Err(LumenError::PolicyViolation { signal: "tool_use",
     //   policy: "planner_policy", reason: "signal type not in allowed list" })

  7. Agent reads audit log:
     let audit = db.session_audit(session.id())?;
     // Contains: 2 accepted writes, 1 rejected write with reason

  8. Agent closes the session:
     let summary = db.close_session(session)?;
     // Returns SessionSummary: duration, signal_counts, rejections, archived

  9. After closure, query the archived snapshot:
     let archived = db.session_snapshot(session_id)?;
     // Returns the frozen final snapshot (signals no longer decay)

  10. Verify session isolation -- a second session for the same user
      does not see session 1's signals:
      let session2 = db.start_session(user_id: 42, agent_id: "planner",
          policy: "planner_policy", metadata: {})?;
      let snap2 = db.session_snapshot(session2.id())?;
      // snap2 has zero signals -- session 1's data does not leak

Then:
  - Step 1: start_session returns SessionHandle; session appears in db.active_sessions()
  - Step 2: preference_hint recorded; session signal count = 1
  - Step 3: reward recorded; session signal count = 2; reward velocity > 0
  - Step 4: Results shaped by session context -- items matching "jazz" preference
    hint rank higher than without session context; results include a
    session_snapshot field with reward_velocity and hint summary
  - Step 5: Snapshot contains { signals_written: 2, signals_rejected: 0,
    reward_velocity_5m: >0.0, duration_ms: <5000, metadata: {"tool":"planner"} }
  - Step 6: Error returned with LumenError::PolicyViolation; write not persisted
  - Step 7: Audit log has 3 entries (2 accepted, 1 rejected with reason)
  - Step 8: Session marked closed; summary.duration_ms > 0; summary.signals_written == 2;
    summary.rejections == 1
  - Step 9: Archived snapshot readable; signal values frozen at close time (no further decay)
  - Step 10: Session isolation proven -- zero signal leakage between sessions
  - Performance: session_signal write < 200 microseconds (including WAL + policy check);
    session_snapshot read < 50 microseconds; RETRIEVE with session context adds < 5ms
    overhead vs without

Phases

Phase 1: Session Schema and Lifecycle (m4p1)

Delivers: SessionId, AgentId, AgentPolicy, and SessionHandle types in the schema and entities modules. Schema-level session_policy() for declaring per-agent allowed/denied signal lists, duration limits, and signal count caps. Session lifecycle APIs: start_session, close_session, active_sessions. WAL entries tagged with session_id for crash recovery of active sessions. Closed sessions archived to storage as frozen snapshots.

Acceptance Criteria:

  • SessionId is a u64 newtype with Display, Hash, Eq, Ord, monotonically assigned via AtomicU64 counter
  • AgentId is a String newtype (max 64 chars, validated at construction: [a-z0-9_-]+)
  • AgentPolicy struct declared in schema: allowed_signals: Vec<String>, denied_signals: Vec<String>, max_session_duration: Duration, max_signals_per_session: u32; validated at schema build time (no signal name in both allowed and denied; all signal names must exist in schema)
  • SessionHandle is a move-only type containing SessionId, user_id: u64, agent_id: AgentId, policy_name: String, start timestamp, and a closed: AtomicBool flag; SessionHandle is Send + Sync
  • SchemaBuilder::session_policy(name, AgentPolicy) registers policies at schema build time; duplicate names rejected with SchemaError
  • db.start_session(user_id, agent_id, policy_name, metadata) -> Result<SessionHandle> creates a new session: validates policy exists, assigns SessionId, stores session metadata in a DashMap<SessionId, SessionState>, logs session-start event to WAL
  • db.close_session(handle) -> Result<SessionSummary> takes ownership of SessionHandle (move semantics prevent use-after-close), freezes signal state, computes summary (duration, signal counts, rejection count), archives to storage under Tag::Session key prefix, logs session-close event to WAL
  • db.active_sessions() -> Vec<SessionInfo> returns list of open sessions with id, user_id, agent_id, start_time, signal_count
  • Session state survives crash: on WAL replay, session-start events without matching session-close events are restored as active sessions; session-close events mark sessions as archived
  • SessionState contains: DashMap<SignalTypeId, SessionSignalState> for per-signal-type accumulators within the session
  • Closed SessionHandle cannot be used for further writes (compile-time enforcement via move semantics; runtime check via closed flag as defense-in-depth)
  • Session metadata (HashMap<String, String>) persisted to storage and retrievable after close
  • max_session_duration enforced: session_signal on a session that has exceeded its duration returns LumenError::SessionExpired
  • Tag::Session (0x07) added to the key encoding enum for session archive storage

Task Breakdown:

# Task Delivers Complexity
01 Session Types SessionId, AgentId, AgentPolicy, SessionHandle, SessionState, SessionInfo, SessionSummary types with validation, Display, Hash, Eq M
02 Schema Integration SchemaBuilder::session_policy(), validation (signal name cross-check, no duplicates), Schema::policy(name) -> Option<&AgentPolicy> S
03 Session Lifecycle start_session, close_session, active_sessions, SessionState DashMap, move-only handle, duration enforcement, archive to storage L
04 WAL Session Events WalCommand::SessionStart and WalCommand::SessionClose variants, WAL replay restores active sessions, closed sessions restored as archived M

Depends On: m1p1 (type system), m1p2 (WAL), m1p3 (storage, key encoding, Tag enum), m3p1 (user entities) Complexity: L Research Reference: VISION.md (Sessions / Agent Context), thoughts.md Part V.5 (quarantine-first durability), docs/research/tidaldb_signal_ledger.md (running-score formula reuse for session signals)

Phase 2: Session Signal Engine (m4p2)

Delivers: session_signal() API that writes session-scoped signal events with aggressive decay. Session signals share the existing SignalLedger running-score infrastructure but are keyed by (SessionId, SignalTypeId) instead of (EntityId, SignalTypeId). Preference hints are stored as typed annotations on session signal entries. Session-scoped windowed counts and velocity available via session_snapshot().

Acceptance Criteria:

  • db.session_signal(&SessionHandle, signal_type, entity_id, weight, timestamp, Option<annotation>) -> Result<()> writes a session-scoped signal: validates session is open, updates SessionSignalState running decay score, updates session windowed counters, increments session signal count, logs to WAL with session_id tag
  • SessionSignalState uses the same HotSignalState running-score formula (S(t) = S(t_prev) * exp(-lambda * dt) + w) -- reuse, not rewrite
  • Session windowed counters use BucketedCounter with minute-level granularity (appropriate for session timescales of minutes to hours)
  • db.session_snapshot(session_id) -> Result<SessionSnapshot> returns: signal type -> (decay_score, windowed_counts, velocity), total signals written, total rejections, duration, metadata, annotations (preference hints)
  • Preference hint annotations stored as Vec<(Timestamp, String)> on the session state; capped at 100 per session to bound memory
  • Session signals do NOT update the global item signal ledger -- they are session-scoped only (isolation)
  • Session signals do NOT update user preference vectors or interaction weights -- session influence is read-time only (via Phase 4)
  • For active sessions, decay scores reflect current wall-clock time (lazy decay on read, same as HotSignalState)
  • For archived sessions, signal values are frozen at close time (no further decay applied on read)
  • WAL replay of session signals restores SessionSignalState accumulators correctly (property test: replay produces identical state to uninterrupted execution for 1000 random session signal sequences)
  • session_signal latency < 200 microseconds including WAL write (benchmarked)
  • session_snapshot read latency < 50 microseconds (benchmarked)
  • 50,000 session signals per second throughput (benchmarked)

Task Breakdown:

# Task Delivers Complexity
01 SessionSignalState Running decay score, windowed counters, annotation storage, freeze-on-close semantics, reusing HotSignalState internals M
02 session_signal API Write path: validation, WAL event, state update, signal count tracking, annotation capture M
03 session_snapshot API Read path: snapshot assembly, decay-correct reads for active sessions, frozen reads for archived, preference hint list S
04 WAL Integration & Replay WalCommand::SessionSignal variant, replay logic, property test for replay correctness M

Depends On: m4p1 (session types, lifecycle, WAL events), m1p4 (signal ledger infrastructure -- HotSignalState, BucketedCounter) Complexity: L Research Reference: docs/research/tidaldb_signal_ledger.md (running-score formula, BucketedCounter, EntityState struct), VISION.md (session signals with aggressive decay)

Phase 3: Policy Enforcement and Audit (m4p3)

Delivers: Declarative policy enforcement on the session signal write path. Policies declared in schema (m4p1) are enforced at write time: signal type allow/deny lists, per-session signal count caps, and session duration limits. Every write attempt (accepted or rejected) is recorded in a per-session audit log. Rejected writes return structured LumenError::PolicyViolation errors with the policy name, signal type, and human-readable reason.

Acceptance Criteria:

  • session_signal() checks the session's policy before writing: if signal type is in denied_signals, or is not in allowed_signals (when allowed_signals is non-empty), write is rejected with LumenError::PolicyViolation
  • LumenError::PolicyViolation variant added: contains signal_type: String, policy_name: String, reason: String
  • Per-session signal count cap enforced: when signals_written >= max_signals_per_session, further writes return LumenError::PolicyViolation with reason "session signal limit exceeded (N/max)"
  • Session duration limit enforced: when now - session_start > max_session_duration, further writes return LumenError::SessionExpired
  • SessionAuditLog stored per session: Vec<AuditEntry> where AuditEntry = { timestamp, signal_type, outcome: Accepted | Rejected(reason) }
  • db.session_audit(session_id) -> Result<Vec<AuditEntry>> returns the audit log for a session (active or archived)
  • Audit log capped at 10,000 entries per session to bound memory; oldest entries evicted with a "truncated" marker
  • Audit log persisted with session archive on close (retrievable after close)
  • Policy evaluation adds < 1 microsecond per signal write (benchmarked -- it is a HashMap lookup, not a hot path concern)
  • Property test: for any sequence of allowed and denied signal writes, the audit log exactly matches the write outcomes and no denied signal modifies session state

Task Breakdown:

# Task Delivers Complexity
01 Policy Evaluator PolicyEvaluator::check(policy, signal_type, session_state) -> Result<(), PolicyViolation>, signal allow/deny, count cap, duration check S
02 Audit Log SessionAuditLog, AuditEntry, append-on-write, cap enforcement, persist with archive S
03 Write Path Integration Wire PolicyEvaluator into session_signal(), wire audit log recording, db.session_audit() API, property tests M

Depends On: m4p1 (session types, AgentPolicy in schema), m4p2 (session signal write path to intercept) Complexity: M Research Reference: VISION.md ("policy guards live in schema, not ad-hoc middleware", "agents can only read/write within their sessions")

Phase 4: Session-Aware Ranking and M4 UAT (m4p4)

Delivers: FOR SESSION @session_id clause in the RETRIEVE query that loads session context and blends it into ranking. Session preference hints boost items matching the hint content. Session reward velocity adjusts the scoring weight. Query results include a session_snapshot alongside ranked items. End-to-end M4 UAT integration test proving the full agent workflow: start session, write signals with policy, query with session context, verify session isolation, close and archive.

Acceptance Criteria:

  • RetrieveBuilder::for_session(session_id) added; Retrieve struct gains for_session: Option<SessionId> field
  • SessionContext struct loaded when for_session is present: contains preference hints (parsed into keyword boost hints), reward velocity, session metadata
  • for_you profile (and any personalized profile) accepts optional SessionContext: scoring formula adds a session boost factor: session_boost = hint_match_score * 0.3 + reward_velocity_normalized * 0.2
  • hint_match_score computed as: for each preference hint string, extract keywords; if item metadata (category, tags, title) contains any keyword, score = 1.0 per match, normalized to [0, 1]; this is a simple keyword match (semantic session hints deferred to M5)
  • reward_velocity_normalized = reward_velocity / (reward_velocity + 1.0) -- sigmoid normalization to [0, 1)
  • Session boost is additive to the existing profile score (does not replace personalization, layers on top)
  • Results struct gains optional session_snapshot: Option<SessionSnapshot> field, populated when for_session is present
  • Session isolation: FOR SESSION @S1 uses only S1's signals; S2's signals are invisible; no global state pollution
  • When for_session references a closed session, archived snapshot is used (read-only, no decay applied)
  • When for_session references a non-existent session, LumenError::Query("session not found") returned
  • RETRIEVE with session context adds < 5ms overhead vs without (benchmarked at 10K items)
  • Full M4 UAT integration test passes covering all 10 UAT scenario steps

Task Breakdown:

# Task Delivers Complexity
01 FOR SESSION Query Context RetrieveBuilder::for_session(), Retrieve.for_session field, SessionContext loader from active/archived session state M
02 Session-Aware Scoring ProfileExecutor extension: score_with_session_context(), keyword hint matching, reward velocity normalization, additive boost M
03 Session Snapshot in Results Results.session_snapshot field, populated by executor when session context present S
04 M4 UAT Integration Test End-to-end test covering all 10 UAT steps: lifecycle, signals, policy, ranking, isolation, archive, audit L

Depends On: m4p3 (policy enforcement wired into write path), m4p2 (session signals and snapshot readable), m4p1 (session lifecycle), m3p3 (personalized profile executor, UserContext), m2p5 (RETRIEVE executor pipeline) Complexity: L Research Reference: VISION.md ("agents ground themselves by reading live session context, write structured signals with decay budgets, and immediately query those updates on the next turn"), USE_CASES.md UC-01 (For You with session overlay)

Phase Dependency DAG

m4p1 (Session Schema & Lifecycle)
    |
    v
m4p2 (Session Signal Engine)
    |
    v
m4p3 (Policy Enforcement & Audit)
    |
    v
m4p4 (Session-Aware Ranking + UAT)

All four phases are strictly sequential. m4p2 cannot begin without the session types and lifecycle from m4p1. m4p3 cannot begin without the session signal write path from m4p2. m4p4 cannot begin without policy enforcement from m4p3. Within each phase, certain tasks can parallelize (e.g., m4p2 tasks 01 and 03 overlap; m4p3 tasks 01 and 02 are independent).

Deferred to Later Milestones

  • Session forking and merging -- deferred because forking introduces DAG-shaped session graphs with merge conflict semantics; this belongs after M8 (Distributed Fabric) when the CRDT model can inform fork/merge design. Planned for M9/M10.
  • Multi-agent sessions (multiple agents sharing one session) -- deferred because shared-session policy requires capability intersection and concurrent write arbitration; M4 proves single-agent sessions first. Planned for M10.
  • Cross-session aggregation ("what did this user's agents learn across all sessions this week?") -- deferred because it requires a materialization layer rolling up closed sessions into user-level signal state. Planned for M6.
  • Semantic hint matching (preference hints interpreted via embedding similarity) -- deferred because it requires Tantivy integration (M5) for proper text analysis; M4 uses simple keyword matching as a correct baseline. Planned for M5.
  • Session signal influence on global user preference vector -- deferred because the correct boundary between ephemeral session boost and permanent preference update requires careful UX design; M4 keeps session influence strictly read-time. Planned for M6.
  • RLHF training data export -- deferred because export formats and training pipelines are application-specific; tidalDB stores the signals, external tools read them. Planned for M7.
  • Per-agent QPS rate limiting -- deferred because the per-session signal count cap provides coarse-grained protection; fine-grained QPS limiting with token-bucket belongs in M7 (Production Hardening). Planned for M7.
  • Session TTL auto-cleanup (background sweeper for abandoned sessions) -- deferred; max_session_duration enforcement on writes is sufficient for M4. Planned for M7.
  • User revocation of agent-contributed signals -- deferred because revocation requires retroactive signal removal with re-materialization, a core M10 (Governance & Agent Rights) concern. Planned for M10.

Integration Test

#[test]
fn milestone_4_uat() {
    let mut schema_builder = SchemaBuilder::new();

    // Session signal types with aggressive decay.
    let _ = schema_builder
        .signal("preference_hint", EntityKind::Item,
            DecaySpec::Linear { lifetime: Duration::from_secs(30 * 60) })
        .windows(&[Window::OneHour])
        .velocity(false)
        .add();
    let _ = schema_builder
        .signal("reward", EntityKind::Item,
            DecaySpec::Exponential { half_life: Duration::from_secs(10 * 60) })
        .windows(&[Window::OneHour])
        .velocity(true)
        .add();
    let _ = schema_builder
        .signal("tool_use", EntityKind::Item,
            DecaySpec::Linear { lifetime: Duration::from_secs(3600) })
        .windows(&[Window::OneHour])
        .velocity(false)
        .add();

    // Standard signals for item ranking.
    for sig in &["view", "like", "skip"] {
        let _ = schema_builder
            .signal(sig, EntityKind::Item,
                DecaySpec::Exponential { half_life: Duration::from_secs(7 * 24 * 3600) })
            .windows(&[Window::OneHour, Window::TwentyFourHours])
            .velocity(true)
            .add();
    }

    // Policy: planner can write preference_hint and reward, not tool_use.
    schema_builder.session_policy("planner_policy", AgentPolicy {
        allowed_signals: vec!["preference_hint".into(), "reward".into()],
        denied_signals: vec!["tool_use".into()],
        max_session_duration: Duration::from_secs(2 * 3600),
        max_signals_per_session: 1000,
    }).unwrap();

    let schema = schema_builder.build().unwrap();
    let db = TidalDb::builder().ephemeral().with_schema(schema).open().unwrap();

    // Write items: some jazz, some rock.
    for i in 1..=50u64 {
        let mut meta = HashMap::new();
        let category = if i <= 25 { "jazz" } else { "rock" };
        meta.insert("category".into(), category.into());
        meta.insert("format".into(), "video".into());
        meta.insert("creator_id".into(), (i % 10).to_string());
        db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
        db.signal("view", EntityId::new(i), 1.0, Timestamp::now()).unwrap();
    }

    // Write user 42 with preference history.
    let mut user_meta = HashMap::new();
    user_meta.insert("name".into(), "alice".into());
    db.write_user(EntityId::new(42), &user_meta).unwrap();

    // Step 1: Start session.
    let mut session_meta = HashMap::new();
    session_meta.insert("tool".into(), "planner".into());
    let session = db.start_session(42, "planner", "planner_policy", session_meta)
        .unwrap();
    let session_id = session.id();
    assert!(db.active_sessions().iter().any(|s| s.id == session_id));

    // Step 2: Write preference_hint.
    db.session_signal(&session, "preference_hint", EntityId::new(0), 1.0,
        Timestamp::now(), Some("more jazz today".into())).unwrap();

    // Step 3: Write reward.
    db.session_signal(&session, "reward", EntityId::new(42), 0.8,
        Timestamp::now(), None).unwrap();

    // Step 4: Query with session context.
    let query = RetrieveBuilder::new(EntityKind::Item, ProfileRef::new("for_you"))
        .for_user(42)
        .for_session(session_id)
        .limit(10)
        .build()
        .unwrap();
    let results = db.retrieve(&query).unwrap();
    assert!(!results.items.is_empty());
    assert!(results.session_snapshot.is_some());

    // Step 5: Read session snapshot.
    let snapshot = db.session_snapshot(session_id).unwrap();
    assert_eq!(snapshot.signals_written, 2);
    assert_eq!(snapshot.signals_rejected, 0);
    assert!(snapshot.duration_ms > 0);
    assert_eq!(snapshot.metadata.get("tool").unwrap(), "planner");

    // Step 6: Disallowed write.
    let err = db.session_signal(&session, "tool_use", EntityId::new(0), 1.0,
        Timestamp::now(), None);
    assert!(err.is_err());
    match err.unwrap_err() {
        LumenError::PolicyViolation { signal_type, policy_name, .. } => {
            assert_eq!(signal_type, "tool_use");
            assert_eq!(policy_name, "planner_policy");
        }
        other => panic!("expected PolicyViolation, got: {other:?}"),
    }

    // Step 7: Audit log.
    let audit = db.session_audit(session_id).unwrap();
    let accepted = audit.iter().filter(|e| e.accepted).count();
    let rejected = audit.iter().filter(|e| !e.accepted).count();
    assert_eq!(accepted, 2);
    assert_eq!(rejected, 1);

    // Step 8: Close session.
    let summary = db.close_session(session).unwrap();
    assert!(summary.duration_ms > 0);
    assert_eq!(summary.signals_written, 2);
    assert_eq!(summary.rejections, 1);
    assert!(!db.active_sessions().iter().any(|s| s.id == session_id));

    // Step 9: Archived snapshot readable.
    let archived = db.session_snapshot(session_id).unwrap();
    assert_eq!(archived.signals_written, 2);

    // Step 10: Session isolation.
    let session2 = db.start_session(42, "planner", "planner_policy", HashMap::new())
        .unwrap();
    let snap2 = db.session_snapshot(session2.id()).unwrap();
    assert_eq!(snap2.signals_written, 0, "session 2 must not see session 1 signals");

    db.close_session(session2).unwrap();
    db.close().unwrap();
}

Done When

A developer can embed tidalDB alongside an agent runtime and: (1) declare agent policies in schema specifying allowed/denied signal types, session duration limits, and signal count caps; (2) start sessions bound to a user and an agent; (3) write session-scoped signals (preference hints, rewards) that are accepted or rejected by policy with every attempt recorded in the audit log; (4) execute RETRIEVE items FOR USER @user_id FOR SESSION @session_id USING PROFILE for_you LIMIT 10 and receive ranked items incorporating session preference hints and reward velocity as an additive boost; (5) read session snapshots with signal state, velocity, and preference hints; (6) close sessions and retrieve archived snapshots with frozen signal values; (7) verify complete session isolation -- zero signal leakage between sessions. Policy violations return structured LumenError::PolicyViolation errors. Session signal writes complete in < 200 microseconds. RETRIEVE with session context adds < 5ms overhead. All 10 UAT scenario steps pass in the integration test.


Milestone 5: Hybrid Search -- "Text + semantic + signals in one query"

Milestone Thesis

A developer can execute SEARCH items QUERY "rust tutorial beginner" VECTOR query_vector FOR USER @user_id USING PROFILE search LIMIT 20 and get results that combine BM25 text relevance, semantic similarity, and user personalization in a single ranked list. This proves that search and retrieval are the same system.

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items with text fields (title, description, tags) indexed for full-text search
    - All items have embeddings
    - 500 users with engagement history
    - Search profile defined: text relevance as floor, semantic similarity,
      personalization adjustment

When:
  1. SEARCH items QUERY "rust tutorial beginner" VECTOR [query_embedding]
     FOR USER @user_42 USING PROFILE search DIVERSITY max_per_creator:2 LIMIT 20
  2. SEARCH items QUERY "jazz piano" FOR USER @user_42
     USING PROFILE search FILTER duration:short, format:video LIMIT 20
  3. SEARCH items QUERY "\"exact phrase match\"" USING PROFILE search LIMIT 10
  4. SEARCH items QUERY "jazz -beginner" USING PROFILE search LIMIT 10
  5. SEARCH creators QUERY "jazz" LIMIT 10
  6. User clicks result #3, record SIGNAL search_click
  7. User searches same query again

Then:
  - Step 1: Results combine BM25 + semantic similarity via RRF;
    personalization re-ranks within relevant set; user_42 (a beginner)
    sees beginner content elevated
  - Step 2: Text-only search (no vector), filtered by duration and format
  - Step 3: Exact phrase match -- only items containing "exact phrase match"
  - Step 4: Boolean exclusion -- no items matching "beginner"
  - Step 5: Creator search by name/topic
  - Step 6: Signal recorded with query context and rank position
  - Step 7: Clicked result may rank higher due to search_click signal
  - Performance: SEARCH < 50ms at 10K items

Phases

Phase 1: Tantivy Integration

Delivers: Tantivy embedded as a derived index for full-text search. DB-primary consistency pattern: entity store is source of truth, Tantivy is a materialized view updated via outbox. BM25 scoring exposed via custom Collector and Weight/Scorer seek pattern.

Acceptance Criteria:

  • Tantivy index created from schema text field definitions (title, description, tags)
  • Background indexer reads entity store outbox and feeds Tantivy writer
  • Tantivy commit stores last-processed sequence number in payload for crash recovery
  • Custom AllScoresCollector returns all matching doc IDs with BM25 scores
  • Weight::scorer + DocSet::seek pattern scores specific candidate IDs (for re-ranking ANN results)
  • External entity ID -> DocAddress mapping maintained and updated on segment merge
  • Boolean queries supported: AND, OR, NOT, exact phrase, field-scoped
  • Commit interval: every 1-5 seconds or every N thousand documents
  • Index rebuild from entity store completes in < 10 minutes at 10K items
  • BM25 query latency < 10ms at 10K documents (benchmarked)

Depends On: m1p3 (storage engine), m1p5 (entity API) Complexity: L Research Reference: docs/research/tantivy.md (Collector API, consistency pattern, seek scoring, commit model)

Phase 2: Hybrid Fusion (RRF)

Delivers: Reciprocal Rank Fusion combining BM25 ranked lists with ANN ranked lists into a single scored result set. The starting point is RRF with k=60; the architecture supports upgrading to tuned linear combination when relevance labels exist.

Acceptance Criteria:

  • RRF(d) = 1/(60 + rank_bm25(d)) + 1/(60 + rank_ann(d)) implemented
  • Documents appearing in only one list contribute only their single-list term
  • RRF results are re-rankable by personalization (user preference overlay)
  • When only text query is provided (no vector), pure BM25 ranking used
  • When only vector is provided (no text), pure ANN ranking used
  • Fusion adds < 1ms to query time (benchmarked)
  • k parameter configurable (default 60)

Depends On: Phase 1 (BM25 scores), m2p1 (ANN scores) Complexity: S Research Reference: docs/research/tantivy.md (RRF section, Cormack et al.)

Phase 3: SEARCH Query Parser and Executor

Delivers: The SEARCH query parser and executor that orchestrates text retrieval, semantic retrieval, fusion, personalization, filtering, diversity, and result assembly.

Acceptance Criteria:

  • Parser handles: SEARCH items/creators, QUERY "text", VECTOR [embedding], FOR USER, USING PROFILE, FILTER, DIVERSITY, LIMIT
  • Query text parsing: exact phrase ("...""), boolean operators (AND/OR/NOT/-), field-scoped (title:...), wildcard (term*)
  • Executor pipeline: text retrieval -> ANN retrieval -> fusion -> personalization -> filter -> diversity -> return
  • When both QUERY and VECTOR provided, hybrid fusion (RRF)
  • When only QUERY, BM25-only retrieval
  • When only VECTOR, ANN-only retrieval
  • Search results include: entity_id, combined_score, bm25_score, semantic_score, rank
  • search_click signal writes include query context and rank position
  • End-to-end SEARCH < 50ms at 10K items (benchmarked)

Depends On: Phase 1, Phase 2, m2p5 (query parser infrastructure) Complexity: M

Delivers: Search over creator entities by name, topic, and attributes. "Creators like X" via creator embedding similarity. Enables UC-10.

Acceptance Criteria:

  • Creator entities indexed in Tantivy (name, handle, bio, topics)
  • Creator embeddings searchable via ANN (aggregated from catalog)
  • SEARCH creators QUERY "jazz" LIMIT 10 returns creators matching topic
  • SEARCH creators SIMILAR TO @creator_id LIMIT 10 returns similar creators by embedding
  • Creator filters: verified, min_followers, language, followed_by_user
  • Creator sort modes: follower_count, engagement_rate, posting_frequency

Depends On: Phase 1, m3p1 (creator entities) Complexity: M

Deferred to Later Milestones

  • Autocomplete and search suggestions (UC-02.3) -- deferred to M5; requires prefix indexes and trending query tracking
  • Saved searches and alerts (UC-02.4) -- deferred to M5; requires persistent query storage and push notification
  • Visual search / image search (UC-11) -- deferred to M5; requires multi-modal embedding support
  • "Did you mean" typo correction -- deferred to M5; requires edit-distance computation on term dictionary
  • Tuned linear combination (replacing RRF) -- deferred to M5; requires relevance labels for alpha tuning

Done When

A developer can execute SEARCH queries that combine full-text BM25 relevance with semantic vector similarity and user personalization in a single ranked result set. Boolean queries, phrase matching, field-scoped search, and creator search all work. Results reflect engagement signals. End-to-end SEARCH latency < 50ms at 10K items.


Milestone 6: Full Surface Coverage -- "Every use case works"

Milestone Thesis

Every one of the 14 use cases works end-to-end. Every sort mode, every filter dimension, every discovery surface described in USE_CASES.md is operational. The query RETRIEVE items FOR USER @user_id CONTEXT feed USING PROFILE for_you FILTER unseen, unblocked, format:video, duration:short DIVERSITY max_per_creator:2, format_mix:true LIMIT 50 is the complete, production-quality end state query.

UAT Scenario

Given:
  A tidalDB instance loaded with:
    - 100,000 items across 1,000 creators
    - 10,000 users with engagement histories
    - All 14 use case scenarios configured
    - All sort modes and filter dimensions exercised

When:
  All 14 use cases are executed as described in USE_CASES.md:
    UC-01: For You Feed with full diversity and exploration
    UC-02: Search with all filter dimensions, autocomplete, saved searches
    UC-03: Trending (global, category, social-graph scoped)
    UC-04: Following feed (chronological, algorithmic modes)
    UC-05: Related/Up Next with collaborative filtering
    UC-06: Browse with all sort modes, faceted filters, mood filters
    UC-07: Notification prioritization with frequency capping
    UC-08: Creator profile (Top, New, Hot, For You modes)
    UC-09: User library (history, saved, liked, collections, continue watching)
    UC-10: People search with "creators like X"
    UC-11: Visual/semantic search with image embeddings
    UC-12: Live content with real-time viewer count
    UC-13: Hidden gems with breakout detection
    UC-14: Controversial and Hot with dual-signal ranking

Then:
  Every query returns correct results per use case specification.
  All 25+ sort modes produce correctly ordered results.
  All filter dimensions compose correctly.
  Performance: < 50ms for all queries at 100K items.

Phases

(Phases for M5 are provisional -- detailed decomposition happens after M4 ships, informed by what was learned.)

Phase 1: Complete Sort Mode Coverage

Delivers: All 25+ sort modes from Appendix B operational. Windowed top sorts (hour, today, week, month, year, all_time), shuffle, alphabetical, shortest/longest, live_viewer_count, date_saved, creator_engagement_rate.

Depends On: M4 complete Complexity: L

Phase 2: Complete Filter Coverage

Delivers: All filter dimensions from Appendix A operational and composable. Geographic filters, accessibility filters, community signal filters, availability filters, engagement threshold filters.

Depends On: Phase 1 Complexity: L

Phase 3: Social Graph Queries and Collaborative Filtering

Delivers: Social graph traversal for trending-among-follows, collaborative filtering for related/up-next, "creators followed by people I follow." The graph query capabilities needed for UC-03 (social trending), UC-05 (collaborative filtering), UC-10 (social creator discovery).

Depends On: Phase 1 Complexity: L

Phase 4: User Library, Collections, and Continue Watching

Delivers: UC-09 complete: watch history, saved items, liked items, user-created collections, continue watching (resume position), download state. Collections as rankable entities.

Depends On: Phase 2 Complexity: M

Phase 5: Advanced Search Features

Delivers: Autocomplete, search suggestions, trending searches, saved searches, "did you mean" typo correction, related query suggestions. UC-02.3 and UC-02.4.

Depends On: Phase 1 Complexity: L

Phase 6: Live Content and Notification Systems

Delivers: UC-12 (live content with real-time viewer count, scheduled content, reminders) and UC-07 (notification prioritization with frequency capping, per-creator limits). Real-time signal types for viewer count and schedule awareness.

Depends On: Phase 1 Complexity: M

Deferred to Later Milestones

  • Signal rollups (hourly/daily materialization) -- built if 100K-item benchmarks show bucketed counters exceeding the latency budget for 30d+ windows
  • Multi-vector user interest clustering (PinnerSage) -- deferred to M7 or beyond; single preference vector serves through M6
  • ACORN-1 two-hop expansion for very selective filters -- deferred to M7; USearch predicate callback sufficient through M6

Done When

All 14 use cases pass their UAT scenarios as defined in USE_CASES.md. All 25+ sort modes work. All filter dimensions compose. Every sequence diagram in SEQUENCE.md can be executed. Performance: < 50ms for all queries at 100K items.


Milestone 7: Production Hardening -- "Ready for real workloads"

Milestone Thesis

tidalDB can be embedded in a production application and operated with confidence. Crash recovery is correct and fast. Graceful degradation works under load. Operational visibility exists. Performance meets targets at 1M+ items. The database is trustworthy.

UAT Scenario

Given:
  A tidalDB instance with:
    - 1,000,000 items, 100,000 users, 10,000 creators
    - Sustained write load: 10,000 signal events/second
    - Concurrent read load: 1,000 RETRIEVE queries/second

When:
  1. Run full workload for 1 hour
  2. Kill the process at a random point
  3. Restart and measure recovery time
  4. Verify no data loss and no inconsistency
  5. Run workload at 3x expected load
  6. Verify graceful degradation (reduced precision, not errors)

Then:
  - Step 1: All queries < 50ms p99, all signal writes < 100us amortized
  - Step 3: Recovery time < 30 seconds
  - Step 4: WAL replay produces state identical to pre-crash;
    no phantom items, no lost signals, no inconsistent aggregates
  - Step 5: Under overload, tidalDB reduces candidate set size, uses coarser
    aggregates, skips diversity -- but never returns errors for well-formed queries
  - Step 6: Degradation follows the documented order:
    1. Reduce candidate set (500 -> 200)
    2. Use coarser aggregates
    3. Skip diversity
    4. Return from materialized cache

Phases

(Phases for M7 are provisional -- detailed decomposition happens after M6 ships.)

Phase 1: Crash Recovery Hardening

Delivers: Comprehensive crash recovery testing and hardening. Fault injection at every write-path stage. Recovery time targets. WAL compaction and checkpoint optimization.

Depends On: M6 complete Complexity: XL

Phase 2: Graceful Degradation Under Load

Delivers: Automatic quality reduction under load pressure. Configurable degradation order. Backpressure on write path. Never errors for well-formed queries.

Depends On: Phase 1 Complexity: L

Phase 3: Performance at Scale

Delivers: Benchmarks and optimization at 1M items, 100K users. USearch performance tuning (M, ef_search, quantization). Tantivy segment management. Signal state memory optimization. Hot/warm/cold tiering for signal state if memory budget requires it.

Depends On: Phase 1 Complexity: XL

Phase 4: Operational Visibility

Delivers: Metrics, diagnostics, and observability. Query execution stats (candidates considered, filters applied, scoring time, diversity adjustments). Signal system health (WAL lag, checkpoint age, memory usage). Index health (segment count, tombstone ratio). Error reporting with context.

Depends On: Phase 1 Complexity: M

Deferred (Post-M7 / Future)

  • Horizontal distribution -- the single-node architecture scales vertically first; distribution is a separate product decision
  • Multi-tenancy -- per-tenant isolation within a single tidalDB instance
  • Streaming query results -- cursor-based streaming for very large result sets
  • A/B testing infrastructure -- comparing two profile versions within the database
  • Signal rollup to external cold storage -- S3/GCS archival for compliance
  • Client libraries -- language-specific wrappers beyond Rust embedding

Done When

tidalDB operates correctly at 1M items under sustained concurrent read/write load. Crash recovery completes in < 30 seconds with zero data loss. Graceful degradation works under 3x overload without returning errors. All performance targets met at p99. A developer can embed tidalDB in a production application and operate it with confidence.


Milestone 8: Distributed Fabric -- "Agent memory everywhere"

Milestone Thesis

The exact same signal semantics, session policies, and WAL format power a multi-tenant, multi-region deployment. Instances shard deterministically by EntityKind + EntityId, ship WAL segments to peers, reconcile deterministically, and expose an eventually consistent API that still honors agent memory guarantees (no hidden items leaking, no double-counted decay). Hosted tidalDB can now back global agent workloads without rewriting application code.

UAT Scenario

Given:
  - Three regions (us-east, eu-west, ap-south) with 5 shards each
  - Global write throughput: 25K signal events/sec, evenly distributed
  - Fat-client agents pinned to local region but free to roam
  - 1-hour network partition between eu-west and ap-south during sustained load

When:
  1. Write signals for a user in us-east, then read in eu-west after < 2s
  2. Crash an entire shard primary; observe automatic promotion and replay
  3. Execute global query (`RETRIEVE ... COHORT locale:EU`) while ap-south is partitioned
  4. Heal the partition; verify deterministic reconciliation (no duplicate counts, hides remain hidden)
  5. Move a tenant (agent workspace) to a new region by changing routing config only

Then:
  - Cross-region replication lag < 2s p99
  - No signal loss or duplication after failover/partition
  - Hard negatives (hide/mute/block) never leak, even while eventual state converges
  - Per-tenant resource isolation enforced (quotas, WAL namespaces)
  - Control plane surfaces reconciliation lag, shard health, and tenant placement

Phases

Phase 1: Partitioned Keyspaces and WAL Shipping

Delivers: Deterministic shard IDs derived from subject-prefix keys, WAL segment shipping with per-segment checksums, follower apply loops using the same checkpoint format as single-node. Cross-shard atomicity defined at the "entity group" boundary (Item, User, Creator each map to a shard). Lag metrics (replication_seconds_behind) exported.

Acceptance Criteria:

  • ShardId = hash(entity_id) mod N (configurable per EntityKind) stored alongside keys; shard map hot-swappable via epoch config.
  • WAL segments have globally unique IDs (region_id:shard_id:seqno); followers detect gaps and request retransmit.
  • Followers reapply segments idempotently using the same EntitySignalState checkpoint format from M1.
  • Lag SLO: < 2s p99 at 25K writes/sec across 5 shards.
  • CLI: tidalctl shard status shows leader, lag, checkpoint age.

Depends On: M7 (hardened WAL/Signal ledger) Complexity: XL Research Reference: docs/research/tidaldb_wal.md, docs/research/tidaldb_signal_ledger.md

Phase 2: Conflict Resolution and Session Semantics

Delivers: Deterministic reconciliation for eventually-consistent writes: CRDT-style counters for windowed aggregates, last-writer-wins timestamps for session state, and per-session sequence numbers so agents can reason about acknowledgements. Adds write-idempotency keys to the WAL and exposes a reconciliation audit log.

Acceptance Criteria:

  • Windowed counters replicated as bounded PN-counters (positive/negative components) with tombstones for expired buckets.
  • Decay scores replay identically because WAL order is preserved per shard; cross-shard dependencies (user->creator) carry causal metadata.
  • Session updates carry (session_id, seqno); duplicates dropped, gaps surfaced via API.
  • reconcile --since <ts> tool emits merged vs diverged entries for auditing.
  • Hides/blocks modeled as LWW registers with vector-clock tie-breakers (region priority list).

Depends On: Phase 1 Complexity: XL Research Reference: thoughts.md Part V.5-6 (quarantine-first, group commit), docs/research/tidaldb_signal_ledger.md

Phase 3: Control Plane, Multi-Tenancy, and Routing

Delivers: Tenant-aware namespaces (per-tenant WAL directories and key prefixes), routing layer that maps tenants + entity IDs to shard endpoints, and policy templates (data residency, read-after-write budgets). Adds hosted-ready observability (lag dashboards, per-tenant quotas) and blue/green deploy tooling for the fabric.

Acceptance Criteria:

  • Tenant config: {tenant_id, shard_set, residency=[regions], rpo, rto} stored in control-plane keyspace.
  • Router SDK chooses nearest healthy region that satisfies residency and read-after-write target; falls back with documented staleness budget.
  • Throttling per tenant (signals/sec, query concurrency) with circuit-breaker events surfaced via metrics + CLI.
  • Rolling upgrade playbook: add shard, rebalance, observe zero dropped writes.
  • Hosted docs: describe how embeddable apps graduate to hosted fabric without rewrites (same query + signal APIs).

Depends On: Phase 2 Complexity: L

Done When

tidalDB instances can be deployed as a hosted, multi-region fabric with deterministic replication and reconciliation. Agents anywhere in the world can write signals and rely on hides/mutes/policies holding globally. Operators get tooling for shard health, tenant placement, rolling upgrades, and lag visibility. Embeddable users flip a config switch to opt into the fabric; query and signal APIs remain unchanged.


Milestone 9: Community Sync & Revocation -- "Join, share, leave, purge"

Milestone Thesis

Users keep a local embeddable personalization profile as source of truth, opt into one or more community personalization overlays, and can leave those overlays safely. Community contributions are scope-aware, auditable, and removable (both stop-forward and retroactive purge) without corrupting local personalization state.

UAT Scenario

Given:
  - User U has a local embeddable profile with 90 days of signals
  - Community C has opt-in policy requiring explicit share scope
  - U has one agent (A) writing session signals locally

When:
  1. U joins C with sharing mode `community_share:enabled`
  2. U allows only selected signal intents (`not_for_me`, `save`, `low_quality`) to sync
  3. Community feed query blends local + community layers for U
  4. U leaves C with `stop_forward` (no new contributions)
  5. U requests `purge_prior_contributions` from C
  6. C rematerializes affected aggregates and U re-queries feed

Then:
  - U's local profile remains intact throughout
  - No new signals from U enter C after `stop_forward`
  - Purged contributions no longer affect C's ranking outputs
  - Hard negatives from U do not leak back in after replay/failover
  - Audit log shows join, share scope, leave, purge, rematerialize checkpoints

Phases

Phase 1: Signal Scope and Share Contract

Delivers: explicit signal scope model (local, community, session, agent) and share policy metadata attached to WAL events. Community replication only ships share-eligible events.

Acceptance Criteria:

  • WAL event envelope carries scope, origin, share_policy_version, and idempotency key.
  • Default behavior is local-only; sharing is explicit opt-in.
  • Per-intent share filters supported (skip_for_now, not_for_me, low_quality, hide/mute/block, etc.).
  • tidalctl can inspect scope distribution and share eligibility.

Depends On: M8
Complexity: L

Phase 2: Membership Lifecycle and Stop-Forward Semantics

Delivers: join/leave lifecycle for community overlays with causal checkpoints and stop-forward guarantees.

Acceptance Criteria:

  • Membership states: joined, leaving_stop_forward, left, rejoined.
  • leave(stop_forward) blocks new community contributions in < 1s p99.
  • Rejoin creates a new membership epoch (no ambiguous replay across epochs).
  • Queries expose active membership epoch for debugging and explainability.

Depends On: Phase 1
Complexity: L

Phase 3: Retroactive Purge and Deterministic Rematerialization

Delivers: remove prior user contributions from community state and rebuild affected aggregates deterministically.

Acceptance Criteria:

  • purge_prior_contributions(user_id, community_id, epoch_range) API implemented.
  • Purge writes tombstones and triggers deterministic rematerialization job.
  • Rebuilt aggregates are identical across repeated replay of same purge operation.
  • Community queries include purge watermark metadata for auditability.

Depends On: Phase 2
Complexity: XL

Done When

Users can opt into community personalization, leave safely, and purge prior contributions without damaging local personalization or producing inconsistent community rankings.


Milestone 10: Governance & Agent Rights -- "Who can influence ranking, and how"

Milestone Thesis

Communities and users can govern personalization influence through policy: which signal intents count, what trust thresholds apply, and what agents are allowed to read/write. Agent-contributed signals are fully attributable and revocable by scope.

UAT Scenario

Given:
  - Community C defines ranking governance policy
  - User U has two agents: A_trusted and A_experimental
  - A_experimental is denied community write scope

When:
  1. A_trusted writes allowed community-scoped signals for U
  2. A_experimental attempts the same and is rejected by policy
  3. C changes policy to downweight `skip_for_now`, upweight `low_quality`
  4. U revokes A_trusted community scope and removes A_trusted prior contributions from C
  5. U queries local-only, local+community, and community-only views

Then:
  - Policy enforcement is deterministic and auditable
  - Disallowed agent writes never affect community ranking
  - Policy changes are versioned and explainable in result metadata
  - Agent revocation removes future influence immediately
  - Optional retroactive removal of agent contributions completes within SLA

Phases

Phase 1: Community Governance Policy Engine

Delivers: versioned community policy definitions governing signal eligibility, weighting bounds, and trust/quality thresholds.

Acceptance Criteria:

  • Policy schema includes allowed intents, excluded intents, and weighting constraints.
  • Policy changes are versioned and applied with effective timestamps.
  • Query results can return governing policy version for explanation.
  • Out-of-policy signals are rejected or quarantined by rule.

Depends On: M9
Complexity: L

Phase 2: Agent Capability and Scope Controls

Delivers: per-agent capabilities for read/write by scope (local, community, session) with hard enforcement in write/read paths.

Acceptance Criteria:

  • Agent capability tokens include scope permissions and TTL.
  • Reads/writes outside granted scope return policy errors and audit events.
  • Revocation invalidates capabilities immediately (< 1s p99).
  • tidalctl can inspect agent capabilities and revocation history.

Depends On: Phase 1
Complexity: L

Phase 3: Provenance, Explainability, and Remove-by-Scope

Delivers: provenance graph for ranking influence and APIs to remove contributions by scope (agent, community, session, local).

Acceptance Criteria:

  • Every ranking-affecting signal has provenance metadata (writer, scope, policy_version, membership_epoch).
  • remove_from_personalization(scope=...) API supports precise, non-global deletion.
  • Explainability endpoint can attribute top-ranked items to policy-allowed signals.
  • Replay/failover preserves remove-by-scope outcomes deterministically.

Depends On: Phase 2
Complexity: XL

Done When

Communities and users can control ranking influence with explicit governance and agent rights, while retaining user-owned, revocable personalization semantics end-to-end.


Use Case Coverage Progression

UC Description M1 M2 M3 M4 M5 M6 M7
UC-01 For You Feed - - Full Full Full Full Full
UC-02 Search - - - - Core Full Full
UC-03 Trending/Rising Signals Full Full Full Full Full Full
UC-04 Following Feed - Partial Full Full Full Full Full
UC-05 Related/Up Next - - Core Core Core Full Full
UC-06 Browse/Category Signals Core Core Core Core Full Full
UC-07 Notifications - - Core Core Core Full Full
UC-08 Creator Profile - Core Core Core Core Full Full
UC-09 User Library - - Partial Partial Partial Full Full
UC-10 People Search - - - - Core Full Full
UC-11 Visual/Semantic - - - - Partial Full Full
UC-12 Live Content - - - - - Full Full
UC-13 Hidden Gems - Full Full Full Full Full Full
UC-14 Controversial/Hot Signals Full Full Full Full Full Full

Legend:

  • - = Not addressed
  • Signals = Signal primitives exist but no query surface
  • Partial = Some functionality, not all modes
  • Core = Primary query path works, some modes/filters missing
  • Full = All modes, filters, and feedback loops per USE_CASES.md specification

M8-M10 focus on deployment topology, community sync semantics, and governance controls; they leave UC coverage unchanged while making the existing feature surface globally portable, revocable, and policy-safe.


Dependency DAG

m1p1 (Types/Schema) ✓
  |
  +---> m1p2 (WAL) ✓
  |       |
  +---> m1p3 (Storage/fjall) ✓ ---+
  |       |                        |
  |       +---> m1p4 (Signal Ledger) ✓
  |               |
  |               +---> m1p5 (Entity + Signal API) ✓  = M1 COMPLETE ✓
  |               |
  |               +---> m2p3 (Ranking Profiles) ✓
  |                       |
  +---> m2p1 (USearch) ✓ -+
  |                        |
  +---> m2p2 (Filters) ✓ -+---> m2p4 (Diversity) ✓
                           |       |
                           +-------+---> m2p5 (RETRIEVE Query) ✓ = M2 COMPLETE ✓
                           |
                           +---> m3p1 (Users/Creators/Relationships)
                           |       |
                           |       +---> m3p2 (Feedback Loop)
                           |               |
                           |               +---> m3p3 (Personalized Profiles)
                           |                       |
                           |                       +---> m3p4 (User State Filters + UAT)
                           |
                           |       m3p4 = M3 COMPLETE
                           |
                           +---> m4p1 (Tantivy)
                                   |
                                   +---> m4p2 (RRF Fusion)
                                   |       |
                                   |       +---> m4p3 (SEARCH Query)
                                   |
                                   +---> m4p4 (Creator Search)

                                   m4p3 + m4p4 = M4 COMPLETE

                                   M5 phases (provisional) depend on M4
                                   M6 phases (provisional) depend on M5
                                   M7 phases depend on M6
                                   M8 phases depend on M7
                                   M9 phases depend on M8
                                   M10 phases depend on M9

Parallelization opportunities:

  • m1p2 (WAL) and m1p3 (Storage) are parallel after m1p1 (both now complete: m1p3 was completed first, m1p2 followed)
  • m2p1 (USearch) and m2p2 (Filters) can be built in parallel after m1p3
  • m3p1 (Entities) and m4p1 (Tantivy) can start in parallel with later M2 phases
  • m3p2 Tasks 01 (User Preference Vector) and 03 (Hard Negatives) can be built in parallel within m3p2
  • m4p2 (RRF) and m4p4 (Creator Search) can be built in parallel

Architectural Decisions Locked In

These decisions are made. They are not revisited unless benchmarks prove them wrong.

Decision Chosen Alternative Rationale
Storage engine fjall (pure Rust) RocksDB Pure Rust, #![forbid(unsafe_code)], fast compile, trait-abstracted for swap
Vector index USearch (C++ FFI) hnsw_rs 10-100x QPS, predicate callbacks, mmap, f16 quantization
Text search Tantivy (embedded) Custom BM25 40K lines of battle-tested code; Collector/Scorer API provides exact hooks needed
Decay formula Running S(t)=S(prev)exp(-lambdadt)+w Raw event scan O(1) vs O(N), proven exact, 20-60x faster at 50+ events/entity
Windowed aggregation Bucketed counters (Scotty pattern) SWAG two-stacks Simpler, serves multiple window sizes from one set of buckets
Hybrid fusion RRF (k=60) Tuned linear combination Zero-config, robust; linear combo is the upgrade path with relevance labels
Consistency model DB-primary, Tantivy as derived index Two-phase commit Simpler, deterministic recovery, source of truth is always the entity store
WAL checksums BLAKE3 CRC32C Content-addressing enables deduplication; BLAKE3 is fast enough
Key encoding Subject-prefix [entity_id][0x00][TAG:suffix] Separate key namespaces Co-locates entity data, natural shard boundary, single prefix scan
Embedding format f16 quantization (default) float32 Half memory, < 1% recall loss at 1536D
Query language Custom (RETRIEVE/SEARCH/SIGNAL) SQL Domain semantics cannot be expressed in SQL without losing optimization opportunities

What This Roadmap Does NOT Cover

These are explicitly out of scope for the foreseeable future:

  1. Embedding generation -- tidalDB retrieves and ranks over vectors. It does not generate them. Bring your own model.
  2. Generic horizontal distribution -- M8-M10 deliver the tidalDB-specific fabric (WAL shipping, shard routing, community sync/revocation, governance). We are still not building a general-purpose distributed SQL store or OLTP replica mesh.
  3. ACID transactions across entities -- Signal writes are atomic within an entity's state. Cross-entity transactions are not needed for the ranking problem.
  4. SQL compatibility -- The custom query language exists because SQL cannot express ranking semantics. No SQL layer.
  5. Per-request hard multitenancy inside a single shard -- M8-M10 introduce tenant-aware namespaces, quotas, and governance controls for hosted deployments, but strong regulatory isolation (HIPAA, PCI) still requires separate deployments per tenant.
  6. Content moderation, authentication, payments, CDN -- tidalDB solves one problem: ranking. Everything else is someone else's job.