jordan c87e9b0fdd docs: mark M0-M8 complete in roadmap with milestone-level summaries

- Add milestone-level COMPLETE summary bullets for M0–M8 (only M8 had one)
- Fix m8p6 lib test count (1199 → 1206 after latest additions)
- Update iknowyou/Aeries date to 2026-02-24
- Each summary captures the key capabilities proved by that milestone

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-24 23:40:10 -07:00

212 KiB

Raw Blame History

TidalDB Roadmap

Vision Statement

When tidalDB is complete, an engineering team building any content platform -- a media library, a social feed, a marketplace, a discovery surface, or an agentic UX -- can embed a single Rust database and replace the Elasticsearch + Redis + Kafka + feature store + vector database + ranking service stack. One process, one query interface, one operational model. The query RETRIEVE items FOR USER @user_id USING PROFILE for_you FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50 executes in under 50ms, reflects signals written 100ms ago, enforces diversity without application logic, handles cold-start items without application intervention, and returns results a user would describe as "it knows what I want."

The same runtime doubles as the personalization memory substrate for agents: user → agent → tidalDB. Agents ground themselves by reading live session context, write structured signals (preferences, critiques, tool usage) with decay budgets, and immediately query those updates on the next turn. The embeddable runtime is step zero; the exact same WAL + subject-prefix key architecture grows into a multi-region, eventually-consistent fabric so agent memory travels with the user across devices and datacenters without losing correctness.

The long-term model is user-owned personalization across three scopes: global profile, opt-in community overlays, and agent/session context. Users can grant and revoke access per scope, and remove scoped contributions from future ranking without destroying local history.

Thesis

A single embeddable database can replace the 6-system content ranking stack by treating signals, ranking profiles, session policy, and diversity constraints as database primitives rather than application logic. Every agent or product surface gets an always-fresh memory lane without standing up Vespa-scale search clusters or bespoke feature stores.

Differentiation vs Vespa and search platforms

Agent-owned memory lanes. Signals, session context, and reward metadata are schema-level objects. Agents can create scoped sessions, write feedback with decay guarantees, and read it back with zero glue code. Vespa is optimized for serving queries; it assumes you run feature updates elsewhere.
Embeddable-first ergonomics. cargo add tidaldb gives you the full signal + ranking stack with WAL durability and diagnostics in-process. Vespa demands a cluster, config servers, and feed pipelines before you can prototype.
Temporal math on the write path. Decay, windowing, velocity, and diversity guards are computed atomically when signals arrive. There is no notion of "update documents later" or external CRON math.
Session- and policy-aware query language. RETRIEVE ... FOR USER ... FOR SESSION ... USING PROFILE ... encodes permissions, diversity and cohort constraints; agent policies live in schema, not middleware.
Roadmapped scale path. The same WAL segments, subject-prefix keys, and checkpoint formats we ship for the embeddable runtime become the replication log and deterministic conflict-resolution substrate for the distributed fabric (see M8). Vespa already starts distributed; tidalDB grows there without sacrificing the zero-config DX.

Milestone Summary

#	Name	Proves	Enables
M0	Embeddable Runtime	tidalDB can run in-process with zero-config defaults and tooling	Cuts proof-of-concept friction, enables internal dogfooding
M1	Signal Engine	Signals are a database primitive with O(1) decay, not application math	UC-03 (partial), UC-06 (partial), UC-14 (partial)
M2	Ranked Retrieval	A single query retrieves, scores, and ranks content using live signals	UC-03, UC-04, UC-06, UC-08, UC-13, UC-14
M3	Personalized Ranking	User context shapes retrieval and ranking -- the "For You" query works	UC-01, UC-05, UC-07, UC-09 (partial)
M4	Agent Memory	Agents can create sessions, write signals, and enforce policy inside tidalDB	Agent-mediated personalization, RLHF loops, conversational memory
M5	Hybrid Search	Text + semantic + signal-ranked search in one query	UC-02, UC-10, UC-11
M6	Full Surface Coverage	Every use case, every sort mode, every filter, every feedback loop	UC-01 through UC-14 complete
M7	Production Hardening	Crash safety, graceful degradation, operational readiness	All UCs at production quality
M8	Distributed Fabric	Multi-region, multi-tenant replication keeps agent-memory semantics intact	Hosted tidalDB, cloud/edge deployments, shared agent substrate
M9	Community Sync & Revocation	Local embeddable profiles can opt into community personalization and safely leave/purge contributions	Community personalization, federated taste graphs, shared feeds
M10	Governance & Agent Rights	Community rules and agent-scoped permissions control what signals influence ranking	User-owned AI personalization at scale, policy-compliant agents

Embeddable → Distributed Path

M0–M2 (Embed & prove primitives): Establish the deterministic builder, WAL, key encoding, and checkpoint semantics that make on-device instances safe to embed. Research refs: docs/research/tidaldb_wal.md, docs/research/tidaldb_signal_ledger.md.
M3–M4 (Session + agent policy): Layer user/creator entities, sessions, and policy enforcement so agents can write/read scoped memory lanes without glue. This also defines the logical replication unit: entity + session keyspaces.
M5–M6 (Surface completeness): Ship hybrid search and every retrieval mode so a single tidalDB node can back any personalization surface or agent prompt grounding workload.
M7 (Operational envelope): Hardening (crash fencing, throttling, observability) creates the guarantees the fabric will rely on when shipping WAL segments across machines.
M8 (Distributed Fabric): Introduce shard-aware keyspaces, WAL shipping + deterministic reconciliation, and multi-region eventual-consistency policies so embeddable instances graduate to hosted, global deployments without rewriting application code.
M9 (Community Sync & Revocation): Add opt-in sharing from local profiles to community layers, plus leave/removal semantics (stop-forward + retroactive purge) with deterministic re-materialization.
M10 (Governance & Agent Rights): Add community policy engines and agent capability boundaries so users and communities can control exactly which signals affect ranking and revoke them safely.

Product Milestone Summary (New)

The roadmap now has two tracks:

Engine Track (M0-M7): proves tidalDB capabilities.
Product Track (P0-P4): proves end-user value for the beachhead product.

#	Name	Proves	Depends On
P0	Beachhead Validation	Knowledge workers and consumers care about a personal briefing feed enough to use it repeatedly	M0 (embedding/runtime), partial M1
P1	Concierge Alpha	Daily "Today Brief" with explicit feedback controls creates Day-2 retention in a small cohort	M1 complete, partial M2
PG1	Personalization Core Done (Blocking Gate)	Personalization loop is correct, immediate, and measurably better than baseline	P1 + M1/M2/M3 core slices
P2	Productized Beta	Self-serve onboarding + real-time adaptation + explanation UX works without manual curation	M2 complete, partial M3
P3	Public Launch	The product is reliable, useful, and trusted at real user volume	M3 + M5 core, M6 partial
P4	Scale + Revenue Fit	Sustainable retention and monetization without quality collapse	M6 + M7

Current Status

Phase	Status	Tests
m0p1: Embeddable Runtime Skeleton	COMPLETE	329 passing (293 unit + 36 integration + 3 doc)
m0p2: Tooling & Diagnostics	COMPLETE	349 passing (+7 metrics unit + 7 metrics integration + 9 tidalctl CLI)
m0p3: Samples & Docs	COMPLETE	11 doc tests (14 with features); 4 examples compile and run
m1p1: Core Type System and Schema	COMPLETE	77 passing
m1p2: Write-Ahead Log	COMPLETE	passing (unit + integration)
m1p3: Storage Engine Trait and fjall Backend	COMPLETE	140 passing (128 unit + 12 integration)
m1p4: Signal Ledger	COMPLETE	300 passing
m1p5: Entity CRUD and Signal Write API	COMPLETE	305 passing (300 unit + 5 integration)
m2p1: Vector Index Integration (USearch)	COMPLETE	passing
m2p2: Metadata Indexes and Filter Engine	COMPLETE	passing
m2p3: Ranking Profile Engine	COMPLETE	passing
m2p4: Diversity Enforcement	COMPLETE	passing
m2p5: Query Parser and RETRIEVE Executor	COMPLETE	passing
m3p1: User and Creator Entities with Relationships	COMPLETE	passing
m3p2: Feedback Loop -- Signal Writes Update User State	COMPLETE	passing
m3p3: Personalized Ranking Profiles	COMPLETE	passing
m3p4: User State Filters + M3 UAT	COMPLETE	571 lib + 11 m3_uat + 6 m2_uat + 5 signal_api + 8 vector_usearch passing
m4: Agent Session Layer	COMPLETE	607 lib + 12 m4_uat + 11 m3_uat + 7 m2_uat + 5 signal_api + 8 vector_usearch + 12 storage passing
m5p1: Tantivy Integration	COMPLETE	650 lib + 3 text_index integration = 653 passing; BM25 @ 10K docs = 0.26ms
m5p2: Hybrid Fusion (RRF)	COMPLETE	665 lib passing; RRF fusion @ 1K candidates = 46µs
m5p3: SEARCH Query Executor	COMPLETE	681 lib + 12 m5_search integration = 693 passing
m5p4: Creator and People Search	COMPLETE	705 lib + 9 m5_uat + 6 m5p4_creator_search + 12 m5_search = 732 passing
m6p1: Cohort Engine + Cohort-Scoped Trending	COMPLETE	748 total (739 lib + 9 m6_cohort)
m6p2: Social Graph + Collaborative Filtering	COMPLETE	812 lib + 8 m6_social integration
m6p3: Full Sort Modes + Live Content	COMPLETE	777 lib + 15 m6p3_sorts integration
m6p4: Collections + Watch History + Saved Searches	COMPLETE	971 total (794 lib + 10 m6p4_collections)
m6p5: Query Composition + SUGGEST Autocomplete	COMPLETE	1,084 total (830 lib + 11 m6p5_scope)
m6p6: Notification Capping + Adaptive Preferences + M6 UAT	COMPLETE	1,082 total (835 lib + 247 integration); 9 m6_uat passing
forage-p0: Demo Application + Behavioral Loop (Close the Loop)	COMPLETE	Server + seed corpus + MAB + feed page; dwell→completion→prefs→feed shift; 8 UAT scenarios passing; 911 lib + all prior UATs clean
forage-p1: Real Signal Surface (add_item + /capture endpoint)	COMPLETE	`add_item()` (FNV-1a URL-hashing, idempotent), `/capture` HTTP endpoint, discovered items injected into feed pool (capped VecDeque, 1000 entries); code infrastructure complete
forage-p2: Real Embeddings (Semantic Preference Model)	COMPLETE	`forage-embedder` sidecar (OpenAI text-embedding-3-small + deterministic mock mode); `ForageEngineBuilder::with_embedder(url)`; 1536-dim schema; `semantic_boost: 0.3` blended scoring; `similar_to_saved: true` pool augmentation; `semantic_search(text, limit)` + `similar_to(item_id, limit)` public methods; `read_item_embedding()` added to tidalDB; 12 smoke tests passing; 937 tidalDB lib tests clean
forage-p3: Adaptive MAB (Per-User Exploration Tuning)	COMPLETE	`ExplorationStats` (hits/total/category_signals); `adaptive_ratio()` (0.10/0.14/0.25); UCB1 bonus within exploration slot; `exploration_stats(user_id)` public API; `track_signal_stats()` wired to `signal()` + `signal_dwell()`; `last_explore_items` for outcome detection; exploration stats persisted to `exploration_stats.json` (atomic write; loaded on `open()`); 17 smoke tests passing; 960 tidalDB lib clean
forage-p4: The Surprise Moment (Bridge Item)	COMPLETE	`ItemLabel::Bridge { cat_a, cat_b }` variant; `make_bridge_item()` computes normalized midpoint of top-2 preference dims, queries ANN, injects 1 bridge item per feed (replaces last non-Exploring slot); cold users receive no bridge; feed page renders `bridge: {cat_a} × {cat_b}` badge (teal); 20 smoke tests passing
iknowyou M1: Chat Interface (Aeries)	COMPLETE	Next.js 15 + React 19 + Tailwind v4 (OKLCH dark); SSE streaming from Qwen3-8B via vLLM; Zustand state; port 59521
iknowyou M2: Memory Layer (Synap)	COMPLETE	Conversations persist; observer extracts learnings; top-5 vivid memories injected into system prompt; conversation sidebar with localStorage + Synap
iknowyou M3: Deep Observer	COMPLETE	Two-tier observer: Tier 1 extracts full ObserverOutput (engagement, style, topic, dynamics) on every exchange; Tier 2 synthesizes natural-language observations every 5 turns; dimension-tagged signal storage in Synap
iknowyou M4: Cohort Engine	COMPLETE	Person identity, soft cohort assignment, cohort priors, and profile persistence wired into chat loop
iknowyou M5: Communication Brief	IN PROGRESS	Brief assembly + `/api/brief/[personId]` + prompt injection are live; milestone acceptance validation pending
m7p1: Crash Recovery Hardening	COMPLETE	900 lib + 8 m7_crash_property + 10 m7_crash_m6 + 5 m7_crash_invariant (100 cases); recovery bench passing; WAL compaction, BLAKE3 checkpoint integrity, hard-negative crash invariant
m7p2: Graceful Degradation, Rate Limiting, and Session Cleanup	COMPLETE	896 lib + 12 m7p2_load; 1,191 total (--features test-utils); 4-stage degradation, per-agent token-bucket rate limiter, session TTL sweeper
m7p3: Performance at Scale	COMPLETE	900 lib + all integration; 1,201 total; scale bench (1M items), USearch ef=400, LogMergePolicy, signal trimmer (5M entry cap), social scale tests
m7p4: Operational Visibility	COMPLETE	946 lib + 28 m7p4_visibility (--features test-utils); QueryStats, WAL/signal/index Prometheus metrics, tidalctl diagnostics, RLHF export, cross-session aggregation
Enterprise Readiness + M7 UAT	COMPLETE	960 lib + ~155 integration passing; all P0/P1 gaps resolved; m7_uat.rs passing (crash recovery, degradation, rate limiting, observability, regression gate)
m8p1: Shard-Aware Foundations	COMPLETE	1029 lib; ShardId, RegionId, WalSegmentId, ShardRouter, ReplicationState, NodeConfig/NodeRole, BatchHeader v2, shard-aware segment naming
m8p2: WAL Shipping and Follower Replay	COMPLETE	1054 lib + 8 m8p2_replication integration; Transport trait, InProcessTransport, WalShipper, SegmentReceiver, FollowerDb (ReadOnly guards), ReplicationLagGauge
m8p3: CRDT Counters and Deterministic Reconciliation	COMPLETE	1125 lib + 13 m8p3_crdt property tests; HLC/HlcTimestamp, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine, StateSnapshot
m8p4: Session Continuity Across Regions	COMPLETE	1163 lib + 8 m8p4_session integration tests; SessionSeqNo+HWM, IdempotencyKey/Store (lru 0.12), SessionReplicationBridge (sync crossbeam), HardNeg union semantics with HLC gating
m8p5: Control Plane + Multi-Tenancy + Routing	COMPLETE	1194 lib + 5 m8p5_multitenancy integration tests; TenantId/TenantConfig, TenantRateLimiter (AtomicU64 CAS token-bucket), TenantRouter (Jump Consistent Hash), ControlPlane (shard heartbeat + health), TenantMigration state machine, RollingUpgradeCoordinator
m8p6: End-to-End UAT	COMPLETE	1,206 lib + 8 m8_uat tests (0.11s); SimulatedCluster (signal-replay harness, 3 regions), NetworkPartition/ShardCrash RAII fault injection, 5 UAT steps + 3 perf assertions; p99 replication < 2s, failover < 10s, CRDT reconciliation < 100ms

M0 Embeddable Runtime: COMPLETE — m0p1 (skeleton), m0p2 (tooling/diagnostics), m0p3 (samples/docs). Zero-config in-process runtime with WAL, fjall backend, and tidalctl CLI operational.

M1 Signal Engine: COMPLETE — m1p1–m1p5 all done. Signals are a database primitive with O(1) decay, windowed aggregation, and velocity — not application math. WAL + fjall durability included.

M2 Ranked Retrieval: COMPLETE — m2p1–m2p5 all done. RETRIEVE query combines vector index (USearch), metadata filters, ranking profiles, and diversity in one operation.

M3 Personalized Ranking: COMPLETE — m3p1–m3p4 all done. User/creator entities, feedback loop, personalized profiles, hard negatives, and "For You" query working end-to-end.

M4 Agent Memory: COMPLETE — Sessions, session policy, RLHF export, cross-session aggregation, and crash recovery for agent-mediated personalization all operational.

M5 Hybrid Search: COMPLETE — m5p1–m5p4 all done. BM25 + ANN + RRF fusion, creator search, similar-to, search_click feedback. Hybrid search < 50ms; creator search < 20ms. Re-verified 2026-02-24: 1,206 lib + 27 M5 integration tests passing.

M6 Full Surface Coverage: COMPLETE — m6p1–m6p6 all done. All 14 use cases, every sort mode, cohort trending, social graph scoping, collections, live content, notification capping, adaptive preferences, SUGGEST autocomplete. Re-verified 2026-02-24: 1,206 lib + 70 M6 integration tests passing.

M7 Production Hardening: COMPLETE — m7p1–m7p4 + Enterprise Readiness all done. Crash recovery (BLAKE3 integrity, WAL compaction), 4-stage graceful degradation, per-agent rate limiting, session TTL sweeper, scale to 1M items, Prometheus metrics, tidalctl diagnostics, RLHF export.

M8 Distributed Fabric: COMPLETE — m8p1–m8p6 all done. Shard-aware keyspaces, WAL shipping + follower replay, CRDT counters + deterministic reconciliation, session continuity across regions, control plane + multi-tenancy + jump-consistent routing, rolling upgrade coordinator. 1,206 lib + all phase integration tests passing.

Forage: COMPLETE — All 5 phases done (P0: demo loop, P1: real signal surface, P2: semantic embeddings, P3: adaptive MAB, P4: bridge/surprise moment). Chrome extension + forage-server + forage-engine + forage-embedder sidecar all operational.

iknowyou / Aeries: IN PROGRESS (as of 2026-02-24) — M1–M4 complete. M5 (Communication Brief) is in progress with core implementation live; acceptance validation pending.

Next (engine): M9 Phase 1 — Signal Scope and Share Contract. Next (product): iknowyou M5 acceptance pass, then M6 Closed Loop (session lifecycle + preference drift validation).

Product Track: Personal Briefing Feed (Knowledge Workers + Consumers)

This track defines the milestones for the actual product experience (not only the database engine).
Use case reference: docs/personal-briefing-beachhead.md. Dedicated roadmap: docs/planning/PRODUCT_ROADMAP.md.

P0: Beachhead Validation -- "Do users care enough to return?"

Milestone Thesis

Validate that a personal briefing feed solves a painful daily job for users and drives repeat use.

Acceptance Criteria

Recruit 20-50 target users (knowledge workers + high-intent consumers).
Run daily briefing prototype (can include manual source QA).
At least one meaningful feedback action per session for the median user (more, less, hide, mute, save).
User interviews confirm value vs baseline feeds ("less noise", "more useful", "saves time").
D2 retention reaches agreed threshold for target segment.

P1: Concierge Alpha -- "High-value daily brief for a narrow cohort"

Milestone Thesis

Deliver a reliable daily Today Brief experience with immediate visible adaptation after user feedback.

Acceptance Criteria

App surface: ranked brief, reason labels, source links, save/feedback controls.
Feedback loop: next refresh reflects less/hide/mute actions immediately.
Time-budget mode (5/10/20 min) is available and used.
Diversity constraints prevent source/topic domination in top results.
Weekly active usage demonstrates repeated utility.

P2: Productized Beta -- "Self-serve and repeatable without handholding"

Milestone Thesis

Turn the alpha into a self-serve product with stable onboarding, trust UX, and measurable quality.

Acceptance Criteria

Self-serve onboarding completed in under 3 minutes.
"Why this" explanations are present and understandable on every briefing card.
Cohort layer available ("trending for people like you").
Trust controls available (source transparency, mute/hide persistence).
D7 retention and "useful item rate" exceed baseline comparison feed.
PG1 Personalization Core Done gate has passed.

P3: Public Launch -- "Trusted at real volume"

Milestone Thesis

Launch publicly with reliability, quality, and trust guardrails suitable for broad use.

Acceptance Criteria

Reliability and latency SLOs defined and met for briefing generation.
Quality floor enforced (freshness, source quality, duplicate suppression).
Notification cadence controls prevent spam.
Core support and incident process in place for user-facing regressions.

P4: Scale + Revenue Fit -- "Sustainable business without degrading quality"

Milestone Thesis

Prove the product can grow and monetize while preserving user trust and briefing quality.

Acceptance Criteria

Monetization model validated (subscription, team plan, or equivalent).
Revenue metrics tracked alongside quality metrics (no quality-revenue trade-off regressions).
Retention and engagement remain stable as volume increases.
Product roadmap for next segment expansion is data-backed.

PG1: Personalization Core Done (Blocking Gate)

Milestone Thesis

Before product breadth expansion, the core personalization loop must be provably correct and immediately responsive.

Acceptance Criteria

Hard negatives (hide/mute/block) never leak after write, restart, or replay.
Explicit feedback (more/less/skip/save) changes next-refresh ranking within target latency.
User personalization state rebuilds deterministically from checkpoint + WAL replay.
Useful-item rate and repeated-unwanted-item rate outperform a non-personalized baseline.
Diversity guardrails hold while maintaining personalization quality.

Milestone 0: Embeddable Runtime -- "Runs in your process in minutes"

Milestone Thesis

Before we prove any ranking math, developers must be able to embed tidalDB inside an existing service with zero operational prep. M0 delivers the runtime glue — an ergonomic builder API, deterministic storage layout, a tiny admin CLI, and living examples — so the very first experience is cargo add tidaldb, TidalDb::builder().in_memory().open(), and a passing smoke test.

Phases

Phase 1: Embeddable Runtime Skeleton

Delivers: A cohesive Config/Builder API for single-process use, with in-memory and filesystem-backed defaults, sandboxed data directories, and graceful shutdown hooks developers can call from tests or application drop handlers.

Builder exposes ephemeral() / single_process() shortcuts and eagerly validates directories.
Shutdown hooks drain WAL writer threads and surface errors.
Temp-directory helper guarantees deterministic cleanup (used in doctests).

Phase 2: Tooling & Diagnostics

Delivers: tidalctl (a minimal CLI) for inspecting embedded instances, plus a lightweight metrics surface (Prometheus text or JSON) tagged with the same IDs future distributed deployments will use.

tidalctl status --path <dir> returns JSON with WAL seq, config hash, uptime.
Metrics endpoint optional (disabled by default) exposes /metrics and /healthz.
Tooling reuses the same path helpers from Phase 1.

Phase 3: Samples & Docs

Delivers: Quick-start samples (For You POC + integration tests) compiled as doctests, and reference snippets for embedding tidalDB inside Axum/Actix or a CLI app. Keeps DX in lockstep with the runtime.

Quickstart example + doctest run under CI (cargo test --doc --examples).
Axum/Actix embedding examples include graceful shutdown + metrics wiring.
CONTRIBUTING updated with “run samples” checklist.

UAT Scenario

Given:
  // in tests/lib.rs
  let db = TidalDb::builder()
      .ephemeral()
      .with_temp_dir()
      .open()
      .unwrap();

When:
  db.health_check();           // ok
  tidalctl status --path <dir> // prints WAL, storage, signal counts
  cargo test --doc             // quick-start snippet compiles & runs

Then:
  - Builder defaults require zero manual config
  - CLI connects to the same files used by the embedded process
  - Samples stay in sync (failing doctest fails CI)

Milestone 1: Signal Engine -- "Signals are a database primitive"

Milestone Thesis

A developer can open a tidalDB instance, define signal types with decay rates, write engagement events, and read back decay-correct scores and windowed aggregates -- all without computing any temporal math in application code. This proves that the hardest primitive (temporal signals with O(1) decay, velocity, and windowed aggregation) works correctly and meets the performance budget.

UAT Scenario

Given:
  A tidalDB instance is opened with a schema defining:
    - Entity type: Item with metadata fields (title, category, created_at)
    - Signal type: "view" with exponential decay, half_life=7d, windows=[1h, 24h, 7d]
    - Signal type: "like" with exponential decay, half_life=14d, windows=[24h, 7d, all_time]
    - Signal type: "skip" with exponential decay, half_life=1d, windows=[1h, 24h]

When:
  1. Write 100 items with metadata
  2. Write 10,000 signal events across the items (views, likes, skips)
     with timestamps spanning the last 7 days
  3. Read the decay score for item #42, signal "view", at current time
  4. Read the windowed count for item #42, signal "view", window=24h
  5. Read the velocity for item #42, signal "view", window=1h
  6. Write a new "view" event for item #42
  7. Immediately re-read the decay score, windowed count, and velocity
  8. Close and reopen the tidalDB instance
  9. Re-read all values for item #42

Then:
  - Step 3: Decay score matches S(t) = sum(w_i * exp(-lambda * (t - t_i)))
    computed analytically from raw events, to 6 decimal places
  - Step 4: Windowed count equals the exact count of "view" events
    within the last 24h window
  - Step 5: Velocity equals windowed_count / window_duration
  - Step 7: All values reflect the new event immediately
    (decay score increased, count incremented, velocity updated)
  - Step 9: All values match step 7 (crash recovery preserves state)
  - Performance: decay score read < 100ns per entity,
    signal write < 100us including WAL fsync (amortized),
    200-entity scoring pass < 5us

Phases

Phase 1: Core Type System and Schema

Delivers: The foundational type system -- entity IDs, signal type definitions, decay rate declarations, window specifications, and the error types that every subsequent module depends on. The schema module that validates and stores signal/entity definitions.

Acceptance Criteria:

EntityId is a u64 newtype with Display, Hash, Eq, Ord, to_be_bytes() (big-endian, preserves numeric ordering)
EntityKind enum: Item, User, Creator
SignalTypeDef captures: name, target EntityKind, DecayModel (exponential with pre-computed lambda / linear / permanent), WindowSet, velocity enabled flag
DecayModel::Exponential stores pre-computed lambda = ln(2) / half_life.as_secs_f64() -- no division on hot path
Window enum: OneHour, TwentyFourHours, SevenDays, ThirtyDays, AllTime with duration(), label(), duration_secs_f64()
WindowSet deduplicates and sorts windows; empty() for permanent signals
LumenError enum covers Storage, NotFound, Schema, Durability, Query, Internal variants with From impls for each sub-error
SchemaError enum validates: duplicate signal names, invalid identifiers, zero half-life/lifetime, empty windows for non-permanent signals, velocity without windows
Schema validation via SchemaBuilder rejects invalid configurations at construction time
Property tests: lambda correctness across half-life range, byte ordering preservation
cargo fmt clean, cargo clippy -D warnings clean, all 77 tests pass

Depends On: None Complexity: M Research Reference: docs/research/tidaldb_signal_ledger.md (decay formula, EntityState struct)

Phase 2: Write-Ahead Log

Delivers: A durable, append-only log for signal events. Every signal write is fsync'd before acknowledgment. Group commit amortizes fsync cost. Content-addressed events via BLAKE3 for deduplication. The WAL is the source of truth -- all other state is derived.

Acceptance Criteria:

WAL entries are length-prefixed with BLAKE3 checksums
Group commit batches up to 100 events or 10ms, whichever comes first
Duplicate events (same BLAKE3 hash) are silently deduplicated
WAL replay from any checkpoint produces identical state to uninterrupted execution (property test with 10,000+ random event sequences)
fsync is called per batch, not per event
WAL can be truncated after a checkpoint without losing committed state
Crash simulation (kill at random WAL positions) never produces corrupt state -- either the event is committed or it is not

Depends On: Phase 1 Complexity: L Research Reference: docs/research/tidaldb_wal.md (wire format, group commit, crash detection, deduplication), thoughts.md Part II.1 (WAL convergence), Part V.5-6 (quarantine-first, group commit)

Phase 3: Storage Engine Trait and fjall Backend

Delivers: The StorageEngine trait abstraction and two implementations: FjallBackend (fjall 3 LSM-tree) for production and InMemoryBackend (BTreeMap + RwLock) for deterministic testing. Key encoding follows the subject-prefix pattern with a Tag discriminant. FjallStorage coordinates three keyspaces per entity kind. FjallAtomicBatch provides cross-keyspace atomic writes.

Acceptance Criteria:

StorageEngine trait with get, put, delete, scan_prefix, write_batch, flush operations
Key encoding: [entity_id: 8 bytes BE][0x00][Tag: 1 byte][suffix...] with Tag enum (Evt=0x01, Sig=0x02, Meta=0x03, Rel=0x04, Mv=0x05, Idx=0x06)
encode_key, parse_key roundtrip correctly for all tag variants and arbitrary suffixes
entity_prefix (9 bytes) and entity_tag_prefix (10 bytes) for scoped prefix scans
Byte-lexicographic key ordering matches numeric entity ID ordering (property tested)
FjallBackend wraps a single fjall Keyspace, implements StorageEngine
FjallStorage owns a fjall Database with three keyspaces: "items", "users", "creators" (one per EntityKind)
FjallStorage::backend(EntityKind) routes to the correct keyspace backend
Entity kind isolation: same key written to different entity kinds does not collide
FjallAtomicBatch provides cross-keyspace atomic writes via fjall::OwnedWriteBatch
Data persists across close and reopen (flush_all + reopen test)
InMemoryBackend uses BTreeMap + RwLock for deterministic, sorted, concurrent testing
WriteBatch and BatchOp types for atomic multi-operation writes
PrefixIterator type alias for boxed prefix scan iterators
Property tests with proptest: encode/parse roundtrip, prefix ordering, prefix containment
Criterion benchmarks passing
cargo fmt clean, cargo clippy -D warnings clean, all 140 tests pass (128 unit + 12 integration)

Depends On: Phase 1 Complexity: L Research Reference: thoughts.md Part V.9 (hybrid storage), Part V.12 (subject-prefix keys), CODING_GUIDELINES.md section 2

Phase 4: Signal Ledger -- Decay Scores and Windowed Aggregation

Delivers: The in-memory per-entity signal state with running decay scores (O(1) update, O(1) read) and bucketed windowed counters. Signal writes update the running scores atomically. Signal reads return decay-correct values without scanning raw events. State is checkpointed to storage for crash recovery.

Acceptance Criteria:

EntitySignalState is #[repr(C, align(64))] -- one L1 cache line per hot-path struct
Running decay formula: S(t) = S(t_prev) * exp(-lambda * dt) + weight -- mathematically exact, verified against analytical brute-force computation to 6 decimal places across 10,000 random event sequences (property test)
Out-of-order events handled correctly: when t_event < last_update, weight is pre-decayed: score += weight * exp(-lambda * (last_update - t_event))
Windowed counts use per-minute bucketed counters (BucketedCounter) supporting 1h/24h/7d windows
Velocity = windowed_count / window_duration_seconds
Signal write latency < 100 microseconds including WAL write (amortized), benchmarked with criterion
Decay score read latency < 100ns per entity per lambda, benchmarked with criterion
200-entity scoring pass < 5 microseconds, benchmarked with criterion
State checkpointed to storage every 30 seconds; crash recovery reconstructs from checkpoint + WAL replay
DashMap or sharded map for concurrent entity state access; signal counters use AtomicU64 with Relaxed ordering

Depends On: Phase 2, Phase 3 Complexity: XL Research Reference: docs/research/tidaldb_signal_ledger.md (running-score formula, SWAG, BucketedCounter, EntityState struct, three-tier architecture)

Phase 5: Entity CRUD and Signal Write API

Delivers: The public API surface for Milestone 1. TidalDB::open(), TidalDB::shutdown(), entity write/read, signal write/read. This is the interface the UAT scenario tests against. Includes the signal() method that atomically writes to WAL, updates in-memory state, and returns immediately.

Acceptance Criteria:

TidalDB::open(config) opens storage, restores in-memory state from checkpoint + WAL replay, returns Result<TidalDB>
TidalDB::shutdown() checkpoints all in-memory state, syncs WAL, closes storage cleanly
db.write_item(id, metadata) stores entity metadata
db.signal(signal_type, entity_id, weight, timestamp) atomically: appends to WAL, updates decay scores, updates windowed counters
db.read_decay_score(entity_id, signal_type, lambda_index) returns current decayed score
db.read_windowed_count(entity_id, signal_type, window) returns count within window
db.read_velocity(entity_id, signal_type, window) returns count / window_duration
Full UAT scenario passes as an integration test
TidalDB is Send + Sync -- safe to share across threads behind Arc

Depends On: Phase 4 Complexity: M Research Reference: CODING_GUIDELINES.md section 9 (public API surface)

Deferred to Later Milestones

User entities and preference vectors -- deferred to M3 because M1 proves the signal primitive without needing user context
Creator entities and relationship edges -- deferred to M2/M3 because M1 only needs items to prove signal correctness
Vector index (USearch) -- deferred to M2 because M1 does not need ANN retrieval
Text index (Tantivy) -- deferred to M4 because M1 does not need full-text search
Ranking profiles -- deferred to M2 because M1 proves signals work; M2 proves ranking over signals works
Query parser -- deferred to M2; M1 uses the Rust API directly
Diversity enforcement -- deferred to M2 because M1 does not produce ranked result sets
Signal rollups (hourly/daily materialization) -- deferred to M5 because the bucketed counter approach serves the performance budget through M4; rollups become necessary only at scale for 30d+ windows
RocksDB backend -- deferred indefinitely; fjall is the primary backend, RocksDB is the trait-abstracted fallback if benchmarks demand it

Integration Test

#[test]
fn milestone_1_uat() {
    // Open tidalDB with signal schema
    let db = TidalDB::open(Config {
        data_dir: temp_dir(),
        schema: Schema::builder()
            .entity_type("item", &["title", "category", "created_at"])
            .signal("view", Decay::exponential(Duration::days(7)),
                    &[Window::Hours(1), Window::Hours(24), Window::Days(7)])
            .signal("like", Decay::exponential(Duration::days(14)),
                    &[Window::Hours(24), Window::Days(7), Window::AllTime])
            .signal("skip", Decay::exponential(Duration::days(1)),
                    &[Window::Hours(1), Window::Hours(24)])
            .build(),
    }).unwrap();

    // Write 100 items
    for i in 0..100 {
        db.write_item(EntityId(i), metadata(i)).unwrap();
    }

    // Write 10,000 signal events spanning 7 days
    let events = generate_events(10_000, Duration::days(7));
    for e in &events {
        db.signal(e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Read and verify item #42
    let now = Timestamp::now();
    let analytical_score = compute_analytical_decay(&events, EntityId(42), "view", now);
    let actual_score = db.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!((actual_score - analytical_score).abs() < 1e-6);

    let analytical_count = count_events_in_window(&events, EntityId(42), "view", now, Duration::hours(24));
    let actual_count = db.read_windowed_count(EntityId(42), "view", Window::Hours(24)).unwrap();
    assert_eq!(actual_count, analytical_count);

    // Write new event and verify immediate visibility
    db.signal("view", EntityId(42), 1.0, now).unwrap();
    let new_score = db.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!(new_score > actual_score);

    // Close, reopen, verify persistence
    db.shutdown().unwrap();
    let db2 = TidalDB::open(same_config()).unwrap();
    let recovered_score = db2.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!((recovered_score - new_score).abs() < 1e-6);
}

Done When

A developer can embed tidalDB as a Rust dependency, define signal types with decay rates and windows in schema, write thousands of signal events, and read back decay-correct scores, windowed counts, and velocity values that match analytical computation to 6 decimal places -- including after a crash and restart. Performance benchmarks pass: signal write < 100us amortized, decay read < 100ns per entity, 200-entity scoring < 5us.

Milestone 2: Ranked Retrieval -- "A single query retrieves, scores, and ranks content"

Milestone Thesis

A developer can write items with metadata and embeddings, write signal events, and execute a RETRIEVE query that returns items ranked by a named profile using live signal scores -- with metadata filters and diversity constraints applied by the database, not the application. This proves that ranking is a database operation, not application logic.

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items with metadata (title, category, format, duration, created_at)
      and 1536-dim embeddings
    - Signal types: view (7d decay), like (14d decay), skip (1d decay),
      share (3d decay), completion (30d decay)
    - 100,000 signal events spanning 7 days across the items
    - Ranking profiles defined:
      * "trending" -- share_velocity(6h) primary, view_velocity(6h) secondary,
        engagement_ratio gate > 0.03
      * "hot" -- score / (age_hours + 2)^1.8
      * "new" -- created_at DESC
      * "top_week" -- quality_score within 7d window
      * "hidden_gems" -- high completion_rate, inverse view_count
      * "controversial" -- max(likes * dislikes)

When:
  1. RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25
  2. RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20
  3. RETRIEVE items USING PROFILE new LIMIT 20
  4. RETRIEVE items USING PROFILE top_week LIMIT 20
  5. RETRIEVE items USING PROFILE hidden_gems FILTER min_completion_rate:0.7 LIMIT 10
  6. RETRIEVE items USING PROFILE controversial LIMIT 10
  7. Write a burst of 100 "share" signals for item #500
  8. Re-execute the trending query

Then:
  - Step 1: Items ordered by share velocity, max 1 per creator, items with
    engagement_ratio < 0.03 excluded
  - Step 2: Only jazz items returned, ordered by hot formula
  - Step 3: Items ordered by created_at descending, no signal computation
  - Step 4: Items ordered by quality score computed from 7d-windowed signals
  - Step 5: Items with high completion but low views, sorted by quality/reach ratio
  - Step 6: Items with highest product of positive and negative signals
  - Step 7: ok
  - Step 8: Item #500 appears higher in trending results (signal written 100ms ago
    is reflected)
  - Performance: end-to-end RETRIEVE < 50ms for 10K items

Phases

Phase 1: Vector Index Integration (USearch)

Delivers: USearch wrapped behind a trait, with mmap persistence, f16 quantization, and the adaptive filtered search planner. Items can be inserted with embeddings and retrieved by ANN similarity.

Acceptance Criteria:

VectorIndex trait with insert(key, vector), remove(key), search(query, k), filtered_search(query, k, predicate), save(), load(), view()
USearch backend implements the trait with f16 quantization (default), mmap persistence
Vectors normalized at insertion time (L2 distance equivalent to cosine for unit vectors)
Adaptive query planner: selectivity < 2% triggers pre-filter + brute-force; 2-100% uses filtered_search with predicate callback
ANN retrieval at 10K vectors returns top-100 with recall@10 > 0.95
ANN retrieval latency < 10ms at 10K vectors (benchmarked)
Persistence: save on checkpoint, view() on restart for immediate read serving
#![forbid(unsafe_code)] relaxed only in the USearch FFI boundary module with SAFETY comments

Depends On: m1p3 (storage traits) Complexity: L Research Reference: docs/research/ann_for_tidaldb.md (USearch architecture, filtered search, f16, mmap)

Phase 2: Metadata Indexes and Filter Engine

Delivers: Roaring bitmap indexes for categorical metadata, B-tree indexes for range attributes, and a composable filter engine that evaluates arbitrary filter combinations. The filter engine produces either a bitmap (for pre-filtering ANN) or a predicate closure (for in-graph filtering).

Acceptance Criteria:

Roaring bitmap per high-cardinality metadata value: category, format, creator_id
B-tree index for range attributes: created_at, duration
Filter expressions are composable: AND across dimensions, OR within a dimension
filter.selectivity() estimates the fraction of items matching (for query planner)
filter.to_bitmap() returns a RoaringBitmap for pre-filtering
filter.to_predicate() returns a Fn(EntityId) -> bool for in-graph filtering
Filters tested: category:jazz, format:video, duration_min:5m, created_within:7d, and arbitrary combinations
Filter evaluation < 1 microsecond per candidate (benchmarked)

Depends On: m1p3 (storage engine) Complexity: M Research Reference: docs/research/ann_for_tidaldb.md (metadata indexes, selectivity estimation, roaring bitmaps)

Phase 3: Ranking Profile Engine

Delivers: Named ranking profiles declared as data (not compiled code), parsed, validated, stored, and executed by the database. Profiles reference signal scores, windowed aggregates, velocity, metadata fields, and define quality gates. Profiles are versioned and swappable at query time.

Acceptance Criteria:

Profile declaration syntax supports: primary signal, secondary signals with weights, BOOST, GATE (minimum threshold), PENALIZE, EXCLUDE
Profiles stored in schema, versioned, retrievable by name
Profile execution: given a candidate set and a profile, produce a scored and sorted result list
Built-in profiles implemented: trending, hot, new, top_week, top_month, top_all_time, hidden_gems, controversial, most_viewed, most_liked, shuffle
hot formula: log10(max(|positive - negative|, 1)) / (age_hours + 2)^gravity with configurable gravity
controversial formula: (positive * negative) / (positive + negative)^2
hidden_gems formula: quality_score * (1 / log10(view_count + 10)) -- the +10 prevents division by zero for items with zero views
Profile change does not require recompile -- profiles are runtime data
200-candidate scoring pass with decay-only profile < 10 microseconds, with velocity-based profile (trending) < 100 microseconds (both Criterion benchmarked)

Depends On: m1p4 (signal ledger) Complexity: L Research Reference: VISION.md (ranking profile declarations), ai-lookup/services/ranking-profiles.md, USE_CASES.md Appendix B (sort mode formulas)

Phase 4: Diversity Enforcement

Delivers: Post-scoring diversity pass that reorders results to satisfy constraints (max_per_creator, format_mix) without reducing result count. Implemented as a greedy selection pass over the scored candidate list.

Acceptance Criteria:

max_per_creator:N enforced: no more than N items from any single creator in the result set
format_mix:true enforced: no more than 60% of results from any single format
Diversity pass does not reduce result count -- it selects the next-best candidate that satisfies constraints
Diversity pass adds < 1ms for 200 candidates (benchmarked)
When diversity constraints cannot be fully satisfied (too few creators), results are returned with a warning flag, not an error
Property test: diversity constraints hold for 10,000 random candidate sets

Depends On: Phase 3 (ranking profiles produce scored lists) Complexity: M Research Reference: VISION.md (diversity as query constraint), thoughts.md Part V.14 (MMR post-scoring)

Phase 5: Query Parser and RETRIEVE Executor

Delivers: The query parser for the RETRIEVE operation and the executor that orchestrates candidate retrieval, filtering, scoring, diversity, and result assembly. This is the "one query" entry point. For M2, the RETRIEVE query does not require FOR USER (no personalization yet) -- it operates on the full item corpus with filters and profiles.

Acceptance Criteria:

Parser handles: RETRIEVE items, USING PROFILE <name>, FILTER <conditions>, DIVERSITY <constraints>, LIMIT <n>, EXCLUDE [ids]
Parser produces a typed AST; parse errors include position and helpful message
Executor pipeline: candidate retrieval (ANN or full scan based on profile) -> filter -> score -> diversity -> limit -> return
When profile uses velocity/decay signals, executor uses ANN retrieval over embeddings then scores with signal state
When profile is new or alphabetical, executor skips ANN and uses metadata index directly
End-to-end RETRIEVE latency < 50ms at 10K items (benchmarked)
Results include: entity_id, score, and a signal snapshot (key signal values used in scoring) for debugging/transparency
SIGNAL write command also parsed and routed to signal write path from M1
Full M2 UAT scenario passes as an integration test

Depends On: Phase 1, Phase 2, Phase 3, Phase 4 Complexity: L Research Reference: ai-lookup/features/query-language.md, SEQUENCE.md (all sequence diagrams)

Deferred to Later Milestones

FOR USER clause and user preference vectors -- deferred to M3; M2 proves ranking works without personalization
SIMILAR TO clause (related content) -- deferred to M3; requires user context for personalization layer
Relationship graph (follows, blocks) -- deferred to M3; M2 filters on metadata, not relationships
SEARCH query (text + semantic) -- deferred to M4; M2 proves RETRIEVE ranking
Full-text index (Tantivy) -- deferred to M4
Exploration budget / cold start -- deferred to M3; requires user context to be meaningful
User state filters (unseen, saved, liked) -- deferred to M3; requires user entities
Engagement threshold filters (min_views, min_likes) -- partially implemented via signal reads; full composable filter syntax deferred to M5

Integration Test

#[test]
fn milestone_2_uat() {
    let db = open_with_full_schema();

    // Write 10K items with embeddings
    for i in 0..10_000 {
        db.write_item(EntityId(i), metadata(i), Some(embedding(i))).unwrap();
    }

    // Write 100K signal events
    for e in generate_events(100_000, Duration::days(7)) {
        db.signal(e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Trending query with diversity
    let results = db.retrieve(
        "RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25"
    ).unwrap();
    assert_eq!(results.len(), 25);
    assert!(results.windows(2).all(|w| w[0].score >= w[1].score));
    assert!(creator_counts(&results).values().all(|&c| c <= 1));

    // Category filter with hot sort
    let jazz = db.retrieve(
        "RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20"
    ).unwrap();
    assert!(jazz.iter().all(|r| r.metadata["category"] == "jazz"));

    // Signal freshness: write burst, verify ranking change
    let pre_burst = db.retrieve(
        "RETRIEVE items USING PROFILE trending LIMIT 10"
    ).unwrap();
    for _ in 0..100 {
        db.signal("share", EntityId(500), 1.0, Timestamp::now()).unwrap();
    }
    let post_burst = db.retrieve(
        "RETRIEVE items USING PROFILE trending LIMIT 10"
    ).unwrap();
    let pre_rank = pre_burst.iter().position(|r| r.id == EntityId(500));
    let post_rank = post_burst.iter().position(|r| r.id == EntityId(500));
    assert!(post_rank.unwrap() < pre_rank.unwrap_or(25));
}

Done When

A developer can write items with embeddings and metadata, write signal events, and execute RETRIEVE queries with any of the 11+ built-in sort modes, metadata filters, and diversity constraints. Results are correctly ranked by the named profile. Signal events written 100ms ago are reflected in the next query. End-to-end latency < 50ms at 10K items. Diversity constraints hold in every result set.

Milestone 3: Personalized Ranking -- "The For You query works"

Milestone Thesis

A developer can write user entities with preference vectors, write relationship edges (follows, blocks), write engagement signals that update user profiles and relationship weights automatically, and execute RETRIEVE items FOR USER @user_id USING PROFILE for_you -- getting results shaped by the user's history, relationships, and implicit preferences. This proves that the feedback loop closes inside the database.

Enables

UC-01 (For You Feed) -- Full: personalized ranking with diversity, exploration, cold start
UC-04 (Following Feed) -- Full: restricted to followed creators, chronological + quality tiebreaker
UC-05 (Related/Up Next) -- Core: ANN retrieval from source item, user preference re-ranking
UC-07 (Notifications) -- Core: relationship-strength scoring, recency filtering
UC-09 (User Library) -- Partial: unseen/liked/saved filters enable history and library queries

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items across 200 creators, with embeddings
    - 500 users with initial preference embeddings
    - Relationship edges: follows, blocks
    - Signals: view, like, skip, hide, completion, share
    - 500,000 historical signal events establishing user preferences
    - Profiles: for_you, following, related, notification

When:
  1. RETRIEVE items FOR USER @user_42 USING PROFILE for_you
     FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50
  2. RETRIEVE items FOR USER @user_42 FILTER relationship:follows
     USING PROFILE following LIMIT 50
  3. RETRIEVE items SIMILAR TO @item_abc FOR USER @user_42
     USING PROFILE related FILTER unseen LIMIT 10
  4. SIGNAL like item:@item_xyz user:@user_42
  5. Re-execute the for_you query
  6. SIGNAL hide item:@item_999 user:@user_42
  7. SIGNAL block user:@user_42 target_creator:@creator_77
  8. Re-execute the for_you query

Then:
  - Step 1: Results personalized -- items matching user_42's preference vector
    rank higher; items from blocked creators excluded; items already seen excluded;
    max 2 per creator; 10% exploration budget (items from unfollowed creators)
  - Step 2: Only items from followed creators, chronological order
  - Step 3: Items semantically similar to @item_abc, re-ranked by user_42's
    preference match, already-seen excluded
  - Step 4: Signal write atomically updates: item like count, user->creator
    interaction weight, user preference vector shifted toward item embedding
  - Step 5: Results shift -- items similar to @item_xyz's topic rank higher;
    creator of @item_xyz appears more frequently
  - Step 6: @item_999 never appears in any future query for user_42
  - Step 7: All items by creator_77 excluded from all queries for user_42
  - Step 8: No items from creator_77; no item_999; shift from like reflected

Phases

Phase 1: User and Creator Entities with Relationships (m3p1)

Delivers: User and creator entity types stored in their own fjall keyspaces (EntityKind::User, EntityKind::Creator) with preference embeddings, metadata, and a relationship graph. Relationship edges are (from_entity, to_entity, type, weight, timestamp) stored under the Tag::Rel key prefix. Three user-state bitmap indexes (FollowsBitmap, UserSeenBitmap, UserBlockedSet) power the unseen, unblocked, and relationship:follows filters.

Acceptance Criteria:

db.write_user(user_id, metadata, Option<embedding>) stores user entity in the users keyspace
db.write_creator(creator_id, metadata, Option<embedding>) stores creator entity in the creators keyspace
db.write_relationship(from, to, rel_type, weight, timestamp) stores a directional weighted edge
db.read_relationship(from, to, rel_type) returns Option<RelationshipEdge>
db.list_relationships(from, rel_type) returns all edges of a type from a source entity
Relationship types supported: follows, blocks, interaction_weight, hide, mute
Key encoding: [from_entity_id][0x00][REL][type_byte][to_entity_id] for O(1) lookup and prefix scan by (from, type)
FollowsBitmap::for_user(user_id) returns a RoaringBitmap of item IDs from all followed creators
UserSeenBitmap::for_user(user_id) returns a RoaringBitmap of item IDs the user has viewed
UserBlockedSet::for_user(user_id) returns blocked creator IDs + hidden item IDs
Relationship write/read latency < 50 microseconds (benchmarked)
User and creator entities persist across shutdown and restart
Relationships persist across shutdown and restart via storage engine

Task Breakdown:

#	Task	Delivers	Complexity
01	User + Creator Entity Types and Storage	`UserEntity`, `CreatorEntity`, write/read APIs, metadata codec, embedding slots	M
02	Relationship Graph	`RelationshipEdge`, `RelationshipType`, storage codec, CRUD operations, prefix scan	L
03	User-State Bitmap Indexes	`FollowsBitmap`, `UserSeenBitmap`, `UserBlockedSet`, bitmap maintenance hooks	M

Depends On: m1p1 (types), m1p3 (storage engine, key encoding, Tag::Rel), m1p5 (entity write API pattern), m2p1 (vector index for embedding storage), m2p2 (bitmap indexes, FilterExpr, FilterResult) Complexity: L (3 sequential tasks: 01 -> 02 -> 03) Research Reference: docs/research/tidaldb_signal_ledger.md (three-tier storage, subject-prefix keys), docs/research/ann_for_tidaldb.md (user preference vector in embedding slot), thoughts.md Part V.12 (subject-prefix keys), Part V.16 (user preference vector)

Phase 2: Feedback Loop -- Signal Writes Update User State (m3p2)

Delivers: Atomic multi-state updates on signal write. When a signal event is written (view, like, skip, hide, block, completion, share), the database atomically updates: the item's signal ledger, the user's preference vector (EMA), the user-to-creator interaction weight, and the user-state bitmap indexes. Four components: (1) user preference vector EMA update with configurable learning rate, (2) interaction weight ledger using the existing decay infrastructure from m1p4, (3) hard negative storage with WAL-backed durability, and (4) an atomic signal dispatch that wires all state updates into a single transactional signal write.

Acceptance Criteria:

db.signal("view", item_id, 1.0, ts) with user context atomically: updates item signal ledger, marks item as seen in UserSeenBitmap, increments user->creator interaction weight
db.signal("like", item_id, 1.0, ts) with user context atomically: updates item signal ledger, shifts user preference vector toward item embedding (EMA), increments user->creator interaction weight
db.signal("skip", item_id, 1.0, ts) with user context atomically: updates item signal ledger, shifts user preference vector away from item embedding, decays user->creator interaction weight
db.signal("hide", item_id, 1.0, ts) with user context atomically: writes permanent hide edge, adds item to UserBlockedSet.hidden_items, excludes from all future queries for this user
db.signal("block", user_id, creator_id, ...) atomically: writes permanent block edge, adds creator to UserBlockedSet.blocked_creators, excludes all creator items from all future queries
Preference vector EMA: pref_new = normalize(alpha * item_embedding + (1 - alpha) * pref_old) with configurable alpha (default 0.1)
Interaction weights use the same DecayModel::Exponential infrastructure from m1p4
Hard negatives (hide/block) are WAL-backed and survive crash + replay
Property test: for any sequence of hide/block/signal events, a RETRIEVE query NEVER returns a hidden item or blocked creator's items
All updates visible to the next query (no eventual consistency lag within the process)
Signal dispatch overhead < 50 microseconds beyond the base item signal write

Task Breakdown:

#	Task	Delivers	Complexity
01	User Preference Vector	EMA update, normalization, learning rate config, cold-start initialization, storage codec	L
02	Interaction Weight Ledger	User-to-creator weights using decay infrastructure, update on engagement signals, read API	M
03	Hard Negatives	Hide/block permanent storage, WAL-backed durability, crash-safe replay, bitmap integration	L
04	Atomic Signal Dispatch	`UserSignalContext` wiring, multi-target dispatch, property tests for correctness invariants	L

Depends On: m3p1 (user/creator entities, relationship graph, user-state bitmaps), m1p4 (signal ledger, decay infrastructure), m1p5 (signal write API), m2p1 (vector index for embedding reads) Complexity: XL (4 tasks; Tasks 01 and 03 can parallelize; Task 04 depends on all three) Research Reference: docs/research/tidaldb_signal_ledger.md (three-tier storage, signal dispatch), docs/research/ann_for_tidaldb.md (user preference vector management), thoughts.md Part V.16 (user preference vector as database-managed embedding)

Phase 3: Personalized Ranking Profiles (m3p3)

Delivers: Four personalized ranking profiles (for_you, following, related, notification) that incorporate user context into scoring, plus cold-start handling for new users and new items. The FOR USER @user_id clause is parsed and resolved into a UserContext that loads the user's preference vector, interaction weights, followed creators, and blocked state. The SIMILAR TO @item_id clause is parsed for the related profile. The profile executor uses this context to score candidates with personalization factors.

Acceptance Criteria:

FOR USER @user_id clause parsed by the query parser and resolved into UserContext
SIMILAR TO @item_id clause parsed for related-content retrieval
UserContext loaded from UserStateIndex, InteractionWeightLedger, preference vector
for_you profile: ANN retrieval using user preference vector, scoring = preference_match * engagement_velocity * recency_decay * social_proof, gates on completion_rate, penalizes skip count, 10% exploration budget
following profile: candidates restricted to followed creators' items (via FollowsBitmap), sorted by created_at DESC
related profile: ANN retrieval using source item embedding, re-ranked by user preference match, seen items excluded
notification profile: candidates from followed creators' recent items, scored by relationship_strength * item_quality
Cold-start users (no preference vector): fall back to population-level signals (trending/quality)
Cold-start items (no signals): exploration window -- appear in ~2% of for_you feeds
Exploration budget: ~5 of 50 for_you results from unfollowed creators to prevent filter bubbles
ProfileExecutor extended with score_with_user_context() method
for_you, following, related, notification added to ProfileRegistry as builtins

Task Breakdown:

#	Task	Delivers	Complexity
01	FOR USER Query Context	`UserContext` loader, query parser extensions for `FOR USER` and `SIMILAR TO`, planner integration	M
02	Personalized Profiles	`for_you`, `following`, `related`, `notification` profile implementations, executor extensions	L
03	Cold Start and Exploration	Cold-start user fallback, cold-start item injection, exploration budget enforcement	M

Depends On: m3p2 (feedback loop: preference vectors populated, interaction weights updated, user-state bitmaps maintained), m2p3 (ranking profile engine, ProfileExecutor), m2p5 (query parser, RETRIEVE executor), m2p1 (vector index for ANN retrieval with user preference vector) Complexity: L (3 sequential tasks: 01 -> 02 -> 03) Research Reference: docs/research/ann_for_tidaldb.md (ANN retrieval with user preference vector as query), VISION.md (ranking profiles, personalization factors, cold start), USE_CASES.md (UC-01 For You, UC-04 Following, UC-05 Related, UC-07 Notifications)

Phase 4: User State Filters + M3 UAT Integration Test (m3p4)

Delivers: Composable user-state filters (unseen, unblocked, saved, liked, in_progress) integrated with the existing FilterExpr/FilterResult system from m2p2, plus the end-to-end M3 UAT integration test that proves the full "For You" query works. User-state filters require the FOR USER clause (from m3p3) to resolve user context and are evaluated alongside metadata filters during the RETRIEVE pipeline.

Acceptance Criteria:

FILTER unseen excludes items the user has viewed (via UserSeenBitmap)
FILTER unblocked excludes items from blocked creators and hidden items (via UserBlockedSet)
FILTER saved returns only items the user has saved
FILTER liked returns only items the user has liked
FILTER in_progress returns items with partial completion signal (0.0 < completion < 0.8)
User-state filters compose with metadata filters: FILTER unseen, category:jazz, format:video
User-state filters require FOR USER clause; used without it returns LumenError::Query error with helpful message
FilterExpr extended with Unseen, Unblocked, Saved, Liked, InProgress variants
Filter evaluation produces FilterResult::Predicate for user-state filters (not bitmap)
The RETRIEVE executor intersects user-state predicates with metadata filter bitmaps
Full M3 UAT integration test passes (all 8 UAT scenario steps verified)

Task Breakdown:

#	Task	Delivers	Complexity
01	User State Filters	`Unseen`, `Unblocked`, `Saved`, `Liked`, `InProgress` filter variants, parser recognition, executor integration	M
02	M3 UAT Integration Test	End-to-end integration test covering all 8 UAT scenario steps, property tests for hard-negative invariants	L

Depends On: m3p3 (personalized profiles, FOR USER query context parsing, cold-start handling), m3p2 (feedback loop: seen bitmaps populated, hard negatives enforced), m3p1 (user-state index, relationships), m2p2 (filter engine, FilterExpr, FilterResult) Complexity: M (2 sequential tasks: 01 -> 02) Research Reference: VISION.md (user-state filters as first-class query primitives), USE_CASES.md (Appendix A: user state filters), API.md (FILTER clause syntax)

Phase Dependency DAG

m3p1 (Users/Creators/Relationships)
    |
    v
m3p2 (Feedback Loop)      [Tasks 01 & 03 parallel within phase]
    |
    v
m3p3 (Personalized Profiles)
    |
    v
m3p4 (User State Filters + UAT)

All four phases are strictly sequential. m3p2 cannot begin without the entity and relationship foundation from m3p1. m3p3 cannot begin without the preference vectors and interaction weights from m3p2. m3p4 cannot begin without the FOR USER clause parsing and profile execution from m3p3.

Within m3p2, Tasks 01 (User Preference Vector) and 03 (Hard Negatives) can be built in parallel. Task 04 (Atomic Signal Dispatch) depends on all three preceding tasks.

Deferred to Later Milestones

SEARCH query with personalization -- deferred to M5; M3 proves personalized RETRIEVE works. Adding text search on top of a proven personalization layer is the correct sequence.
Tantivy integration -- deferred to M5; M3 uses ANN retrieval only. Full-text search requires the hybrid fusion layer (RRF) which belongs in M5.
People/creator search (UC-10) -- deferred to M5; requires Tantivy indexing of creator entities and "creators like X" similarity search.
Social graph traversal for trending ("trending among my follows") -- deferred to M6; requires graph query capabilities beyond the simple follows filter delivered in m3p1. M3 uses population-level signals as a proxy for social proof.
Collaborative filtering -- deferred to M6; M3's related profile uses ANN similarity + user preference re-ranking. Full matrix-factorization-style CF (co-engagement signals, "users who liked X also liked Y") adds a new data structure and compute model.
User-created collections/boards (UC-09.4) -- deferred to M6; collections are a new entity type with their own ranking surface. M3 delivers the simpler user-state filters (saved, liked, in_progress).
Live content status tracking (UC-12) -- deferred to M6; requires real-time viewer count signals and schedule awareness.
Notification frequency capping -- deferred to M6; M3's notification profile ranks by recency * relationship_strength without per-creator or per-user caps.
Adaptive preference learning rate -- deferred to M6; M3 uses constant alpha (0.1). Adaptive alpha that decays with update count is a refinement that requires tracking per-user update history.
Reverse relationship index (creator -> followers) -- deferred to M6; M3 only needs forward traversal (user -> creators they follow). Reverse traversal enables social graph queries.

Integration Test

#[test]
fn milestone_3_uat() {
    let db = open_with_users_and_relationships();

    // User 42 likes jazz, follows creators 1-10, blocked creator 77
    let feed = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();
    assert_eq!(feed.len(), 50);
    assert!(feed.iter().all(|r| !user_42_seen.contains(&r.id)));
    assert!(feed.iter().all(|r| r.creator_id != CreatorId(77)));
    assert!(creator_counts(&feed).values().all(|&c| c <= 2));

    // Following feed -- only followed creators, chronological
    let following = db.retrieve(
        "RETRIEVE items FOR USER @42 FILTER relationship:follows \
         USING PROFILE following LIMIT 50"
    ).unwrap();
    assert!(following.iter().all(|r| followed_creators.contains(&r.creator_id)));
    assert!(following.windows(2).all(|w| w[0].created_at >= w[1].created_at));

    // Related content -- similar to item_abc, personalized
    let related = db.retrieve(
        "RETRIEVE items SIMILAR TO @item_abc FOR USER @42 \
         USING PROFILE related FILTER unseen LIMIT 10"
    ).unwrap();
    assert!(related.iter().all(|r| !user_42_seen.contains(&r.id)));

    // Like an item, verify preference shift
    db.signal("like", EntityId(500), UserId(42), 1.0, now()).unwrap();
    let feed2 = db.retrieve(same_for_you_query()).unwrap();
    // Items topically similar to item 500 should rank higher
    let topic_500 = db.read_item(EntityId(500)).unwrap().category;
    let topic_match_before = feed.iter().filter(|r| r.category == topic_500).count();
    let topic_match_after = feed2.iter().filter(|r| r.category == topic_500).count();
    assert!(topic_match_after >= topic_match_before);

    // Hide and block, verify exclusion
    db.signal("hide", EntityId(999), UserId(42), 1.0, now()).unwrap();
    db.signal("block", UserId(42), CreatorId(77), 1.0, now()).unwrap();
    let feed3 = db.retrieve(same_for_you_query()).unwrap();
    assert!(feed3.iter().all(|r| r.id != EntityId(999)));
    assert!(feed3.iter().all(|r| r.creator_id != CreatorId(77)));

    // Verify cold-start user gets population-level results
    let cold_feed = db.retrieve(
        "RETRIEVE items FOR USER @new_user USING PROFILE for_you \
         FILTER unseen, unblocked LIMIT 50"
    ).unwrap();
    assert_eq!(cold_feed.len(), 50); // falls back to trending/quality

    // Verify crash recovery preserves hard negatives
    db.shutdown().unwrap();
    let db2 = TidalDb::reopen(same_config()).unwrap();
    let feed4 = db2.retrieve(same_for_you_query_user_42()).unwrap();
    assert!(feed4.iter().all(|r| r.id != EntityId(999)));
    assert!(feed4.iter().all(|r| r.creator_id != CreatorId(77)));
}

Done When

The full "For You" query works: RETRIEVE items FOR USER @user_id USING PROFILE for_you FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50 returns personalized, diversity-constrained results that reflect the user's engagement history, exclude hidden items and blocked creators, include an exploration budget, handle cold-start users and items, and update in response to new signal events within 100ms. The following, related, and notification profiles also work correctly. Hard negatives survive crash and restart. All 8 UAT scenario steps pass.

Milestone 4: Agent Memory -- "Agents own the personalization substrate"

Milestone Thesis

M3 proved the feedback loop closes inside the database for direct user interactions. M4 proves that agents -- the dominant interaction mediator -- can create scoped sessions, write structured feedback signals with aggressive decay, enforce declarative policy on the write path, and query live session context as part of ranking, all within the same embeddable runtime. A developer wiring an LLM agent to tidalDB gets instant-on session memory without standing up Redis, a feature store, or a policy middleware.

Enables

Agent-mediated personalization -- agents ground LLM responses by reading the user's preference state plus the session's accumulated reward and preference hints, then write structured feedback that immediately shapes the next ranking pass.
RLHF-style reward loops -- reward signals with minute-scale decay let agents record how well a recommendation served the user; the next RETRIEVE incorporates reward velocity into scoring.
Conversational memory -- multi-turn tool usage and preference hints are short-lived signals scoped to a session; they influence ranking for the session's lifetime and are archived on close.
Policy-safe agent integration -- the schema declares which signal types an agent may write per session; the database enforces this, not application middleware. Disallowed writes are rejected with audit trail.
Partial UC-01 enhancement -- "For You" queries that incorporate session context (e.g., "more jazz today") produce results shaped by both long-lived user preferences and ephemeral session preferences.

UAT Scenario

Given:
  A tidalDB instance with:
    - Schema defining session signal types:
      * "preference_hint" with linear decay (lifetime=30m), target=Item
      * "reward" with exponential decay (half_life=10m), windows=[5m, 15m], velocity=true
      * "tool_use" with linear decay (lifetime=1h), target=Item
    - An AgentPolicy "planner_policy" in schema:
        allowed_signals: [preference_hint, reward]
        denied_signals: [tool_use]
        max_session_duration: 2h
        max_signals_per_session: 1000
    - 100 items with embeddings and metadata (category, format, creator_id)
    - 10,000 signal events establishing item signal state
    - User @42 with preference vector and engagement history
    - Profiles: for_you (updated to accept optional SessionContext)

When:
  1. Agent starts a session:
     let session = db.start_session(user_id: 42, agent_id: "planner",
         policy: "planner_policy", metadata: {"tool": "planner"})?;
     // Returns SessionHandle with SessionId

  2. Agent writes a preference_hint signal:
     db.session_signal(&session, "preference_hint", EntityId(0), 1.0,
         Timestamp::now(), Some("more jazz today".into()))?;
     // Accepted: preference_hint is in allowed_signals

  3. Agent writes a reward signal after delivering an answer:
     db.session_signal(&session, "reward", EntityId(42), 0.8,
         Timestamp::now(), None)?;
     // Accepted: reward is in allowed_signals

  4. Agent queries with session context:
     let query = RetrieveBuilder::new(EntityKind::Item, ProfileRef::new("for_you"))
         .for_user(42)
         .for_session(session.id())
         .limit(10)
         .build()?;
     let results = db.retrieve(&query)?;
     // Returns ranked items with session_snapshot attached

  5. Agent reads session snapshot:
     let snapshot = db.session_snapshot(session.id())?;
     // Returns: signal counts, reward velocity, duration, metadata

  6. Agent attempts a disallowed write:
     let err = db.session_signal(&session, "tool_use", EntityId(0), 1.0,
         Timestamp::now(), None);
     // Returns Err(LumenError::PolicyViolation { signal: "tool_use",
     //   policy: "planner_policy", reason: "signal type not in allowed list" })

  7. Agent reads audit log:
     let audit = db.session_audit(session.id())?;
     // Contains: 2 accepted writes, 1 rejected write with reason

  8. Agent closes the session:
     let summary = db.close_session(session)?;
     // Returns SessionSummary: duration, signal_counts, rejections, archived

  9. After closure, query the archived snapshot:
     let archived = db.session_snapshot(session_id)?;
     // Returns the frozen final snapshot (signals no longer decay)

  10. Verify session isolation -- a second session for the same user
      does not see session 1's signals:
      let session2 = db.start_session(user_id: 42, agent_id: "planner",
          policy: "planner_policy", metadata: {})?;
      let snap2 = db.session_snapshot(session2.id())?;
      // snap2 has zero signals -- session 1's data does not leak

Then:
  - Step 1: start_session returns SessionHandle; session appears in db.active_sessions()
  - Step 2: preference_hint recorded; session signal count = 1
  - Step 3: reward recorded; session signal count = 2; reward velocity > 0
  - Step 4: Results shaped by session context -- items matching "jazz" preference
    hint rank higher than without session context; results include a
    session_snapshot field with reward_velocity and hint summary
  - Step 5: Snapshot contains { signals_written: 2, signals_rejected: 0,
    reward_velocity_5m: >0.0, duration_ms: <5000, metadata: {"tool":"planner"} }
  - Step 6: Error returned with LumenError::PolicyViolation; write not persisted
  - Step 7: Audit log has 3 entries (2 accepted, 1 rejected with reason)
  - Step 8: Session marked closed; summary.duration_ms > 0; summary.signals_written == 2;
    summary.rejections == 1
  - Step 9: Archived snapshot readable; signal values frozen at close time (no further decay)
  - Step 10: Session isolation proven -- zero signal leakage between sessions
  - Performance: session_signal write < 200 microseconds (including WAL + policy check);
    session_snapshot read < 50 microseconds; RETRIEVE with session context adds < 5ms
    overhead vs without

Phases

Phase 1: Session Schema and Lifecycle (m4p1)

Delivers: SessionId, AgentId, AgentPolicy, and SessionHandle types in the schema and entities modules. Schema-level session_policy() for declaring per-agent allowed/denied signal lists, duration limits, and signal count caps. Session lifecycle APIs: start_session, close_session, active_sessions. WAL entries tagged with session_id for crash recovery of active sessions. Closed sessions archived to storage as frozen snapshots.

Acceptance Criteria:

SessionId is a u64 newtype with Display, Hash, Eq, Ord, monotonically assigned via AtomicU64 counter
AgentId is a String newtype (max 64 chars, validated at construction: [a-z0-9_-]+)
AgentPolicy struct declared in schema: allowed_signals: Vec<String>, denied_signals: Vec<String>, max_session_duration: Duration, max_signals_per_session: u32; validated at schema build time (no signal name in both allowed and denied; all signal names must exist in schema)
SessionHandle is a move-only type containing SessionId, user_id: u64, agent_id: AgentId, policy_name: String, start timestamp, and a closed: AtomicBool flag; SessionHandle is Send + Sync
SchemaBuilder::session_policy(name, AgentPolicy) registers policies at schema build time; duplicate names rejected with SchemaError
db.start_session(user_id, agent_id, policy_name, metadata) -> Result<SessionHandle> creates a new session: validates policy exists, assigns SessionId, stores session metadata in a DashMap<SessionId, SessionState>, logs session-start event to WAL
db.close_session(handle) -> Result<SessionSummary> takes ownership of SessionHandle (move semantics prevent use-after-close), freezes signal state, computes summary (duration, signal counts, rejection count), archives to storage under Tag::Session key prefix, logs session-close event to WAL
db.active_sessions() -> Vec<SessionInfo> returns list of open sessions with id, user_id, agent_id, start_time, signal_count
Session state survives crash: on WAL replay, session-start events without matching session-close events are restored as active sessions; session-close events mark sessions as archived
SessionState contains: DashMap<SignalTypeId, SessionSignalState> for per-signal-type accumulators within the session
Closed SessionHandle cannot be used for further writes (compile-time enforcement via move semantics; runtime check via closed flag as defense-in-depth)
Session metadata (HashMap<String, String>) persisted to storage and retrievable after close
max_session_duration enforced: session_signal on a session that has exceeded its duration returns LumenError::SessionExpired
Tag::Session (0x07) added to the key encoding enum for session archive storage

Task Breakdown:

#	Task	Delivers	Complexity
01	Session Types	`SessionId`, `AgentId`, `AgentPolicy`, `SessionHandle`, `SessionState`, `SessionInfo`, `SessionSummary` types with validation, Display, Hash, Eq	M
02	Schema Integration	`SchemaBuilder::session_policy()`, validation (signal name cross-check, no duplicates), `Schema::policy(name) -> Option<&AgentPolicy>`	S
03	Session Lifecycle	`start_session`, `close_session`, `active_sessions`, `SessionState` DashMap, move-only handle, duration enforcement, archive to storage	L
04	WAL Session Events	`WalCommand::SessionStart` and `WalCommand::SessionClose` variants, WAL replay restores active sessions, closed sessions restored as archived	M

Depends On: m1p1 (type system), m1p2 (WAL), m1p3 (storage, key encoding, Tag enum), m3p1 (user entities) Complexity: L Research Reference: VISION.md (Sessions / Agent Context), thoughts.md Part V.5 (quarantine-first durability), docs/research/tidaldb_signal_ledger.md (running-score formula reuse for session signals)

Phase 2: Session Signal Engine (m4p2)

Delivers: session_signal() API that writes session-scoped signal events with aggressive decay. Session signals share the existing SignalLedger running-score infrastructure but are keyed by (SessionId, SignalTypeId) instead of (EntityId, SignalTypeId). Preference hints are stored as typed annotations on session signal entries. Session-scoped windowed counts and velocity available via session_snapshot().

Acceptance Criteria:

db.session_signal(&SessionHandle, signal_type, entity_id, weight, timestamp, Option<annotation>) -> Result<()> writes a session-scoped signal: validates session is open, updates SessionSignalState running decay score, updates session windowed counters, increments session signal count, logs to WAL with session_id tag
SessionSignalState uses the same HotSignalState running-score formula (S(t) = S(t_prev) * exp(-lambda * dt) + w) -- reuse, not rewrite
Session windowed counters use BucketedCounter with minute-level granularity (appropriate for session timescales of minutes to hours)
db.session_snapshot(session_id) -> Result<SessionSnapshot> returns: signal type -> (decay_score, windowed_counts, velocity), total signals written, total rejections, duration, metadata, annotations (preference hints)
Preference hint annotations stored as Vec<(Timestamp, String)> on the session state; capped at 100 per session to bound memory
Session signals do NOT update the global item signal ledger -- they are session-scoped only (isolation)
Session signals do NOT update user preference vectors or interaction weights -- session influence is read-time only (via Phase 4)
For active sessions, decay scores reflect current wall-clock time (lazy decay on read, same as HotSignalState)
For archived sessions, signal values are frozen at close time (no further decay applied on read)
WAL replay of session signals restores SessionSignalState accumulators correctly (property test: replay produces identical state to uninterrupted execution for 1000 random session signal sequences)
session_signal latency < 200 microseconds including WAL write (benchmarked)
session_snapshot read latency < 50 microseconds (benchmarked)
50,000 session signals per second throughput (benchmarked)

Task Breakdown:

#	Task	Delivers	Complexity
01	SessionSignalState	Running decay score, windowed counters, annotation storage, freeze-on-close semantics, reusing `HotSignalState` internals	M
02	session_signal API	Write path: validation, WAL event, state update, signal count tracking, annotation capture	M
03	session_snapshot API	Read path: snapshot assembly, decay-correct reads for active sessions, frozen reads for archived, preference hint list	S
04	WAL Integration & Replay	`WalCommand::SessionSignal` variant, replay logic, property test for replay correctness	M

Depends On: m4p1 (session types, lifecycle, WAL events), m1p4 (signal ledger infrastructure -- HotSignalState, BucketedCounter) Complexity: L Research Reference: docs/research/tidaldb_signal_ledger.md (running-score formula, BucketedCounter, EntityState struct), VISION.md (session signals with aggressive decay)

Phase 3: Policy Enforcement and Audit (m4p3)

Delivers: Declarative policy enforcement on the session signal write path. Policies declared in schema (m4p1) are enforced at write time: signal type allow/deny lists, per-session signal count caps, and session duration limits. Every write attempt (accepted or rejected) is recorded in a per-session audit log. Rejected writes return structured LumenError::PolicyViolation errors with the policy name, signal type, and human-readable reason.

Acceptance Criteria:

session_signal() checks the session's policy before writing: if signal type is in denied_signals, or is not in allowed_signals (when allowed_signals is non-empty), write is rejected with LumenError::PolicyViolation
LumenError::PolicyViolation variant added: contains signal_type: String, policy_name: String, reason: String
Per-session signal count cap enforced: when signals_written >= max_signals_per_session, further writes return LumenError::PolicyViolation with reason "session signal limit exceeded (N/max)"
Session duration limit enforced: when now - session_start > max_session_duration, further writes return LumenError::SessionExpired
SessionAuditLog stored per session: Vec<AuditEntry> where AuditEntry = { timestamp, signal_type, outcome: Accepted | Rejected(reason) }
db.session_audit(session_id) -> Result<Vec<AuditEntry>> returns the audit log for a session (active or archived)
Audit log capped at 10,000 entries per session to bound memory; oldest entries evicted with a "truncated" marker
Audit log persisted with session archive on close (retrievable after close)
Policy evaluation adds < 1 microsecond per signal write (benchmarked -- it is a HashMap lookup, not a hot path concern)
Property test: for any sequence of allowed and denied signal writes, the audit log exactly matches the write outcomes and no denied signal modifies session state

Task Breakdown:

#	Task	Delivers	Complexity
01	Policy Evaluator	`PolicyEvaluator::check(policy, signal_type, session_state) -> Result<(), PolicyViolation>`, signal allow/deny, count cap, duration check	S
02	Audit Log	`SessionAuditLog`, `AuditEntry`, append-on-write, cap enforcement, persist with archive	S
03	Write Path Integration	Wire `PolicyEvaluator` into `session_signal()`, wire audit log recording, `db.session_audit()` API, property tests	M

Depends On: m4p1 (session types, AgentPolicy in schema), m4p2 (session signal write path to intercept) Complexity: M Research Reference: VISION.md ("policy guards live in schema, not ad-hoc middleware", "agents can only read/write within their sessions")

Phase 4: Session-Aware Ranking and M4 UAT (m4p4)

Delivers: FOR SESSION @session_id clause in the RETRIEVE query that loads session context and blends it into ranking. Session preference hints boost items matching the hint content. Session reward velocity adjusts the scoring weight. Query results include a session_snapshot alongside ranked items. End-to-end M4 UAT integration test proving the full agent workflow: start session, write signals with policy, query with session context, verify session isolation, close and archive.

Acceptance Criteria:

RetrieveBuilder::for_session(session_id) added; Retrieve struct gains for_session: Option<SessionId> field
SessionContext struct loaded when for_session is present: contains preference hints (parsed into keyword boost hints), reward velocity, session metadata
for_you profile (and any personalized profile) accepts optional SessionContext: scoring formula adds a session boost factor: session_boost = hint_match_score * 0.3 + reward_velocity_normalized * 0.2
hint_match_score computed as: for each preference hint string, extract keywords; if item metadata (category, tags, title) contains any keyword, score = 1.0 per match, normalized to [0, 1]; this is a simple keyword match (semantic session hints deferred to M5)
reward_velocity_normalized = reward_velocity / (reward_velocity + 1.0) -- sigmoid normalization to [0, 1)
Session boost is additive to the existing profile score (does not replace personalization, layers on top)
Results struct gains optional session_snapshot: Option<SessionSnapshot> field, populated when for_session is present
Session isolation: FOR SESSION @S1 uses only S1's signals; S2's signals are invisible; no global state pollution
When for_session references a closed session, archived snapshot is used (read-only, no decay applied)
When for_session references a non-existent session, LumenError::Query("session not found") returned
RETRIEVE with session context adds < 5ms overhead vs without (benchmarked at 10K items)
Full M4 UAT integration test passes covering all 10 UAT scenario steps

Task Breakdown:

#	Task	Delivers	Complexity
01	FOR SESSION Query Context	`RetrieveBuilder::for_session()`, `Retrieve.for_session` field, `SessionContext` loader from active/archived session state	M
02	Session-Aware Scoring	`ProfileExecutor` extension: `score_with_session_context()`, keyword hint matching, reward velocity normalization, additive boost	M
03	Session Snapshot in Results	`Results.session_snapshot` field, populated by executor when session context present	S
04	M4 UAT Integration Test	End-to-end test covering all 10 UAT steps: lifecycle, signals, policy, ranking, isolation, archive, audit	L

Depends On: m4p3 (policy enforcement wired into write path), m4p2 (session signals and snapshot readable), m4p1 (session lifecycle), m3p3 (personalized profile executor, UserContext), m2p5 (RETRIEVE executor pipeline) Complexity: L Research Reference: VISION.md ("agents ground themselves by reading live session context, write structured signals with decay budgets, and immediately query those updates on the next turn"), USE_CASES.md UC-01 (For You with session overlay)

Phase Dependency DAG

m4p1 (Session Schema & Lifecycle)
    |
    v
m4p2 (Session Signal Engine)
    |
    v
m4p3 (Policy Enforcement & Audit)
    |
    v
m4p4 (Session-Aware Ranking + UAT)

All four phases are strictly sequential. m4p2 cannot begin without the session types and lifecycle from m4p1. m4p3 cannot begin without the session signal write path from m4p2. m4p4 cannot begin without policy enforcement from m4p3. Within each phase, certain tasks can parallelize (e.g., m4p2 tasks 01 and 03 overlap; m4p3 tasks 01 and 02 are independent).

Deferred to Later Milestones

Session forking and merging -- deferred because forking introduces DAG-shaped session graphs with merge conflict semantics; this belongs after M8 (Distributed Fabric) when the CRDT model can inform fork/merge design. Planned for M9/M10.
Multi-agent sessions (multiple agents sharing one session) -- deferred because shared-session policy requires capability intersection and concurrent write arbitration; M4 proves single-agent sessions first. Planned for M10.
Cross-session aggregation ("what did this user's agents learn across all sessions this week?") -- deferred because it requires a materialization layer rolling up closed sessions into user-level signal state. Planned for M6.
Semantic hint matching (preference hints interpreted via embedding similarity) -- deferred because it requires Tantivy integration (M5) for proper text analysis; M4 uses simple keyword matching as a correct baseline. Planned for M5.
Session signal influence on global user preference vector -- deferred because the correct boundary between ephemeral session boost and permanent preference update requires careful UX design; M4 keeps session influence strictly read-time. Planned for M6.
RLHF training data export -- deferred because export formats and training pipelines are application-specific; tidalDB stores the signals, external tools read them. Planned for M7.
Per-agent QPS rate limiting -- deferred because the per-session signal count cap provides coarse-grained protection; fine-grained QPS limiting with token-bucket belongs in M7 (Production Hardening). Planned for M7.
Session TTL auto-cleanup (background sweeper for abandoned sessions) -- deferred; max_session_duration enforcement on writes is sufficient for M4. Planned for M7.
User revocation of agent-contributed signals -- deferred because revocation requires retroactive signal removal with re-materialization, a core M10 (Governance & Agent Rights) concern. Planned for M10.

Integration Test

#[test]
fn milestone_4_uat() {
    let mut schema_builder = SchemaBuilder::new();

    // Session signal types with aggressive decay.
    let _ = schema_builder
        .signal("preference_hint", EntityKind::Item,
            DecaySpec::Linear { lifetime: Duration::from_secs(30 * 60) })
        .windows(&[Window::OneHour])
        .velocity(false)
        .add();
    let _ = schema_builder
        .signal("reward", EntityKind::Item,
            DecaySpec::Exponential { half_life: Duration::from_secs(10 * 60) })
        .windows(&[Window::OneHour])
        .velocity(true)
        .add();
    let _ = schema_builder
        .signal("tool_use", EntityKind::Item,
            DecaySpec::Linear { lifetime: Duration::from_secs(3600) })
        .windows(&[Window::OneHour])
        .velocity(false)
        .add();

    // Standard signals for item ranking.
    for sig in &["view", "like", "skip"] {
        let _ = schema_builder
            .signal(sig, EntityKind::Item,
                DecaySpec::Exponential { half_life: Duration::from_secs(7 * 24 * 3600) })
            .windows(&[Window::OneHour, Window::TwentyFourHours])
            .velocity(true)
            .add();
    }

    // Policy: planner can write preference_hint and reward, not tool_use.
    schema_builder.session_policy("planner_policy", AgentPolicy {
        allowed_signals: vec!["preference_hint".into(), "reward".into()],
        denied_signals: vec!["tool_use".into()],
        max_session_duration: Duration::from_secs(2 * 3600),
        max_signals_per_session: 1000,
    }).unwrap();

    let schema = schema_builder.build().unwrap();
    let db = TidalDb::builder().ephemeral().with_schema(schema).open().unwrap();

    // Write items: some jazz, some rock.
    for i in 1..=50u64 {
        let mut meta = HashMap::new();
        let category = if i <= 25 { "jazz" } else { "rock" };
        meta.insert("category".into(), category.into());
        meta.insert("format".into(), "video".into());
        meta.insert("creator_id".into(), (i % 10).to_string());
        db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
        db.signal("view", EntityId::new(i), 1.0, Timestamp::now()).unwrap();
    }

    // Write user 42 with preference history.
    let mut user_meta = HashMap::new();
    user_meta.insert("name".into(), "alice".into());
    db.write_user(EntityId::new(42), &user_meta).unwrap();

    // Step 1: Start session.
    let mut session_meta = HashMap::new();
    session_meta.insert("tool".into(), "planner".into());
    let session = db.start_session(42, "planner", "planner_policy", session_meta)
        .unwrap();
    let session_id = session.id();
    assert!(db.active_sessions().iter().any(|s| s.id == session_id));

    // Step 2: Write preference_hint.
    db.session_signal(&session, "preference_hint", EntityId::new(0), 1.0,
        Timestamp::now(), Some("more jazz today".into())).unwrap();

    // Step 3: Write reward.
    db.session_signal(&session, "reward", EntityId::new(42), 0.8,
        Timestamp::now(), None).unwrap();

    // Step 4: Query with session context.
    let query = RetrieveBuilder::new(EntityKind::Item, ProfileRef::new("for_you"))
        .for_user(42)
        .for_session(session_id)
        .limit(10)
        .build()
        .unwrap();
    let results = db.retrieve(&query).unwrap();
    assert!(!results.items.is_empty());
    assert!(results.session_snapshot.is_some());

    // Step 5: Read session snapshot.
    let snapshot = db.session_snapshot(session_id).unwrap();
    assert_eq!(snapshot.signals_written, 2);
    assert_eq!(snapshot.signals_rejected, 0);
    assert!(snapshot.duration_ms > 0);
    assert_eq!(snapshot.metadata.get("tool").unwrap(), "planner");

    // Step 6: Disallowed write.
    let err = db.session_signal(&session, "tool_use", EntityId::new(0), 1.0,
        Timestamp::now(), None);
    assert!(err.is_err());
    match err.unwrap_err() {
        LumenError::PolicyViolation { signal_type, policy_name, .. } => {
            assert_eq!(signal_type, "tool_use");
            assert_eq!(policy_name, "planner_policy");
        }
        other => panic!("expected PolicyViolation, got: {other:?}"),
    }

    // Step 7: Audit log.
    let audit = db.session_audit(session_id).unwrap();
    let accepted = audit.iter().filter(|e| e.accepted).count();
    let rejected = audit.iter().filter(|e| !e.accepted).count();
    assert_eq!(accepted, 2);
    assert_eq!(rejected, 1);

    // Step 8: Close session.
    let summary = db.close_session(session).unwrap();
    assert!(summary.duration_ms > 0);
    assert_eq!(summary.signals_written, 2);
    assert_eq!(summary.rejections, 1);
    assert!(!db.active_sessions().iter().any(|s| s.id == session_id));

    // Step 9: Archived snapshot readable.
    let archived = db.session_snapshot(session_id).unwrap();
    assert_eq!(archived.signals_written, 2);

    // Step 10: Session isolation.
    let session2 = db.start_session(42, "planner", "planner_policy", HashMap::new())
        .unwrap();
    let snap2 = db.session_snapshot(session2.id()).unwrap();
    assert_eq!(snap2.signals_written, 0, "session 2 must not see session 1 signals");

    db.close_session(session2).unwrap();
    db.close().unwrap();
}

Done When

A developer can embed tidalDB alongside an agent runtime and: (1) declare agent policies in schema specifying allowed/denied signal types, session duration limits, and signal count caps; (2) start sessions bound to a user and an agent; (3) write session-scoped signals (preference hints, rewards) that are accepted or rejected by policy with every attempt recorded in the audit log; (4) execute RETRIEVE items FOR USER @user_id FOR SESSION @session_id USING PROFILE for_you LIMIT 10 and receive ranked items incorporating session preference hints and reward velocity as an additive boost; (5) read session snapshots with signal state, velocity, and preference hints; (6) close sessions and retrieve archived snapshots with frozen signal values; (7) verify complete session isolation -- zero signal leakage between sessions. Policy violations return structured LumenError::PolicyViolation errors. Session signal writes complete in < 200 microseconds. RETRIEVE with session context adds < 5ms overhead. All 10 UAT scenario steps pass in the integration test.

Milestone 5: Hybrid Search -- "Text + semantic + signals in one query"

Milestone Thesis

M4 proved agents can write scoped signals and query session context within a personalized ranking pipeline. M5 proves that text search and vector retrieval are the same system. A developer can execute SEARCH items QUERY "rust tutorial beginner" VECTOR query_vector FOR USER @user_id USING PROFILE search LIMIT 20 and get results that combine BM25 text relevance, semantic similarity, and user personalization in a single ranked list -- with the same signal freshness, diversity enforcement, and feedback loop guarantees that RETRIEVE already provides.

Enables

UC-02 (Search) -- Full: keyword search, exact phrase, boolean operators, field-scoped, hybrid BM25 + semantic, personalized re-ranking, search click feedback loop
UC-10 (People/Creator Search) -- Full: creator discovery by name/topic, "creators like X" via embedding similarity, creator attribute filters
UC-11 (Visual/Semantic Search) -- Core: vector-only search for image similarity, semantic intent queries ("something relaxing to watch")

UAT Scenario

Given:
  A tidalDB instance with:
    - 10,000 items with text fields (title, description, tags) indexed for full-text search
    - All items have 1536-dim embeddings
    - 500 users with engagement history and preference vectors
    - 200 creators with name, handle, and aggregated embeddings
    - Signal types: view (7d decay), like (14d decay), skip (1d decay),
      search_click (3d decay, with query context)
    - Profiles: "search" (text_weight:0.6, vector_weight:0.4, RRF k=60,
      personalization overlay, completion gate > 0.3, diversity max_per_creator:2)

When:
  1. SEARCH items QUERY "rust tutorial beginner" VECTOR [query_embedding]
     FOR USER @user_42 USING PROFILE search DIVERSITY max_per_creator:2 LIMIT 20
  2. SEARCH items QUERY "jazz piano" FOR USER @user_42
     USING PROFILE search FILTER duration:short, format:video LIMIT 20
  3. SEARCH items QUERY "\"exact phrase match\"" USING PROFILE search LIMIT 10
  4. SEARCH items QUERY "jazz -beginner" USING PROFILE search LIMIT 10
  5. SEARCH creators QUERY "jazz" LIMIT 10
  6. SEARCH creators SIMILAR TO @creator_xyz LIMIT 10
  7. SIGNAL search_click item:@item_abc user:@user_42
     context:{ query: "rust tutorial beginner", rank_at_click: 3 }
  8. Re-execute search #1

Then:
  - Step 1: Results combine BM25 + semantic similarity via RRF;
    personalization re-ranks within relevant set; user_42 (a beginner)
    sees beginner content elevated; max 2 per creator enforced
  - Step 2: Text-only search (no vector), filtered by duration and format;
    only short videos returned
  - Step 3: Exact phrase match -- only items containing "exact phrase match"
    as a contiguous sequence
  - Step 4: Boolean exclusion -- no items matching "beginner" appear in results
  - Step 5: Creators returned by name/topic match, ordered by engagement rate
  - Step 6: Creators semantically similar to @creator_xyz by embedding distance
  - Step 7: Signal recorded with query context and rank position;
    item and user-topic affinity updated
  - Step 8: Clicked result @item_abc may rank higher due to search_click signal;
    signal written < 100ms ago is reflected
  - Performance: SEARCH < 50ms at 10K items

Phases

Phase 1: Tantivy Integration (m5p1)

Delivers: Tantivy embedded as a derived index for full-text search. DB-primary consistency pattern: entity store is source of truth, Tantivy is a materialized view updated via an outbox sequence. BM25 scoring exposed via custom Collector and the Weight/Scorer seek pattern. Schema text fields (title, description, tags) automatically indexed. Crash recovery replays from the last committed sequence number stored in Tantivy's commit payload.

Acceptance Criteria:

TextIndex struct wraps Tantivy Index, IndexWriter (behind Mutex), and IndexReader with auto-reload
Tantivy schema created from tidalDB schema text field definitions: text fields get full-text tokenization with Tantivy's default tokenizer; keyword fields get raw (untokenized) indexing for exact match
TextIndexWriter::index_item(entity_id, metadata) adds or updates a document in Tantivy; delete_item(entity_id) removes via delete_term on the entity_id fast field
Background indexer: TextIndexSyncer reads entity store writes (via WAL sequence tracking) and feeds Tantivy writer; commit interval configurable (default: every 1000 documents or 2 seconds, whichever comes first)
Each Tantivy commit() stores the last-processed WAL sequence number in the commit payload via set_payload(); on crash recovery, replay from that sequence number
Custom AllScoresCollector implementing Tantivy's Collector trait returns all matching (EntityId, f32) pairs with BM25 scores; requires_scoring() returns true
ScoredCandidateCollector implementing Tantivy's Collector trait accepts a pre-sorted candidate set and returns BM25 scores for only those candidates via DocSet::seek() (for scoring ANN results)
External EntityId -> DocAddress mapping maintained via a fast field (entity_id_field) on every Tantivy document; mapping rebuilt on IndexReader::reload() after segment merges
Boolean query parsing: AND, OR, NOT operators; exact phrase ("..."); field-scoped (title:jazz, tag:tutorial); exclusion (-beginner); wildcard prefix (pian*)
Index rebuild from entity store: text_index.rebuild_from(storage) scans all items and rebuilds the Tantivy index; completes in < 10 minutes at 10K items
BM25 query latency < 10ms at 10K documents (Criterion benchmarked)
Tantivy IndexWriter heap budget set to 50MB (conservative for embedded use)
LogMergePolicy configured with defaults; wait_merging_threads() called on shutdown
TextIndex is Send + Sync -- safe to share across threads behind Arc

Task Breakdown:

#	Task	Delivers	Complexity
01	TextIndex Core	`TextIndex` struct, Tantivy schema generation from tidalDB schema, `IndexWriter`/`IndexReader` lifecycle, `entity_id` fast field, `TextIndex::open()` and `TextIndex::close()`	L
02	Document Write/Delete	`index_item()`, `delete_item()`, field mapping (text -> tokenized, keyword -> raw), metadata-to-document conversion	M
03	Background Syncer	`TextIndexSyncer` reads WAL sequence, feeds writer, configurable commit interval, `set_payload()` with sequence number, crash recovery replay	L
04	BM25 Scoring Collectors	`AllScoresCollector` for full scoring, `ScoredCandidateCollector` for seek-based candidate scoring, entity ID resolution from fast field	M
05	Boolean Query Parsing	AND/OR/NOT, exact phrase, field-scoped, exclusion, wildcard prefix; wraps Tantivy's `QueryParser` with custom syntax extensions	M

Depends On: m1p3 (storage engine, key encoding), m1p5 (entity write API, WAL sequence), m2p2 (metadata fields used for field-scoped queries) Complexity: XL (5 tasks; Tasks 01-02 sequential, then 03/04/05 can parallelize after 02 completes) Research Reference: docs/research/tantivy.md (Collector API, consistency pattern, seek scoring, commit model, single-writer lock, segment merge)

Phase 2: Hybrid Fusion (RRF) (m5p2)

Delivers: Reciprocal Rank Fusion combining BM25 ranked lists with ANN ranked lists into a single scored result set. The starting point is RRF with k=60; the architecture supports upgrading to tuned linear combination when relevance labels exist. Handles the three retrieval modes: text-only, vector-only, and hybrid.

Acceptance Criteria:

HybridFusion struct with fuse(bm25_results: &[(EntityId, f32)], ann_results: &[(EntityId, f32)], k: u32) -> Vec<(EntityId, f64)> method
RRF formula: score(d) = 1.0 / (k + rank_bm25(d)) + 1.0 / (k + rank_ann(d)) where k = 60 by default
Documents appearing in only one list contribute only their single-list term (the other term is zero)
Results sorted by fused score descending
RRF results are passed to the existing ProfileExecutor for personalization re-ranking (user preference overlay, signal boosts, quality gates)
When only text query is provided (no vector), pure BM25 ranking passed directly to profile executor
When only vector is provided (no text), pure ANN ranking passed directly to profile executor
k parameter configurable per profile or per query (default 60)
Fusion adds < 1ms to query time for 1000 candidates from each list (Criterion benchmarked)
Property test: for any pair of ranked lists, RRF output contains the union of both input document sets with correct score computation to 6 decimal places

Task Breakdown:

#	Task	Delivers	Complexity
01	RRF Implementation	`HybridFusion::fuse()`, rank-to-score conversion, union merge of ranked lists, configurable `k`	S
02	Retrieval Mode Router	Logic to select text-only, vector-only, or hybrid based on query contents; routes to BM25, ANN, or fusion accordingly	S

Depends On: m5p1 (BM25 scored results), m2p1 (ANN scored results) Complexity: S (2 small tasks; Task 02 depends on 01) Research Reference: docs/research/tantivy.md (RRF section, Cormack et al. SIGIR 2009, k=60 insensitivity, production system approaches)

Phase 3: SEARCH Query Parser and Executor (m5p3)

Delivers: The SEARCH query operation -- parser, planner, and executor -- that orchestrates text retrieval, semantic retrieval, hybrid fusion, personalization, filtering, diversity, and result assembly. Reuses the existing filter engine (m2p2), diversity enforcement (m2p4), and profile executor (m2p3/m3p3) from prior milestones. The search_click signal type is integrated for feedback loop closure.

Acceptance Criteria:

Search struct with fields: entity_kind, query_text: Option<String>, query_vector: Option<Vec<f32>>, for_user: Option<u64>, for_session: Option<SessionId>, profile: ProfileRef, filters: Vec<FilterExpr>, diversity: Option<DiversityConstraints>, limit: u32
SearchBuilder with fluent API: .query("text"), .vector(&[f32]), .for_user(id), .for_session(id), .using_profile("search"), .filter(expr), .diversity(constraints), .limit(n), .build()
db.search(&Search) -> Result<SearchResults> executes the full pipeline
Search executor pipeline: (1) parse query text into Tantivy query, (2) if vector present, execute ANN retrieval, (3) if both, fuse via RRF, (4) load user context if for_user present, (5) apply profile scoring (personalization, signal boosts, quality gates), (6) apply metadata filters, (7) apply diversity enforcement, (8) assemble results with scores
SearchResults struct contains: items: Vec<SearchResultItem>, next_cursor: Option<Cursor>, total_candidates: u64
SearchResultItem contains: id: EntityId, score: f64, bm25_score: Option<f32>, semantic_score: Option<f32>, signals: SignalSnapshot
Query text parsing handles: bare terms (jazz piano), exact phrase ("jazz piano"), boolean operators (AND, OR, NOT, -), field-scoped (title:jazz, tag:tutorial, creator:handle), wildcard prefix (pian*), hashtag (#jazz)
search_click signal type recognized: db.signal("search_click", item_id, 1.0, ts) with context containing query and rank_at_click fields
Search profile search registered as a builtin: text relevance as floor, personalization adjustment, completion gate, diversity
Session context (FOR SESSION) integrates with search the same way it does with RETRIEVE (preference hint keyword boost, reward velocity factor)
End-to-end SEARCH < 50ms at 10K items (Criterion benchmarked)
Full M5 UAT steps 1-4 and 7-8 pass as integration test assertions

Task Breakdown:

#	Task	Delivers	Complexity
01	Search Types and Builder	`Search`, `SearchBuilder`, `SearchResults`, `SearchResultItem` structs with validation	M
02	Search Executor Pipeline	`SearchExecutor` orchestrating BM25 retrieval, ANN retrieval, fusion, profile scoring, filtering, diversity, result assembly	L
03	Search Profile Builtin	`search` profile definition registered in `ProfileRegistry`, text relevance floor, personalization overlay, configurable RRF k	S
04	search_click Signal Integration	`search_click` signal type with context fields (query, rank_at_click), feedback loop wiring into user-topic affinity	S

Depends On: m5p1 (Tantivy integration, BM25 queries), m5p2 (hybrid fusion), m2p2 (filter engine), m2p3 (profile executor), m2p4 (diversity), m2p5 (query parser infrastructure, RETRIEVE executor pattern to follow), m3p3 (personalized profiles, UserContext), m4p4 (SessionContext for FOR SESSION) Complexity: L (4 tasks; Tasks 01 first, then 02 depends on 01; Tasks 03 and 04 can parallelize with 02) Research Reference: VISION.md (SEARCH query syntax), API.md (SEARCH operation, query syntax table), USE_CASES.md UC-02 (search capabilities), SEQUENCE.md UC-02 (search sequence diagram)

Phase 4: Creator and People Search (m5p4)

Delivers: Search over creator entities by name, topic, and attributes. "Creators like X" via creator embedding similarity. Creator entities indexed in both Tantivy (text fields) and USearch (embeddings). Enables UC-10 (People and Creator Search).

Acceptance Criteria:

Creator entities indexed in Tantivy when written via db.write_creator(): fields name (text, tokenized), handle (keyword, raw), region (keyword), language (keyword), verified (bool)
Creator embeddings indexed in a dedicated USearch index (separate from item embeddings) when provided via write_creator(id, metadata, Some(embedding))
SEARCH creators QUERY "jazz" LIMIT 10 returns creators matching by name or topic, ordered by BM25 relevance
SEARCH creators QUERY "jazz" FILTER verified:true LIMIT 10 filters by creator attributes
SEARCH creators SIMILAR TO @creator_id LIMIT 10 retrieves the source creator's embedding and runs ANN against the creator vector index
Creator search results include: id: EntityId, score: f64, metadata: HashMap<String, String>
Creator sort modes available: Sort::CreatorEngagementRate (average engagement ratio across recent catalog), Sort::MostFollowed (follower count desc)
Creator filters composable: verified, min_followers, max_followers, language, region, followed_by_user (requires FOR USER)
followed_by_user filter uses the existing FollowsBitmap infrastructure from m3p1 to restrict results to creators the user follows
Hybrid search on creators: SEARCH creators QUERY "jazz" VECTOR [query_embedding] LIMIT 10 fuses BM25 name/topic match with embedding similarity via RRF
Creator search latency < 20ms at 200 creators (Criterion benchmarked)
Full M5 UAT steps 5-6 pass as integration test assertions

Task Breakdown:

#	Task	Delivers	Complexity
01	Creator Text Indexing	Tantivy indexing for creator entities, field mapping, write/delete hooks in `write_creator()`/`update_creator()`, syncer integration	M
02	Creator Vector Index	Dedicated USearch index for creator embeddings, insertion on `write_creator()`, ANN search, `SIMILAR TO @creator_id` resolution	M
03	Creator Search Executor	`SEARCH creators` routing in search executor, creator-specific filters (verified, followers, language), sort modes, `followed_by_user` via FollowsBitmap, hybrid fusion for creators	M

Depends On: m5p1 (Tantivy integration, syncer infrastructure), m5p3 (SEARCH executor pipeline to extend), m3p1 (creator entities, FollowsBitmap), m2p1 (vector index infrastructure) Complexity: L (3 tasks; Task 01 and 02 can parallelize; Task 03 depends on both) Research Reference: USE_CASES.md UC-10 (People and Creator Search: name search, "creators like X", social graph discovery), API.md (SEARCH creators examples), docs/research/ann_for_tidaldb.md (creator embedding similarity)

Phase Dependency DAG

m5p1 (Tantivy Integration)
    |         \
    v          \
m5p2 (RRF)     \
    |            \
    v             v
m5p3 (SEARCH Executor)
    |
    v
m5p4 (Creator Search)

m5p1 is the foundation -- everything else depends on having a working text index. m5p2 (RRF fusion) depends on m5p1 for BM25 scores and on the existing m2p1 for ANN scores. m5p3 (SEARCH executor) depends on both m5p1 and m5p2 to orchestrate the full pipeline. m5p4 (Creator search) depends on m5p1 (for creator text indexing) and m5p3 (for the search executor to extend).

Within m5p1, tasks 01-02 are sequential (schema before documents), then tasks 03, 04, and 05 can parallelize once document write is working.

Deferred to Later Milestones

Autocomplete and search suggestions (UC-02.3) -- deferred to M6; requires prefix indexes on the Tantivy term dictionary and trending query tracking infrastructure; M5 proves search works, M6 adds the polish features
Saved searches and alerts (UC-02.4) -- deferred to M6; requires persistent query storage, new-result detection on each indexing pass, and push notification integration; M5 provides the search primitive, M6 builds subscriptions on top
Visual search / image search (UC-11 full) -- deferred to M6; UC-11 core (vector-only search) works via M5's SEARCH items VECTOR [embedding] LIMIT N; the full crop-and-search and multi-modal (text query against image items) workflow requires additional embedding pipeline coordination
"Did you mean" typo correction -- deferred to M6; requires edit-distance computation on the Tantivy term dictionary and a suggestion model; not required for M5's UAT
Tuned linear combination (replacing RRF) -- deferred to M7 or later; requires relevance labels and offline evaluation infrastructure; RRF is the correct zero-configuration starting point
Query composition / SEARCH WITHIN scope (searching within trending, within cohort trending, within following) -- deferred to M6; requires candidate set intersection with scoped retrieval; M5 proves standalone search works first
Semantic session hint matching -- deferred to M6; M4's keyword matching is sufficient; semantic matching via Tantivy text analysis would upgrade hint precision but is not required for M5's UAT
Search result explanation ("why this result?") -- deferred to M6/M7; Tantivy provides Query::explain() per document but it is expensive; not required for M5's UAT

Integration Test

#[test]
fn milestone_5_uat() {
    let db = open_with_search_schema();

    // Write 10K items with text fields, embeddings, and metadata.
    for i in 0..10_000u64 {
        let meta = item_metadata(i); // title, description, tags, category, format, creator_id
        let embedding = item_embedding(i); // 1536-dim
        db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
        db.write_item_embedding(EntityId::new(i), &embedding).unwrap();
    }

    // Write 200 creators with names, handles, and embeddings.
    for c in 0..200u64 {
        let meta = creator_metadata(c); // name, handle, verified, language
        let embedding = creator_embedding(c);
        db.write_creator(EntityId::new(c), &meta).unwrap();
        db.write_creator_embedding(EntityId::new(c), &embedding).unwrap();
    }

    // Write user 42 with engagement history.
    db.write_user(EntityId::new(42), &user_metadata()).unwrap();
    for e in generate_engagement_events(500, EntityId::new(42)) {
        db.signal(&e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Wait for Tantivy syncer to commit.
    db.flush_text_index().unwrap();

    // Step 1: Hybrid search with personalization and diversity.
    let query_vec = embed("rust tutorial beginner");
    let results = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("rust tutorial beginner")
            .vector(&query_vec)
            .for_user(42)
            .diversity(DiversityConstraints { max_per_creator: Some(2), ..Default::default() })
            .limit(20)
            .build().unwrap()
    ).unwrap();
    assert_eq!(results.items.len(), 20);
    assert!(results.items.iter().all(|r| r.score > 0.0));
    assert!(results.items.windows(2).all(|w| w[0].score >= w[1].score));
    assert!(creator_counts(&results.items).values().all(|&c| c <= 2));

    // Step 2: Text-only with filters.
    let filtered = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("jazz piano")
            .for_user(42)
            .filter(FilterExpr::eq("format", "video"))
            .limit(20)
            .build().unwrap()
    ).unwrap();
    assert!(filtered.items.iter().all(|r| r.bm25_score.is_some()));
    assert!(filtered.items.iter().all(|r| r.semantic_score.is_none()));

    // Step 3: Exact phrase match.
    let phrase = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("\"exact phrase match\"")
            .limit(10)
            .build().unwrap()
    ).unwrap();
    // All returned items must contain the exact phrase in some text field.

    // Step 4: Boolean exclusion.
    let excluded = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("jazz -beginner")
            .limit(10)
            .build().unwrap()
    ).unwrap();
    // No returned items should match "beginner" in any text field.

    // Step 5: Creator search by topic.
    let creators = db.search(
        SearchBuilder::new(EntityKind::Creator, ProfileRef::new("search"))
            .query("jazz")
            .limit(10)
            .build().unwrap()
    ).unwrap();
    assert!(!creators.items.is_empty());

    // Step 6: Creators similar to creator_xyz by embedding.
    let similar_creators = db.search(
        SearchBuilder::new(EntityKind::Creator, ProfileRef::new("search"))
            .similar_to(EntityId::new(5))
            .limit(10)
            .build().unwrap()
    ).unwrap();
    assert!(!similar_creators.items.is_empty());
    assert!(similar_creators.items.iter().all(|r| r.id != EntityId::new(5)));

    // Step 7: Search click signal with context.
    let clicked = results.items[2].id;
    db.signal("search_click", clicked, 1.0, Timestamp::now()).unwrap();

    // Step 8: Re-search -- clicked result may rank higher.
    let results2 = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("rust tutorial beginner")
            .vector(&query_vec)
            .for_user(42)
            .limit(20)
            .build().unwrap()
    ).unwrap();
    let rank_before = results.items.iter().position(|r| r.id == clicked).unwrap();
    let rank_after = results2.items.iter().position(|r| r.id == clicked);
    // The clicked result should appear at the same or better rank.
    if let Some(ra) = rank_after {
        assert!(ra <= rank_before);
    }
}

Done When

A developer can execute SEARCH items QUERY "rust tutorial beginner" VECTOR [query_embedding] FOR USER @user_42 USING PROFILE search DIVERSITY max_per_creator:2 LIMIT 20 and receive results that combine BM25 text relevance with semantic vector similarity, re-ranked by user personalization and engagement signals, with diversity constraints enforced. Boolean queries (AND/OR/NOT), exact phrase matching ("..."), field-scoped search (title:...), and wildcard prefix (term*) all work. Creator search returns creators by name, topic, and embedding similarity. The search_click signal closes the feedback loop -- a clicked result influences the next search. End-to-end SEARCH latency < 50ms at 10K items. All 8 UAT scenario steps pass in the integration test.

Milestone 6: Full Surface Coverage -- "Every use case, every sort mode, every filter, every feedback loop"

Milestone Thesis

Milestones 1-5 proved that a single database can ingest signals, rank content, personalize results, manage agent memory, and execute hybrid text+vector search. But a skeptical engineer can still say: "Sure, it handles the happy path, but my platform needs cohort-scoped trending, collections, live content, notification capping, autocomplete, and query composition -- and those are the surfaces that force me back to multiple systems." M6 proves they are wrong. After M6, every query in SEQUENCE.md executes correctly, every Sort variant in API.md works, every filter in USE_CASES.md Appendix A composes, every UC-01 through UC-15 surface is testable end-to-end. The gap between "demo database" and "production database for content ranking" closes here.

Enables

UC-15 (Cohort-Scoped Trending) -- Full: cohort definitions, per-cohort signal aggregation at write time, cohort filter in RETRIEVE
UC-03 full -- Social-graph-scoped trending, cohort-scoped trending, search within cohort trending
UC-05 full -- Collaborative filtering ("users who liked X also liked Y") in the related profile
UC-06 full -- All remaining sort modes: AlphabeticalAsc/Desc, MostCommented, MostShared, Shortest, Longest, LiveViewerCount, DateSaved
UC-07 full -- Notification frequency capping (per-creator and per-user daily caps)
UC-08 full -- Creator profile page modes (top/hot/for_you filtered within one creator's catalog)
UC-09 full -- User library: collections CRUD, in_collection filter, in_progress (continue watching), saved searches as persistent feeds
UC-10 full -- "Creators followed by people I follow" via social graph traversal
UC-12 full -- Live content: viewer_count signal, status=live filter, LiveViewerCount sort
UC-02 full -- SUGGEST autocomplete, trending searches, query composition (SEARCH WITHIN TRENDING / COHORT_TRENDING / FOLLOWING / COLLECTION)
UC-01 through UC-15 -- Full end-to-end UAT for all 15 use cases

UAT Scenario

Given:
  A TidalDb instance with:
    - 500 items across 50 creators, embeddings, signals (view, like, share, comment,
      skip, completion, hide, follow), metadata (category, format, duration, language,
      status, title)
    - 20 users with locale, age_range, explicit_interests, engagement_level attributes
      and preference vectors
    - Relationship graph: follows, blocks, interaction weights
    - 2 named cohorts: "us_young_music" (locale=en-US, age_range in [18-24, 25-34],
      primary_categories includes music) and "jp_casual" (region=JP, engagement_level=casual)
    - 3 user-created collections
    - 5 items with status=live and active viewer_count signals
    - Signal history creating measurable velocity differences between cohort members
      and non-members

When/Then:

1. Cohort-scoped trending (UC-15):
   db.retrieve(Retrieve::builder().profile("cohort_trending").cohort("us_young_music").limit(10))
   → Items ranked by velocity from cohort members only; different from global trending.

2. Social-graph-scoped trending (UC-03):
   db.retrieve(Retrieve::builder().for_user(user_a).profile("trending")
       .filter(FilterExpr::social_graph(user_a, 2)).limit(10))
   → Only items engaged by users that user_a follows appear.

3. All sort modes (UC-06):
   - Sort::AlphabeticalAsc → titles A-Z order
   - Sort::Shortest → ascending duration order
   - Sort::LiveViewerCount + Filter::eq("status","live") → live items by viewer count
   - Sort::DateSaved + Filter::user_state("saved") + for_user → by save timestamp

4. Collections and user library (UC-09):
   db.create_collection(user_a, "jazz_favorites", Visibility::Private)
   db.add_to_collection(coll_id, item_1); db.add_to_collection(coll_id, item_2)
   db.retrieve(Retrieve::builder().filter(FilterExpr::in_collection(coll_id)).limit(10))
   → Exactly 2 results.
   Filter::user_state("in_progress") → Items partially watched by user_a.

5. Live content (UC-12):
   db.signal("viewer_count", live_item, 1.0, ts)
   db.retrieve(Retrieve::builder().filter(FilterExpr::eq("status","live"))
       .sort(Sort::LiveViewerCount).limit(5))
   → Live items ordered by current viewer count.

6. Notification caps (UC-07):
   db.retrieve(Retrieve::builder().for_user(user_a).profile("notification")
       .filter(FilterExpr::since(last_seen))
       .notification_caps(NotificationCaps { max_per_creator_per_day: 1, max_total_per_day: 3 })
       .limit(20))
   → At most 1 item per creator, at most 3 total.

7. Query composition (UC-02.5):
   db.search(Search::builder().query("jazz piano").within(WithinScope::Trending { window_hours: 24 }).limit(10))
   → Only items that are BOTH relevant to "jazz piano" AND trending.
   db.search(Search::builder().query("jazz piano")
       .within(WithinScope::CohortTrending { cohort: "us_young_music", window_hours: 24 }).limit(10))
   → Intersection of cohort-scoped trending and text relevance.

8. SUGGEST autocomplete (UC-02.3):
   db.suggest(&Suggest { prefix: "jazz pia", for_user: None, limit: 5 })
   → ["jazz piano", "jazz piano tutorial", ...]
   db.suggest(&Suggest { prefix: "", for_user: None, limit: 10 })
   → Trending search terms returned.
   Latency < 20ms.

9. Collaborative filtering in related (UC-05):
   After recording co-engagement (users who liked item_X also liked item_Y),
   db.retrieve(Retrieve::builder().for_user(user_a).profile("related").similar_to(item_x).limit(10))
   → item_Y appears in results via co-engagement signal, not just embedding proximity.

Phases

Delivers: Cohort definitions stored in schema with compound predicates over user attributes. Cohort membership resolved at signal write time. Per-cohort signal aggregation in a CohortSignalLedger. cohort() parameter on Retrieve. cohort_trending built-in profile. Cohort filter in queries.

Acceptance Criteria:

db.define_cohort(CohortDef { name, predicate, aggregation }) stores a named cohort predicate over user attributes (locale, age_range, primary_categories, engagement_level, region); validated at write time; duplicate names rejected
Predicate enum supports: Eq(field, value), Any(field, values), Range(field, lo, hi), And(Vec<Predicate>), Or(Vec<Predicate>) -- sufficient to express all cohort definitions in USE_CASES.md
On db.signal(kind, entity_id, weight, ts) with user context, the signal is attributed to every materialized cohort the user belongs to; cohort membership resolved from user metadata at write time via CohortResolver
CohortResolver evaluates user metadata against all registered cohort predicates; result cached in DashMap<EntityId, Vec<CohortName>> with invalidation on user metadata write
CohortSignalLedger: DashMap<(CohortName, EntityId, SignalTypeId), HotEntry> with the same decay/velocity/windowed semantics as the global SignalLedger; updated atomically with the global ledger on each signal write
RetrieveBuilder::cohort(name) scopes signal reads to CohortSignalLedger instead of global ledger; errors with TidalError::NotFound if cohort is not defined
RetrieveBuilder::cohort_predicate(Predicate) supports ad-hoc cohort queries by resolving matching users at query time and aggregating their signals on-demand (slower, but correct; no pre-materialized state required)
cohort_trending built-in profile registered: CandidateStrategy::Scan, velocity-based scoring reading from cohort ledger, same gates as trending
Cohort-scoped trending produces results different from global trending when cohort signal velocity diverges from global velocity (verified via a test with intentionally divergent engagement patterns)
Per-cohort ledger state checkpointed alongside global ledger; survives crash and restart
Performance: cohort-scoped RETRIEVE with materialized cohort completes in < 50ms at 500 items / 20 users

Task Breakdown:

#	Task	Delivers	Complexity
01	`CohortDef` + `Predicate` types + schema storage	Schema-level cohort definitions with compound predicates; stored in-memory and checkpointed	M
02	`CohortResolver`	Evaluates user metadata against all registered predicates; `DashMap` cache with invalidation on user write	M
03	`CohortSignalLedger`	Per-cohort `DashMap<(CohortName, EntityId, SignalTypeId), HotEntry>` with decay/velocity/windowed semantics identical to global ledger	L
04	Wire cohort attribution into signal write path	On `db.signal()`, resolve user's cohorts, call `cohort_ledger.record(cohort, entity, signal, weight, ts)` for each	M
05	`RetrieveBuilder::cohort()` + executor integration	Thread cohort name through `Retrieve` → `RetrieveExecutor`; read from `CohortSignalLedger` in scoring stage	M
06	Ad-hoc cohort predicate (`cohort_predicate`)	Resolve matching users at query time; aggregate their signals on-demand	M
07	`cohort_trending` builtin profile + tests	Register profile; integration test proving cohort vs. global trending divergence	M
08	Checkpoint/restore for cohort state	Serialize `CohortSignalLedger` alongside global ledger in periodic checkpoint	S

Depends On: None (foundation phase; all other M6 phases can start in parallel once task 03 is complete) Complexity: XL Research Reference: VISION.md:46-50 (cohort model), USE_CASES.md:554-591 (UC-15), SEQUENCE.md:306-347 (cohort trending sequence), docs/research/tidaldb_signal_ledger.md (signal aggregation architecture)

Delivers: Reverse relationship index (creator→followers), social-graph-scoped trending filter (FilterExpr::social_graph), co-engagement signal tracking, collaborative filtering boost in the related profile.

Acceptance Criteria:

Reverse relationship index maintained: given a creator_id, retrieve all users who follow them. Implemented as DashMap<EntityId, RoaringBitmap> mapping entity → inbound follower set; updated on every write_relationship call
Reverse index persisted alongside relationship data; survives crash and restart
FilterExpr::social_graph(user_id, depth: u8) implemented: depth=1 constrains candidates to items from creators the user follows; depth=2 expands to items engaged by the users the follow graph resolves to
Social-graph-scoped trending: when FilterExpr::social_graph is used with a trending profile, velocity reads are scoped to signals from users in the resolved social subgraph
Co-engagement tracking: on a positive engagement signal (like or completion ≥ 0.8), record pairwise co-engagement edges between the engaged item and the user's last 50 positively-engaged items. Edge weight incremented per co-occurrence. Stored in DashMap<(EntityId, EntityId), f32> with LRU eviction at configurable capacity (default: 50K pairs)
related profile scoring incorporates co-engagement: final = embedding_sim × 0.6 + co_engagement × 0.3 + signal_score × 0.1
Co-engagement is asymmetric: (A,B) and (B,A) are separate entries; query uses the seed item as the first key
Performance: social_graph filter at depth 2 with 20 users / 10 followed creators completes in < 50ms

Task Breakdown:

#	Task	Delivers	Complexity
01	Reverse relationship index	`DashMap<EntityId, RoaringBitmap>`, updated on `write_relationship`, persisted to storage	M
02	`FilterExpr::social_graph(user_id, depth)`	New filter variant; depth-1 resolves followed creators → their items via `CreatorItemsBitmap`; depth-2 expands to engaged items of followed users	L
03	Co-engagement tracker	`CoEngagementIndex`: pairwise co-occurrence counting on positive engagement signals; bounded with LRU eviction	L
04	Wire co-engagement into `related` profile	In executor scoring, fetch co-engagement scores for all candidates relative to seed item; blend into final score	M
05	Social-graph-scoped trending	When `social_graph` filter is present with a trending profile, scope velocity reads to signals from users in the resolved subgraph	M
06	Persistence + crash recovery	Serialize reverse index and co-engagement map at checkpoint; restore on open	S

Depends On: Phase 1 (cohort ledger patterns reused for social-graph signal scoping) Complexity: L Research Reference: USE_CASES.md:248-270 (UC-05 collaborative filtering), SEQUENCE.md:95-141 (UC-03 social trending), thoughts.md:104-113 (lock-free concurrency), docs/research/ann_for_tidaldb.md (PinnerSage multi-query retrieval)

Phase 3: Full Sort Mode Coverage + Live Content + Engagement Filters (m6p3)

Delivers: All missing Sort variants, viewer_count signal for live content, engagement threshold filters, geographic post-filter, live built-in profile.

Acceptance Criteria:

Sort enum extended with: AlphabeticalAsc, AlphabeticalDesc, Shortest, Longest, MostCommented { window: Window }, MostShared { window: Window }, LiveViewerCount, DateSaved
AlphabeticalAsc / AlphabeticalDesc: sort by item "title" metadata field, case-insensitive, with missing-title items last
Shortest / Longest: sort by item "duration" metadata field in seconds; items without duration last
MostCommented / MostShared: sort by windowed count of "comment" / "share" signal, following the same pattern as existing MostViewed / MostLiked
LiveViewerCount: sort by current decayed score of "viewer_count" signal; items without the signal score 0.0
DateSaved: sort by timestamp when the querying user saved the item (from UserStateIndex); requires for_user; returns TidalError::Query if absent
viewer_count signal type pre-registered in default schema: exponential decay with 5-minute half-life, no windowed aggregation, no velocity (represents current concurrent viewer count)
live built-in profile registered: CandidateStrategy::Scan, Sort::LiveViewerCount, relationship_weight boost from querying user's follows, diversity max_per_creator:1
FilterExpr::MinSignal { signal: String, threshold: f64 } and FilterExpr::MaxSignal { signal: String, threshold: f64 } evaluate AllTime windowed count against a threshold
FilterExpr::NearLocation { lat: f64, lng: f64, radius_km: f64 } evaluates Haversine distance against item "latitude" / "longitude" metadata; evaluated as a post-filter (not index-backed)
All 20 Sort enum variants (covering all 27 Appendix B sort modes via window parameterization) have at least one unit test proving correct ordering semantics
Performance: metadata-based sorts (alphabetical, duration) complete in < 50ms at 500 items

Task Breakdown:

#	Task	Delivers	Complexity
01	Extend `Sort` enum with 8 new variants	Add enum arms to `ranking/profile.rs`	S
02	Metadata-based sort scoring	Case-insensitive title sort; duration-in-seconds parse; missing-field handling	M
03	Signal-count sort variants	Wire windowed count reads for "comment"/"share"; LiveViewerCount reads decay score of "viewer_count"	M
04	`DateSaved` sort	Read save timestamp from `UserStateIndex` per candidate; requires `for_user`; sort descending	M
05	`viewer_count` signal + `live` builtin	Register signal in default schema; register `live` profile	S
06	Engagement threshold filters	`MinSignal` / `MaxSignal` filter variants with AllTime windowed count evaluation	M
07	Geographic filter	`NearLocation` filter variant with Haversine distance post-filter	S
08	Unit + integration tests	One test per Sort variant; tests for engagement and geographic filters	M

Depends On: None (parallel with phases 1 and 2; touches different modules) Complexity: L Research Reference: USE_CASES.md:280-336 (UC-06 sort modes), USE_CASES.md:475-501 (UC-12 live content), USE_CASES.md:594-696 (Appendix A filters), API.md:1001-1035 (Sort enum spec)

Phase 4: User Collections + Watch History + Saved Searches (m6p4)

Delivers: Collection entity type, collection CRUD API, in_collection filter, in_progress user state filter, saved searches as persistent feeds, cross-session preference aggregation onto global user vector.

Acceptance Criteria:

db.create_collection(owner: EntityId, name: &str, visibility: Visibility) -> Result<CollectionId> creates a named collection; Visibility is Private, Shared, Public
db.add_to_collection(collection_id: &CollectionId, item_id: EntityId) -> Result<()> adds an item; idempotent (adding the same item twice is not an error)
db.remove_from_collection(collection_id: &CollectionId, item_id: EntityId) -> Result<()> removes an item
db.list_collections(owner: EntityId) -> Result<Vec<CollectionInfo>> returns the user's collections
Item membership stored as DashMap<CollectionId, RoaringBitmap> for O(1) membership check; persisted to fjall
FilterExpr::in_collection(collection_id) constrains candidates to the collection's RoaringBitmap
FilterExpr::user_state("in_progress") returns items where the user has a partial_completion or completion signal with weight < 0.9; scans user state for items matching this predicate
db.save_search(user, name, query_text, filters) -> Result<()> persists a search configuration; db.list_saved_searches(user) -> Result<Vec<SavedSearchInfo>>; db.retrieve_saved_search(user, name, since) -> Result<SearchResults> re-executes the search with created_after: since; db.delete_saved_search(user, name) -> Result<()>
Cross-session preference aggregation: on close_session, session-level preference hints are merged into the user's global preference vector with a configurable damping factor (default: 0.1 × session hint weight); closes the M4 deferred item ("session signal influence on global user preference vector")
Collections and saved searches survive crash and restart
Performance: in_collection filter with 100 items completes in < 10ms

Task Breakdown:

#	Task	Delivers	Complexity
01	Collection storage model	`Collection` struct, `CollectionId` newtype, `Visibility` enum; item membership `DashMap<CollectionId, RoaringBitmap>`; persisted to fjall	M
02	Collection CRUD API	`create_collection`, `add_to_collection`, `remove_from_collection`, `list_collections` on `TidalDb`; idempotent add	M
03	`FilterExpr::in_collection`	New filter variant; evaluator checks membership in the bitmap	S
04	`FilterExpr::user_state("in_progress")`	Extend user state filter: detect partial completion from user state index	M
05	Saved search storage + CRUD	`SavedSearch` struct in users keyspace; `save_search`, `list_saved_searches`, `delete_saved_search`, `retrieve_saved_search`	M
06	Cross-session preference aggregation	On `close_session`, extract preference delta from `SessionSnapshot`, apply to global preference vector with damping	M
07	Persistence + integration tests	Collections and saved searches survive restart; CRUD tests; `in_progress` filter test	M

Depends On: None (parallel with phases 1-3; operates on different storage surfaces) Complexity: L Research Reference: USE_CASES.md:381-421 (UC-09), API.md:1196-1210 (Collections), API.md:1177-1192 (Saved Searches), VISION.md:52-53 (session→global preference boundary)

Phase 5: Query Composition + SUGGEST Autocomplete (m6p5)

Delivers: WithinScope on SEARCH queries (Trending, CohortTrending, Following, Category, Collection), db.suggest() autocomplete with prefix match and trending searches.

Acceptance Criteria:

SearchBuilder::within(WithinScope) constrains the candidate set before BM25+ANN retrieval; the pre-filter produces a RoaringBitmap passed to both Tantivy (post-filter) and USearch (predicate callback)
WithinScope::Trending { window_hours: u64 } -- candidates are items with view+share velocity above the p75 threshold in the specified window; computed from the global signal ledger
WithinScope::CohortTrending { cohort: String, window_hours: u64 } -- candidates are items with cohort-scoped velocity above threshold; requires cohort to be defined (TidalError::NotFound otherwise)
WithinScope::Following -- candidates are items from creators the querying user follows; requires for_user (TidalError::Query otherwise)
WithinScope::Category { name: String } -- candidates are items matching the category metadata value
WithinScope::Collection { id: CollectionId } -- candidates are items in the specified collection's bitmap
db.suggest(Suggest { prefix: String, for_user: Option<EntityId>, limit: u32 }) -> Result<Vec<Suggestion>>
Prefix autocomplete: SuggestionIndex maintains a sorted Vec<String> of terms extracted from item titles at write time; binary search on prefix; updated incrementally on write_item_with_metadata
Trending searches: when prefix is empty, returns top N search terms by recent query frequency; tracked via DashMap<String, AtomicU64> incremented on each db.search() call; periodic pruning of stale entries
Performance: SEARCH with WithinScope completes in < 50ms; SUGGEST completes in < 20ms

Task Breakdown:

#	Task	Delivers	Complexity
01	`WithinScope` enum + `SearchBuilder::within()`	Define enum with 5 variants; add field to `Search` struct	S
02	`ScopeResolver` -- scope to bitmap	Converts `WithinScope` into `RoaringBitmap`: Trending/CohortTrending via velocity scan, Following via FollowsBitmap, Category via bitmap index, Collection via collection bitmap	L
03	Wire scope bitmap into `SearchExecutor`	Pass scope bitmap to BM25 (Tantivy post-filter) and ANN (USearch predicate callback) before candidate scoring	L
04	`SuggestionIndex` + prefix autocomplete	Sorted `Vec<String>` of title terms; incremental updates on `write_item`; binary search on prefix match	M
05	Trending search frequency counter	`DashMap<String, AtomicU64>` incremented on `db.search()`; `suggest(prefix="")` returns top N	M
06	`db.suggest()` public API	Delegates to `SuggestionIndex` for prefix; trending counter for empty prefix; optional interest-category boost when `for_user` is set	S
07	Integration tests	Each `WithinScope` variant tested independently; composition test (search within cohort trending); suggest with prefix and empty prefix	M

Depends On: Phase 1 (CohortTrending scope needs cohort ledger), Phase 4 (Collection scope needs collection bitmap) Complexity: L Research Reference: USE_CASES.md:148-174 (UC-02.3-5), API.md:822-895 (WithinScope + SUGGEST spec), SEQUENCE.md:306-347 (search within cohort trending)

Phase 6: Notification Capping + Adaptive Preferences + Creator Profile Modes + M6 UAT (m6p6)

Delivers: Notification frequency capping, adaptive preference learning rate, for_creator query constraint for creator profile pages, comprehensive M6 UAT test suite proving all 15 use cases.

Acceptance Criteria:

NotificationCaps { max_per_creator_per_day: u32, max_total_per_day: u32 } type defined; RetrieveBuilder::notification_caps(caps) adds it to the query
Notification caps enforced as a post-diversity pass: count results per creator (using a HashMap<EntityId, u32>) and cap at max_per_creator_per_day; cap total results at max_total_per_day
Per-creator notification tracking: DashMap<(EntityId, EntityId, NaiveDate), u32> counting notifications delivered (user, creator, date); updated after each notification-profile RETRIEVE; reset implied by date key expiry
Adaptive preference learning rate: EMA alpha decays logarithmically with update count per user: alpha = base_alpha / (1 + ln(update_count + 1)); base_alpha configurable in schema (default: 0.1); update count tracked in UserStateIndex alongside preference vector
After 1000 preference updates, new signals shift the vector < 5% as much as the first signal -- verified by a unit test comparing shift magnitude at update counts 1, 100, 1000
RetrieveBuilder::for_creator(creator_id: EntityId) adds FilterExpr::eq("creator_id", creator_id.to_string()) and restricts candidate generation to items from that creator via CreatorItemsBitmap
Creator profile page modes verified: for_creator(x) + for_you profile returns creator x's items ranked by the querying user's preferences; for_creator(x) + hot returns hot-sorted items within x's catalog
M6 UAT integration test (tidal/tests/m6_uat.rs): all 9 UAT steps from the scenario above, each as a separate #[test] function; data setup shared via setup_m6_test_db()
All prior milestone integration tests (m2_uat, m3_uat, m4_uat, m5_uat, m5_search, m5p4_creator_search) continue to pass
Total of 25 built-in profiles: 16 existing + cohort_trending (m6p1) + live (m6p3) + 7 sort-mode profiles added in m6p3 (alphabetical_asc, alphabetical_desc, shortest, longest, most_commented, most_shared, date_saved)

Task Breakdown:

#	Task	Delivers	Complexity
01	`NotificationCaps` type + `RetrieveBuilder::notification_caps()`	Define struct; add optional field to `Retrieve`	S
02	Notification cap enforcement in executor	Post-diversity pass with per-creator and total count tracking	M
03	Adaptive preference learning rate	Modify `update_preference_vector()` to read update count; alpha formula; unit test for decay curve	M
04	`RetrieveBuilder::for_creator(creator_id)`	Convenience method adding creator filter + `CreatorItemsBitmap` restriction	S
05	Creator profile mode integration tests	Verify `for_creator` + `for_you` and `for_creator` + `hot` produce correct scoped rankings	S
06	M6 UAT test suite	`tidal/tests/m6_uat.rs`: 9 test functions covering all UAT steps; shared `setup_m6_test_db()` fixture; all 15 UCs exercised	XL

Depends On: All prior M6 phases (UAT exercises everything built in m6p1-m6p5) Complexity: L Research Reference: USE_CASES.md:339-362 (UC-07 notification caps), USE_CASES.md:366-378 (UC-08 creator profile modes), VISION.md:43-44 (user preferences update continuously)

Phase Dependency DAG

m6p1 (Cohort Engine)
    |          \
    v           \
m6p2 (Social     m6p3 (Sort + Live)    m6p4 (Collections)
  Graph)                |                      |
    |                   |                      |
    +-------------------+----------+-----------+
                                   |
                              m6p5 (Query Composition + SUGGEST)
                                   |
                              m6p6 (Notification Caps + M6 UAT)

m6p1 is the cohort foundation used by m6p2 (social-graph signal scoping patterns) and m6p5 (CohortTrending scope)
m6p2, m6p3, m6p4 can execute in parallel with m6p1 and with each other -- they touch different modules
m6p5 requires m6p1 (CohortTrending scope) and m6p4 (Collection scope) to be complete
m6p6 requires all prior phases complete (UAT exercises everything)

Deferred to Later Milestones

Topic-cluster notification capping (UC-07.3) -- capping notifications at max-per-topic-cluster per batch is unimplemented; this requires a cluster assignment data structure (topic embeddings → cluster ID) and a per-cluster counter alongside the per-creator counter in NotificationTracker; per-creator and per-user-total caps are implemented in m6p6 and cover the primary UC-07 need; topic-cluster capping is a refinement planned for M7.
"Did you mean" typo correction (UC-02.3) -- requires edit-distance automata over the Tantivy term dictionary; prefix autocomplete covers the primary use case for M6 (planned M7)
Personalized suggestions from full search history -- tracking per-user query history as a signal stream and using it for SUGGEST personalization beyond interest-category boosting (planned M7)
Collaborative collections (multi-user boards) -- multi-user write access requires access control beyond owner-only; single-owner collections ship in M6 (planned M7)
Visual search / crop-and-search (UC-06.4, UC-11.1) -- requires image segmentation and region embedding, which is generation; out of scope per VISION.md ("tidalDB does not generate embeddings")
Mood/aesthetic embedding regions (UC-06.3) -- requires application-provided mood anchor embeddings to define regions; database infrastructure exists but semantic regions must come from the application
Signal rollups (hourly/daily materialization for 30d+ windows) -- build only if 500-item benchmarks show bucketed counters exceeding the 50ms budget; not required for M6 test scale (planned M7)
Multi-vector user interest clustering (PinnerSage) -- single preference vector serves through M6; multi-vector clustering adds a new data structure and requires offline training (planned M7+)
Search result explanation ("why this result?") -- Tantivy provides Query::explain() per document but it is expensive at query time; useful for debugging tools, not production serving (planned M7)
Cross-session aggregation dashboards -- the preference merging on session close (m6p4 task 06) closes the correctness gap; a full "what did my agents learn this week?" analytics API requires materialization over closed session archives (planned M7)
Horizontal distribution / partitioned keyspaces -- the key encoding and WAL format are partitioning-ready; actual multi-node deployment is M8

Integration Test

// tidal/tests/m6_uat.rs

fn setup_m6_test_db() -> TidalDb {
    // 500 items, 50 creators, 20 users, 2 cohorts, 3 collections, 5 live items
    // ... (full setup in tests/m6_uat.rs)
}

#[test]
fn uat_step_1_cohort_scoped_trending() {
    let db = setup_m6_test_db();
    // US young music users engage heavily with items 1-10
    for user in us_young_music_users() {
        for item_id in 1u64..=10 {
            db.signal("view", EntityId::new(item_id), 1.0, Timestamp::now()).unwrap();
            db.signal("share", EntityId::new(item_id), 1.0, Timestamp::now()).unwrap();
        }
    }
    let cohort_trending = db.retrieve(
        &Retrieve::builder().profile("cohort_trending")
            .cohort("us_young_music").limit(10).build().unwrap()
    ).unwrap();
    let global_trending = db.retrieve(
        &Retrieve::builder().profile("trending").limit(10).build().unwrap()
    ).unwrap();
    let cohort_ids: Vec<u64> = cohort_trending.results.iter().map(|r| r.id.raw()).collect();
    let global_ids: Vec<u64> = global_trending.results.iter().map(|r| r.id.raw()).collect();
    assert_ne!(cohort_ids, global_ids, "cohort trending must differ from global trending");
    assert!(cohort_ids.iter().all(|&id| id <= 10),
        "cohort trending should reflect US music engagement, got: {:?}", cohort_ids);
}

#[test]
fn uat_step_3_sort_modes() {
    let db = setup_m6_test_db();
    let alpha = db.retrieve(
        &Retrieve::builder().sort(Sort::AlphabeticalAsc).limit(20).build().unwrap()
    ).unwrap();
    assert!(alpha.results.windows(2).all(|w|
        w[0].metadata.get("title").unwrap_or(&String::new()).to_lowercase()
        <= w[1].metadata.get("title").unwrap_or(&String::new()).to_lowercase()
    ));
    let live = db.retrieve(
        &Retrieve::builder()
            .filter(FilterExpr::eq("status", "live"))
            .sort(Sort::LiveViewerCount)
            .limit(5).build().unwrap()
    ).unwrap();
    assert!(live.results.windows(2).all(|w| w[0].score >= w[1].score));
}

#[test]
fn uat_step_4_collections() {
    let db = setup_m6_test_db();
    let user_a = EntityId::new(1001);
    let coll = db.create_collection(user_a, "jazz_faves", Visibility::Private).unwrap();
    db.add_to_collection(&coll, EntityId::new(1)).unwrap();
    db.add_to_collection(&coll, EntityId::new(2)).unwrap();
    let results = db.retrieve(
        &Retrieve::builder()
            .for_user(user_a)
            .filter(FilterExpr::in_collection(&coll))
            .limit(10).build().unwrap()
    ).unwrap();
    assert_eq!(results.results.len(), 2);
}

#[test]
fn uat_step_7_search_within_trending() {
    let db = setup_m6_test_db();
    // generate trending jazz items...
    let results = db.search(
        &Search::builder()
            .query("jazz")
            .within(WithinScope::Trending { window_hours: 24 })
            .limit(10).build().unwrap()
    ).unwrap();
    assert!(!results.items.is_empty());
    assert!(results.items.iter().all(|r| r.bm25_score.unwrap_or(0.0) > 0.0));
}

#[test]
fn uat_step_8_suggest() {
    let db = setup_m6_test_db();
    let suggestions = db.suggest(&Suggest {
        prefix: "jazz".into(),
        for_user: None,
        limit: 5,
    }).unwrap();
    assert!(!suggestions.is_empty());
    assert!(suggestions.iter().all(|s| s.text.to_lowercase().starts_with("jazz")));
    let trending = db.suggest(&Suggest {
        prefix: "".into(),
        for_user: None,
        limit: 5,
    }).unwrap();
    assert!(!trending.is_empty(), "empty prefix must return trending searches");
}

Done When

A developer can embed TidalDb, define 2 cohorts, write 500 items with metadata and embeddings across 50 creators, register 20 users with demographic attributes, build a relationship graph, create user collections, mark items as live, record engagement signals, and then verify all 9 UAT steps pass:

Cohort-scoped trending returns items trending within the cohort -- distinct from global trending
Social-graph-scoped trending returns items engaged by the user's follow graph
All 20 Sort enum variants (including AlphabeticalAsc, Shortest, LiveViewerCount, DateSaved) produce correctly ordered results
Collection CRUD and in_collection filter work end-to-end; in_progress returns partially-watched items
Live content filters by status=live and sorts by viewer_count signal
Notification caps enforce per-creator and total daily limits
SEARCH with WithinScope (Trending, CohortTrending, Following, Collection) correctly intersects scope with text+vector retrieval
SUGGEST returns prefix completions and trending searches in < 20ms
Related profile incorporates co-engagement alongside embedding similarity

All prior milestone tests (m2_uat, m3_uat, m4_uat, m5_uat) continue to pass. Every query at the 500-item test scale completes in under 50ms. UC-01 through UC-15 are verifiable end-to-end.

Milestone 7: Production Hardening -- "Ready for real workloads"

Milestone Thesis

M6 proved that tidalDB can handle every discovery surface, sort mode, filter, and feedback loop. But "feature-complete" is not "production-ready." A skeptical SRE can still say: "Sure, it handles 500 items in a test. What happens at 1M items when the process crashes mid-checkpoint? What happens under 3x read load? How do I know the WAL is healthy?" M7 proves they are wrong. After M7, tidalDB can be embedded in a production application and operated with confidence -- crash recovery is correct and fast, graceful degradation works under load, performance meets targets at 1M+ items, abandoned sessions are cleaned up, rate limiting protects against runaway agents, and operational visibility exists. The database is trustworthy.

UAT Scenario

Given:
  A tidalDB instance with:
    - 1,000,000 items, 100,000 users, 10,000 creators
    - 10 signal types with 5 windows each
    - 2 cohorts with materialized signal aggregation
    - 50 active agent sessions with policies
    - Sustained write load: 10,000 signal events/second
    - Concurrent read load: 1,000 RETRIEVE queries/second

When:
  1. Run full workload for 1 hour
  2. Kill the process at a random point (mid-checkpoint, mid-WAL-write,
     mid-signal-aggregation)
  3. Restart and measure recovery time
  4. Verify no data loss and no inconsistency: no phantom items, no lost
     signals, no inconsistent cohort aggregates, no orphaned collections
  5. Verify abandoned sessions (>2h old) have been cleaned up
  6. Run workload at 3x expected load (30K signals/sec, 3K queries/sec)
  7. Verify graceful degradation (reduced precision, not errors)
  8. Inject a runaway agent writing 500 signals/sec to a single session
  9. Verify per-agent rate limiting rejects excess writes without affecting
     other agents
  10. Read QueryStats from results and verify timing breakdown is present
  11. Read /metrics endpoint and verify signal write latency, WAL lag,
      index health, and degradation level are all exported
  12. Run `tidalctl diagnostics --path <dir>` and verify human-readable
      health summary

Then:
  - Step 1: All queries < 50ms p99 (RETRIEVE), < 100ms p99 (SEARCH),
    all signal writes < 100us amortized
  - Step 3: Recovery time < 30 seconds (1M items checkpoint + 5min WAL backlog)
  - Step 4: WAL replay produces state identical to pre-crash; checkpoint
    integrity verified via BLAKE3; cohort ledger, collection index, and
    co-engagement index all recovered correctly; hard negatives (hidden items,
    blocked creators) never leak after any crash scenario
  - Step 5: Sessions exceeding max_session_duration are auto-closed with
    summary archived; sweeper runs every 60 seconds
  - Step 6-7: Under overload, tidalDB reduces candidate set size, uses
    coarser aggregates, skips diversity -- but never returns errors for
    well-formed queries; degradation level exposed in query response
  - Step 8-9: Agent exceeding configured rate limit gets TidalError::RateLimited
    (rate limiting is opt-in; unlimited by default; configure via
    `RateLimiterConfig::limited(rate, burst)` in builder); other agents unaffected;
    rate limit tracked per (agent_id, session_id)
  - Step 10: QueryStats includes candidates_considered, scoring_time_us,
    total_time_us, degradation_level, filters_applied
  - Step 11: Prometheus metrics include tidaldb_wal_lag_bytes,
    tidaldb_checkpoint_age_seconds, tidaldb_signal_hot_entries,
    tidaldb_tantivy_segment_count, tidaldb_usearch_index_size,
    tidaldb_degradation_level
  - Step 12: tidalctl diagnostics prints WAL state, checkpoint age, signal
    state size, index sizes, session count, degradation level

Phases

Phase 1: Crash Recovery Hardening (m7p1)

Delivers: Fault injection test harness, WAL compaction, checkpoint integrity verification, recovery time measurement, and crash fencing for all M6 state surfaces. Every write-path stage tested for crash safety. Recovery < 30 seconds guaranteed at 1M items.

Acceptance Criteria:

Fault injection harness: CrashPoint enum covering WAL pre-write, WAL post-write, checkpoint pre-flush, checkpoint post-flush, signal aggregation update, cohort ledger update, collection index update, co-engagement update; configurable via #[cfg(test)] feature flag
Property tests for each crash point: generate N random event sequences (N >= 1000), inject crash at random position within the write path, restart, verify state matches expected from WAL replay to 6 decimal places for decay scores and exact match for counters
WAL compaction: after successful checkpoint, WAL segments with seqno <= checkpoint seqno are atomically deleted; compaction verifies the new checkpoint is readable before deleting old segments (write-new-then-delete-old pattern)
Checkpoint integrity: CheckpointMeta extended with a BLAKE3 hash of the checkpoint payload; on open, hash verified before applying checkpoint state; corrupt checkpoint triggers fallback to WAL-only replay with a warning log
Recovery time < 30 seconds for 1M items checkpoint + 5 minutes of WAL backlog (Criterion benchmarked)
tidalctl recover --path <dir> --verify-only dry-runs WAL replay and reports: event count, last seqno, inconsistency count, estimated recovery time, without writing any state
Crash fencing for cohort state: CohortSignalLedger checkpoint/restore roundtrips correctly under all crash points; cohort membership cache rebuilt from user metadata on restart
Crash fencing for collection state: CollectionIndex persisted bitmaps survive all crash points; orphaned bitmap entries (collection deleted but bitmap persists) detected and cleaned on recovery
Crash fencing for co-engagement state: CoEngagementIndex recovered from checkpoint; bounded-LRU invariant preserved across restart
Crash fencing for session state: active sessions with WAL session-start but no session-close are correctly restored; signal counts and audit logs match WAL replay
No phantom items (items in index state but not in WAL replay) after any crash scenario
No lost signals (signals in WAL but missing from state after recovery) after any crash scenario
No leaked hard negatives (hidden items or blocked creators appearing in query results after crash recovery)

Task Breakdown:

#	Task	Delivers	Complexity
01	`CrashPoint` enum + fault injection hooks	Test-gated hooks at 8 write-path locations; `CrashInjector` struct with configurable trigger (crash at Nth write, random probability)	M
02	Property tests for signal ledger crash points	4 crash points (WAL pre/post, checkpoint pre/post) x 1000 random event sequences; verify decay scores match analytical formula to 6 decimal places after recovery	L
03	WAL compaction	Atomic deletion of pre-checkpoint segments; write-new-checkpoint-then-delete-old pattern; compaction after each successful periodic checkpoint	M
04	Checkpoint BLAKE3 integrity	Extend `CheckpointMeta` with 32-byte BLAKE3 hash of checkpoint payload; verify on open; fallback to WAL-only replay on corruption	M
05	Recovery time benchmark	Generate 1M-item checkpoint + 5min WAL backlog; measure cold-start to ready time; assert < 30s	S
06	`tidalctl recover --verify-only`	Dry-run WAL replay; report event count, last seqno, inconsistencies, estimated recovery time	S
07	Crash fencing for M6 state (cohort, collection, co-engagement, session)	Property tests for crash recovery of CohortSignalLedger, CollectionIndex, CoEngagementIndex, active sessions; checkpoint/restore roundtrip correctness	L
08	Hard negative crash invariant test	After any crash scenario, RETRIEVE never returns hidden items or blocked creators; 1000 random crash+restart sequences with hide/block interspersed	M

Depends On: M6 complete Complexity: XL Research Reference: docs/research/tidaldb_wal.md (crash recovery, segment format, deduplication), thoughts.md Part V.5-6 (quarantine-first, group commit), docs/research/tidaldb_signal_ledger.md (checkpoint format, running-score formula)

Phase 2: Graceful Degradation, Rate Limiting, and Session Cleanup (m7p2)

Delivers: Automatic quality reduction under load pressure. 4-stage degradation order documented and enforced. Backpressure on write path. Per-agent token-bucket rate limiting. Session TTL auto-cleanup sweeper. All load behavior visible in query responses.

Acceptance Criteria:

DegradationLevel enum: Full, ReducedCandidates, CoarseAggregates, NoDiversity -- applied in this order under increasing load
Load detection: AtomicU64 tracking in-flight query count; threshold configurable per level (defaults: 200 -> ReducedCandidates, 500 -> CoarseAggregates, 1000 -> NoDiversity)
ReducedCandidates: ANN top_k reduced from 500 to 200; BM25 candidate limit halved
CoarseAggregates: windowed count reads use AllTime instead of fine-grained windows for scoring; velocity reads use 24h window regardless of profile configuration
NoDiversity: diversity pass skipped entirely; results returned after scoring only
Under 3x overload (3000 concurrent queries), all well-formed queries return results (no ServiceUnavailable or panic); malformed queries still return errors
Degradation level exposed in query response: Results.degradation_level: DegradationLevel and SearchResults.degradation_level: DegradationLevel
Write backpressure: when WAL batch queue depth exceeds configurable threshold (default: 1000 pending batches), db.signal() returns TidalError::Backpressure { retry_after_ms: u64 } with exponential backoff hint
TidalError::Backpressure variant added with retry_after_ms field
Per-agent token-bucket rate limiting: RateLimiter struct with configurable tokens/second per (AgentId, SessionId) pair (default: unlimited; opt-in via RateLimiterConfig::limited(rate, burst) in builder); excess writes return TidalError::RateLimited { agent_id, limit, retry_after_ms }
TidalError::RateLimited variant added with agent_id, limit, and retry_after_ms fields
Rate limiter does not affect non-session signal writes (global db.signal() is not rate-limited per-agent)
Session TTL auto-cleanup sweeper: background task runs every 60 seconds; sessions exceeding max_session_duration are auto-closed with SessionSummary archived; SessionSummary.auto_closed: bool field added
Sweeper is cancellable on db.close() / db.shutdown(); no dangling threads after shutdown
Load test: simulate 3x overload for 60 seconds; verify all queries return results; verify degradation progression matches thresholds; verify signal writes under backpressure retry successfully after delay

Task Breakdown:

#	Task	Delivers	Complexity
01	`DegradationLevel` enum + load detector	AtomicU64 in-flight counter; threshold config; level computed on each query entry; RAII guard struct for decrement on drop	M
02	Query executor degradation branches	Wire `DegradationLevel` into `RetrieveExecutor` and `SearchExecutor`: ReducedCandidates, CoarseAggregates, NoDiversity	M
03	Degradation level in response + backpressure error	Add `degradation_level` field to `Results` and `SearchResults`; add `Backpressure` variant to `TidalError`; WAL queue depth check before enqueue	M
04	Per-agent token-bucket rate limiter	`RateLimiter` struct with `DashMap<(AgentId, SessionId), TokenBucket>`; refill rate configurable; wire into `session_signal()` write path; `TidalError::RateLimited` variant	M
05	Session TTL auto-cleanup sweeper	Background task scanning `active_sessions` every 60s; auto-close expired sessions; `auto_closed` flag on `SessionSummary`; cancellation on shutdown	M
06	Load test	Simulate 3x overload (concurrent query + write threads); verify degradation progression, backpressure behavior, rate limiting isolation, session cleanup	L

Depends On: m7p1 (stable crash recovery before load testing) Complexity: L Research Reference: thoughts.md Part V (graceful degradation), VISION.md design principles, M4 deferrals (per-agent QPS rate limiting, session TTL auto-cleanup)

Phase 3: Performance at Scale (m7p3)

Delivers: Benchmarks and optimization at 1M items, 100K users, 10K creators. USearch parameter tuning. Tantivy segment management. Signal state memory footprint optimization. Signal rollups for 30d+ windows if bucketed counters exceed the 50ms budget at scale.

Acceptance Criteria:

Criterion benchmark suite at 1M items: RETRIEVE (for_you profile) p99 < 50ms, SEARCH (hybrid BM25+ANN) p99 < 100ms, signal write p99 < 100us amortized
USearch index tuning: M={8,16,32} and ef_construction={100,200,400} benchmarked at 1M vectors; optimal config documented and applied; ANN recall@10 > 0.95 within latency budget
Tantivy segment management: LogMergePolicy tuned for 1M docs; segment count < 20 at steady state after 1M document indexing; background merge verified to not block foreground reads (concurrent read/write benchmark)
Signal state memory footprint measured and documented: bytes per hot entry at 1M items x 10 signal types x 5 windows; total footprint < 10 GB; if footprint exceeds budget, implement signal state trimming (evict entries with no signal activity in the last 30 days)
Signal rollup evaluation: benchmark bucketed counters at 1M items for 30d windows; if p99 windowed-count read exceeds 50ms, implement hourly rollup materialization (background thread computes hourly aggregates, stores under Tag::Rollup key prefix, merge with live counters at read time); if p99 is within budget, document the result and defer rollups
Profile execution path profiled with cargo flamegraph or equivalent; top-3 hotspots documented; any hotspot representing > 10% of total RETRIEVE time optimized or documented with rationale for deferral
CoEngagementIndex LRU eviction verified at capacity: insert 2x capacity, verify memory stays bounded; verify evicted entries are the least-recently-accessed
Cross-session preference aggregation verified at scale: 100K users with 10 closed sessions each; preference vector merge completes within the close_session latency budget (< 1ms per merge)
tidal/benches/social.rs extended (or new benchmark) covering 1M-item RETRIEVE with social graph filter, cohort-scoped trending, and collection filter

Task Breakdown:

#	Task	Delivers	Complexity
01	Scale benchmark suite	Criterion benches at 1M items for RETRIEVE (for_you, trending, following), SEARCH (hybrid, text-only), signal write; establish baselines	L
02	USearch parameter tuning	Benchmark M x ef_construction matrix at 1M vectors; document recall/latency tradeoff; apply optimal config	M
03	Tantivy merge policy tuning	Configure `LogMergePolicy`; benchmark segment count evolution during sustained indexing; verify concurrent read/write latency	M
04	Signal state memory analysis + trimming	Measure bytes per hot entry; document memory model; implement LRU trimming of inactive entries if footprint exceeds 10 GB	L
05	Signal rollup evaluation (conditional)	Benchmark 30d windowed count at 1M items; implement hourly rollups if p99 > 50ms; otherwise document and defer	L
06	Flamegraph profiling + hotspot optimization	Profile RETRIEVE + SEARCH hot paths; document top-3 hotspots; optimize any > 10% of total time	L
07	CoEngagementIndex LRU + social scale verification	Eviction correctness test at 2x capacity; social graph filter benchmark at 1M items	M

Depends On: m7p1 (stable crash recovery before benchmarking at scale) Complexity: XL Research Reference: docs/research/ann_for_tidaldb.md (USearch parameter guidance, M and ef_construction tradeoffs), docs/research/tantivy.md (segment management, LogMergePolicy), docs/research/tidaldb_signal_ledger.md (memory model, three-tier architecture), M6 deferrals (signal rollups for 30d+ windows)

Phase 4: Operational Visibility (m7p4)

Delivers: Query execution stats, signal system health metrics, index health metrics, structured error reporting with context, tidalctl diagnostics, zero-overhead metrics feature flag, RLHF training data export API, and cross-session aggregation query.

Acceptance Criteria:

QueryStats struct returned alongside query results: candidates_considered: u64, candidates_after_filter: u64, candidates_after_diversity: u64, filters_applied: Vec<String>, scoring_time_us: u64, diversity_time_us: u64, total_time_us: u64, degradation_level: DegradationLevel, profile_name: String
Results.stats: QueryStats and SearchResults.stats: QueryStats populated by executors
Signal system health metrics exported at /metrics (Prometheus text format, gated by #[cfg(feature = "metrics")]): tidaldb_wal_lag_bytes (gauge), tidaldb_wal_compacted_segments_total (counter), tidaldb_checkpoint_age_seconds (gauge), tidaldb_signal_hot_entries (gauge), tidaldb_signal_writes_total (counter), tidaldb_signal_write_latency_us (histogram with p50/p99/p999 quantiles)
Index health metrics: tidaldb_tantivy_segment_count (gauge), tidaldb_tantivy_indexed_docs (gauge), tidaldb_usearch_index_size_bytes (gauge), tidaldb_usearch_vector_count (gauge), tidaldb_bitmap_index_cardinality (gauge per bitmap name)
Cohort ledger health: tidaldb_cohort_ledger_entries (gauge), tidaldb_cohort_count (gauge)
Session health: tidaldb_active_sessions (gauge), tidaldb_closed_sessions_total (counter), tidaldb_session_auto_closed_total (counter), tidaldb_rate_limited_total (counter)
Degradation level gauge: tidaldb_degradation_level (gauge, 0=Full/1=ReducedCandidates/2=CoarseAggregates/3=NoDiversity)
tidalctl diagnostics --path <dir> prints a human-readable health summary: WAL state (lag bytes, last seqno, segment count), checkpoint age, signal state size (entry count, estimated memory), index sizes (Tantivy docs/segments, USearch vectors/bytes), session count (active/closed), degradation level, collection count, cohort count
All TidalError variants include structured context: operation name, entity ID where relevant, signal type where relevant; no bare "error" strings in any variant
RLHF training data export: db.export_signals(ExportRequest { user_id: Option<u64>, signal_types: Vec<String>, since: Timestamp, until: Timestamp, format: ExportFormat }) -> Result<Vec<ExportedSignal>> reads signal events from WAL segments within the time range; ExportedSignal contains user_id, entity_id, signal_type, weight, timestamp, session_id: Option<SessionId>, annotation: Option<String>; ExportFormat::JsonLines supported
Cross-session aggregation: db.user_session_summary(user_id, since: Timestamp) -> Result<UserSessionSummary> scans closed session archives; returns sessions_count, total_signals, total_rejections, top_signal_types: Vec<(String, u64)>, preference_drift: f64 (cosine distance between preference vector at since and now)
Metrics are zero-overhead when the metrics feature is disabled: all metrics calls wrapped in #[cfg(feature = "metrics")]; verified by compiling without the feature

Task Breakdown:

#	Task	Delivers	Complexity
01	`QueryStats` struct + executor instrumentation	Define struct; instrument `RetrieveExecutor` and `SearchExecutor` to record timing and counts at each pipeline stage; populate `Results.stats` and `SearchResults.stats`	M
02	Signal system + WAL metrics	Wire counters/gauges into WAL (lag, compaction count), checkpoint (age), signal ledger (entry count, write latency histogram)	M
03	Index health metrics	Expose Tantivy segment count + doc count; USearch vector count + byte size; bitmap index cardinality per bitmap	M
04	Session + cohort + degradation metrics	Active/closed/auto-closed session gauges; cohort ledger entry count; degradation level gauge; rate-limited counter	S
05	`tidalctl diagnostics`	Print human-readable health summary; covers WAL, checkpoint, signals, indexes, sessions, cohorts, collections	M
06	Structured `TidalError` context audit	Audit all `TidalError` variants; add operation name, entity ID, signal type context where missing; remove bare string errors	M
07	`metrics` feature flag + zero-overhead verification	Wrap all metrics calls in `#[cfg(feature = "metrics")]`; compile without feature; verify no metrics overhead	S
08	RLHF training data export	`db.export_signals()` API reading WAL segments by time range; `ExportedSignal` type; `ExportFormat::JsonLines` output; integration test	M
09	Cross-session aggregation query	`db.user_session_summary()` API scanning closed archives; `UserSessionSummary` type with session count, signal totals, top types, preference drift	M

Depends On: m7p1 (stable internals before instrumenting them), m7p2 (degradation level must exist to report it) Complexity: L Research Reference: docs/research/tidaldb_tooling_and_diagnostics.md, thoughts.md Part V (operational simplicity), M4 deferrals (RLHF training data export), M6 deferrals (cross-session aggregation dashboards)

Phase 5: M7 UAT Integration Test (m7p5)

Delivers: End-to-end M7 UAT integration test proving all production hardening capabilities work together. Crash recovery, graceful degradation, rate limiting, session cleanup, observability, and scale performance all exercised in a single comprehensive test suite.

Acceptance Criteria:

tidal/tests/m7_uat.rs test suite with separate #[test] functions for each UAT step
Crash recovery tests: write 10K items + 100K signals; inject crash via CrashPoint at 3 write-path stages; verify recovery produces identical state; verify checkpoint BLAKE3 integrity; verify WAL compaction removed old segments; verify hard negatives don't leak after recovery
Session cleanup test: create session with 30-second TTL; wait 35 seconds; verify sweeper auto-closed the session; verify auto_closed: true in summary
Degradation test: simulate concurrent load above threshold; verify degradation_level in response matches expected level; verify all queries return results
Rate limiting test: configure 10 signals/sec rate limit; write 50 signals in 1 second; verify first 10 succeed and remaining return TidalError::RateLimited; verify other sessions unaffected
QueryStats test: execute RETRIEVE and SEARCH; verify stats field populated with non-zero candidates_considered, scoring_time_us, total_time_us
Metrics test: verify Prometheus text output from /metrics contains expected metric names and is non-empty
Export + aggregation test: write session signals; close session; call export_signals() and verify output contains expected events; call user_session_summary() and verify counts
All prior milestone integration tests (m2_uat, m3_uat, m4_uat, m5_uat, m6_uat) continue to pass
No individual test takes longer than 60 seconds (crash recovery tests use small datasets; load tests use short duration)

Task Breakdown:

#	Task	Delivers	Complexity
01	Crash recovery UAT tests	3 tests: crash at WAL-write, crash at checkpoint, crash with M6 state (cohort, collection); verify correct recovery and hard negative invariant	L
02	Degradation + rate limiting + session cleanup UAT tests	3 tests: degradation progression under load, per-agent rate limiting isolation, session auto-cleanup after TTL	L
03	Observability + export UAT tests	3 tests: QueryStats populated, metrics endpoint content, RLHF export + session aggregation	M
04	Regression gate	Verify all prior UAT suites pass (m2_uat through m6_uat); no regressions introduced by M7 changes	S

Depends On: m7p1, m7p2, m7p3, m7p4 (UAT exercises everything built in M7) Complexity: L Research Reference: All M7 phase specifications above

Phase Dependency DAG

m7p1 (Crash Recovery Hardening)
        |
        +--------+--------+
        |        |        |
     m7p2     m7p3     m7p4*
  (Degrade   (Scale)  (Observ.)
   + Rate
   + Sweep)
        |
        +--- m7p4 (needs degradation level from m7p2)
                 |
              m7p5 (M7 UAT -- depends on all four prior phases)

m7p1 is the foundation -- crash recovery hardening must be stable before load testing, scale optimization, or instrumentation
m7p2 depends on m7p1 (stable system before stress testing); delivers degradation level, rate limiting, and session cleanup
m7p3 depends on m7p1 (stable system before benchmarking at scale); can parallelize with m7p2
m7p4 depends on m7p1 (stable internals to instrument); tasks 01-03, 05-07 can start in parallel with m7p2; tasks 04, 08, 09 depend on m7p2 being complete (degradation level gauge, rate limited counter, session auto-close counter)
m7p5 depends on all four prior phases (UAT exercises everything)

Deferred to Later Milestones

Horizontal distribution / partitioned keyspaces -- deferred to M8; the single-node architecture scales vertically first; distribution is a separate product decision requiring shard-aware keyspaces, WAL shipping, and deterministic reconciliation. Planned for M8.
Multi-tenancy -- deferred to M8+; per-tenant isolation within a single tidalDB instance requires the distributed fabric's namespace and routing infrastructure. Planned for M8+.
A/B testing infrastructure -- deferred to M8+; comparing two profile versions within the database requires tenant-level isolation and traffic routing. Planned for M8+.
Signal rollup to external cold storage -- deferred to M8+; S3/GCS archival for compliance requires the distributed fabric's WAL shipping infrastructure. Planned for M8+.
Client libraries (Python, Node, Go bindings) -- deferred to M8+; language-specific wrappers beyond Rust embedding require a stable API surface; M7 may still refine APIs. Planned for M8+.
Streaming query results -- deferred post-M7; cursor-based streaming for very large result sets is a refinement once core performance targets are met at 1M items.
Multi-vector user interest clustering (PinnerSage) -- deferred post-M7; single preference vector serves through M7; multi-vector clustering adds a new data structure and requires offline training.
"Did you mean" typo correction -- deferred to M8+. Prefix autocomplete (m6p5) covers the primary use case for search suggestions. Edit-distance automata over the Tantivy term dictionary is a quality-of-life improvement, not a production hardening requirement. M7's scope is crash safety, load handling, and operational readiness; typo correction belongs in a surface-quality milestone after the system is production-hardened. Planned for M8+.
Search result explanation ("why this result?") -- deferred to M8+. Tantivy's Query::explain() is expensive at query time and produces per-document scoring breakdowns useful for debugging, not production serving. M7 delivers QueryStats (pipeline-level timing and count visibility) which serves the production operator's need. Per-result explanations belong in a developer experience milestone. Planned for M8+.
Collaborative collections (multi-user boards) -- deferred to M8+; multi-user write access requires access control beyond single-owner, which intersects with the multi-tenancy work in M8. Single-owner collections work in M6.
Personalized suggestions from full search history -- deferred to M8+; tracking per-user query history as a signal stream for SUGGEST personalization beyond interest-category boosting (m6p5) is a refinement that belongs after production hardening.
Topic-cluster notification capping -- deferred to M8+; requires topic embedding clustering and per-cluster counters; per-creator and per-user-total caps (M6) cover the primary notification spam prevention need; topic-level refinement belongs after production hardening.
Tuned linear combination (replacing RRF for hybrid search) -- deferred to M8+; requires relevance labels and offline evaluation infrastructure; RRF is the correct zero-configuration starting point through M7.

Done When

tidalDB operates correctly at 1M items under sustained concurrent read/write load. Crash recovery completes in < 30 seconds with zero data loss -- verified by fault injection at every write-path stage including cohort, collection, co-engagement, and session state. WAL compaction atomically removes pre-checkpoint segments. Checkpoint integrity is BLAKE3-verified on every open. Graceful degradation works under 3x overload without returning errors for well-formed queries, following the documented 4-stage progression (Full -> ReducedCandidates -> CoarseAggregates -> NoDiversity). Per-agent token-bucket rate limiting protects against runaway agents without affecting other sessions. Abandoned sessions are automatically cleaned up by the background sweeper. RETRIEVE p99 < 50ms and SEARCH p99 < 100ms at 1M items. Signal writes p99 < 100us amortized. Signal state memory footprint < 10 GB for 1M items x 10 signal types x 5 windows. QueryStats are returned with every query result. Prometheus metrics expose WAL lag, checkpoint age, signal state, index health, session health, cohort health, and degradation level. tidalctl diagnostics prints a human-readable health summary. Signal events are exportable as RLHF training data. Cross-session aggregation answers "what did my agents learn this week?" All prior milestone integration tests (m2_uat through m6_uat) continue to pass. A developer can embed tidalDB in a production application and operate it with confidence.

Milestone 8: Distributed Fabric -- "Agent memory everywhere"

Milestone Thesis

The exact same signal semantics, session policies, and WAL format power a multi-tenant, multi-region deployment. Instances shard deterministically by EntityKind + EntityId, ship WAL segments to peers, reconcile deterministically, and expose an eventually consistent API that still honors agent memory guarantees (no hidden items leaking, no double-counted decay). Hosted tidalDB can now back global agent workloads without rewriting application code.

UAT Scenario

Given:
  - Three regions (us-east, eu-west, ap-south) with 5 shards each
  - Global write throughput: 25K signal events/sec, evenly distributed
  - Fat-client agents pinned to local region but free to roam
  - 1-hour network partition between eu-west and ap-south during sustained load

When:
  1. Write signals for a user in us-east, then read in eu-west after < 2s
  2. Crash an entire shard primary; observe automatic promotion and replay
  3. Execute global query (`RETRIEVE ... COHORT locale:EU`) while ap-south is partitioned
  4. Heal the partition; verify deterministic reconciliation (no duplicate counts, hides remain hidden)
  5. Move a tenant (agent workspace) to a new region by changing routing config only

Then:
  - Cross-region replication lag < 2s p99
  - No signal loss or duplication after failover/partition
  - Hard negatives (hide/mute/block) never leak, even while eventual state converges
  - Per-tenant resource isolation enforced (quotas, WAL namespaces)
  - Control plane surfaces reconciliation lag, shard health, and tenant placement

Phases

Phase 1: Shard-Aware Foundations (m8p1) -- COMPLETE

Delivers: Identity types (ShardId, RegionId, WalSegmentId, NodeRole), ShardRouter for entity placement, BatchHeader v2 (backward-compatible WAL extension), shard-aware segment naming, NodeConfig in TidalDbBuilder, and ReplicationState per-shard high-water-mark. No network I/O in this phase -- just the data structure layer that everything else builds on.

Acceptance Criteria:

ShardId(u16) and RegionId(u16) are Copy + Hash + Ord + Serialize; TenantId(0) single-node default unchanged.
WalSegmentId::parse("r0:s0:42") and Display round-trip deterministically.
BatchHeader v2 reads bytes 60-63 for shard/region IDs; v1 segments decode as shard=0, region=0 (zero-padding was always there).
ShardRouter::route(entity_id) with N=1 always returns ShardId(0) (single-node default).
ReplicationState::advance_hwm(shard, seqno) is monotonic via compare_exchange.

Depends On: M7 (hardened WAL/Signal ledger) Complexity: L Task Files: docs/planning/milestone-8/phase-1/ Research Reference: docs/research/tidaldb_wal.md, docs/research/tidaldb_signal_ledger.md

Phase 2: WAL Shipping and Follower Replay (m8p2) -- COMPLETE

Delivers: Transport trait, InProcessTransport (for tests), WalShipper background task, SegmentReceiver with BLAKE3 validation and idempotent replay, FollowerDb (read-only mode with TidalError::ReadOnly), ReplicationLagGauge, and an 8-test integration suite (m8p2_replication.rs).

Acceptance Criteria:

WalShipper ships sealed segments to followers in parallel; lagging follower catches up within 2s on in-process transport.
SegmentReceiver validates BLAKE3 checksum; returns TidalError::CorruptedWal on mismatch.
Followers reject all write methods with TidalError::ReadOnly.
ReplicationLagGauge::lag_seqno(shard) = leader_hwm - follower_applied; reaches 0 after convergence.
m8p2_replication.rs 8 tests pass.

Depends On: Phase 1 Complexity: XL Task Files: docs/planning/milestone-8/phase-2/

Phase 3: CRDT Counters and Deterministic Reconciliation (m8p3) -- COMPLETE

Delivers: HlcTimestamp and HLC (Hybrid Logical Clock), PNCounter (per-node P/N vectors), LWWRegister<T> (HLC-timestamped, used for hard negatives), CrdtSignalState (per-node decay accumulators that sum on merge), ReconciliationEngine (plan() + apply() idempotent), and property tests (m8p3_crdt.rs).

Acceptance Criteria:

PNCounter::merge is commutative, associative, and idempotent (10K proptest cases each).
CrdtSignalState::decay_score = sum of per-node contributions; no double-counting after merge of disjoint node histories (key-aligned HashMap lookup, not zip).
LWWRegister::merge resolves concurrent writes by (wall_ns, logical, node_id) ordering.
ReconciliationEngine::plan(local, remote).apply() produces identical state to single-node replay of all events (verified to 6 decimal places).
m8p3_crdt.rs 13 property tests pass.

Depends On: Phase 1 (ShardId as node identifier) Complexity: L Task Files: docs/planning/milestone-8/phase-3/

Phase 4: Session Continuity and Agent Memory Across Regions (m8p4)

Delivers: SessionSeqNo(u64) monotonic per-session write counter, IdempotencyKey(u128) BLAKE3-derived per-operation key, IdempotencyStore (bounded LRU 100K), SessionReplicationBridge (ships session journal entries via Transport), hard-negative union-semantics during convergence (hide always wins during partition), and cross-region session tests (m8p4_session.rs).

Acceptance Criteria: ✅ COMPLETE

Session started in region A is visible in region B within 2s (in-process transport).
Duplicate session events (same idempotency key) produce exactly one state change.
Hard negatives: hide(t=100) + unhide(t=50) → item stays hidden on both regions after replication.
m8p4_session.rs 8 tests pass.

Depends On: Phase 2 (WAL shipping), Phase 3 (LWWRegister, HLC) Complexity: L Task Files: docs/planning/milestone-8/phase-4/

Phase 5: Control Plane, Multi-Tenancy, and Routing (m8p5) ✅ COMPLETE

Delivers: TenantId(u64) + TenantConfig (quotas + residency policy), TenantRateLimiter (token bucket), TenantRouter (Jump Consistent Hash with residency constraint), ControlPlane (embedded leader-local cluster health), TenantMigration (dual-write zero-downtime migration state machine), RollingUpgradeCoordinator (drain+rejoin), and multi-tenancy tests (m8p5_multitenancy.rs).

Acceptance Criteria: ✅ COMPLETE

TidalError::QuotaExceeded returned within 1ms when token bucket empty.
Tenant migration: all signals present on target after migration; source has 0 after GC; zero downtime during dual-write.
Rolling upgrade: signals written during drain window present on rejoined node.
WAL directory for TenantId(42) is {data_dir}/tenants/42/wal/.
m8p5_multitenancy.rs 5 tests pass.

Depends On: Phase 2 (WAL shipping), Phase 3 (reconciliation), Phase 4 (session continuity) Complexity: L Task Files: docs/planning/milestone-8/phase-5/

Phase 6: End-to-End UAT (m8p6) ✅ COMPLETE

Delivers: SimulatedCluster test harness (signal-replay, N regions), NetworkPartition + ShardCrash RAII fault injection, m8_uat.rs (5 UAT scenario tests + 3 perf assertions). 1199 lib tests, 8 m8_uat tests, all pass in 0.11s.

Acceptance Criteria: ✅ COMPLETE

UAT Step 1: Cross-region replication < 2s; decay scores match to 6 decimal places.
UAT Step 2: Failover within 10s; no data loss on promoted follower.
UAT Step 3: Degraded query succeeds with 2/3 regions available; partitioned region lags visibly.
UAT Step 4: Post-reconciliation: no duplicate counts; hard negatives propagated; CRDT merge correct.
UAT Step 5: Tenant migration zero downtime; full state machine traversal.
cargo test --test m8_uat passes in < 60 seconds (actual: 0.11s).

Depends On: Phases 1–5 Complexity: M Task Files: docs/planning/milestone-8/phase-6/

✅ M8 COMPLETE

cargo test --test m8_uat passes all 5 UAT scenario steps. Signal replication, failover, partition/heal, CRDT reconciliation, and tenant migration all verified. 1199 lib tests + all M8 integration suites green. Embeddable users have the full distributed fabric primitive set available without any API changes to signal/retrieve paths.

Milestone Thesis

Users keep a local embeddable personalization profile as source of truth, opt into one or more community personalization overlays, and can leave those overlays safely. Community contributions are scope-aware, auditable, and removable (both stop-forward and retroactive purge) without corrupting local personalization state.

UAT Scenario

Given:
  - User U has a local embeddable profile with 90 days of signals
  - Community C has opt-in policy requiring explicit share scope
  - U has one agent (A) writing session signals locally

When:
  1. U joins C with sharing mode `community_share:enabled`
  2. U allows only selected signal intents (`not_for_me`, `save`, `low_quality`) to sync
  3. Community feed query blends local + community layers for U
  4. U leaves C with `stop_forward` (no new contributions)
  5. U requests `purge_prior_contributions` from C
  6. C rematerializes affected aggregates and U re-queries feed

Then:
  - U's local profile remains intact throughout
  - No new signals from U enter C after `stop_forward`
  - Purged contributions no longer affect C's ranking outputs
  - Hard negatives from U do not leak back in after replay/failover
  - Audit log shows join, share scope, leave, purge, rematerialize checkpoints

Phases

Delivers: explicit signal scope model (local, community, session, agent) and share policy metadata attached to WAL events. Community replication only ships share-eligible events.

Acceptance Criteria:

WAL event envelope carries scope, origin, share_policy_version, and idempotency key.
Default behavior is local-only; sharing is explicit opt-in.
Per-intent share filters supported (skip_for_now, not_for_me, low_quality, hide/mute/block, etc.).
tidalctl can inspect scope distribution and share eligibility.

Depends On: M8
Complexity: L

Phase 2: Membership Lifecycle and Stop-Forward Semantics

Delivers: join/leave lifecycle for community overlays with causal checkpoints and stop-forward guarantees.

Acceptance Criteria:

Membership states: joined, leaving_stop_forward, left, rejoined.
leave(stop_forward) blocks new community contributions in < 1s p99.
Rejoin creates a new membership epoch (no ambiguous replay across epochs).
Queries expose active membership epoch for debugging and explainability.

Depends On: Phase 1
Complexity: L

Phase 3: Retroactive Purge and Deterministic Rematerialization

Delivers: remove prior user contributions from community state and rebuild affected aggregates deterministically.

Acceptance Criteria:

purge_prior_contributions(user_id, community_id, epoch_range) API implemented.
Purge writes tombstones and triggers deterministic rematerialization job.
Rebuilt aggregates are identical across repeated replay of same purge operation.
Community queries include purge watermark metadata for auditability.

Depends On: Phase 2
Complexity: XL

Done When

Users can opt into community personalization, leave safely, and purge prior contributions without damaging local personalization or producing inconsistent community rankings.

Milestone 10: Governance & Agent Rights -- "Who can influence ranking, and how"

Milestone Thesis

Communities and users can govern personalization influence through policy: which signal intents count, what trust thresholds apply, and what agents are allowed to read/write. Agent-contributed signals are fully attributable and revocable by scope.

UAT Scenario

Given:
  - Community C defines ranking governance policy
  - User U has two agents: A_trusted and A_experimental
  - A_experimental is denied community write scope

When:
  1. A_trusted writes allowed community-scoped signals for U
  2. A_experimental attempts the same and is rejected by policy
  3. C changes policy to downweight `skip_for_now`, upweight `low_quality`
  4. U revokes A_trusted community scope and removes A_trusted prior contributions from C
  5. U queries local-only, local+community, and community-only views

Then:
  - Policy enforcement is deterministic and auditable
  - Disallowed agent writes never affect community ranking
  - Policy changes are versioned and explainable in result metadata
  - Agent revocation removes future influence immediately
  - Optional retroactive removal of agent contributions completes within SLA

Phases

Phase 1: Community Governance Policy Engine

Delivers: versioned community policy definitions governing signal eligibility, weighting bounds, and trust/quality thresholds.

Acceptance Criteria:

Policy schema includes allowed intents, excluded intents, and weighting constraints.
Policy changes are versioned and applied with effective timestamps.
Query results can return governing policy version for explanation.
Out-of-policy signals are rejected or quarantined by rule.

Depends On: M9
Complexity: L

Phase 2: Agent Capability and Scope Controls

Delivers: per-agent capabilities for read/write by scope (local, community, session) with hard enforcement in write/read paths.

Acceptance Criteria:

Agent capability tokens include scope permissions and TTL.
Reads/writes outside granted scope return policy errors and audit events.
Revocation invalidates capabilities immediately (< 1s p99).
tidalctl can inspect agent capabilities and revocation history.

Depends On: Phase 1
Complexity: L

Phase 3: Provenance, Explainability, and Remove-by-Scope

Delivers: provenance graph for ranking influence and APIs to remove contributions by scope (agent, community, session, local).

Acceptance Criteria:

Every ranking-affecting signal has provenance metadata (writer, scope, policy_version, membership_epoch).
remove_from_personalization(scope=...) API supports precise, non-global deletion.
Explainability endpoint can attribute top-ranked items to policy-allowed signals.
Replay/failover preserves remove-by-scope outcomes deterministically.

Depends On: Phase 2
Complexity: XL

Done When

Communities and users can control ranking influence with explicit governance and agent rights, while retaining user-owned, revocable personalization semantics end-to-end.

Use Case Coverage Progression

UC	Description	M1	M2	M3	M4	M5	M6	M7
UC-01	For You Feed	-	-	Full	Full	Full	Full	Full
UC-02	Search	-	-	-	-	Core	Full	Full
UC-03	Trending/Rising	Signals	Full	Full	Full	Full	Full	Full
UC-04	Following Feed	-	Partial	Full	Full	Full	Full	Full
UC-05	Related/Up Next	-	-	Core	Core	Core	Full	Full
UC-06	Browse/Category	Signals	Core	Core	Core	Core	Full	Full
UC-07	Notifications	-	-	Core	Core	Core	Full	Full
UC-08	Creator Profile	-	Core	Core	Core	Core	Full	Full
UC-09	User Library	-	-	Partial	Partial	Partial	Full	Full
UC-10	People Search	-	-	-	-	Core	Full	Full
UC-11	Visual/Semantic	-	-	-	-	Partial	Full	Full
UC-12	Live Content	-	-	-	-	-	Full	Full
UC-13	Hidden Gems	-	Full	Full	Full	Full	Full	Full
UC-14	Controversial/Hot	Signals	Full	Full	Full	Full	Full	Full

Legend:

- = Not addressed
Signals = Signal primitives exist but no query surface
Partial = Some functionality, not all modes
Core = Primary query path works, some modes/filters missing
Full = All modes, filters, and feedback loops per USE_CASES.md specification

M8-M10 focus on deployment topology, community sync semantics, and governance controls; they leave UC coverage unchanged while making the existing feature surface globally portable, revocable, and policy-safe.

Dependency DAG

m1p1 (Types/Schema) ✓
  |
  +---> m1p2 (WAL) ✓
  |       |
  +---> m1p3 (Storage/fjall) ✓ ---+
  |       |                        |
  |       +---> m1p4 (Signal Ledger) ✓
  |               |
  |               +---> m1p5 (Entity + Signal API) ✓  = M1 COMPLETE ✓
  |               |
  |               +---> m2p3 (Ranking Profiles) ✓
  |                       |
  +---> m2p1 (USearch) ✓ -+
  |                        |
  +---> m2p2 (Filters) ✓ -+---> m2p4 (Diversity) ✓
                           |       |
                           +-------+---> m2p5 (RETRIEVE Query) ✓ = M2 COMPLETE ✓
                           |
                           +---> m3p1 (Users/Creators/Relationships) ✓
                           |       |
                           |       +---> m3p2 (Feedback Loop) ✓
                           |               |
                           |               +---> m3p3 (Personalized Profiles) ✓
                           |                       |
                           |                       +---> m3p4 (User State Filters + UAT) ✓
                           |
                           |       m3p4 = M3 COMPLETE ✓
                           |
                           +---> m4 (Agent Session Layer) ✓  = M4 COMPLETE ✓
                                   |
                                   +---> m5p1 (Tantivy) ✓
                                           |
                                           +---> m5p2 (RRF Fusion) ✓
                                           |       |
                                           |       +---> m5p3 (SEARCH Query) ✓
                                           |
                                           +---> m5p4 (Creator Search) ✓

                                           m5p3 + m5p4 = M5 COMPLETE ✓

                                           M6 COMPLETE ✓ (6 phases: cohort, social, sorts, collections, scope, notifications)
                                           M7 COMPLETE ✓ (crash recovery, degradation, scale, observability, UAT + enterprise readiness)

                                           M8 IN PROGRESS (Distributed Fabric):
                                             m8p1 (Shard-Aware Foundations) ✓
                                               |
                                               +---> m8p2 (WAL Shipping + Follower Replay) ✓
                                               |       |
                                               +---> m8p3 (CRDT Reconciliation) ✓
                                                       |
                                                       +---> m8p4 (Session Continuity)  ← NEXT
                                                       |       |
                                                       +-------+---> m8p5 (Control Plane + Multi-Tenancy)
                                                                       |
                                                                       +---> m8p6 (End-to-End UAT)

                                           M9 phases depend on M8
                                           M10 phases depend on M9

Parallelization opportunities:

m1p2 (WAL) and m1p3 (Storage) are parallel after m1p1 (both now complete: m1p3 was completed first, m1p2 followed)
m2p1 (USearch) and m2p2 (Filters) can be built in parallel after m1p3
m3p1 (Entities) and m5p1 (Tantivy) can start in parallel with later M2 phases (M4 Agent Memory sits between M3 and M5)
m3p2 Tasks 01 (User Preference Vector) and 03 (Hard Negatives) can be built in parallel within m3p2
m4p2 (RRF) and m4p4 (Creator Search) can be built in parallel
m8p2 (WAL Shipping) and m8p3 (CRDT Reconciliation) can be built in parallel after m8p1 (both complete)
m8p4 (Session Continuity) tasks 01 and 02 are parallelizable within the phase
m8p5 (Multi-Tenancy) tasks 01 and 02 are parallelizable within the phase

Architectural Decisions Locked In

These decisions are made. They are not revisited unless benchmarks prove them wrong.

Decision	Chosen	Alternative	Rationale
Storage engine	fjall (pure Rust)	RocksDB	Pure Rust, `#![forbid(unsafe_code)]`, fast compile, trait-abstracted for swap
Vector index	USearch (C++ FFI)	hnsw_rs	10-100x QPS, predicate callbacks, mmap, f16 quantization
Text search	Tantivy (embedded)	Custom BM25	40K lines of battle-tested code; Collector/Scorer API provides exact hooks needed
Decay formula	Running S(t)=S(prev)exp(-lambdadt)+w	Raw event scan	O(1) vs O(N), proven exact, 20-60x faster at 50+ events/entity
Windowed aggregation	Bucketed counters (Scotty pattern)	SWAG two-stacks	Simpler, serves multiple window sizes from one set of buckets
Hybrid fusion	RRF (k=60)	Tuned linear combination	Zero-config, robust; linear combo is the upgrade path with relevance labels
Consistency model	DB-primary, Tantivy as derived index	Two-phase commit	Simpler, deterministic recovery, source of truth is always the entity store
WAL checksums	BLAKE3	CRC32C	Content-addressing enables deduplication; BLAKE3 is fast enough
Key encoding	Subject-prefix `[entity_id][0x00][TAG:suffix]`	Separate key namespaces	Co-locates entity data, natural shard boundary, single prefix scan
Embedding format	f16 quantization (default)	float32	Half memory, < 1% recall loss at 1536D
Query language	Custom (RETRIEVE/SEARCH/SIGNAL)	SQL	Domain semantics cannot be expressed in SQL without losing optimization opportunities
Replication model	Primary-backup WAL shipping	Raft consensus	No distributed consensus needed; signal CRDTs handle conflict-free merge
Signal CRDTs	PNCounter (per-node P/N vectors) + CrdtSignalState	Per-event dedup (BLAKE3)	O(nodes) memory vs O(events); commutative/associative/idempotent merge
Hard negative CRDTs	LWWRegister with HLC timestamps	G-Set (union only)	LWW allows unhide; HLC provides causal ordering even with clock skew
Causal ordering	HLC (Hybrid Logical Clock)	NTP / Lamport clocks	Tolerates wall-clock skew; causal ordering within bounded drift (Kulkarni et al. 2014)

What This Roadmap Does NOT Cover

These are explicitly out of scope for the foreseeable future:

Embedding generation -- tidalDB retrieves and ranks over vectors. It does not generate them. Bring your own model.
Generic horizontal distribution -- M8-M10 deliver the tidalDB-specific fabric (WAL shipping, shard routing, community sync/revocation, governance). We are still not building a general-purpose distributed SQL store or OLTP replica mesh.
ACID transactions across entities -- Signal writes are atomic within an entity's state. Cross-entity transactions are not needed for the ranking problem.
SQL compatibility -- The custom query language exists because SQL cannot express ranking semantics. No SQL layer.
Per-request hard multitenancy inside a single shard -- M8-M10 introduce tenant-aware namespaces, quotas, and governance controls for hosted deployments, but strong regulatory isolation (HIPAA, PCI) still requires separate deployments per tenant.
Content moderation, authentication, payments, CDN -- tidalDB solves one problem: ranking. Everything else is someone else's job.

212 KiB Raw Blame History Unescape Escape

TidalDB Roadmap

Vision Statement

Thesis

Differentiation vs Vespa and search platforms

Milestone Summary

Embeddable → Distributed Path

Product Milestone Summary (New)

Current Status

Product Track: Personal Briefing Feed (Knowledge Workers + Consumers)

P0: Beachhead Validation -- "Do users care enough to return?"

P1: Concierge Alpha -- "High-value daily brief for a narrow cohort"

P2: Productized Beta -- "Self-serve and repeatable without handholding"

P3: Public Launch -- "Trusted at real volume"

P4: Scale + Revenue Fit -- "Sustainable business without degrading quality"

PG1: Personalization Core Done (Blocking Gate)

Milestone 0: Embeddable Runtime -- "Runs in your process in minutes"

Milestone Thesis

Phases

Phase 1: Embeddable Runtime Skeleton

Phase 2: Tooling & Diagnostics

Phase 3: Samples & Docs

UAT Scenario

Milestone 1: Signal Engine -- "Signals are a database primitive"

Milestone Thesis

UAT Scenario

Phases

Phase 1: Core Type System and Schema

Phase 2: Write-Ahead Log

Phase 3: Storage Engine Trait and fjall Backend

Phase 4: Signal Ledger -- Decay Scores and Windowed Aggregation

Phase 5: Entity CRUD and Signal Write API

Deferred to Later Milestones

Integration Test

Done When

Milestone 2: Ranked Retrieval -- "A single query retrieves, scores, and ranks content"

Milestone Thesis

UAT Scenario

Phases

Phase 1: Vector Index Integration (USearch)

Phase 2: Metadata Indexes and Filter Engine

Phase 3: Ranking Profile Engine

Phase 4: Diversity Enforcement

Phase 5: Query Parser and RETRIEVE Executor

Deferred to Later Milestones

Integration Test

Done When

Milestone 3: Personalized Ranking -- "The For You query works"

Milestone Thesis

Enables

UAT Scenario

Phases

Phase 1: User and Creator Entities with Relationships (m3p1)

Phase 2: Feedback Loop -- Signal Writes Update User State (m3p2)

Phase 3: Personalized Ranking Profiles (m3p3)

Phase 4: User State Filters + M3 UAT Integration Test (m3p4)

Phase Dependency DAG

Deferred to Later Milestones

Integration Test

Done When

Milestone 4: Agent Memory -- "Agents own the personalization substrate"

Milestone Thesis

Enables

UAT Scenario

Phases

Phase 1: Session Schema and Lifecycle (m4p1)

Phase 2: Session Signal Engine (m4p2)

Phase 3: Policy Enforcement and Audit (m4p3)

Phase 4: Session-Aware Ranking and M4 UAT (m4p4)

Phase Dependency DAG

Deferred to Later Milestones

Integration Test

Done When

Milestone 5: Hybrid Search -- "Text + semantic + signals in one query"

Milestone Thesis

Enables

UAT Scenario

Phases

Phase 1: Tantivy Integration (m5p1)

Phase 2: Hybrid Fusion (RRF) (m5p2)

212 KiB

Raw Blame History