# TidalDB Roadmap

## Vision Statement

When tidalDB is complete, an engineering team building any content platform -- a media library, a social feed, a marketplace, a discovery surface, or an agentic UX -- can embed a single Rust database and replace the Elasticsearch + Redis + Kafka + feature store + vector database + ranking service stack. One process, one query interface, one operational model. The query `RETRIEVE items FOR USER @user_id USING PROFILE for_you FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50` executes in under 50ms, reflects signals written 100ms ago, enforces diversity without application logic, handles cold-start items without application intervention, and returns results a user would describe as "it knows what I want."

The same runtime doubles as the personalization **memory substrate for agents**: user → agent → tidalDB. Agents ground themselves by reading live session context, write structured signals (preferences, critiques, tool usage) with decay budgets, and immediately query those updates on the next turn. The embeddable runtime is step zero; the exact same WAL + subject-prefix key architecture grows into a multi-region, eventually-consistent fabric so agent memory travels with the user across devices and datacenters without losing correctness.

The long-term model is user-owned personalization across three scopes: global profile, opt-in community overlays, and agent/session context. Users can grant and revoke access per scope, and remove scoped contributions from future ranking without destroying local history.

## Thesis

A single embeddable database can replace the 6-system content ranking stack by treating signals, ranking profiles, session policy, and diversity constraints as database primitives rather than application logic. Every agent or product surface gets an always-fresh memory lane without standing up Vespa-scale search clusters or bespoke feature stores.

## Differentiation vs Vespa and search platforms

1. **Agent-owned memory lanes.** Signals, session context, and reward metadata are schema-level objects. Agents can create scoped sessions, write feedback with decay guarantees, and read it back with zero glue code. Vespa is optimized for serving queries; it assumes you run feature updates elsewhere.
2. **Embeddable-first ergonomics.** `cargo add tidaldb` gives you the full signal + ranking stack with WAL durability and diagnostics in-process. Vespa demands a cluster, config servers, and feed pipelines before you can prototype.
3. **Temporal math on the write path.** Decay, windowing, velocity, and diversity guards are computed atomically when signals arrive. There is no notion of "update documents later" or external CRON math.
4. **Session- and policy-aware query language.** `RETRIEVE ... FOR USER ... FOR SESSION ... USING PROFILE ...` encodes permissions, diversity and cohort constraints; agent policies live in schema, not middleware.
5. **Roadmapped scale path.** The same WAL segments, subject-prefix keys, and checkpoint formats we ship for the embeddable runtime become the replication log and deterministic conflict-resolution substrate for the distributed fabric (see M8). Vespa already starts distributed; tidalDB grows there without sacrificing the zero-config DX.

---

## Milestone Summary

| #   | Name                  | Proves                                                                       | Enables                                                           |
| --- | --------------------- | ---------------------------------------------------------------------------- | ----------------------------------------------------------------- |
| M0  | Embeddable Runtime    | tidalDB can run in-process with zero-config defaults and tooling             | Cuts proof-of-concept friction, enables internal dogfooding       |
| M1  | Signal Engine         | Signals are a database primitive with O(1) decay, not application math       | UC-03 (partial), UC-06 (partial), UC-14 (partial)                 |
| M2  | Ranked Retrieval      | A single query retrieves, scores, and ranks content using live signals       | UC-03, UC-04, UC-06, UC-08, UC-13, UC-14                          |
| M3  | Personalized Ranking  | User context shapes retrieval and ranking -- the "For You" query works       | UC-01, UC-05, UC-07, UC-09 (partial)                              |
| M4  | Agent Memory          | Agents can create sessions, write signals, and enforce policy inside tidalDB | Agent-mediated personalization, RLHF loops, conversational memory |
| M5  | Hybrid Search         | Text + semantic + signal-ranked search in one query                          | UC-02, UC-10, UC-11                                               |
| M6  | Full Surface Coverage | Every use case, every sort mode, every filter, every feedback loop           | UC-01 through UC-14 complete                                      |
| M7  | Production Hardening  | Crash safety, graceful degradation, operational readiness                    | All UCs at production quality                                     |
| M8  | Distributed Fabric    | Multi-region, multi-tenant replication keeps agent-memory semantics intact   | Hosted tidalDB, cloud/edge deployments, shared agent substrate    |
| M9  | Community Sync & Revocation | Local embeddable profiles can opt into community personalization and safely leave/purge contributions | Community personalization, federated taste graphs, shared feeds |
| M10 | Governance & Agent Rights | Community rules and agent-scoped permissions control what signals influence ranking | User-owned AI personalization at scale, policy-compliant agents |

### Embeddable → Distributed Path

1. **M0–M2 (Embed & prove primitives):** Establish the deterministic builder, WAL, key encoding, and checkpoint semantics that make on-device instances safe to embed. Research refs: `docs/research/tidaldb_wal.md`, `docs/research/tidaldb_signal_ledger.md`.
2. **M3–M4 (Session + agent policy):** Layer user/creator entities, sessions, and policy enforcement so agents can write/read scoped memory lanes without glue. This also defines the logical replication unit: entity + session keyspaces.
3. **M5–M6 (Surface completeness):** Ship hybrid search and every retrieval mode so a single tidalDB node can back any personalization surface or agent prompt grounding workload.
4. **M7 (Operational envelope):** Hardening (crash fencing, throttling, observability) creates the guarantees the fabric will rely on when shipping WAL segments across machines.
5. **M8 (Distributed Fabric):** Introduce shard-aware keyspaces, WAL shipping + deterministic reconciliation, and multi-region eventual-consistency policies so embeddable instances graduate to hosted, global deployments without rewriting application code.
6. **M9 (Community Sync & Revocation):** Add opt-in sharing from local profiles to community layers, plus leave/removal semantics (stop-forward + retroactive purge) with deterministic re-materialization.
7. **M10 (Governance & Agent Rights):** Add community policy engines and agent capability boundaries so users and communities can control exactly which signals affect ranking and revoke them safely.

### Product Milestone Summary (New)

The roadmap now has two tracks:

- **Engine Track (M0-M7):** proves tidalDB capabilities.
- **Product Track (P0-P4):** proves end-user value for the beachhead product.

| #   | Name                                      | Proves                                                                                          | Depends On                         |
| --- | ----------------------------------------- | ----------------------------------------------------------------------------------------------- | ---------------------------------- |
| P0  | Beachhead Validation                      | Knowledge workers and consumers care about a personal briefing feed enough to use it repeatedly | M0 (embedding/runtime), partial M1 |
| P1  | Concierge Alpha                           | Daily "Today Brief" with explicit feedback controls creates Day-2 retention in a small cohort   | M1 complete, partial M2            |
| PG1 | Personalization Core Done (Blocking Gate) | Personalization loop is correct, immediate, and measurably better than baseline                 | P1 + M1/M2/M3 core slices          |
| P2  | Productized Beta                          | Self-serve onboarding + real-time adaptation + explanation UX works without manual curation     | M2 complete, partial M3            |
| P3  | Public Launch                             | The product is reliable, useful, and trusted at real user volume                                | M3 + M5 core, M6 partial           |
| P4  | Scale + Revenue Fit                       | Sustainable retention and monetization without quality collapse                                 | M6 + M7                            |

---

## Current Status

| Phase                                            | Status      | Tests                                                                  |
| ------------------------------------------------ | ----------- | ---------------------------------------------------------------------- |
| **m0p1: Embeddable Runtime Skeleton**            | COMPLETE    | 329 passing (293 unit + 36 integration + 3 doc)                        |
| **m0p2: Tooling & Diagnostics**                  | COMPLETE    | 349 passing (+7 metrics unit + 7 metrics integration + 9 tidalctl CLI) |
| **m0p3: Samples & Docs**                         | COMPLETE    | 11 doc tests (14 with features); 4 examples compile and run            |
| **m1p1: Core Type System and Schema**            | COMPLETE    | 77 passing                                                             |
| **m1p2: Write-Ahead Log**                        | COMPLETE    | passing (unit + integration)                                           |
| **m1p3: Storage Engine Trait and fjall Backend** | COMPLETE    | 140 passing (128 unit + 12 integration)                                |
| **m1p4: Signal Ledger**                          | COMPLETE    | 300 passing                                                            |
| **m1p5: Entity CRUD and Signal Write API**       | COMPLETE    | 305 passing (300 unit + 5 integration)                                 |
| **m2p1: Vector Index Integration (USearch)**     | COMPLETE    | passing                                                                |
| **m2p2: Metadata Indexes and Filter Engine**     | COMPLETE    | passing                                                                |
| **m2p3: Ranking Profile Engine**                 | COMPLETE    | passing                                                                |
| **m2p4: Diversity Enforcement**                  | COMPLETE    | passing                                                                |
| **m2p5: Query Parser and RETRIEVE Executor**     | COMPLETE    | passing                                                                |
| **m3p1: User and Creator Entities with Relationships** | COMPLETE | passing                                                          |
| **m3p2: Feedback Loop -- Signal Writes Update User State** | COMPLETE | passing                                                      |
| **m3p3: Personalized Ranking Profiles**          | COMPLETE    | passing                                                                |
| **m3p4: User State Filters + M3 UAT**            | COMPLETE    | 571 lib + 11 m3_uat + 6 m2_uat + 5 signal_api + 8 vector_usearch passing |
| **m4: Agent Session Layer**                      | COMPLETE    | 607 lib + 12 m4_uat + 11 m3_uat + 7 m2_uat + 5 signal_api + 8 vector_usearch + 12 storage passing |
| **m5p1: Tantivy Integration**                    | COMPLETE    | 650 lib + 3 text_index integration = 653 passing; BM25 @ 10K docs = 0.26ms |
| **m5p2: Hybrid Fusion (RRF)**                    | COMPLETE    | 665 lib passing; RRF fusion @ 1K candidates = 46µs                     |
| **m5p3: SEARCH Query Executor**                  | COMPLETE    | 681 lib + 12 m5_search integration = 693 passing                       |
| **m5p4: Creator and People Search**              | COMPLETE    | 705 lib + 9 m5_uat + 6 m5p4_creator_search + 12 m5_search = 732 passing |
| **m6p1: Cohort Engine + Cohort-Scoped Trending**    | COMPLETE    | 748 total (739 lib + 9 m6_cohort) |
| **m6p2: Social Graph + Collaborative Filtering**     | COMPLETE    | 812 lib + 8 m6_social integration |
| **m6p3: Full Sort Modes + Live Content**             | COMPLETE    | 777 lib + 15 m6p3_sorts integration |
| **m6p4: Collections + Watch History + Saved Searches** | COMPLETE  | 971 total (794 lib + 10 m6p4_collections) |
| **m6p5: Query Composition + SUGGEST Autocomplete**   | COMPLETE    | 1,084 total (830 lib + 11 m6p5_scope) |
| **m6p6: Notification Capping + Adaptive Preferences + M6 UAT** | COMPLETE | 1,082 total (835 lib + 247 integration); 9 m6_uat passing |
| **forage-p0: Demo Application + Behavioral Loop (Close the Loop)** | COMPLETE | Server + seed corpus + MAB + feed page; dwell→completion→prefs→feed shift; 8 UAT scenarios passing; 911 lib + all prior UATs clean |
| **forage-p1: Real Signal Surface (add_item + /capture endpoint)** | COMPLETE | `add_item()` (FNV-1a URL-hashing, idempotent), `/capture` HTTP endpoint, discovered items injected into feed pool (capped VecDeque, 1000 entries); code infrastructure complete |
| **forage-p2: Real Embeddings (Semantic Preference Model)** | COMPLETE | `forage-embedder` sidecar (OpenAI text-embedding-3-small + deterministic mock mode); `ForageEngineBuilder::with_embedder(url)`; 1536-dim schema; `semantic_boost: 0.3` blended scoring; `similar_to_saved: true` pool augmentation; `semantic_search(text, limit)` + `similar_to(item_id, limit)` public methods; `read_item_embedding()` added to tidalDB; 12 smoke tests passing; 937 tidalDB lib tests clean |
| **forage-p3: Adaptive MAB (Per-User Exploration Tuning)** | COMPLETE | `ExplorationStats` (hits/total/category_signals); `adaptive_ratio()` (0.10/0.14/0.25); UCB1 bonus within exploration slot; `exploration_stats(user_id)` public API; `track_signal_stats()` wired to `signal()` + `signal_dwell()`; `last_explore_items` for outcome detection; exploration stats persisted to `exploration_stats.json` (atomic write; loaded on `open()`); 17 smoke tests passing; 960 tidalDB lib clean |
| **forage-p4: The Surprise Moment (Bridge Item)** | COMPLETE | `ItemLabel::Bridge { cat_a, cat_b }` variant; `make_bridge_item()` computes normalized midpoint of top-2 preference dims, queries ANN, injects 1 bridge item per feed (replaces last non-Exploring slot); cold users receive no bridge; feed page renders `bridge: {cat_a} × {cat_b}` badge (teal); 20 smoke tests passing |
| **m7p1: Crash Recovery Hardening** | COMPLETE | 900 lib + 8 m7_crash_property + 10 m7_crash_m6 + 5 m7_crash_invariant (100 cases); recovery bench passing; WAL compaction, BLAKE3 checkpoint integrity, hard-negative crash invariant |
| **m7p2: Graceful Degradation, Rate Limiting, and Session Cleanup** | COMPLETE | 896 lib + 12 m7p2_load; 1,191 total (--features test-utils); 4-stage degradation, per-agent token-bucket rate limiter, session TTL sweeper |
| **m7p3: Performance at Scale** | COMPLETE | 900 lib + all integration; 1,201 total; scale bench (1M items), USearch ef=400, LogMergePolicy, signal trimmer (5M entry cap), social scale tests |
| **m7p4: Operational Visibility** | COMPLETE | 946 lib + 28 m7p4_visibility (--features test-utils); QueryStats, WAL/signal/index Prometheus metrics, tidalctl diagnostics, RLHF export, cross-session aggregation |
| **Enterprise Readiness + M7 UAT** | COMPLETE | 960 lib + ~155 integration passing; all P0/P1 gaps resolved; m7_uat.rs passing (crash recovery, degradation, rate limiting, observability, regression gate) |

**Next:** M8 Distributed Fabric (multi-region WAL shipping, shard routing, deterministic reconciliation). M7 Production Hardening + Enterprise Readiness complete. Engine track through M7 done.

---

## Product Track: Personal Briefing Feed (Knowledge Workers + Consumers)

This track defines the milestones for the **actual product experience** (not only the database engine).  
Use case reference: `docs/personal-briefing-beachhead.md`.
Dedicated roadmap: `docs/planning/PRODUCT_ROADMAP.md`.

### P0: Beachhead Validation -- "Do users care enough to return?"

**Milestone Thesis**

Validate that a personal briefing feed solves a painful daily job for users and drives repeat use.

**Acceptance Criteria**

- [ ] Recruit 20-50 target users (knowledge workers + high-intent consumers).
- [ ] Run daily briefing prototype (can include manual source QA).
- [ ] At least one meaningful feedback action per session for the median user (`more`, `less`, `hide`, `mute`, `save`).
- [ ] User interviews confirm value vs baseline feeds ("less noise", "more useful", "saves time").
- [ ] D2 retention reaches agreed threshold for target segment.

### P1: Concierge Alpha -- "High-value daily brief for a narrow cohort"

**Milestone Thesis**

Deliver a reliable daily `Today Brief` experience with immediate visible adaptation after user feedback.

**Acceptance Criteria**

- [ ] App surface: ranked brief, reason labels, source links, save/feedback controls.
- [ ] Feedback loop: next refresh reflects `less/hide/mute` actions immediately.
- [ ] Time-budget mode (`5/10/20` min) is available and used.
- [ ] Diversity constraints prevent source/topic domination in top results.
- [ ] Weekly active usage demonstrates repeated utility.

### P2: Productized Beta -- "Self-serve and repeatable without handholding"

**Milestone Thesis**

Turn the alpha into a self-serve product with stable onboarding, trust UX, and measurable quality.

**Acceptance Criteria**

- [ ] Self-serve onboarding completed in under 3 minutes.
- [ ] "Why this" explanations are present and understandable on every briefing card.
- [ ] Cohort layer available ("trending for people like you").
- [ ] Trust controls available (source transparency, mute/hide persistence).
- [ ] D7 retention and "useful item rate" exceed baseline comparison feed.
- [ ] **PG1 Personalization Core Done gate has passed.**

### P3: Public Launch -- "Trusted at real volume"

**Milestone Thesis**

Launch publicly with reliability, quality, and trust guardrails suitable for broad use.

**Acceptance Criteria**

- [ ] Reliability and latency SLOs defined and met for briefing generation.
- [ ] Quality floor enforced (freshness, source quality, duplicate suppression).
- [ ] Notification cadence controls prevent spam.
- [ ] Core support and incident process in place for user-facing regressions.

### P4: Scale + Revenue Fit -- "Sustainable business without degrading quality"

**Milestone Thesis**

Prove the product can grow and monetize while preserving user trust and briefing quality.

**Acceptance Criteria**

- [ ] Monetization model validated (subscription, team plan, or equivalent).
- [ ] Revenue metrics tracked alongside quality metrics (no quality-revenue trade-off regressions).
- [ ] Retention and engagement remain stable as volume increases.
- [ ] Product roadmap for next segment expansion is data-backed.

### PG1: Personalization Core Done (Blocking Gate)

**Milestone Thesis**

Before product breadth expansion, the core personalization loop must be provably correct and immediately responsive.

**Acceptance Criteria**

- [ ] Hard negatives (`hide/mute/block`) never leak after write, restart, or replay.
- [ ] Explicit feedback (`more/less/skip/save`) changes next-refresh ranking within target latency.
- [ ] User personalization state rebuilds deterministically from checkpoint + WAL replay.
- [ ] Useful-item rate and repeated-unwanted-item rate outperform a non-personalized baseline.
- [ ] Diversity guardrails hold while maintaining personalization quality.

---

## Milestone 0: Embeddable Runtime -- "Runs in your process in minutes"

### Milestone Thesis

Before we prove any ranking math, developers must be able to embed tidalDB inside an existing service with zero operational prep. M0 delivers the runtime glue — an ergonomic builder API, deterministic storage layout, a tiny admin CLI, and living examples — so the very first experience is `cargo add tidaldb`, `TidalDb::builder().in_memory().open()`, and a passing smoke test.

### Phases

#### Phase 1: Embeddable Runtime Skeleton

**Delivers:** A cohesive `Config`/`Builder` API for single-process use, with in-memory and filesystem-backed defaults, sandboxed data directories, and graceful shutdown hooks developers can call from tests or application drop handlers.

- Builder exposes `ephemeral()` / `single_process()` shortcuts and eagerly validates directories.
- Shutdown hooks drain WAL writer threads and surface errors.
- Temp-directory helper guarantees deterministic cleanup (used in doctests).

#### Phase 2: Tooling & Diagnostics

**Delivers:** `tidalctl` (a minimal CLI) for inspecting embedded instances, plus a lightweight metrics surface (Prometheus text or JSON) tagged with the same IDs future distributed deployments will use.

- `tidalctl status --path <dir>` returns JSON with WAL seq, config hash, uptime.
- Metrics endpoint optional (disabled by default) exposes `/metrics` and `/healthz`.
- Tooling reuses the same path helpers from Phase 1.

#### Phase 3: Samples & Docs

**Delivers:** Quick-start samples (For You POC + integration tests) compiled as doctests, and reference snippets for embedding tidalDB inside Axum/Actix or a CLI app. Keeps DX in lockstep with the runtime.

- Quickstart example + doctest run under CI (`cargo test --doc --examples`).
- Axum/Actix embedding examples include graceful shutdown + metrics wiring.
- CONTRIBUTING updated with “run samples” checklist.

### UAT Scenario

```
Given:
  // in tests/lib.rs
  let db = TidalDb::builder()
      .ephemeral()
      .with_temp_dir()
      .open()
      .unwrap();

When:
  db.health_check();           // ok
  tidalctl status --path <dir> // prints WAL, storage, signal counts
  cargo test --doc             // quick-start snippet compiles & runs

Then:
  - Builder defaults require zero manual config
  - CLI connects to the same files used by the embedded process
  - Samples stay in sync (failing doctest fails CI)
```

---

## Milestone 1: Signal Engine -- "Signals are a database primitive"

### Milestone Thesis

A developer can open a tidalDB instance, define signal types with decay rates, write engagement events, and read back decay-correct scores and windowed aggregates -- all without computing any temporal math in application code. This proves that the hardest primitive (temporal signals with O(1) decay, velocity, and windowed aggregation) works correctly and meets the performance budget.

### UAT Scenario

```
Given:
  A tidalDB instance is opened with a schema defining:
    - Entity type: Item with metadata fields (title, category, created_at)
    - Signal type: "view" with exponential decay, half_life=7d, windows=[1h, 24h, 7d]
    - Signal type: "like" with exponential decay, half_life=14d, windows=[24h, 7d, all_time]
    - Signal type: "skip" with exponential decay, half_life=1d, windows=[1h, 24h]

When:
  1. Write 100 items with metadata
  2. Write 10,000 signal events across the items (views, likes, skips)
     with timestamps spanning the last 7 days
  3. Read the decay score for item #42, signal "view", at current time
  4. Read the windowed count for item #42, signal "view", window=24h
  5. Read the velocity for item #42, signal "view", window=1h
  6. Write a new "view" event for item #42
  7. Immediately re-read the decay score, windowed count, and velocity
  8. Close and reopen the tidalDB instance
  9. Re-read all values for item #42

Then:
  - Step 3: Decay score matches S(t) = sum(w_i * exp(-lambda * (t - t_i)))
    computed analytically from raw events, to 6 decimal places
  - Step 4: Windowed count equals the exact count of "view" events
    within the last 24h window
  - Step 5: Velocity equals windowed_count / window_duration
  - Step 7: All values reflect the new event immediately
    (decay score increased, count incremented, velocity updated)
  - Step 9: All values match step 7 (crash recovery preserves state)
  - Performance: decay score read < 100ns per entity,
    signal write < 100us including WAL fsync (amortized),
    200-entity scoring pass < 5us
```

### Phases

#### Phase 1: Core Type System and Schema

**Delivers:** The foundational type system -- entity IDs, signal type definitions, decay rate declarations, window specifications, and the error types that every subsequent module depends on. The schema module that validates and stores signal/entity definitions.

**Acceptance Criteria:**

- [x] `EntityId` is a u64 newtype with `Display`, `Hash`, `Eq`, `Ord`, `to_be_bytes()` (big-endian, preserves numeric ordering)
- [x] `EntityKind` enum: `Item`, `User`, `Creator`
- [x] `SignalTypeDef` captures: name, target `EntityKind`, `DecayModel` (exponential with pre-computed lambda / linear / permanent), `WindowSet`, velocity enabled flag
- [x] `DecayModel::Exponential` stores pre-computed `lambda = ln(2) / half_life.as_secs_f64()` -- no division on hot path
- [x] `Window` enum: `OneHour`, `TwentyFourHours`, `SevenDays`, `ThirtyDays`, `AllTime` with `duration()`, `label()`, `duration_secs_f64()`
- [x] `WindowSet` deduplicates and sorts windows; `empty()` for permanent signals
- [x] `LumenError` enum covers Storage, NotFound, Schema, Durability, Query, Internal variants with `From` impls for each sub-error
- [x] `SchemaError` enum validates: duplicate signal names, invalid identifiers, zero half-life/lifetime, empty windows for non-permanent signals, velocity without windows
- [x] Schema validation via `SchemaBuilder` rejects invalid configurations at construction time
- [x] Property tests: lambda correctness across half-life range, byte ordering preservation
- [x] `cargo fmt` clean, `cargo clippy -D warnings` clean, all 77 tests pass

**Depends On:** None
**Complexity:** M
**Research Reference:** `docs/research/tidaldb_signal_ledger.md` (decay formula, EntityState struct)

#### Phase 2: Write-Ahead Log

**Delivers:** A durable, append-only log for signal events. Every signal write is fsync'd before acknowledgment. Group commit amortizes fsync cost. Content-addressed events via BLAKE3 for deduplication. The WAL is the source of truth -- all other state is derived.

**Acceptance Criteria:**

- [x] WAL entries are length-prefixed with BLAKE3 checksums
- [x] Group commit batches up to 100 events or 10ms, whichever comes first
- [x] Duplicate events (same BLAKE3 hash) are silently deduplicated
- [x] WAL replay from any checkpoint produces identical state to uninterrupted execution (property test with 10,000+ random event sequences)
- [x] `fsync` is called per batch, not per event
- [x] WAL can be truncated after a checkpoint without losing committed state
- [x] Crash simulation (kill at random WAL positions) never produces corrupt state -- either the event is committed or it is not

**Depends On:** Phase 1
**Complexity:** L
**Research Reference:** `docs/research/tidaldb_wal.md` (wire format, group commit, crash detection, deduplication), `thoughts.md` Part II.1 (WAL convergence), Part V.5-6 (quarantine-first, group commit)

#### Phase 3: Storage Engine Trait and fjall Backend

**Delivers:** The `StorageEngine` trait abstraction and two implementations: `FjallBackend` (fjall 3 LSM-tree) for production and `InMemoryBackend` (BTreeMap + RwLock) for deterministic testing. Key encoding follows the subject-prefix pattern with a `Tag` discriminant. `FjallStorage` coordinates three keyspaces per entity kind. `FjallAtomicBatch` provides cross-keyspace atomic writes.

**Acceptance Criteria:**

- [x] `StorageEngine` trait with `get`, `put`, `delete`, `scan_prefix`, `write_batch`, `flush` operations
- [x] Key encoding: `[entity_id: 8 bytes BE][0x00][Tag: 1 byte][suffix...]` with `Tag` enum (`Evt`=0x01, `Sig`=0x02, `Meta`=0x03, `Rel`=0x04, `Mv`=0x05, `Idx`=0x06)
- [x] `encode_key`, `parse_key` roundtrip correctly for all tag variants and arbitrary suffixes
- [x] `entity_prefix` (9 bytes) and `entity_tag_prefix` (10 bytes) for scoped prefix scans
- [x] Byte-lexicographic key ordering matches numeric entity ID ordering (property tested)
- [x] `FjallBackend` wraps a single fjall `Keyspace`, implements `StorageEngine`
- [x] `FjallStorage` owns a fjall `Database` with three keyspaces: "items", "users", "creators" (one per `EntityKind`)
- [x] `FjallStorage::backend(EntityKind)` routes to the correct keyspace backend
- [x] Entity kind isolation: same key written to different entity kinds does not collide
- [x] `FjallAtomicBatch` provides cross-keyspace atomic writes via `fjall::OwnedWriteBatch`
- [x] Data persists across close and reopen (`flush_all` + reopen test)
- [x] `InMemoryBackend` uses `BTreeMap` + `RwLock` for deterministic, sorted, concurrent testing
- [x] `WriteBatch` and `BatchOp` types for atomic multi-operation writes
- [x] `PrefixIterator` type alias for boxed prefix scan iterators
- [x] Property tests with proptest: encode/parse roundtrip, prefix ordering, prefix containment
- [x] Criterion benchmarks passing
- [x] `cargo fmt` clean, `cargo clippy -D warnings` clean, all 140 tests pass (128 unit + 12 integration)

**Depends On:** Phase 1
**Complexity:** L
**Research Reference:** `thoughts.md` Part V.9 (hybrid storage), Part V.12 (subject-prefix keys), `CODING_GUIDELINES.md` section 2

#### Phase 4: Signal Ledger -- Decay Scores and Windowed Aggregation

**Delivers:** The in-memory per-entity signal state with running decay scores (O(1) update, O(1) read) and bucketed windowed counters. Signal writes update the running scores atomically. Signal reads return decay-correct values without scanning raw events. State is checkpointed to storage for crash recovery.

**Acceptance Criteria:**

- [x] `EntitySignalState` is `#[repr(C, align(64))]` -- one L1 cache line per hot-path struct
- [x] Running decay formula: `S(t) = S(t_prev) * exp(-lambda * dt) + weight` -- mathematically exact, verified against analytical brute-force computation to 6 decimal places across 10,000 random event sequences (property test)
- [x] Out-of-order events handled correctly: when `t_event < last_update`, weight is pre-decayed: `score += weight * exp(-lambda * (last_update - t_event))`
- [x] Windowed counts use per-minute bucketed counters (BucketedCounter) supporting 1h/24h/7d windows
- [x] Velocity = windowed_count / window_duration_seconds
- [x] Signal write latency < 100 microseconds including WAL write (amortized), benchmarked with criterion
- [x] Decay score read latency < 100ns per entity per lambda, benchmarked with criterion
- [x] 200-entity scoring pass < 5 microseconds, benchmarked with criterion
- [x] State checkpointed to storage every 30 seconds; crash recovery reconstructs from checkpoint + WAL replay
- [x] DashMap or sharded map for concurrent entity state access; signal counters use AtomicU64 with Relaxed ordering

**Depends On:** Phase 2, Phase 3
**Complexity:** XL
**Research Reference:** `docs/research/tidaldb_signal_ledger.md` (running-score formula, SWAG, BucketedCounter, EntityState struct, three-tier architecture)

#### Phase 5: Entity CRUD and Signal Write API

**Delivers:** The public API surface for Milestone 1. `TidalDB::open()`, `TidalDB::shutdown()`, entity write/read, signal write/read. This is the interface the UAT scenario tests against. Includes the `signal()` method that atomically writes to WAL, updates in-memory state, and returns immediately.

**Acceptance Criteria:**

- [x] `TidalDB::open(config)` opens storage, restores in-memory state from checkpoint + WAL replay, returns `Result<TidalDB>`
- [x] `TidalDB::shutdown()` checkpoints all in-memory state, syncs WAL, closes storage cleanly
- [x] `db.write_item(id, metadata)` stores entity metadata
- [x] `db.signal(signal_type, entity_id, weight, timestamp)` atomically: appends to WAL, updates decay scores, updates windowed counters
- [x] `db.read_decay_score(entity_id, signal_type, lambda_index)` returns current decayed score
- [x] `db.read_windowed_count(entity_id, signal_type, window)` returns count within window
- [x] `db.read_velocity(entity_id, signal_type, window)` returns count / window_duration
- [x] Full UAT scenario passes as an integration test
- [x] `TidalDB` is `Send + Sync` -- safe to share across threads behind `Arc`

**Depends On:** Phase 4
**Complexity:** M
**Research Reference:** `CODING_GUIDELINES.md` section 9 (public API surface)

### Deferred to Later Milestones

- **User entities and preference vectors** -- deferred to M3 because M1 proves the signal primitive without needing user context
- **Creator entities and relationship edges** -- deferred to M2/M3 because M1 only needs items to prove signal correctness
- **Vector index (USearch)** -- deferred to M2 because M1 does not need ANN retrieval
- **Text index (Tantivy)** -- deferred to M4 because M1 does not need full-text search
- **Ranking profiles** -- deferred to M2 because M1 proves signals work; M2 proves ranking over signals works
- **Query parser** -- deferred to M2; M1 uses the Rust API directly
- **Diversity enforcement** -- deferred to M2 because M1 does not produce ranked result sets
- **Signal rollups (hourly/daily materialization)** -- deferred to M5 because the bucketed counter approach serves the performance budget through M4; rollups become necessary only at scale for 30d+ windows
- **RocksDB backend** -- deferred indefinitely; fjall is the primary backend, RocksDB is the trait-abstracted fallback if benchmarks demand it

### Integration Test

```rust
#[test]
fn milestone_1_uat() {
    // Open tidalDB with signal schema
    let db = TidalDB::open(Config {
        data_dir: temp_dir(),
        schema: Schema::builder()
            .entity_type("item", &["title", "category", "created_at"])
            .signal("view", Decay::exponential(Duration::days(7)),
                    &[Window::Hours(1), Window::Hours(24), Window::Days(7)])
            .signal("like", Decay::exponential(Duration::days(14)),
                    &[Window::Hours(24), Window::Days(7), Window::AllTime])
            .signal("skip", Decay::exponential(Duration::days(1)),
                    &[Window::Hours(1), Window::Hours(24)])
            .build(),
    }).unwrap();

    // Write 100 items
    for i in 0..100 {
        db.write_item(EntityId(i), metadata(i)).unwrap();
    }

    // Write 10,000 signal events spanning 7 days
    let events = generate_events(10_000, Duration::days(7));
    for e in &events {
        db.signal(e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Read and verify item #42
    let now = Timestamp::now();
    let analytical_score = compute_analytical_decay(&events, EntityId(42), "view", now);
    let actual_score = db.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!((actual_score - analytical_score).abs() < 1e-6);

    let analytical_count = count_events_in_window(&events, EntityId(42), "view", now, Duration::hours(24));
    let actual_count = db.read_windowed_count(EntityId(42), "view", Window::Hours(24)).unwrap();
    assert_eq!(actual_count, analytical_count);

    // Write new event and verify immediate visibility
    db.signal("view", EntityId(42), 1.0, now).unwrap();
    let new_score = db.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!(new_score > actual_score);

    // Close, reopen, verify persistence
    db.shutdown().unwrap();
    let db2 = TidalDB::open(same_config()).unwrap();
    let recovered_score = db2.read_decay_score(EntityId(42), "view", 0).unwrap();
    assert!((recovered_score - new_score).abs() < 1e-6);
}
```

### Done When

A developer can embed tidalDB as a Rust dependency, define signal types with decay rates and windows in schema, write thousands of signal events, and read back decay-correct scores, windowed counts, and velocity values that match analytical computation to 6 decimal places -- including after a crash and restart. Performance benchmarks pass: signal write < 100us amortized, decay read < 100ns per entity, 200-entity scoring < 5us.

---

## Milestone 2: Ranked Retrieval -- "A single query retrieves, scores, and ranks content"

### Milestone Thesis

A developer can write items with metadata and embeddings, write signal events, and execute a RETRIEVE query that returns items ranked by a named profile using live signal scores -- with metadata filters and diversity constraints applied by the database, not the application. This proves that ranking is a database operation, not application logic.

### UAT Scenario

```
Given:
  A tidalDB instance with:
    - 10,000 items with metadata (title, category, format, duration, created_at)
      and 1536-dim embeddings
    - Signal types: view (7d decay), like (14d decay), skip (1d decay),
      share (3d decay), completion (30d decay)
    - 100,000 signal events spanning 7 days across the items
    - Ranking profiles defined:
      * "trending" -- share_velocity(6h) primary, view_velocity(6h) secondary,
        engagement_ratio gate > 0.03
      * "hot" -- score / (age_hours + 2)^1.8
      * "new" -- created_at DESC
      * "top_week" -- quality_score within 7d window
      * "hidden_gems" -- high completion_rate, inverse view_count
      * "controversial" -- max(likes * dislikes)

When:
  1. RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25
  2. RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20
  3. RETRIEVE items USING PROFILE new LIMIT 20
  4. RETRIEVE items USING PROFILE top_week LIMIT 20
  5. RETRIEVE items USING PROFILE hidden_gems FILTER min_completion_rate:0.7 LIMIT 10
  6. RETRIEVE items USING PROFILE controversial LIMIT 10
  7. Write a burst of 100 "share" signals for item #500
  8. Re-execute the trending query

Then:
  - Step 1: Items ordered by share velocity, max 1 per creator, items with
    engagement_ratio < 0.03 excluded
  - Step 2: Only jazz items returned, ordered by hot formula
  - Step 3: Items ordered by created_at descending, no signal computation
  - Step 4: Items ordered by quality score computed from 7d-windowed signals
  - Step 5: Items with high completion but low views, sorted by quality/reach ratio
  - Step 6: Items with highest product of positive and negative signals
  - Step 7: ok
  - Step 8: Item #500 appears higher in trending results (signal written 100ms ago
    is reflected)
  - Performance: end-to-end RETRIEVE < 50ms for 10K items
```

### Phases

#### Phase 1: Vector Index Integration (USearch)

**Delivers:** USearch wrapped behind a trait, with mmap persistence, f16 quantization, and the adaptive filtered search planner. Items can be inserted with embeddings and retrieved by ANN similarity.

**Acceptance Criteria:**

- [x] `VectorIndex` trait with `insert(key, vector)`, `remove(key)`, `search(query, k)`, `filtered_search(query, k, predicate)`, `save()`, `load()`, `view()`
- [x] USearch backend implements the trait with f16 quantization (default), mmap persistence
- [x] Vectors normalized at insertion time (L2 distance equivalent to cosine for unit vectors)
- [x] Adaptive query planner: selectivity < 2% triggers pre-filter + brute-force; 2-100% uses `filtered_search` with predicate callback
- [x] ANN retrieval at 10K vectors returns top-100 with recall@10 > 0.95
- [x] ANN retrieval latency < 10ms at 10K vectors (benchmarked)
- [x] Persistence: save on checkpoint, view() on restart for immediate read serving
- [x] `#![forbid(unsafe_code)]` relaxed only in the USearch FFI boundary module with SAFETY comments

**Depends On:** m1p3 (storage traits)
**Complexity:** L
**Research Reference:** `docs/research/ann_for_tidaldb.md` (USearch architecture, filtered search, f16, mmap)

#### Phase 2: Metadata Indexes and Filter Engine

**Delivers:** Roaring bitmap indexes for categorical metadata, B-tree indexes for range attributes, and a composable filter engine that evaluates arbitrary filter combinations. The filter engine produces either a bitmap (for pre-filtering ANN) or a predicate closure (for in-graph filtering).

**Acceptance Criteria:**

- [x] Roaring bitmap per high-cardinality metadata value: category, format, creator_id
- [x] B-tree index for range attributes: created_at, duration
- [x] Filter expressions are composable: AND across dimensions, OR within a dimension
- [x] `filter.selectivity()` estimates the fraction of items matching (for query planner)
- [x] `filter.to_bitmap()` returns a RoaringBitmap for pre-filtering
- [x] `filter.to_predicate()` returns a `Fn(EntityId) -> bool` for in-graph filtering
- [x] Filters tested: category:jazz, format:video, duration_min:5m, created_within:7d, and arbitrary combinations
- [x] Filter evaluation < 1 microsecond per candidate (benchmarked)

**Depends On:** m1p3 (storage engine)
**Complexity:** M
**Research Reference:** `docs/research/ann_for_tidaldb.md` (metadata indexes, selectivity estimation, roaring bitmaps)

#### Phase 3: Ranking Profile Engine

**Delivers:** Named ranking profiles declared as data (not compiled code), parsed, validated, stored, and executed by the database. Profiles reference signal scores, windowed aggregates, velocity, metadata fields, and define quality gates. Profiles are versioned and swappable at query time.

**Acceptance Criteria:**

- [x] Profile declaration syntax supports: primary signal, secondary signals with weights, BOOST, GATE (minimum threshold), PENALIZE, EXCLUDE
- [x] Profiles stored in schema, versioned, retrievable by name
- [x] Profile execution: given a candidate set and a profile, produce a scored and sorted result list
- [x] Built-in profiles implemented: `trending`, `hot`, `new`, `top_week`, `top_month`, `top_all_time`, `hidden_gems`, `controversial`, `most_viewed`, `most_liked`, `shuffle`
- [x] `hot` formula: `log10(max(|positive - negative|, 1)) / (age_hours + 2)^gravity` with configurable gravity
- [x] `controversial` formula: `(positive * negative) / (positive + negative)^2`
- [x] `hidden_gems` formula: `quality_score * (1 / log10(view_count + 10))` -- the `+10` prevents division by zero for items with zero views
- [x] Profile change does not require recompile -- profiles are runtime data
- [x] 200-candidate scoring pass with decay-only profile < 10 microseconds, with velocity-based profile (trending) < 100 microseconds (both Criterion benchmarked)

**Depends On:** m1p4 (signal ledger)
**Complexity:** L
**Research Reference:** `VISION.md` (ranking profile declarations), `ai-lookup/services/ranking-profiles.md`, `USE_CASES.md` Appendix B (sort mode formulas)

#### Phase 4: Diversity Enforcement

**Delivers:** Post-scoring diversity pass that reorders results to satisfy constraints (max_per_creator, format_mix) without reducing result count. Implemented as a greedy selection pass over the scored candidate list.

**Acceptance Criteria:**

- [x] `max_per_creator:N` enforced: no more than N items from any single creator in the result set
- [x] `format_mix:true` enforced: no more than 60% of results from any single format
- [x] Diversity pass does not reduce result count -- it selects the next-best candidate that satisfies constraints
- [x] Diversity pass adds < 1ms for 200 candidates (benchmarked)
- [x] When diversity constraints cannot be fully satisfied (too few creators), results are returned with a warning flag, not an error
- [x] Property test: diversity constraints hold for 10,000 random candidate sets

**Depends On:** Phase 3 (ranking profiles produce scored lists)
**Complexity:** M
**Research Reference:** `VISION.md` (diversity as query constraint), `thoughts.md` Part V.14 (MMR post-scoring)

#### Phase 5: Query Parser and RETRIEVE Executor

**Delivers:** The query parser for the RETRIEVE operation and the executor that orchestrates candidate retrieval, filtering, scoring, diversity, and result assembly. This is the "one query" entry point. For M2, the RETRIEVE query does not require `FOR USER` (no personalization yet) -- it operates on the full item corpus with filters and profiles.

**Acceptance Criteria:**

- [x] Parser handles: `RETRIEVE items`, `USING PROFILE <name>`, `FILTER <conditions>`, `DIVERSITY <constraints>`, `LIMIT <n>`, `EXCLUDE [ids]`
- [x] Parser produces a typed AST; parse errors include position and helpful message
- [x] Executor pipeline: candidate retrieval (ANN or full scan based on profile) -> filter -> score -> diversity -> limit -> return
- [x] When profile uses velocity/decay signals, executor uses ANN retrieval over embeddings then scores with signal state
- [x] When profile is `new` or `alphabetical`, executor skips ANN and uses metadata index directly
- [x] End-to-end RETRIEVE latency < 50ms at 10K items (benchmarked)
- [x] Results include: entity_id, score, and a signal snapshot (key signal values used in scoring) for debugging/transparency
- [x] `SIGNAL` write command also parsed and routed to signal write path from M1
- [x] Full M2 UAT scenario passes as an integration test

**Depends On:** Phase 1, Phase 2, Phase 3, Phase 4
**Complexity:** L
**Research Reference:** `ai-lookup/features/query-language.md`, `SEQUENCE.md` (all sequence diagrams)

### Deferred to Later Milestones

- **FOR USER clause and user preference vectors** -- deferred to M3; M2 proves ranking works without personalization
- **SIMILAR TO clause (related content)** -- deferred to M3; requires user context for personalization layer
- **Relationship graph (follows, blocks)** -- deferred to M3; M2 filters on metadata, not relationships
- **SEARCH query (text + semantic)** -- deferred to M4; M2 proves RETRIEVE ranking
- **Full-text index (Tantivy)** -- deferred to M4
- **Exploration budget / cold start** -- deferred to M3; requires user context to be meaningful
- **User state filters (unseen, saved, liked)** -- deferred to M3; requires user entities
- **Engagement threshold filters (min_views, min_likes)** -- partially implemented via signal reads; full composable filter syntax deferred to M5

### Integration Test

```rust
#[test]
fn milestone_2_uat() {
    let db = open_with_full_schema();

    // Write 10K items with embeddings
    for i in 0..10_000 {
        db.write_item(EntityId(i), metadata(i), Some(embedding(i))).unwrap();
    }

    // Write 100K signal events
    for e in generate_events(100_000, Duration::days(7)) {
        db.signal(e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Trending query with diversity
    let results = db.retrieve(
        "RETRIEVE items USING PROFILE trending DIVERSITY max_per_creator:1 LIMIT 25"
    ).unwrap();
    assert_eq!(results.len(), 25);
    assert!(results.windows(2).all(|w| w[0].score >= w[1].score));
    assert!(creator_counts(&results).values().all(|&c| c <= 1));

    // Category filter with hot sort
    let jazz = db.retrieve(
        "RETRIEVE items FILTER category:jazz USING PROFILE hot LIMIT 20"
    ).unwrap();
    assert!(jazz.iter().all(|r| r.metadata["category"] == "jazz"));

    // Signal freshness: write burst, verify ranking change
    let pre_burst = db.retrieve(
        "RETRIEVE items USING PROFILE trending LIMIT 10"
    ).unwrap();
    for _ in 0..100 {
        db.signal("share", EntityId(500), 1.0, Timestamp::now()).unwrap();
    }
    let post_burst = db.retrieve(
        "RETRIEVE items USING PROFILE trending LIMIT 10"
    ).unwrap();
    let pre_rank = pre_burst.iter().position(|r| r.id == EntityId(500));
    let post_rank = post_burst.iter().position(|r| r.id == EntityId(500));
    assert!(post_rank.unwrap() < pre_rank.unwrap_or(25));
}
```

### Done When

A developer can write items with embeddings and metadata, write signal events, and execute RETRIEVE queries with any of the 11+ built-in sort modes, metadata filters, and diversity constraints. Results are correctly ranked by the named profile. Signal events written 100ms ago are reflected in the next query. End-to-end latency < 50ms at 10K items. Diversity constraints hold in every result set.

---

## Milestone 3: Personalized Ranking -- "The For You query works"

### Milestone Thesis

A developer can write user entities with preference vectors, write relationship edges (follows, blocks), write engagement signals that update user profiles and relationship weights automatically, and execute `RETRIEVE items FOR USER @user_id USING PROFILE for_you` -- getting results shaped by the user's history, relationships, and implicit preferences. This proves that the feedback loop closes inside the database.

### Enables

- **UC-01** (For You Feed) -- Full: personalized ranking with diversity, exploration, cold start
- **UC-04** (Following Feed) -- Full: restricted to followed creators, chronological + quality tiebreaker
- **UC-05** (Related/Up Next) -- Core: ANN retrieval from source item, user preference re-ranking
- **UC-07** (Notifications) -- Core: relationship-strength scoring, recency filtering
- **UC-09** (User Library) -- Partial: unseen/liked/saved filters enable history and library queries

### UAT Scenario

```
Given:
  A tidalDB instance with:
    - 10,000 items across 200 creators, with embeddings
    - 500 users with initial preference embeddings
    - Relationship edges: follows, blocks
    - Signals: view, like, skip, hide, completion, share
    - 500,000 historical signal events establishing user preferences
    - Profiles: for_you, following, related, notification

When:
  1. RETRIEVE items FOR USER @user_42 USING PROFILE for_you
     FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50
  2. RETRIEVE items FOR USER @user_42 FILTER relationship:follows
     USING PROFILE following LIMIT 50
  3. RETRIEVE items SIMILAR TO @item_abc FOR USER @user_42
     USING PROFILE related FILTER unseen LIMIT 10
  4. SIGNAL like item:@item_xyz user:@user_42
  5. Re-execute the for_you query
  6. SIGNAL hide item:@item_999 user:@user_42
  7. SIGNAL block user:@user_42 target_creator:@creator_77
  8. Re-execute the for_you query

Then:
  - Step 1: Results personalized -- items matching user_42's preference vector
    rank higher; items from blocked creators excluded; items already seen excluded;
    max 2 per creator; 10% exploration budget (items from unfollowed creators)
  - Step 2: Only items from followed creators, chronological order
  - Step 3: Items semantically similar to @item_abc, re-ranked by user_42's
    preference match, already-seen excluded
  - Step 4: Signal write atomically updates: item like count, user->creator
    interaction weight, user preference vector shifted toward item embedding
  - Step 5: Results shift -- items similar to @item_xyz's topic rank higher;
    creator of @item_xyz appears more frequently
  - Step 6: @item_999 never appears in any future query for user_42
  - Step 7: All items by creator_77 excluded from all queries for user_42
  - Step 8: No items from creator_77; no item_999; shift from like reflected
```

### Phases

#### Phase 1: User and Creator Entities with Relationships (m3p1)

**Delivers:** User and creator entity types stored in their own fjall keyspaces (`EntityKind::User`, `EntityKind::Creator`) with preference embeddings, metadata, and a relationship graph. Relationship edges are `(from_entity, to_entity, type, weight, timestamp)` stored under the `Tag::Rel` key prefix. Three user-state bitmap indexes (`FollowsBitmap`, `UserSeenBitmap`, `UserBlockedSet`) power the `unseen`, `unblocked`, and `relationship:follows` filters.

**Acceptance Criteria:**

- [x] `db.write_user(user_id, metadata, Option<embedding>)` stores user entity in the users keyspace
- [x] `db.write_creator(creator_id, metadata, Option<embedding>)` stores creator entity in the creators keyspace
- [x] `db.write_relationship(from, to, rel_type, weight, timestamp)` stores a directional weighted edge
- [x] `db.read_relationship(from, to, rel_type)` returns `Option<RelationshipEdge>`
- [x] `db.list_relationships(from, rel_type)` returns all edges of a type from a source entity
- [x] Relationship types supported: `follows`, `blocks`, `interaction_weight`, `hide`, `mute`
- [x] Key encoding: `[from_entity_id][0x00][REL][type_byte][to_entity_id]` for O(1) lookup and prefix scan by (from, type)
- [x] `FollowsBitmap::for_user(user_id)` returns a `RoaringBitmap` of item IDs from all followed creators
- [x] `UserSeenBitmap::for_user(user_id)` returns a `RoaringBitmap` of item IDs the user has viewed
- [x] `UserBlockedSet::for_user(user_id)` returns blocked creator IDs + hidden item IDs
- [x] Relationship write/read latency < 50 microseconds (benchmarked)
- [x] User and creator entities persist across shutdown and restart
- [x] Relationships persist across shutdown and restart via storage engine

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | User + Creator Entity Types and Storage | `UserEntity`, `CreatorEntity`, write/read APIs, metadata codec, embedding slots | M |
| 02 | Relationship Graph | `RelationshipEdge`, `RelationshipType`, storage codec, CRUD operations, prefix scan | L |
| 03 | User-State Bitmap Indexes | `FollowsBitmap`, `UserSeenBitmap`, `UserBlockedSet`, bitmap maintenance hooks | M |

**Depends On:** m1p1 (types), m1p3 (storage engine, key encoding, `Tag::Rel`), m1p5 (entity write API pattern), m2p1 (vector index for embedding storage), m2p2 (bitmap indexes, `FilterExpr`, `FilterResult`)
**Complexity:** L (3 sequential tasks: 01 -> 02 -> 03)
**Research Reference:** `docs/research/tidaldb_signal_ledger.md` (three-tier storage, subject-prefix keys), `docs/research/ann_for_tidaldb.md` (user preference vector in embedding slot), `thoughts.md` Part V.12 (subject-prefix keys), Part V.16 (user preference vector)

#### Phase 2: Feedback Loop -- Signal Writes Update User State (m3p2)

**Delivers:** Atomic multi-state updates on signal write. When a signal event is written (view, like, skip, hide, block, completion, share), the database atomically updates: the item's signal ledger, the user's preference vector (EMA), the user-to-creator interaction weight, and the user-state bitmap indexes. Four components: (1) user preference vector EMA update with configurable learning rate, (2) interaction weight ledger using the existing decay infrastructure from m1p4, (3) hard negative storage with WAL-backed durability, and (4) an atomic signal dispatch that wires all state updates into a single transactional signal write.

**Acceptance Criteria:**

- [x] `db.signal("view", item_id, 1.0, ts)` with user context atomically: updates item signal ledger, marks item as seen in `UserSeenBitmap`, increments user->creator interaction weight
- [x] `db.signal("like", item_id, 1.0, ts)` with user context atomically: updates item signal ledger, shifts user preference vector toward item embedding (EMA), increments user->creator interaction weight
- [x] `db.signal("skip", item_id, 1.0, ts)` with user context atomically: updates item signal ledger, shifts user preference vector away from item embedding, decays user->creator interaction weight
- [x] `db.signal("hide", item_id, 1.0, ts)` with user context atomically: writes permanent hide edge, adds item to `UserBlockedSet.hidden_items`, excludes from all future queries for this user
- [x] `db.signal("block", user_id, creator_id, ...)` atomically: writes permanent block edge, adds creator to `UserBlockedSet.blocked_creators`, excludes all creator items from all future queries
- [x] Preference vector EMA: `pref_new = normalize(alpha * item_embedding + (1 - alpha) * pref_old)` with configurable alpha (default 0.1)
- [x] Interaction weights use the same `DecayModel::Exponential` infrastructure from m1p4
- [x] Hard negatives (hide/block) are WAL-backed and survive crash + replay
- [x] Property test: for any sequence of hide/block/signal events, a RETRIEVE query NEVER returns a hidden item or blocked creator's items
- [x] All updates visible to the next query (no eventual consistency lag within the process)
- [x] Signal dispatch overhead < 50 microseconds beyond the base item signal write

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | User Preference Vector | EMA update, normalization, learning rate config, cold-start initialization, storage codec | L |
| 02 | Interaction Weight Ledger | User-to-creator weights using decay infrastructure, update on engagement signals, read API | M |
| 03 | Hard Negatives | Hide/block permanent storage, WAL-backed durability, crash-safe replay, bitmap integration | L |
| 04 | Atomic Signal Dispatch | `UserSignalContext` wiring, multi-target dispatch, property tests for correctness invariants | L |

**Depends On:** m3p1 (user/creator entities, relationship graph, user-state bitmaps), m1p4 (signal ledger, decay infrastructure), m1p5 (signal write API), m2p1 (vector index for embedding reads)
**Complexity:** XL (4 tasks; Tasks 01 and 03 can parallelize; Task 04 depends on all three)
**Research Reference:** `docs/research/tidaldb_signal_ledger.md` (three-tier storage, signal dispatch), `docs/research/ann_for_tidaldb.md` (user preference vector management), `thoughts.md` Part V.16 (user preference vector as database-managed embedding)

#### Phase 3: Personalized Ranking Profiles (m3p3)

**Delivers:** Four personalized ranking profiles (`for_you`, `following`, `related`, `notification`) that incorporate user context into scoring, plus cold-start handling for new users and new items. The `FOR USER @user_id` clause is parsed and resolved into a `UserContext` that loads the user's preference vector, interaction weights, followed creators, and blocked state. The `SIMILAR TO @item_id` clause is parsed for the `related` profile. The profile executor uses this context to score candidates with personalization factors.

**Acceptance Criteria:**

- [x] `FOR USER @user_id` clause parsed by the query parser and resolved into `UserContext`
- [x] `SIMILAR TO @item_id` clause parsed for related-content retrieval
- [x] `UserContext` loaded from `UserStateIndex`, `InteractionWeightLedger`, preference vector
- [x] `for_you` profile: ANN retrieval using user preference vector, scoring = preference_match * engagement_velocity * recency_decay * social_proof, gates on completion_rate, penalizes skip count, 10% exploration budget
- [x] `following` profile: candidates restricted to followed creators' items (via `FollowsBitmap`), sorted by `created_at` DESC
- [x] `related` profile: ANN retrieval using source item embedding, re-ranked by user preference match, seen items excluded
- [x] `notification` profile: candidates from followed creators' recent items, scored by relationship_strength * item_quality
- [x] Cold-start users (no preference vector): fall back to population-level signals (trending/quality)
- [x] Cold-start items (no signals): exploration window -- appear in ~2% of for_you feeds
- [x] Exploration budget: ~5 of 50 for_you results from unfollowed creators to prevent filter bubbles
- [x] `ProfileExecutor` extended with `score_with_user_context()` method
- [x] `for_you`, `following`, `related`, `notification` added to `ProfileRegistry` as builtins

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | FOR USER Query Context | `UserContext` loader, query parser extensions for `FOR USER` and `SIMILAR TO`, planner integration | M |
| 02 | Personalized Profiles | `for_you`, `following`, `related`, `notification` profile implementations, executor extensions | L |
| 03 | Cold Start and Exploration | Cold-start user fallback, cold-start item injection, exploration budget enforcement | M |

**Depends On:** m3p2 (feedback loop: preference vectors populated, interaction weights updated, user-state bitmaps maintained), m2p3 (ranking profile engine, `ProfileExecutor`), m2p5 (query parser, RETRIEVE executor), m2p1 (vector index for ANN retrieval with user preference vector)
**Complexity:** L (3 sequential tasks: 01 -> 02 -> 03)
**Research Reference:** `docs/research/ann_for_tidaldb.md` (ANN retrieval with user preference vector as query), `VISION.md` (ranking profiles, personalization factors, cold start), `USE_CASES.md` (UC-01 For You, UC-04 Following, UC-05 Related, UC-07 Notifications)

#### Phase 4: User State Filters + M3 UAT Integration Test (m3p4)

**Delivers:** Composable user-state filters (`unseen`, `unblocked`, `saved`, `liked`, `in_progress`) integrated with the existing `FilterExpr`/`FilterResult` system from m2p2, plus the end-to-end M3 UAT integration test that proves the full "For You" query works. User-state filters require the `FOR USER` clause (from m3p3) to resolve user context and are evaluated alongside metadata filters during the RETRIEVE pipeline.

**Acceptance Criteria:**

- [x] `FILTER unseen` excludes items the user has viewed (via `UserSeenBitmap`)
- [x] `FILTER unblocked` excludes items from blocked creators and hidden items (via `UserBlockedSet`)
- [x] `FILTER saved` returns only items the user has saved
- [x] `FILTER liked` returns only items the user has liked
- [x] `FILTER in_progress` returns items with partial completion signal (0.0 < completion < 0.8)
- [x] User-state filters compose with metadata filters: `FILTER unseen, category:jazz, format:video`
- [x] User-state filters require `FOR USER` clause; used without it returns `LumenError::Query` error with helpful message
- [x] `FilterExpr` extended with `Unseen`, `Unblocked`, `Saved`, `Liked`, `InProgress` variants
- [x] Filter evaluation produces `FilterResult::Predicate` for user-state filters (not bitmap)
- [x] The RETRIEVE executor intersects user-state predicates with metadata filter bitmaps
- [x] Full M3 UAT integration test passes (all 8 UAT scenario steps verified)

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | User State Filters | `Unseen`, `Unblocked`, `Saved`, `Liked`, `InProgress` filter variants, parser recognition, executor integration | M |
| 02 | M3 UAT Integration Test | End-to-end integration test covering all 8 UAT scenario steps, property tests for hard-negative invariants | L |

**Depends On:** m3p3 (personalized profiles, `FOR USER` query context parsing, cold-start handling), m3p2 (feedback loop: seen bitmaps populated, hard negatives enforced), m3p1 (user-state index, relationships), m2p2 (filter engine, `FilterExpr`, `FilterResult`)
**Complexity:** M (2 sequential tasks: 01 -> 02)
**Research Reference:** `VISION.md` (user-state filters as first-class query primitives), `USE_CASES.md` (Appendix A: user state filters), `API.md` (FILTER clause syntax)

### Phase Dependency DAG

```
m3p1 (Users/Creators/Relationships)
    |
    v
m3p2 (Feedback Loop)      [Tasks 01 & 03 parallel within phase]
    |
    v
m3p3 (Personalized Profiles)
    |
    v
m3p4 (User State Filters + UAT)
```

All four phases are strictly sequential. m3p2 cannot begin without the entity and relationship foundation from m3p1. m3p3 cannot begin without the preference vectors and interaction weights from m3p2. m3p4 cannot begin without the `FOR USER` clause parsing and profile execution from m3p3.

Within m3p2, Tasks 01 (User Preference Vector) and 03 (Hard Negatives) can be built in parallel. Task 04 (Atomic Signal Dispatch) depends on all three preceding tasks.

### Deferred to Later Milestones

- **SEARCH query with personalization** -- deferred to M5; M3 proves personalized RETRIEVE works. Adding text search on top of a proven personalization layer is the correct sequence.
- **Tantivy integration** -- deferred to M5; M3 uses ANN retrieval only. Full-text search requires the hybrid fusion layer (RRF) which belongs in M5.
- **People/creator search (UC-10)** -- deferred to M5; requires Tantivy indexing of creator entities and "creators like X" similarity search.
- **Social graph traversal for trending ("trending among my follows")** -- deferred to M6; requires graph query capabilities beyond the simple follows filter delivered in m3p1. M3 uses population-level signals as a proxy for social proof.
- **Collaborative filtering** -- deferred to M6; M3's `related` profile uses ANN similarity + user preference re-ranking. Full matrix-factorization-style CF (co-engagement signals, "users who liked X also liked Y") adds a new data structure and compute model.
- **User-created collections/boards (UC-09.4)** -- deferred to M6; collections are a new entity type with their own ranking surface. M3 delivers the simpler user-state filters (saved, liked, in_progress).
- **Live content status tracking (UC-12)** -- deferred to M6; requires real-time viewer count signals and schedule awareness.
- **Notification frequency capping** -- deferred to M6; M3's `notification` profile ranks by recency * relationship_strength without per-creator or per-user caps.
- **Adaptive preference learning rate** -- deferred to M6; M3 uses constant alpha (0.1). Adaptive alpha that decays with update count is a refinement that requires tracking per-user update history.
- **Reverse relationship index (creator -> followers)** -- deferred to M6; M3 only needs forward traversal (user -> creators they follow). Reverse traversal enables social graph queries.

### Integration Test

```rust
#[test]
fn milestone_3_uat() {
    let db = open_with_users_and_relationships();

    // User 42 likes jazz, follows creators 1-10, blocked creator 77
    let feed = db.retrieve(
        "RETRIEVE items FOR USER @42 USING PROFILE for_you \
         FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50"
    ).unwrap();
    assert_eq!(feed.len(), 50);
    assert!(feed.iter().all(|r| !user_42_seen.contains(&r.id)));
    assert!(feed.iter().all(|r| r.creator_id != CreatorId(77)));
    assert!(creator_counts(&feed).values().all(|&c| c <= 2));

    // Following feed -- only followed creators, chronological
    let following = db.retrieve(
        "RETRIEVE items FOR USER @42 FILTER relationship:follows \
         USING PROFILE following LIMIT 50"
    ).unwrap();
    assert!(following.iter().all(|r| followed_creators.contains(&r.creator_id)));
    assert!(following.windows(2).all(|w| w[0].created_at >= w[1].created_at));

    // Related content -- similar to item_abc, personalized
    let related = db.retrieve(
        "RETRIEVE items SIMILAR TO @item_abc FOR USER @42 \
         USING PROFILE related FILTER unseen LIMIT 10"
    ).unwrap();
    assert!(related.iter().all(|r| !user_42_seen.contains(&r.id)));

    // Like an item, verify preference shift
    db.signal("like", EntityId(500), UserId(42), 1.0, now()).unwrap();
    let feed2 = db.retrieve(same_for_you_query()).unwrap();
    // Items topically similar to item 500 should rank higher
    let topic_500 = db.read_item(EntityId(500)).unwrap().category;
    let topic_match_before = feed.iter().filter(|r| r.category == topic_500).count();
    let topic_match_after = feed2.iter().filter(|r| r.category == topic_500).count();
    assert!(topic_match_after >= topic_match_before);

    // Hide and block, verify exclusion
    db.signal("hide", EntityId(999), UserId(42), 1.0, now()).unwrap();
    db.signal("block", UserId(42), CreatorId(77), 1.0, now()).unwrap();
    let feed3 = db.retrieve(same_for_you_query()).unwrap();
    assert!(feed3.iter().all(|r| r.id != EntityId(999)));
    assert!(feed3.iter().all(|r| r.creator_id != CreatorId(77)));

    // Verify cold-start user gets population-level results
    let cold_feed = db.retrieve(
        "RETRIEVE items FOR USER @new_user USING PROFILE for_you \
         FILTER unseen, unblocked LIMIT 50"
    ).unwrap();
    assert_eq!(cold_feed.len(), 50); // falls back to trending/quality

    // Verify crash recovery preserves hard negatives
    db.shutdown().unwrap();
    let db2 = TidalDb::reopen(same_config()).unwrap();
    let feed4 = db2.retrieve(same_for_you_query_user_42()).unwrap();
    assert!(feed4.iter().all(|r| r.id != EntityId(999)));
    assert!(feed4.iter().all(|r| r.creator_id != CreatorId(77)));
}
```

### Done When

The full "For You" query works: `RETRIEVE items FOR USER @user_id USING PROFILE for_you FILTER unseen, unblocked DIVERSITY max_per_creator:2 LIMIT 50` returns personalized, diversity-constrained results that reflect the user's engagement history, exclude hidden items and blocked creators, include an exploration budget, handle cold-start users and items, and update in response to new signal events within 100ms. The `following`, `related`, and `notification` profiles also work correctly. Hard negatives survive crash and restart. All 8 UAT scenario steps pass.

---

## Milestone 4: Agent Memory -- "Agents own the personalization substrate"

### Milestone Thesis

M3 proved the feedback loop closes inside the database for direct user interactions. M4 proves that agents -- the dominant interaction mediator -- can create scoped sessions, write structured feedback signals with aggressive decay, enforce declarative policy on the write path, and query live session context as part of ranking, all within the same embeddable runtime. A developer wiring an LLM agent to tidalDB gets instant-on session memory without standing up Redis, a feature store, or a policy middleware.

### Enables

- **Agent-mediated personalization** -- agents ground LLM responses by reading the user's preference state plus the session's accumulated reward and preference hints, then write structured feedback that immediately shapes the next ranking pass.
- **RLHF-style reward loops** -- reward signals with minute-scale decay let agents record how well a recommendation served the user; the next RETRIEVE incorporates reward velocity into scoring.
- **Conversational memory** -- multi-turn tool usage and preference hints are short-lived signals scoped to a session; they influence ranking for the session's lifetime and are archived on close.
- **Policy-safe agent integration** -- the schema declares which signal types an agent may write per session; the database enforces this, not application middleware. Disallowed writes are rejected with audit trail.
- **Partial UC-01 enhancement** -- "For You" queries that incorporate session context (e.g., "more jazz today") produce results shaped by both long-lived user preferences and ephemeral session preferences.

### UAT Scenario

```
Given:
  A tidalDB instance with:
    - Schema defining session signal types:
      * "preference_hint" with linear decay (lifetime=30m), target=Item
      * "reward" with exponential decay (half_life=10m), windows=[5m, 15m], velocity=true
      * "tool_use" with linear decay (lifetime=1h), target=Item
    - An AgentPolicy "planner_policy" in schema:
        allowed_signals: [preference_hint, reward]
        denied_signals: [tool_use]
        max_session_duration: 2h
        max_signals_per_session: 1000
    - 100 items with embeddings and metadata (category, format, creator_id)
    - 10,000 signal events establishing item signal state
    - User @42 with preference vector and engagement history
    - Profiles: for_you (updated to accept optional SessionContext)

When:
  1. Agent starts a session:
     let session = db.start_session(user_id: 42, agent_id: "planner",
         policy: "planner_policy", metadata: {"tool": "planner"})?;
     // Returns SessionHandle with SessionId

  2. Agent writes a preference_hint signal:
     db.session_signal(&session, "preference_hint", EntityId(0), 1.0,
         Timestamp::now(), Some("more jazz today".into()))?;
     // Accepted: preference_hint is in allowed_signals

  3. Agent writes a reward signal after delivering an answer:
     db.session_signal(&session, "reward", EntityId(42), 0.8,
         Timestamp::now(), None)?;
     // Accepted: reward is in allowed_signals

  4. Agent queries with session context:
     let query = RetrieveBuilder::new(EntityKind::Item, ProfileRef::new("for_you"))
         .for_user(42)
         .for_session(session.id())
         .limit(10)
         .build()?;
     let results = db.retrieve(&query)?;
     // Returns ranked items with session_snapshot attached

  5. Agent reads session snapshot:
     let snapshot = db.session_snapshot(session.id())?;
     // Returns: signal counts, reward velocity, duration, metadata

  6. Agent attempts a disallowed write:
     let err = db.session_signal(&session, "tool_use", EntityId(0), 1.0,
         Timestamp::now(), None);
     // Returns Err(LumenError::PolicyViolation { signal: "tool_use",
     //   policy: "planner_policy", reason: "signal type not in allowed list" })

  7. Agent reads audit log:
     let audit = db.session_audit(session.id())?;
     // Contains: 2 accepted writes, 1 rejected write with reason

  8. Agent closes the session:
     let summary = db.close_session(session)?;
     // Returns SessionSummary: duration, signal_counts, rejections, archived

  9. After closure, query the archived snapshot:
     let archived = db.session_snapshot(session_id)?;
     // Returns the frozen final snapshot (signals no longer decay)

  10. Verify session isolation -- a second session for the same user
      does not see session 1's signals:
      let session2 = db.start_session(user_id: 42, agent_id: "planner",
          policy: "planner_policy", metadata: {})?;
      let snap2 = db.session_snapshot(session2.id())?;
      // snap2 has zero signals -- session 1's data does not leak

Then:
  - Step 1: start_session returns SessionHandle; session appears in db.active_sessions()
  - Step 2: preference_hint recorded; session signal count = 1
  - Step 3: reward recorded; session signal count = 2; reward velocity > 0
  - Step 4: Results shaped by session context -- items matching "jazz" preference
    hint rank higher than without session context; results include a
    session_snapshot field with reward_velocity and hint summary
  - Step 5: Snapshot contains { signals_written: 2, signals_rejected: 0,
    reward_velocity_5m: >0.0, duration_ms: <5000, metadata: {"tool":"planner"} }
  - Step 6: Error returned with LumenError::PolicyViolation; write not persisted
  - Step 7: Audit log has 3 entries (2 accepted, 1 rejected with reason)
  - Step 8: Session marked closed; summary.duration_ms > 0; summary.signals_written == 2;
    summary.rejections == 1
  - Step 9: Archived snapshot readable; signal values frozen at close time (no further decay)
  - Step 10: Session isolation proven -- zero signal leakage between sessions
  - Performance: session_signal write < 200 microseconds (including WAL + policy check);
    session_snapshot read < 50 microseconds; RETRIEVE with session context adds < 5ms
    overhead vs without
```

### Phases

#### Phase 1: Session Schema and Lifecycle (m4p1)

**Delivers:** `SessionId`, `AgentId`, `AgentPolicy`, and `SessionHandle` types in the schema and entities modules. Schema-level `session_policy()` for declaring per-agent allowed/denied signal lists, duration limits, and signal count caps. Session lifecycle APIs: `start_session`, `close_session`, `active_sessions`. WAL entries tagged with `session_id` for crash recovery of active sessions. Closed sessions archived to storage as frozen snapshots.

**Acceptance Criteria:**

- [x] `SessionId` is a u64 newtype with `Display`, `Hash`, `Eq`, `Ord`, monotonically assigned via `AtomicU64` counter
- [x] `AgentId` is a `String` newtype (max 64 chars, validated at construction: `[a-z0-9_-]+`)
- [x] `AgentPolicy` struct declared in schema: `allowed_signals: Vec<String>`, `denied_signals: Vec<String>`, `max_session_duration: Duration`, `max_signals_per_session: u32`; validated at schema build time (no signal name in both allowed and denied; all signal names must exist in schema)
- [x] `SessionHandle` is a move-only type containing `SessionId`, `user_id: u64`, `agent_id: AgentId`, `policy_name: String`, start timestamp, and a `closed: AtomicBool` flag; `SessionHandle` is `Send + Sync`
- [x] `SchemaBuilder::session_policy(name, AgentPolicy)` registers policies at schema build time; duplicate names rejected with `SchemaError`
- [x] `db.start_session(user_id, agent_id, policy_name, metadata) -> Result<SessionHandle>` creates a new session: validates policy exists, assigns `SessionId`, stores session metadata in a `DashMap<SessionId, SessionState>`, logs session-start event to WAL
- [x] `db.close_session(handle) -> Result<SessionSummary>` takes ownership of `SessionHandle` (move semantics prevent use-after-close), freezes signal state, computes summary (duration, signal counts, rejection count), archives to storage under `Tag::Session` key prefix, logs session-close event to WAL
- [x] `db.active_sessions() -> Vec<SessionInfo>` returns list of open sessions with id, user_id, agent_id, start_time, signal_count
- [x] Session state survives crash: on WAL replay, session-start events without matching session-close events are restored as active sessions; session-close events mark sessions as archived
- [x] `SessionState` contains: `DashMap<SignalTypeId, SessionSignalState>` for per-signal-type accumulators within the session
- [x] Closed `SessionHandle` cannot be used for further writes (compile-time enforcement via move semantics; runtime check via `closed` flag as defense-in-depth)
- [x] Session metadata (`HashMap<String, String>`) persisted to storage and retrievable after close
- [x] `max_session_duration` enforced: `session_signal` on a session that has exceeded its duration returns `LumenError::SessionExpired`
- [x] `Tag::Session` (0x07) added to the key encoding enum for session archive storage

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Session Types | `SessionId`, `AgentId`, `AgentPolicy`, `SessionHandle`, `SessionState`, `SessionInfo`, `SessionSummary` types with validation, Display, Hash, Eq | M |
| 02 | Schema Integration | `SchemaBuilder::session_policy()`, validation (signal name cross-check, no duplicates), `Schema::policy(name) -> Option<&AgentPolicy>` | S |
| 03 | Session Lifecycle | `start_session`, `close_session`, `active_sessions`, `SessionState` DashMap, move-only handle, duration enforcement, archive to storage | L |
| 04 | WAL Session Events | `WalCommand::SessionStart` and `WalCommand::SessionClose` variants, WAL replay restores active sessions, closed sessions restored as archived | M |

**Depends On:** m1p1 (type system), m1p2 (WAL), m1p3 (storage, key encoding, Tag enum), m3p1 (user entities)
**Complexity:** L
**Research Reference:** `VISION.md` (Sessions / Agent Context), `thoughts.md` Part V.5 (quarantine-first durability), `docs/research/tidaldb_signal_ledger.md` (running-score formula reuse for session signals)

#### Phase 2: Session Signal Engine (m4p2)

**Delivers:** `session_signal()` API that writes session-scoped signal events with aggressive decay. Session signals share the existing `SignalLedger` running-score infrastructure but are keyed by `(SessionId, SignalTypeId)` instead of `(EntityId, SignalTypeId)`. Preference hints are stored as typed annotations on session signal entries. Session-scoped windowed counts and velocity available via `session_snapshot()`.

**Acceptance Criteria:**

- [x] `db.session_signal(&SessionHandle, signal_type, entity_id, weight, timestamp, Option<annotation>) -> Result<()>` writes a session-scoped signal: validates session is open, updates `SessionSignalState` running decay score, updates session windowed counters, increments session signal count, logs to WAL with session_id tag
- [x] `SessionSignalState` uses the same `HotSignalState` running-score formula (`S(t) = S(t_prev) * exp(-lambda * dt) + w`) -- reuse, not rewrite
- [x] Session windowed counters use `BucketedCounter` with minute-level granularity (appropriate for session timescales of minutes to hours)
- [x] `db.session_snapshot(session_id) -> Result<SessionSnapshot>` returns: signal type -> (decay_score, windowed_counts, velocity), total signals written, total rejections, duration, metadata, annotations (preference hints)
- [x] Preference hint annotations stored as `Vec<(Timestamp, String)>` on the session state; capped at 100 per session to bound memory
- [x] Session signals do NOT update the global item signal ledger -- they are session-scoped only (isolation)
- [x] Session signals do NOT update user preference vectors or interaction weights -- session influence is read-time only (via Phase 4)
- [x] For active sessions, decay scores reflect current wall-clock time (lazy decay on read, same as `HotSignalState`)
- [x] For archived sessions, signal values are frozen at close time (no further decay applied on read)
- [x] WAL replay of session signals restores `SessionSignalState` accumulators correctly (property test: replay produces identical state to uninterrupted execution for 1000 random session signal sequences)
- [x] `session_signal` latency < 200 microseconds including WAL write (benchmarked)
- [x] `session_snapshot` read latency < 50 microseconds (benchmarked)
- [x] 50,000 session signals per second throughput (benchmarked)

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | SessionSignalState | Running decay score, windowed counters, annotation storage, freeze-on-close semantics, reusing `HotSignalState` internals | M |
| 02 | session_signal API | Write path: validation, WAL event, state update, signal count tracking, annotation capture | M |
| 03 | session_snapshot API | Read path: snapshot assembly, decay-correct reads for active sessions, frozen reads for archived, preference hint list | S |
| 04 | WAL Integration & Replay | `WalCommand::SessionSignal` variant, replay logic, property test for replay correctness | M |

**Depends On:** m4p1 (session types, lifecycle, WAL events), m1p4 (signal ledger infrastructure -- `HotSignalState`, `BucketedCounter`)
**Complexity:** L
**Research Reference:** `docs/research/tidaldb_signal_ledger.md` (running-score formula, BucketedCounter, EntityState struct), `VISION.md` (session signals with aggressive decay)

#### Phase 3: Policy Enforcement and Audit (m4p3)

**Delivers:** Declarative policy enforcement on the session signal write path. Policies declared in schema (m4p1) are enforced at write time: signal type allow/deny lists, per-session signal count caps, and session duration limits. Every write attempt (accepted or rejected) is recorded in a per-session audit log. Rejected writes return structured `LumenError::PolicyViolation` errors with the policy name, signal type, and human-readable reason.

**Acceptance Criteria:**

- [x] `session_signal()` checks the session's policy before writing: if signal type is in `denied_signals`, or is not in `allowed_signals` (when `allowed_signals` is non-empty), write is rejected with `LumenError::PolicyViolation`
- [x] `LumenError::PolicyViolation` variant added: contains `signal_type: String`, `policy_name: String`, `reason: String`
- [x] Per-session signal count cap enforced: when `signals_written >= max_signals_per_session`, further writes return `LumenError::PolicyViolation` with reason "session signal limit exceeded (N/max)"
- [x] Session duration limit enforced: when `now - session_start > max_session_duration`, further writes return `LumenError::SessionExpired`
- [x] `SessionAuditLog` stored per session: `Vec<AuditEntry>` where `AuditEntry = { timestamp, signal_type, outcome: Accepted | Rejected(reason) }`
- [x] `db.session_audit(session_id) -> Result<Vec<AuditEntry>>` returns the audit log for a session (active or archived)
- [x] Audit log capped at 10,000 entries per session to bound memory; oldest entries evicted with a "truncated" marker
- [x] Audit log persisted with session archive on close (retrievable after close)
- [x] Policy evaluation adds < 1 microsecond per signal write (benchmarked -- it is a HashMap lookup, not a hot path concern)
- [x] Property test: for any sequence of allowed and denied signal writes, the audit log exactly matches the write outcomes and no denied signal modifies session state

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Policy Evaluator | `PolicyEvaluator::check(policy, signal_type, session_state) -> Result<(), PolicyViolation>`, signal allow/deny, count cap, duration check | S |
| 02 | Audit Log | `SessionAuditLog`, `AuditEntry`, append-on-write, cap enforcement, persist with archive | S |
| 03 | Write Path Integration | Wire `PolicyEvaluator` into `session_signal()`, wire audit log recording, `db.session_audit()` API, property tests | M |

**Depends On:** m4p1 (session types, `AgentPolicy` in schema), m4p2 (session signal write path to intercept)
**Complexity:** M
**Research Reference:** `VISION.md` ("policy guards live in schema, not ad-hoc middleware", "agents can only read/write within their sessions")

#### Phase 4: Session-Aware Ranking and M4 UAT (m4p4)

**Delivers:** `FOR SESSION @session_id` clause in the RETRIEVE query that loads session context and blends it into ranking. Session preference hints boost items matching the hint content. Session reward velocity adjusts the scoring weight. Query results include a `session_snapshot` alongside ranked items. End-to-end M4 UAT integration test proving the full agent workflow: start session, write signals with policy, query with session context, verify session isolation, close and archive.

**Acceptance Criteria:**

- [x] `RetrieveBuilder::for_session(session_id)` added; `Retrieve` struct gains `for_session: Option<SessionId>` field
- [x] `SessionContext` struct loaded when `for_session` is present: contains preference hints (parsed into keyword boost hints), reward velocity, session metadata
- [x] `for_you` profile (and any personalized profile) accepts optional `SessionContext`: scoring formula adds a session boost factor: `session_boost = hint_match_score * 0.3 + reward_velocity_normalized * 0.2`
- [x] `hint_match_score` computed as: for each preference hint string, extract keywords; if item metadata (category, tags, title) contains any keyword, score = 1.0 per match, normalized to [0, 1]; this is a simple keyword match (semantic session hints deferred to M5)
- [x] `reward_velocity_normalized` = `reward_velocity / (reward_velocity + 1.0)` -- sigmoid normalization to [0, 1)
- [x] Session boost is additive to the existing profile score (does not replace personalization, layers on top)
- [x] Results struct gains optional `session_snapshot: Option<SessionSnapshot>` field, populated when `for_session` is present
- [x] Session isolation: `FOR SESSION @S1` uses only S1's signals; S2's signals are invisible; no global state pollution
- [x] When `for_session` references a closed session, archived snapshot is used (read-only, no decay applied)
- [x] When `for_session` references a non-existent session, `LumenError::Query("session not found")` returned
- [x] RETRIEVE with session context adds < 5ms overhead vs without (benchmarked at 10K items)
- [x] Full M4 UAT integration test passes covering all 10 UAT scenario steps

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | FOR SESSION Query Context | `RetrieveBuilder::for_session()`, `Retrieve.for_session` field, `SessionContext` loader from active/archived session state | M |
| 02 | Session-Aware Scoring | `ProfileExecutor` extension: `score_with_session_context()`, keyword hint matching, reward velocity normalization, additive boost | M |
| 03 | Session Snapshot in Results | `Results.session_snapshot` field, populated by executor when session context present | S |
| 04 | M4 UAT Integration Test | End-to-end test covering all 10 UAT steps: lifecycle, signals, policy, ranking, isolation, archive, audit | L |

**Depends On:** m4p3 (policy enforcement wired into write path), m4p2 (session signals and snapshot readable), m4p1 (session lifecycle), m3p3 (personalized profile executor, `UserContext`), m2p5 (RETRIEVE executor pipeline)
**Complexity:** L
**Research Reference:** `VISION.md` ("agents ground themselves by reading live session context, write structured signals with decay budgets, and immediately query those updates on the next turn"), `USE_CASES.md` UC-01 (For You with session overlay)

### Phase Dependency DAG

```
m4p1 (Session Schema & Lifecycle)
    |
    v
m4p2 (Session Signal Engine)
    |
    v
m4p3 (Policy Enforcement & Audit)
    |
    v
m4p4 (Session-Aware Ranking + UAT)
```

All four phases are strictly sequential. m4p2 cannot begin without the session types and lifecycle from m4p1. m4p3 cannot begin without the session signal write path from m4p2. m4p4 cannot begin without policy enforcement from m4p3. Within each phase, certain tasks can parallelize (e.g., m4p2 tasks 01 and 03 overlap; m4p3 tasks 01 and 02 are independent).

### Deferred to Later Milestones

- **Session forking and merging** -- deferred because forking introduces DAG-shaped session graphs with merge conflict semantics; this belongs after M8 (Distributed Fabric) when the CRDT model can inform fork/merge design. Planned for M9/M10.
- **Multi-agent sessions** (multiple agents sharing one session) -- deferred because shared-session policy requires capability intersection and concurrent write arbitration; M4 proves single-agent sessions first. Planned for M10.
- **Cross-session aggregation** ("what did this user's agents learn across all sessions this week?") -- deferred because it requires a materialization layer rolling up closed sessions into user-level signal state. Planned for M6.
- **Semantic hint matching** (preference hints interpreted via embedding similarity) -- deferred because it requires Tantivy integration (M5) for proper text analysis; M4 uses simple keyword matching as a correct baseline. Planned for M5.
- **Session signal influence on global user preference vector** -- deferred because the correct boundary between ephemeral session boost and permanent preference update requires careful UX design; M4 keeps session influence strictly read-time. Planned for M6.
- **RLHF training data export** -- deferred because export formats and training pipelines are application-specific; tidalDB stores the signals, external tools read them. Planned for M7.
- **Per-agent QPS rate limiting** -- deferred because the per-session signal count cap provides coarse-grained protection; fine-grained QPS limiting with token-bucket belongs in M7 (Production Hardening). Planned for M7.
- **Session TTL auto-cleanup** (background sweeper for abandoned sessions) -- deferred; `max_session_duration` enforcement on writes is sufficient for M4. Planned for M7.
- **User revocation of agent-contributed signals** -- deferred because revocation requires retroactive signal removal with re-materialization, a core M10 (Governance & Agent Rights) concern. Planned for M10.

### Integration Test

```rust
#[test]
fn milestone_4_uat() {
    let mut schema_builder = SchemaBuilder::new();

    // Session signal types with aggressive decay.
    let _ = schema_builder
        .signal("preference_hint", EntityKind::Item,
            DecaySpec::Linear { lifetime: Duration::from_secs(30 * 60) })
        .windows(&[Window::OneHour])
        .velocity(false)
        .add();
    let _ = schema_builder
        .signal("reward", EntityKind::Item,
            DecaySpec::Exponential { half_life: Duration::from_secs(10 * 60) })
        .windows(&[Window::OneHour])
        .velocity(true)
        .add();
    let _ = schema_builder
        .signal("tool_use", EntityKind::Item,
            DecaySpec::Linear { lifetime: Duration::from_secs(3600) })
        .windows(&[Window::OneHour])
        .velocity(false)
        .add();

    // Standard signals for item ranking.
    for sig in &["view", "like", "skip"] {
        let _ = schema_builder
            .signal(sig, EntityKind::Item,
                DecaySpec::Exponential { half_life: Duration::from_secs(7 * 24 * 3600) })
            .windows(&[Window::OneHour, Window::TwentyFourHours])
            .velocity(true)
            .add();
    }

    // Policy: planner can write preference_hint and reward, not tool_use.
    schema_builder.session_policy("planner_policy", AgentPolicy {
        allowed_signals: vec!["preference_hint".into(), "reward".into()],
        denied_signals: vec!["tool_use".into()],
        max_session_duration: Duration::from_secs(2 * 3600),
        max_signals_per_session: 1000,
    }).unwrap();

    let schema = schema_builder.build().unwrap();
    let db = TidalDb::builder().ephemeral().with_schema(schema).open().unwrap();

    // Write items: some jazz, some rock.
    for i in 1..=50u64 {
        let mut meta = HashMap::new();
        let category = if i <= 25 { "jazz" } else { "rock" };
        meta.insert("category".into(), category.into());
        meta.insert("format".into(), "video".into());
        meta.insert("creator_id".into(), (i % 10).to_string());
        db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
        db.signal("view", EntityId::new(i), 1.0, Timestamp::now()).unwrap();
    }

    // Write user 42 with preference history.
    let mut user_meta = HashMap::new();
    user_meta.insert("name".into(), "alice".into());
    db.write_user(EntityId::new(42), &user_meta).unwrap();

    // Step 1: Start session.
    let mut session_meta = HashMap::new();
    session_meta.insert("tool".into(), "planner".into());
    let session = db.start_session(42, "planner", "planner_policy", session_meta)
        .unwrap();
    let session_id = session.id();
    assert!(db.active_sessions().iter().any(|s| s.id == session_id));

    // Step 2: Write preference_hint.
    db.session_signal(&session, "preference_hint", EntityId::new(0), 1.0,
        Timestamp::now(), Some("more jazz today".into())).unwrap();

    // Step 3: Write reward.
    db.session_signal(&session, "reward", EntityId::new(42), 0.8,
        Timestamp::now(), None).unwrap();

    // Step 4: Query with session context.
    let query = RetrieveBuilder::new(EntityKind::Item, ProfileRef::new("for_you"))
        .for_user(42)
        .for_session(session_id)
        .limit(10)
        .build()
        .unwrap();
    let results = db.retrieve(&query).unwrap();
    assert!(!results.items.is_empty());
    assert!(results.session_snapshot.is_some());

    // Step 5: Read session snapshot.
    let snapshot = db.session_snapshot(session_id).unwrap();
    assert_eq!(snapshot.signals_written, 2);
    assert_eq!(snapshot.signals_rejected, 0);
    assert!(snapshot.duration_ms > 0);
    assert_eq!(snapshot.metadata.get("tool").unwrap(), "planner");

    // Step 6: Disallowed write.
    let err = db.session_signal(&session, "tool_use", EntityId::new(0), 1.0,
        Timestamp::now(), None);
    assert!(err.is_err());
    match err.unwrap_err() {
        LumenError::PolicyViolation { signal_type, policy_name, .. } => {
            assert_eq!(signal_type, "tool_use");
            assert_eq!(policy_name, "planner_policy");
        }
        other => panic!("expected PolicyViolation, got: {other:?}"),
    }

    // Step 7: Audit log.
    let audit = db.session_audit(session_id).unwrap();
    let accepted = audit.iter().filter(|e| e.accepted).count();
    let rejected = audit.iter().filter(|e| !e.accepted).count();
    assert_eq!(accepted, 2);
    assert_eq!(rejected, 1);

    // Step 8: Close session.
    let summary = db.close_session(session).unwrap();
    assert!(summary.duration_ms > 0);
    assert_eq!(summary.signals_written, 2);
    assert_eq!(summary.rejections, 1);
    assert!(!db.active_sessions().iter().any(|s| s.id == session_id));

    // Step 9: Archived snapshot readable.
    let archived = db.session_snapshot(session_id).unwrap();
    assert_eq!(archived.signals_written, 2);

    // Step 10: Session isolation.
    let session2 = db.start_session(42, "planner", "planner_policy", HashMap::new())
        .unwrap();
    let snap2 = db.session_snapshot(session2.id()).unwrap();
    assert_eq!(snap2.signals_written, 0, "session 2 must not see session 1 signals");

    db.close_session(session2).unwrap();
    db.close().unwrap();
}
```

### Done When

A developer can embed tidalDB alongside an agent runtime and: (1) declare agent policies in schema specifying allowed/denied signal types, session duration limits, and signal count caps; (2) start sessions bound to a user and an agent; (3) write session-scoped signals (preference hints, rewards) that are accepted or rejected by policy with every attempt recorded in the audit log; (4) execute `RETRIEVE items FOR USER @user_id FOR SESSION @session_id USING PROFILE for_you LIMIT 10` and receive ranked items incorporating session preference hints and reward velocity as an additive boost; (5) read session snapshots with signal state, velocity, and preference hints; (6) close sessions and retrieve archived snapshots with frozen signal values; (7) verify complete session isolation -- zero signal leakage between sessions. Policy violations return structured `LumenError::PolicyViolation` errors. Session signal writes complete in < 200 microseconds. RETRIEVE with session context adds < 5ms overhead. All 10 UAT scenario steps pass in the integration test.

---

## Milestone 5: Hybrid Search -- "Text + semantic + signals in one query"

### Milestone Thesis

M4 proved agents can write scoped signals and query session context within a personalized ranking pipeline. M5 proves that text search and vector retrieval are the same system. A developer can execute `SEARCH items QUERY "rust tutorial beginner" VECTOR query_vector FOR USER @user_id USING PROFILE search LIMIT 20` and get results that combine BM25 text relevance, semantic similarity, and user personalization in a single ranked list -- with the same signal freshness, diversity enforcement, and feedback loop guarantees that RETRIEVE already provides.

### Enables

- **UC-02** (Search) -- Full: keyword search, exact phrase, boolean operators, field-scoped, hybrid BM25 + semantic, personalized re-ranking, search click feedback loop
- **UC-10** (People/Creator Search) -- Full: creator discovery by name/topic, "creators like X" via embedding similarity, creator attribute filters
- **UC-11** (Visual/Semantic Search) -- Core: vector-only search for image similarity, semantic intent queries ("something relaxing to watch")

### UAT Scenario

```
Given:
  A tidalDB instance with:
    - 10,000 items with text fields (title, description, tags) indexed for full-text search
    - All items have 1536-dim embeddings
    - 500 users with engagement history and preference vectors
    - 200 creators with name, handle, and aggregated embeddings
    - Signal types: view (7d decay), like (14d decay), skip (1d decay),
      search_click (3d decay, with query context)
    - Profiles: "search" (text_weight:0.6, vector_weight:0.4, RRF k=60,
      personalization overlay, completion gate > 0.3, diversity max_per_creator:2)

When:
  1. SEARCH items QUERY "rust tutorial beginner" VECTOR [query_embedding]
     FOR USER @user_42 USING PROFILE search DIVERSITY max_per_creator:2 LIMIT 20
  2. SEARCH items QUERY "jazz piano" FOR USER @user_42
     USING PROFILE search FILTER duration:short, format:video LIMIT 20
  3. SEARCH items QUERY "\"exact phrase match\"" USING PROFILE search LIMIT 10
  4. SEARCH items QUERY "jazz -beginner" USING PROFILE search LIMIT 10
  5. SEARCH creators QUERY "jazz" LIMIT 10
  6. SEARCH creators SIMILAR TO @creator_xyz LIMIT 10
  7. SIGNAL search_click item:@item_abc user:@user_42
     context:{ query: "rust tutorial beginner", rank_at_click: 3 }
  8. Re-execute search #1

Then:
  - Step 1: Results combine BM25 + semantic similarity via RRF;
    personalization re-ranks within relevant set; user_42 (a beginner)
    sees beginner content elevated; max 2 per creator enforced
  - Step 2: Text-only search (no vector), filtered by duration and format;
    only short videos returned
  - Step 3: Exact phrase match -- only items containing "exact phrase match"
    as a contiguous sequence
  - Step 4: Boolean exclusion -- no items matching "beginner" appear in results
  - Step 5: Creators returned by name/topic match, ordered by engagement rate
  - Step 6: Creators semantically similar to @creator_xyz by embedding distance
  - Step 7: Signal recorded with query context and rank position;
    item and user-topic affinity updated
  - Step 8: Clicked result @item_abc may rank higher due to search_click signal;
    signal written < 100ms ago is reflected
  - Performance: SEARCH < 50ms at 10K items
```

### Phases

#### Phase 1: Tantivy Integration (m5p1)

**Delivers:** Tantivy embedded as a derived index for full-text search. DB-primary consistency pattern: entity store is source of truth, Tantivy is a materialized view updated via an outbox sequence. BM25 scoring exposed via custom Collector and the Weight/Scorer seek pattern. Schema text fields (title, description, tags) automatically indexed. Crash recovery replays from the last committed sequence number stored in Tantivy's commit payload.

**Acceptance Criteria:**

- [x] `TextIndex` struct wraps Tantivy `Index`, `IndexWriter` (behind `Mutex`), and `IndexReader` with auto-reload
- [x] Tantivy schema created from tidalDB schema text field definitions: `text` fields get full-text tokenization with Tantivy's default tokenizer; `keyword` fields get raw (untokenized) indexing for exact match
- [x] `TextIndexWriter::index_item(entity_id, metadata)` adds or updates a document in Tantivy; `delete_item(entity_id)` removes via `delete_term` on the entity_id fast field
- [x] Background indexer: `TextIndexSyncer` reads entity store writes (via WAL sequence tracking) and feeds Tantivy writer; commit interval configurable (default: every 1000 documents or 2 seconds, whichever comes first)
- [x] Each Tantivy `commit()` stores the last-processed WAL sequence number in the commit payload via `set_payload()`; on crash recovery, replay from that sequence number
- [x] Custom `AllScoresCollector` implementing Tantivy's `Collector` trait returns all matching `(EntityId, f32)` pairs with BM25 scores; `requires_scoring()` returns `true`
- [x] `ScoredCandidateCollector` implementing Tantivy's `Collector` trait accepts a pre-sorted candidate set and returns BM25 scores for only those candidates via `DocSet::seek()` (for scoring ANN results)
- [x] External `EntityId -> DocAddress` mapping maintained via a fast field (`entity_id_field`) on every Tantivy document; mapping rebuilt on `IndexReader::reload()` after segment merges
- [x] Boolean query parsing: AND, OR, NOT operators; exact phrase (`"..."`); field-scoped (`title:jazz`, `tag:tutorial`); exclusion (`-beginner`); wildcard prefix (`pian*`)
- [x] Index rebuild from entity store: `text_index.rebuild_from(storage)` scans all items and rebuilds the Tantivy index; completes in < 10 minutes at 10K items
- [x] BM25 query latency < 10ms at 10K documents (Criterion benchmarked)
- [x] Tantivy `IndexWriter` heap budget set to 50MB (conservative for embedded use)
- [x] `LogMergePolicy` configured with defaults; `wait_merging_threads()` called on shutdown
- [x] `TextIndex` is `Send + Sync` -- safe to share across threads behind `Arc`

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | TextIndex Core | `TextIndex` struct, Tantivy schema generation from tidalDB schema, `IndexWriter`/`IndexReader` lifecycle, `entity_id` fast field, `TextIndex::open()` and `TextIndex::close()` | L |
| 02 | Document Write/Delete | `index_item()`, `delete_item()`, field mapping (text -> tokenized, keyword -> raw), metadata-to-document conversion | M |
| 03 | Background Syncer | `TextIndexSyncer` reads WAL sequence, feeds writer, configurable commit interval, `set_payload()` with sequence number, crash recovery replay | L |
| 04 | BM25 Scoring Collectors | `AllScoresCollector` for full scoring, `ScoredCandidateCollector` for seek-based candidate scoring, entity ID resolution from fast field | M |
| 05 | Boolean Query Parsing | AND/OR/NOT, exact phrase, field-scoped, exclusion, wildcard prefix; wraps Tantivy's `QueryParser` with custom syntax extensions | M |

**Depends On:** m1p3 (storage engine, key encoding), m1p5 (entity write API, WAL sequence), m2p2 (metadata fields used for field-scoped queries)
**Complexity:** XL (5 tasks; Tasks 01-02 sequential, then 03/04/05 can parallelize after 02 completes)
**Research Reference:** `docs/research/tantivy.md` (Collector API, consistency pattern, seek scoring, commit model, single-writer lock, segment merge)

#### Phase 2: Hybrid Fusion (RRF) (m5p2)

**Delivers:** Reciprocal Rank Fusion combining BM25 ranked lists with ANN ranked lists into a single scored result set. The starting point is RRF with k=60; the architecture supports upgrading to tuned linear combination when relevance labels exist. Handles the three retrieval modes: text-only, vector-only, and hybrid.

**Acceptance Criteria:**

- [x] `HybridFusion` struct with `fuse(bm25_results: &[(EntityId, f32)], ann_results: &[(EntityId, f32)], k: u32) -> Vec<(EntityId, f64)>` method
- [x] RRF formula: `score(d) = 1.0 / (k + rank_bm25(d)) + 1.0 / (k + rank_ann(d))` where `k = 60` by default
- [x] Documents appearing in only one list contribute only their single-list term (the other term is zero)
- [x] Results sorted by fused score descending
- [x] RRF results are passed to the existing `ProfileExecutor` for personalization re-ranking (user preference overlay, signal boosts, quality gates)
- [x] When only text query is provided (no vector), pure BM25 ranking passed directly to profile executor
- [x] When only vector is provided (no text), pure ANN ranking passed directly to profile executor
- [x] `k` parameter configurable per profile or per query (default 60)
- [x] Fusion adds < 1ms to query time for 1000 candidates from each list (Criterion benchmarked)
- [x] Property test: for any pair of ranked lists, RRF output contains the union of both input document sets with correct score computation to 6 decimal places

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | RRF Implementation | `HybridFusion::fuse()`, rank-to-score conversion, union merge of ranked lists, configurable `k` | S |
| 02 | Retrieval Mode Router | Logic to select text-only, vector-only, or hybrid based on query contents; routes to BM25, ANN, or fusion accordingly | S |

**Depends On:** m5p1 (BM25 scored results), m2p1 (ANN scored results)
**Complexity:** S (2 small tasks; Task 02 depends on 01)
**Research Reference:** `docs/research/tantivy.md` (RRF section, Cormack et al. SIGIR 2009, k=60 insensitivity, production system approaches)

#### Phase 3: SEARCH Query Parser and Executor (m5p3)

**Delivers:** The `SEARCH` query operation -- parser, planner, and executor -- that orchestrates text retrieval, semantic retrieval, hybrid fusion, personalization, filtering, diversity, and result assembly. Reuses the existing filter engine (m2p2), diversity enforcement (m2p4), and profile executor (m2p3/m3p3) from prior milestones. The `search_click` signal type is integrated for feedback loop closure.

**Acceptance Criteria:**

- [x] `Search` struct with fields: `entity_kind`, `query_text: Option<String>`, `query_vector: Option<Vec<f32>>`, `for_user: Option<u64>`, `for_session: Option<SessionId>`, `profile: ProfileRef`, `filters: Vec<FilterExpr>`, `diversity: Option<DiversityConstraints>`, `limit: u32`
- [x] `SearchBuilder` with fluent API: `.query("text")`, `.vector(&[f32])`, `.for_user(id)`, `.for_session(id)`, `.using_profile("search")`, `.filter(expr)`, `.diversity(constraints)`, `.limit(n)`, `.build()`
- [x] `db.search(&Search) -> Result<SearchResults>` executes the full pipeline
- [x] Search executor pipeline: (1) parse query text into Tantivy query, (2) if vector present, execute ANN retrieval, (3) if both, fuse via RRF, (4) load user context if `for_user` present, (5) apply profile scoring (personalization, signal boosts, quality gates), (6) apply metadata filters, (7) apply diversity enforcement, (8) assemble results with scores
- [x] `SearchResults` struct contains: `items: Vec<SearchResultItem>`, `next_cursor: Option<Cursor>`, `total_candidates: u64`
- [x] `SearchResultItem` contains: `id: EntityId`, `score: f64`, `bm25_score: Option<f32>`, `semantic_score: Option<f32>`, `signals: SignalSnapshot`
- [x] Query text parsing handles: bare terms (`jazz piano`), exact phrase (`"jazz piano"`), boolean operators (`AND`, `OR`, `NOT`, `-`), field-scoped (`title:jazz`, `tag:tutorial`, `creator:handle`), wildcard prefix (`pian*`), hashtag (`#jazz`)
- [x] `search_click` signal type recognized: `db.signal("search_click", item_id, 1.0, ts)` with context containing `query` and `rank_at_click` fields
- [x] Search profile `search` registered as a builtin: text relevance as floor, personalization adjustment, completion gate, diversity
- [x] Session context (`FOR SESSION`) integrates with search the same way it does with RETRIEVE (preference hint keyword boost, reward velocity factor)
- [x] End-to-end SEARCH < 50ms at 10K items (Criterion benchmarked)
- [x] Full M5 UAT steps 1-4 and 7-8 pass as integration test assertions

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Search Types and Builder | `Search`, `SearchBuilder`, `SearchResults`, `SearchResultItem` structs with validation | M |
| 02 | Search Executor Pipeline | `SearchExecutor` orchestrating BM25 retrieval, ANN retrieval, fusion, profile scoring, filtering, diversity, result assembly | L |
| 03 | Search Profile Builtin | `search` profile definition registered in `ProfileRegistry`, text relevance floor, personalization overlay, configurable RRF k | S |
| 04 | search_click Signal Integration | `search_click` signal type with context fields (query, rank_at_click), feedback loop wiring into user-topic affinity | S |

**Depends On:** m5p1 (Tantivy integration, BM25 queries), m5p2 (hybrid fusion), m2p2 (filter engine), m2p3 (profile executor), m2p4 (diversity), m2p5 (query parser infrastructure, RETRIEVE executor pattern to follow), m3p3 (personalized profiles, UserContext), m4p4 (SessionContext for FOR SESSION)
**Complexity:** L (4 tasks; Tasks 01 first, then 02 depends on 01; Tasks 03 and 04 can parallelize with 02)
**Research Reference:** `VISION.md` (SEARCH query syntax), `API.md` (SEARCH operation, query syntax table), `USE_CASES.md` UC-02 (search capabilities), `SEQUENCE.md` UC-02 (search sequence diagram)

#### Phase 4: Creator and People Search (m5p4)

**Delivers:** Search over creator entities by name, topic, and attributes. "Creators like X" via creator embedding similarity. Creator entities indexed in both Tantivy (text fields) and USearch (embeddings). Enables UC-10 (People and Creator Search).

**Acceptance Criteria:**

- [x] Creator entities indexed in Tantivy when written via `db.write_creator()`: fields `name` (text, tokenized), `handle` (keyword, raw), `region` (keyword), `language` (keyword), `verified` (bool)
- [x] Creator embeddings indexed in a dedicated USearch index (separate from item embeddings) when provided via `write_creator(id, metadata, Some(embedding))`
- [x] `SEARCH creators QUERY "jazz" LIMIT 10` returns creators matching by name or topic, ordered by BM25 relevance
- [x] `SEARCH creators QUERY "jazz" FILTER verified:true LIMIT 10` filters by creator attributes
- [x] `SEARCH creators SIMILAR TO @creator_id LIMIT 10` retrieves the source creator's embedding and runs ANN against the creator vector index
- [x] Creator search results include: `id: EntityId`, `score: f64`, `metadata: HashMap<String, String>`
- [x] Creator sort modes available: `Sort::CreatorEngagementRate` (average engagement ratio across recent catalog), `Sort::MostFollowed` (follower count desc)
- [x] Creator filters composable: `verified`, `min_followers`, `max_followers`, `language`, `region`, `followed_by_user` (requires FOR USER)
- [x] `followed_by_user` filter uses the existing `FollowsBitmap` infrastructure from m3p1 to restrict results to creators the user follows
- [x] Hybrid search on creators: `SEARCH creators QUERY "jazz" VECTOR [query_embedding] LIMIT 10` fuses BM25 name/topic match with embedding similarity via RRF
- [x] Creator search latency < 20ms at 200 creators (Criterion benchmarked)
- [x] Full M5 UAT steps 5-6 pass as integration test assertions

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Creator Text Indexing | Tantivy indexing for creator entities, field mapping, write/delete hooks in `write_creator()`/`update_creator()`, syncer integration | M |
| 02 | Creator Vector Index | Dedicated USearch index for creator embeddings, insertion on `write_creator()`, ANN search, `SIMILAR TO @creator_id` resolution | M |
| 03 | Creator Search Executor | `SEARCH creators` routing in search executor, creator-specific filters (verified, followers, language), sort modes, `followed_by_user` via FollowsBitmap, hybrid fusion for creators | M |

**Depends On:** m5p1 (Tantivy integration, syncer infrastructure), m5p3 (SEARCH executor pipeline to extend), m3p1 (creator entities, FollowsBitmap), m2p1 (vector index infrastructure)
**Complexity:** L (3 tasks; Task 01 and 02 can parallelize; Task 03 depends on both)
**Research Reference:** `USE_CASES.md` UC-10 (People and Creator Search: name search, "creators like X", social graph discovery), `API.md` (SEARCH creators examples), `docs/research/ann_for_tidaldb.md` (creator embedding similarity)

### Phase Dependency DAG

```
m5p1 (Tantivy Integration)
    |         \
    v          \
m5p2 (RRF)     \
    |            \
    v             v
m5p3 (SEARCH Executor)
    |
    v
m5p4 (Creator Search)
```

m5p1 is the foundation -- everything else depends on having a working text index. m5p2 (RRF fusion) depends on m5p1 for BM25 scores and on the existing m2p1 for ANN scores. m5p3 (SEARCH executor) depends on both m5p1 and m5p2 to orchestrate the full pipeline. m5p4 (Creator search) depends on m5p1 (for creator text indexing) and m5p3 (for the search executor to extend).

Within m5p1, tasks 01-02 are sequential (schema before documents), then tasks 03, 04, and 05 can parallelize once document write is working.

### Deferred to Later Milestones

- **Autocomplete and search suggestions (UC-02.3)** -- deferred to M6; requires prefix indexes on the Tantivy term dictionary and trending query tracking infrastructure; M5 proves search works, M6 adds the polish features
- **Saved searches and alerts (UC-02.4)** -- deferred to M6; requires persistent query storage, new-result detection on each indexing pass, and push notification integration; M5 provides the search primitive, M6 builds subscriptions on top
- **Visual search / image search (UC-11 full)** -- deferred to M6; UC-11 core (vector-only search) works via M5's `SEARCH items VECTOR [embedding] LIMIT N`; the full crop-and-search and multi-modal (text query against image items) workflow requires additional embedding pipeline coordination
- **"Did you mean" typo correction** -- deferred to M6; requires edit-distance computation on the Tantivy term dictionary and a suggestion model; not required for M5's UAT
- **Tuned linear combination (replacing RRF)** -- deferred to M7 or later; requires relevance labels and offline evaluation infrastructure; RRF is the correct zero-configuration starting point
- **Query composition / SEARCH WITHIN scope** (searching within trending, within cohort trending, within following) -- deferred to M6; requires candidate set intersection with scoped retrieval; M5 proves standalone search works first
- **Semantic session hint matching** -- deferred to M6; M4's keyword matching is sufficient; semantic matching via Tantivy text analysis would upgrade hint precision but is not required for M5's UAT
- **Search result explanation** ("why this result?") -- deferred to M6/M7; Tantivy provides `Query::explain()` per document but it is expensive; not required for M5's UAT

### Integration Test

```rust
#[test]
fn milestone_5_uat() {
    let db = open_with_search_schema();

    // Write 10K items with text fields, embeddings, and metadata.
    for i in 0..10_000u64 {
        let meta = item_metadata(i); // title, description, tags, category, format, creator_id
        let embedding = item_embedding(i); // 1536-dim
        db.write_item_with_metadata(EntityId::new(i), &meta).unwrap();
        db.write_item_embedding(EntityId::new(i), &embedding).unwrap();
    }

    // Write 200 creators with names, handles, and embeddings.
    for c in 0..200u64 {
        let meta = creator_metadata(c); // name, handle, verified, language
        let embedding = creator_embedding(c);
        db.write_creator(EntityId::new(c), &meta).unwrap();
        db.write_creator_embedding(EntityId::new(c), &embedding).unwrap();
    }

    // Write user 42 with engagement history.
    db.write_user(EntityId::new(42), &user_metadata()).unwrap();
    for e in generate_engagement_events(500, EntityId::new(42)) {
        db.signal(&e.signal_type, e.entity_id, e.weight, e.timestamp).unwrap();
    }

    // Wait for Tantivy syncer to commit.
    db.flush_text_index().unwrap();

    // Step 1: Hybrid search with personalization and diversity.
    let query_vec = embed("rust tutorial beginner");
    let results = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("rust tutorial beginner")
            .vector(&query_vec)
            .for_user(42)
            .diversity(DiversityConstraints { max_per_creator: Some(2), ..Default::default() })
            .limit(20)
            .build().unwrap()
    ).unwrap();
    assert_eq!(results.items.len(), 20);
    assert!(results.items.iter().all(|r| r.score > 0.0));
    assert!(results.items.windows(2).all(|w| w[0].score >= w[1].score));
    assert!(creator_counts(&results.items).values().all(|&c| c <= 2));

    // Step 2: Text-only with filters.
    let filtered = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("jazz piano")
            .for_user(42)
            .filter(FilterExpr::eq("format", "video"))
            .limit(20)
            .build().unwrap()
    ).unwrap();
    assert!(filtered.items.iter().all(|r| r.bm25_score.is_some()));
    assert!(filtered.items.iter().all(|r| r.semantic_score.is_none()));

    // Step 3: Exact phrase match.
    let phrase = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("\"exact phrase match\"")
            .limit(10)
            .build().unwrap()
    ).unwrap();
    // All returned items must contain the exact phrase in some text field.

    // Step 4: Boolean exclusion.
    let excluded = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("jazz -beginner")
            .limit(10)
            .build().unwrap()
    ).unwrap();
    // No returned items should match "beginner" in any text field.

    // Step 5: Creator search by topic.
    let creators = db.search(
        SearchBuilder::new(EntityKind::Creator, ProfileRef::new("search"))
            .query("jazz")
            .limit(10)
            .build().unwrap()
    ).unwrap();
    assert!(!creators.items.is_empty());

    // Step 6: Creators similar to creator_xyz by embedding.
    let similar_creators = db.search(
        SearchBuilder::new(EntityKind::Creator, ProfileRef::new("search"))
            .similar_to(EntityId::new(5))
            .limit(10)
            .build().unwrap()
    ).unwrap();
    assert!(!similar_creators.items.is_empty());
    assert!(similar_creators.items.iter().all(|r| r.id != EntityId::new(5)));

    // Step 7: Search click signal with context.
    let clicked = results.items[2].id;
    db.signal("search_click", clicked, 1.0, Timestamp::now()).unwrap();

    // Step 8: Re-search -- clicked result may rank higher.
    let results2 = db.search(
        SearchBuilder::new(EntityKind::Item, ProfileRef::new("search"))
            .query("rust tutorial beginner")
            .vector(&query_vec)
            .for_user(42)
            .limit(20)
            .build().unwrap()
    ).unwrap();
    let rank_before = results.items.iter().position(|r| r.id == clicked).unwrap();
    let rank_after = results2.items.iter().position(|r| r.id == clicked);
    // The clicked result should appear at the same or better rank.
    if let Some(ra) = rank_after {
        assert!(ra <= rank_before);
    }
}
```

### Done When

A developer can execute `SEARCH items QUERY "rust tutorial beginner" VECTOR [query_embedding] FOR USER @user_42 USING PROFILE search DIVERSITY max_per_creator:2 LIMIT 20` and receive results that combine BM25 text relevance with semantic vector similarity, re-ranked by user personalization and engagement signals, with diversity constraints enforced. Boolean queries (`AND`/`OR`/`NOT`), exact phrase matching (`"..."`), field-scoped search (`title:...`), and wildcard prefix (`term*`) all work. Creator search returns creators by name, topic, and embedding similarity. The `search_click` signal closes the feedback loop -- a clicked result influences the next search. End-to-end SEARCH latency < 50ms at 10K items. All 8 UAT scenario steps pass in the integration test.

---

## Milestone 6: Full Surface Coverage -- "Every use case, every sort mode, every filter, every feedback loop"

### Milestone Thesis

Milestones 1-5 proved that a single database can ingest signals, rank content, personalize results, manage agent memory, and execute hybrid text+vector search. But a skeptical engineer can still say: "Sure, it handles the happy path, but my platform needs cohort-scoped trending, collections, live content, notification capping, autocomplete, and query composition -- and those are the surfaces that force me back to multiple systems." M6 proves they are wrong. After M6, every query in SEQUENCE.md executes correctly, every Sort variant in API.md works, every filter in USE_CASES.md Appendix A composes, every UC-01 through UC-15 surface is testable end-to-end. The gap between "demo database" and "production database for content ranking" closes here.

### Enables

- **UC-15** (Cohort-Scoped Trending) -- Full: cohort definitions, per-cohort signal aggregation at write time, cohort filter in RETRIEVE
- **UC-03 full** -- Social-graph-scoped trending, cohort-scoped trending, search within cohort trending
- **UC-05 full** -- Collaborative filtering ("users who liked X also liked Y") in the `related` profile
- **UC-06 full** -- All remaining sort modes: AlphabeticalAsc/Desc, MostCommented, MostShared, Shortest, Longest, LiveViewerCount, DateSaved
- **UC-07 full** -- Notification frequency capping (per-creator and per-user daily caps)
- **UC-08 full** -- Creator profile page modes (top/hot/for_you filtered within one creator's catalog)
- **UC-09 full** -- User library: collections CRUD, `in_collection` filter, `in_progress` (continue watching), saved searches as persistent feeds
- **UC-10 full** -- "Creators followed by people I follow" via social graph traversal
- **UC-12 full** -- Live content: viewer_count signal, `status=live` filter, `LiveViewerCount` sort
- **UC-02 full** -- SUGGEST autocomplete, trending searches, query composition (SEARCH WITHIN TRENDING / COHORT_TRENDING / FOLLOWING / COLLECTION)
- **UC-01 through UC-15** -- Full end-to-end UAT for all 15 use cases

### UAT Scenario

```
Given:
  A TidalDb instance with:
    - 500 items across 50 creators, embeddings, signals (view, like, share, comment,
      skip, completion, hide, follow), metadata (category, format, duration, language,
      status, title)
    - 20 users with locale, age_range, explicit_interests, engagement_level attributes
      and preference vectors
    - Relationship graph: follows, blocks, interaction weights
    - 2 named cohorts: "us_young_music" (locale=en-US, age_range in [18-24, 25-34],
      primary_categories includes music) and "jp_casual" (region=JP, engagement_level=casual)
    - 3 user-created collections
    - 5 items with status=live and active viewer_count signals
    - Signal history creating measurable velocity differences between cohort members
      and non-members

When/Then:

1. Cohort-scoped trending (UC-15):
   db.retrieve(Retrieve::builder().profile("cohort_trending").cohort("us_young_music").limit(10))
   → Items ranked by velocity from cohort members only; different from global trending.

2. Social-graph-scoped trending (UC-03):
   db.retrieve(Retrieve::builder().for_user(user_a).profile("trending")
       .filter(FilterExpr::social_graph(user_a, 2)).limit(10))
   → Only items engaged by users that user_a follows appear.

3. All sort modes (UC-06):
   - Sort::AlphabeticalAsc → titles A-Z order
   - Sort::Shortest → ascending duration order
   - Sort::LiveViewerCount + Filter::eq("status","live") → live items by viewer count
   - Sort::DateSaved + Filter::user_state("saved") + for_user → by save timestamp

4. Collections and user library (UC-09):
   db.create_collection(user_a, "jazz_favorites", Visibility::Private)
   db.add_to_collection(coll_id, item_1); db.add_to_collection(coll_id, item_2)
   db.retrieve(Retrieve::builder().filter(FilterExpr::in_collection(coll_id)).limit(10))
   → Exactly 2 results.
   Filter::user_state("in_progress") → Items partially watched by user_a.

5. Live content (UC-12):
   db.signal("viewer_count", live_item, 1.0, ts)
   db.retrieve(Retrieve::builder().filter(FilterExpr::eq("status","live"))
       .sort(Sort::LiveViewerCount).limit(5))
   → Live items ordered by current viewer count.

6. Notification caps (UC-07):
   db.retrieve(Retrieve::builder().for_user(user_a).profile("notification")
       .filter(FilterExpr::since(last_seen))
       .notification_caps(NotificationCaps { max_per_creator_per_day: 1, max_total_per_day: 3 })
       .limit(20))
   → At most 1 item per creator, at most 3 total.

7. Query composition (UC-02.5):
   db.search(Search::builder().query("jazz piano").within(WithinScope::Trending { window_hours: 24 }).limit(10))
   → Only items that are BOTH relevant to "jazz piano" AND trending.
   db.search(Search::builder().query("jazz piano")
       .within(WithinScope::CohortTrending { cohort: "us_young_music", window_hours: 24 }).limit(10))
   → Intersection of cohort-scoped trending and text relevance.

8. SUGGEST autocomplete (UC-02.3):
   db.suggest(&Suggest { prefix: "jazz pia", for_user: None, limit: 5 })
   → ["jazz piano", "jazz piano tutorial", ...]
   db.suggest(&Suggest { prefix: "", for_user: None, limit: 10 })
   → Trending search terms returned.
   Latency < 20ms.

9. Collaborative filtering in related (UC-05):
   After recording co-engagement (users who liked item_X also liked item_Y),
   db.retrieve(Retrieve::builder().for_user(user_a).profile("related").similar_to(item_x).limit(10))
   → item_Y appears in results via co-engagement signal, not just embedding proximity.
```

### Phases

---

#### Phase 1: Cohort Engine + Cohort-Scoped Trending (m6p1)

**Delivers:** Cohort definitions stored in schema with compound predicates over user attributes. Cohort membership resolved at signal write time. Per-cohort signal aggregation in a `CohortSignalLedger`. `cohort()` parameter on `Retrieve`. `cohort_trending` built-in profile. Cohort filter in queries.

**Acceptance Criteria:**

- [x] `db.define_cohort(CohortDef { name, predicate, aggregation })` stores a named cohort predicate over user attributes (locale, age_range, primary_categories, engagement_level, region); validated at write time; duplicate names rejected
- [x] `Predicate` enum supports: `Eq(field, value)`, `Any(field, values)`, `Range(field, lo, hi)`, `And(Vec<Predicate>)`, `Or(Vec<Predicate>)` -- sufficient to express all cohort definitions in USE_CASES.md
- [x] On `db.signal(kind, entity_id, weight, ts)` with user context, the signal is attributed to every materialized cohort the user belongs to; cohort membership resolved from user metadata at write time via `CohortResolver`
- [x] `CohortResolver` evaluates user metadata against all registered cohort predicates; result cached in `DashMap<EntityId, Vec<CohortName>>` with invalidation on user metadata write
- [x] `CohortSignalLedger`: `DashMap<(CohortName, EntityId, SignalTypeId), HotEntry>` with the same decay/velocity/windowed semantics as the global `SignalLedger`; updated atomically with the global ledger on each signal write
- [x] `RetrieveBuilder::cohort(name)` scopes signal reads to `CohortSignalLedger` instead of global ledger; errors with `TidalError::NotFound` if cohort is not defined
- [x] `RetrieveBuilder::cohort_predicate(Predicate)` supports ad-hoc cohort queries by resolving matching users at query time and aggregating their signals on-demand (slower, but correct; no pre-materialized state required)
- [x] `cohort_trending` built-in profile registered: `CandidateStrategy::Scan`, velocity-based scoring reading from cohort ledger, same gates as `trending`
- [x] Cohort-scoped trending produces results different from global trending when cohort signal velocity diverges from global velocity (verified via a test with intentionally divergent engagement patterns)
- [x] Per-cohort ledger state checkpointed alongside global ledger; survives crash and restart
- [x] Performance: cohort-scoped RETRIEVE with materialized cohort completes in < 50ms at 500 items / 20 users

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | `CohortDef` + `Predicate` types + schema storage | Schema-level cohort definitions with compound predicates; stored in-memory and checkpointed | M |
| 02 | `CohortResolver` | Evaluates user metadata against all registered predicates; `DashMap` cache with invalidation on user write | M |
| 03 | `CohortSignalLedger` | Per-cohort `DashMap<(CohortName, EntityId, SignalTypeId), HotEntry>` with decay/velocity/windowed semantics identical to global ledger | L |
| 04 | Wire cohort attribution into signal write path | On `db.signal()`, resolve user's cohorts, call `cohort_ledger.record(cohort, entity, signal, weight, ts)` for each | M |
| 05 | `RetrieveBuilder::cohort()` + executor integration | Thread cohort name through `Retrieve` → `RetrieveExecutor`; read from `CohortSignalLedger` in scoring stage | M |
| 06 | Ad-hoc cohort predicate (`cohort_predicate`) | Resolve matching users at query time; aggregate their signals on-demand | M |
| 07 | `cohort_trending` builtin profile + tests | Register profile; integration test proving cohort vs. global trending divergence | M |
| 08 | Checkpoint/restore for cohort state | Serialize `CohortSignalLedger` alongside global ledger in periodic checkpoint | S |

**Depends On:** None (foundation phase; all other M6 phases can start in parallel once task 03 is complete)
**Complexity:** XL
**Research Reference:** `VISION.md:46-50` (cohort model), `USE_CASES.md:554-591` (UC-15), `SEQUENCE.md:306-347` (cohort trending sequence), `docs/research/tidaldb_signal_ledger.md` (signal aggregation architecture)

---

#### Phase 2: Social Graph Extension + Collaborative Filtering (m6p2)

**Delivers:** Reverse relationship index (creator→followers), social-graph-scoped trending filter (`FilterExpr::social_graph`), co-engagement signal tracking, collaborative filtering boost in the `related` profile.

**Acceptance Criteria:**

- [x] Reverse relationship index maintained: given a `creator_id`, retrieve all users who follow them. Implemented as `DashMap<EntityId, RoaringBitmap>` mapping entity → inbound follower set; updated on every `write_relationship` call
- [x] Reverse index persisted alongside relationship data; survives crash and restart
- [x] `FilterExpr::social_graph(user_id, depth: u8)` implemented: depth=1 constrains candidates to items from creators the user follows; depth=2 expands to items engaged by the users the follow graph resolves to
- [x] Social-graph-scoped trending: when `FilterExpr::social_graph` is used with a trending profile, velocity reads are scoped to signals from users in the resolved social subgraph
- [x] Co-engagement tracking: on a positive engagement signal (like or completion ≥ 0.8), record pairwise co-engagement edges between the engaged item and the user's last 50 positively-engaged items. Edge weight incremented per co-occurrence. Stored in `DashMap<(EntityId, EntityId), f32>` with LRU eviction at configurable capacity (default: 50K pairs)
- [x] `related` profile scoring incorporates co-engagement: `final = embedding_sim × 0.6 + co_engagement × 0.3 + signal_score × 0.1`
- [x] Co-engagement is asymmetric: `(A,B)` and `(B,A)` are separate entries; query uses the seed item as the first key
- [x] Performance: `social_graph` filter at depth 2 with 20 users / 10 followed creators completes in < 50ms

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Reverse relationship index | `DashMap<EntityId, RoaringBitmap>`, updated on `write_relationship`, persisted to storage | M |
| 02 | `FilterExpr::social_graph(user_id, depth)` | New filter variant; depth-1 resolves followed creators → their items via `CreatorItemsBitmap`; depth-2 expands to engaged items of followed users | L |
| 03 | Co-engagement tracker | `CoEngagementIndex`: pairwise co-occurrence counting on positive engagement signals; bounded with LRU eviction | L |
| 04 | Wire co-engagement into `related` profile | In executor scoring, fetch co-engagement scores for all candidates relative to seed item; blend into final score | M |
| 05 | Social-graph-scoped trending | When `social_graph` filter is present with a trending profile, scope velocity reads to signals from users in the resolved subgraph | M |
| 06 | Persistence + crash recovery | Serialize reverse index and co-engagement map at checkpoint; restore on open | S |

**Depends On:** Phase 1 (cohort ledger patterns reused for social-graph signal scoping)
**Complexity:** L
**Research Reference:** `USE_CASES.md:248-270` (UC-05 collaborative filtering), `SEQUENCE.md:95-141` (UC-03 social trending), `thoughts.md:104-113` (lock-free concurrency), `docs/research/ann_for_tidaldb.md` (PinnerSage multi-query retrieval)

---

#### Phase 3: Full Sort Mode Coverage + Live Content + Engagement Filters (m6p3)

**Delivers:** All missing `Sort` variants, `viewer_count` signal for live content, engagement threshold filters, geographic post-filter, `live` built-in profile.

**Acceptance Criteria:**

- [x] `Sort` enum extended with: `AlphabeticalAsc`, `AlphabeticalDesc`, `Shortest`, `Longest`, `MostCommented { window: Window }`, `MostShared { window: Window }`, `LiveViewerCount`, `DateSaved`
- [x] `AlphabeticalAsc` / `AlphabeticalDesc`: sort by item "title" metadata field, case-insensitive, with missing-title items last
- [x] `Shortest` / `Longest`: sort by item "duration" metadata field in seconds; items without duration last
- [x] `MostCommented` / `MostShared`: sort by windowed count of "comment" / "share" signal, following the same pattern as existing `MostViewed` / `MostLiked`
- [x] `LiveViewerCount`: sort by current decayed score of "viewer_count" signal; items without the signal score 0.0
- [x] `DateSaved`: sort by timestamp when the querying user saved the item (from `UserStateIndex`); requires `for_user`; returns `TidalError::Query` if absent
- [x] `viewer_count` signal type pre-registered in default schema: exponential decay with 5-minute half-life, no windowed aggregation, no velocity (represents current concurrent viewer count)
- [x] `live` built-in profile registered: `CandidateStrategy::Scan`, `Sort::LiveViewerCount`, relationship_weight boost from querying user's follows, diversity max_per_creator:1
- [x] `FilterExpr::MinSignal { signal: String, threshold: f64 }` and `FilterExpr::MaxSignal { signal: String, threshold: f64 }` evaluate AllTime windowed count against a threshold
- [x] `FilterExpr::NearLocation { lat: f64, lng: f64, radius_km: f64 }` evaluates Haversine distance against item "latitude" / "longitude" metadata; evaluated as a post-filter (not index-backed)
- [x] All 20 Sort enum variants (covering all 27 Appendix B sort modes via window parameterization) have at least one unit test proving correct ordering semantics
- [x] Performance: metadata-based sorts (alphabetical, duration) complete in < 50ms at 500 items

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Extend `Sort` enum with 8 new variants | Add enum arms to `ranking/profile.rs` | S |
| 02 | Metadata-based sort scoring | Case-insensitive title sort; duration-in-seconds parse; missing-field handling | M |
| 03 | Signal-count sort variants | Wire windowed count reads for "comment"/"share"; LiveViewerCount reads decay score of "viewer_count" | M |
| 04 | `DateSaved` sort | Read save timestamp from `UserStateIndex` per candidate; requires `for_user`; sort descending | M |
| 05 | `viewer_count` signal + `live` builtin | Register signal in default schema; register `live` profile | S |
| 06 | Engagement threshold filters | `MinSignal` / `MaxSignal` filter variants with AllTime windowed count evaluation | M |
| 07 | Geographic filter | `NearLocation` filter variant with Haversine distance post-filter | S |
| 08 | Unit + integration tests | One test per Sort variant; tests for engagement and geographic filters | M |

**Depends On:** None (parallel with phases 1 and 2; touches different modules)
**Complexity:** L
**Research Reference:** `USE_CASES.md:280-336` (UC-06 sort modes), `USE_CASES.md:475-501` (UC-12 live content), `USE_CASES.md:594-696` (Appendix A filters), `API.md:1001-1035` (Sort enum spec)

---

#### Phase 4: User Collections + Watch History + Saved Searches (m6p4)

**Delivers:** `Collection` entity type, collection CRUD API, `in_collection` filter, `in_progress` user state filter, saved searches as persistent feeds, cross-session preference aggregation onto global user vector.

**Acceptance Criteria:**

- [x] `db.create_collection(owner: EntityId, name: &str, visibility: Visibility) -> Result<CollectionId>` creates a named collection; `Visibility` is `Private`, `Shared`, `Public`
- [x] `db.add_to_collection(collection_id: &CollectionId, item_id: EntityId) -> Result<()>` adds an item; idempotent (adding the same item twice is not an error)
- [x] `db.remove_from_collection(collection_id: &CollectionId, item_id: EntityId) -> Result<()>` removes an item
- [x] `db.list_collections(owner: EntityId) -> Result<Vec<CollectionInfo>>` returns the user's collections
- [x] Item membership stored as `DashMap<CollectionId, RoaringBitmap>` for O(1) membership check; persisted to fjall
- [x] `FilterExpr::in_collection(collection_id)` constrains candidates to the collection's `RoaringBitmap`
- [x] `FilterExpr::user_state("in_progress")` returns items where the user has a `partial_completion` or `completion` signal with weight < 0.9; scans user state for items matching this predicate
- [x] `db.save_search(user, name, query_text, filters) -> Result<()>` persists a search configuration; `db.list_saved_searches(user) -> Result<Vec<SavedSearchInfo>>`; `db.retrieve_saved_search(user, name, since) -> Result<SearchResults>` re-executes the search with `created_after: since`; `db.delete_saved_search(user, name) -> Result<()>`
- [x] Cross-session preference aggregation: on `close_session`, session-level preference hints are merged into the user's global preference vector with a configurable damping factor (default: 0.1 × session hint weight); closes the M4 deferred item ("session signal influence on global user preference vector")
- [x] Collections and saved searches survive crash and restart
- [x] Performance: `in_collection` filter with 100 items completes in < 10ms

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Collection storage model | `Collection` struct, `CollectionId` newtype, `Visibility` enum; item membership `DashMap<CollectionId, RoaringBitmap>`; persisted to fjall | M |
| 02 | Collection CRUD API | `create_collection`, `add_to_collection`, `remove_from_collection`, `list_collections` on `TidalDb`; idempotent add | M |
| 03 | `FilterExpr::in_collection` | New filter variant; evaluator checks membership in the bitmap | S |
| 04 | `FilterExpr::user_state("in_progress")` | Extend user state filter: detect partial completion from user state index | M |
| 05 | Saved search storage + CRUD | `SavedSearch` struct in users keyspace; `save_search`, `list_saved_searches`, `delete_saved_search`, `retrieve_saved_search` | M |
| 06 | Cross-session preference aggregation | On `close_session`, extract preference delta from `SessionSnapshot`, apply to global preference vector with damping | M |
| 07 | Persistence + integration tests | Collections and saved searches survive restart; CRUD tests; `in_progress` filter test | M |

**Depends On:** None (parallel with phases 1-3; operates on different storage surfaces)
**Complexity:** L
**Research Reference:** `USE_CASES.md:381-421` (UC-09), `API.md:1196-1210` (Collections), `API.md:1177-1192` (Saved Searches), `VISION.md:52-53` (session→global preference boundary)

---

#### Phase 5: Query Composition + SUGGEST Autocomplete (m6p5)

**Delivers:** `WithinScope` on SEARCH queries (Trending, CohortTrending, Following, Category, Collection), `db.suggest()` autocomplete with prefix match and trending searches.

**Acceptance Criteria:**

- [x] `SearchBuilder::within(WithinScope)` constrains the candidate set before BM25+ANN retrieval; the pre-filter produces a `RoaringBitmap` passed to both Tantivy (post-filter) and USearch (predicate callback)
- [x] `WithinScope::Trending { window_hours: u64 }` -- candidates are items with view+share velocity above the p75 threshold in the specified window; computed from the global signal ledger
- [x] `WithinScope::CohortTrending { cohort: String, window_hours: u64 }` -- candidates are items with cohort-scoped velocity above threshold; requires cohort to be defined (`TidalError::NotFound` otherwise)
- [x] `WithinScope::Following` -- candidates are items from creators the querying user follows; requires `for_user` (`TidalError::Query` otherwise)
- [x] `WithinScope::Category { name: String }` -- candidates are items matching the category metadata value
- [x] `WithinScope::Collection { id: CollectionId }` -- candidates are items in the specified collection's bitmap
- [x] `db.suggest(Suggest { prefix: String, for_user: Option<EntityId>, limit: u32 }) -> Result<Vec<Suggestion>>`
- [x] Prefix autocomplete: `SuggestionIndex` maintains a sorted `Vec<String>` of terms extracted from item titles at write time; binary search on prefix; updated incrementally on `write_item_with_metadata`
- [x] Trending searches: when prefix is empty, returns top N search terms by recent query frequency; tracked via `DashMap<String, AtomicU64>` incremented on each `db.search()` call; periodic pruning of stale entries
- [x] Performance: SEARCH with `WithinScope` completes in < 50ms; SUGGEST completes in < 20ms

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | `WithinScope` enum + `SearchBuilder::within()` | Define enum with 5 variants; add field to `Search` struct | S |
| 02 | `ScopeResolver` -- scope to bitmap | Converts `WithinScope` into `RoaringBitmap`: Trending/CohortTrending via velocity scan, Following via FollowsBitmap, Category via bitmap index, Collection via collection bitmap | L |
| 03 | Wire scope bitmap into `SearchExecutor` | Pass scope bitmap to BM25 (Tantivy post-filter) and ANN (USearch predicate callback) before candidate scoring | L |
| 04 | `SuggestionIndex` + prefix autocomplete | Sorted `Vec<String>` of title terms; incremental updates on `write_item`; binary search on prefix match | M |
| 05 | Trending search frequency counter | `DashMap<String, AtomicU64>` incremented on `db.search()`; `suggest(prefix="")` returns top N | M |
| 06 | `db.suggest()` public API | Delegates to `SuggestionIndex` for prefix; trending counter for empty prefix; optional interest-category boost when `for_user` is set | S |
| 07 | Integration tests | Each `WithinScope` variant tested independently; composition test (search within cohort trending); suggest with prefix and empty prefix | M |

**Depends On:** Phase 1 (CohortTrending scope needs cohort ledger), Phase 4 (Collection scope needs collection bitmap)
**Complexity:** L
**Research Reference:** `USE_CASES.md:148-174` (UC-02.3-5), `API.md:822-895` (WithinScope + SUGGEST spec), `SEQUENCE.md:306-347` (search within cohort trending)

---

#### Phase 6: Notification Capping + Adaptive Preferences + Creator Profile Modes + M6 UAT (m6p6)

**Delivers:** Notification frequency capping, adaptive preference learning rate, `for_creator` query constraint for creator profile pages, comprehensive M6 UAT test suite proving all 15 use cases.

**Acceptance Criteria:**

- [x] `NotificationCaps { max_per_creator_per_day: u32, max_total_per_day: u32 }` type defined; `RetrieveBuilder::notification_caps(caps)` adds it to the query
- [x] Notification caps enforced as a post-diversity pass: count results per creator (using a `HashMap<EntityId, u32>`) and cap at `max_per_creator_per_day`; cap total results at `max_total_per_day`
- [x] Per-creator notification tracking: `DashMap<(EntityId, EntityId, NaiveDate), u32>` counting notifications delivered `(user, creator, date)`; updated after each `notification`-profile RETRIEVE; reset implied by date key expiry
- [x] Adaptive preference learning rate: EMA alpha decays logarithmically with update count per user: `alpha = base_alpha / (1 + ln(update_count + 1))`; `base_alpha` configurable in schema (default: 0.1); update count tracked in `UserStateIndex` alongside preference vector
- [x] After 1000 preference updates, new signals shift the vector < 5% as much as the first signal -- verified by a unit test comparing shift magnitude at update counts 1, 100, 1000
- [x] `RetrieveBuilder::for_creator(creator_id: EntityId)` adds `FilterExpr::eq("creator_id", creator_id.to_string())` and restricts candidate generation to items from that creator via `CreatorItemsBitmap`
- [x] Creator profile page modes verified: `for_creator(x)` + `for_you` profile returns creator x's items ranked by the querying user's preferences; `for_creator(x)` + `hot` returns hot-sorted items within x's catalog
- [x] M6 UAT integration test (`tidal/tests/m6_uat.rs`): all 9 UAT steps from the scenario above, each as a separate `#[test]` function; data setup shared via `setup_m6_test_db()`
- [x] All prior milestone integration tests (m2_uat, m3_uat, m4_uat, m5_uat, m5_search, m5p4_creator_search) continue to pass
- [x] Total of 25 built-in profiles: 16 existing + `cohort_trending` (m6p1) + `live` (m6p3) + 7 sort-mode profiles added in m6p3 (`alphabetical_asc`, `alphabetical_desc`, `shortest`, `longest`, `most_commented`, `most_shared`, `date_saved`)

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | `NotificationCaps` type + `RetrieveBuilder::notification_caps()` | Define struct; add optional field to `Retrieve` | S |
| 02 | Notification cap enforcement in executor | Post-diversity pass with per-creator and total count tracking | M |
| 03 | Adaptive preference learning rate | Modify `update_preference_vector()` to read update count; alpha formula; unit test for decay curve | M |
| 04 | `RetrieveBuilder::for_creator(creator_id)` | Convenience method adding creator filter + `CreatorItemsBitmap` restriction | S |
| 05 | Creator profile mode integration tests | Verify `for_creator` + `for_you` and `for_creator` + `hot` produce correct scoped rankings | S |
| 06 | M6 UAT test suite | `tidal/tests/m6_uat.rs`: 9 test functions covering all UAT steps; shared `setup_m6_test_db()` fixture; all 15 UCs exercised | XL |

**Depends On:** All prior M6 phases (UAT exercises everything built in m6p1-m6p5)
**Complexity:** L
**Research Reference:** `USE_CASES.md:339-362` (UC-07 notification caps), `USE_CASES.md:366-378` (UC-08 creator profile modes), `VISION.md:43-44` (user preferences update continuously)

---

### Phase Dependency DAG

```
m6p1 (Cohort Engine)
    |          \
    v           \
m6p2 (Social     m6p3 (Sort + Live)    m6p4 (Collections)
  Graph)                |                      |
    |                   |                      |
    +-------------------+----------+-----------+
                                   |
                              m6p5 (Query Composition + SUGGEST)
                                   |
                              m6p6 (Notification Caps + M6 UAT)
```

- **m6p1** is the cohort foundation used by m6p2 (social-graph signal scoping patterns) and m6p5 (CohortTrending scope)
- **m6p2, m6p3, m6p4** can execute in parallel with m6p1 and with each other -- they touch different modules
- **m6p5** requires m6p1 (CohortTrending scope) and m6p4 (Collection scope) to be complete
- **m6p6** requires all prior phases complete (UAT exercises everything)

### Deferred to Later Milestones

- **Topic-cluster notification capping (UC-07.3)** -- capping notifications at max-per-topic-cluster per batch is unimplemented; this requires a cluster assignment data structure (topic embeddings → cluster ID) and a per-cluster counter alongside the per-creator counter in `NotificationTracker`; per-creator and per-user-total caps are implemented in m6p6 and cover the primary UC-07 need; topic-cluster capping is a refinement planned for M7.
- **"Did you mean" typo correction (UC-02.3)** -- requires edit-distance automata over the Tantivy term dictionary; prefix autocomplete covers the primary use case for M6 (planned M7)
- **Personalized suggestions from full search history** -- tracking per-user query history as a signal stream and using it for SUGGEST personalization beyond interest-category boosting (planned M7)
- **Collaborative collections (multi-user boards)** -- multi-user write access requires access control beyond owner-only; single-owner collections ship in M6 (planned M7)
- **Visual search / crop-and-search (UC-06.4, UC-11.1)** -- requires image segmentation and region embedding, which is generation; out of scope per VISION.md ("tidalDB does not generate embeddings")
- **Mood/aesthetic embedding regions (UC-06.3)** -- requires application-provided mood anchor embeddings to define regions; database infrastructure exists but semantic regions must come from the application
- **Signal rollups (hourly/daily materialization for 30d+ windows)** -- build only if 500-item benchmarks show bucketed counters exceeding the 50ms budget; not required for M6 test scale (planned M7)
- **Multi-vector user interest clustering (PinnerSage)** -- single preference vector serves through M6; multi-vector clustering adds a new data structure and requires offline training (planned M7+)
- **Search result explanation ("why this result?")** -- Tantivy provides `Query::explain()` per document but it is expensive at query time; useful for debugging tools, not production serving (planned M7)
- **Cross-session aggregation dashboards** -- the preference merging on session close (m6p4 task 06) closes the correctness gap; a full "what did my agents learn this week?" analytics API requires materialization over closed session archives (planned M7)
- **Horizontal distribution / partitioned keyspaces** -- the key encoding and WAL format are partitioning-ready; actual multi-node deployment is M8

### Integration Test

```rust
// tidal/tests/m6_uat.rs

fn setup_m6_test_db() -> TidalDb {
    // 500 items, 50 creators, 20 users, 2 cohorts, 3 collections, 5 live items
    // ... (full setup in tests/m6_uat.rs)
}

#[test]
fn uat_step_1_cohort_scoped_trending() {
    let db = setup_m6_test_db();
    // US young music users engage heavily with items 1-10
    for user in us_young_music_users() {
        for item_id in 1u64..=10 {
            db.signal("view", EntityId::new(item_id), 1.0, Timestamp::now()).unwrap();
            db.signal("share", EntityId::new(item_id), 1.0, Timestamp::now()).unwrap();
        }
    }
    let cohort_trending = db.retrieve(
        &Retrieve::builder().profile("cohort_trending")
            .cohort("us_young_music").limit(10).build().unwrap()
    ).unwrap();
    let global_trending = db.retrieve(
        &Retrieve::builder().profile("trending").limit(10).build().unwrap()
    ).unwrap();
    let cohort_ids: Vec<u64> = cohort_trending.results.iter().map(|r| r.id.raw()).collect();
    let global_ids: Vec<u64> = global_trending.results.iter().map(|r| r.id.raw()).collect();
    assert_ne!(cohort_ids, global_ids, "cohort trending must differ from global trending");
    assert!(cohort_ids.iter().all(|&id| id <= 10),
        "cohort trending should reflect US music engagement, got: {:?}", cohort_ids);
}

#[test]
fn uat_step_3_sort_modes() {
    let db = setup_m6_test_db();
    let alpha = db.retrieve(
        &Retrieve::builder().sort(Sort::AlphabeticalAsc).limit(20).build().unwrap()
    ).unwrap();
    assert!(alpha.results.windows(2).all(|w|
        w[0].metadata.get("title").unwrap_or(&String::new()).to_lowercase()
        <= w[1].metadata.get("title").unwrap_or(&String::new()).to_lowercase()
    ));
    let live = db.retrieve(
        &Retrieve::builder()
            .filter(FilterExpr::eq("status", "live"))
            .sort(Sort::LiveViewerCount)
            .limit(5).build().unwrap()
    ).unwrap();
    assert!(live.results.windows(2).all(|w| w[0].score >= w[1].score));
}

#[test]
fn uat_step_4_collections() {
    let db = setup_m6_test_db();
    let user_a = EntityId::new(1001);
    let coll = db.create_collection(user_a, "jazz_faves", Visibility::Private).unwrap();
    db.add_to_collection(&coll, EntityId::new(1)).unwrap();
    db.add_to_collection(&coll, EntityId::new(2)).unwrap();
    let results = db.retrieve(
        &Retrieve::builder()
            .for_user(user_a)
            .filter(FilterExpr::in_collection(&coll))
            .limit(10).build().unwrap()
    ).unwrap();
    assert_eq!(results.results.len(), 2);
}

#[test]
fn uat_step_7_search_within_trending() {
    let db = setup_m6_test_db();
    // generate trending jazz items...
    let results = db.search(
        &Search::builder()
            .query("jazz")
            .within(WithinScope::Trending { window_hours: 24 })
            .limit(10).build().unwrap()
    ).unwrap();
    assert!(!results.items.is_empty());
    assert!(results.items.iter().all(|r| r.bm25_score.unwrap_or(0.0) > 0.0));
}

#[test]
fn uat_step_8_suggest() {
    let db = setup_m6_test_db();
    let suggestions = db.suggest(&Suggest {
        prefix: "jazz".into(),
        for_user: None,
        limit: 5,
    }).unwrap();
    assert!(!suggestions.is_empty());
    assert!(suggestions.iter().all(|s| s.text.to_lowercase().starts_with("jazz")));
    let trending = db.suggest(&Suggest {
        prefix: "".into(),
        for_user: None,
        limit: 5,
    }).unwrap();
    assert!(!trending.is_empty(), "empty prefix must return trending searches");
}
```

### Done When

A developer can embed TidalDb, define 2 cohorts, write 500 items with metadata and embeddings across 50 creators, register 20 users with demographic attributes, build a relationship graph, create user collections, mark items as live, record engagement signals, and then verify all 9 UAT steps pass:

1. Cohort-scoped trending returns items trending within the cohort -- distinct from global trending
2. Social-graph-scoped trending returns items engaged by the user's follow graph
3. All 20 Sort enum variants (including AlphabeticalAsc, Shortest, LiveViewerCount, DateSaved) produce correctly ordered results
4. Collection CRUD and `in_collection` filter work end-to-end; `in_progress` returns partially-watched items
5. Live content filters by `status=live` and sorts by `viewer_count` signal
6. Notification caps enforce per-creator and total daily limits
7. SEARCH with `WithinScope` (Trending, CohortTrending, Following, Collection) correctly intersects scope with text+vector retrieval
8. SUGGEST returns prefix completions and trending searches in < 20ms
9. Related profile incorporates co-engagement alongside embedding similarity

All prior milestone tests (m2_uat, m3_uat, m4_uat, m5_uat) continue to pass. Every query at the 500-item test scale completes in under 50ms. UC-01 through UC-15 are verifiable end-to-end.

---

## Milestone 7: Production Hardening -- "Ready for real workloads"

### Milestone Thesis

M6 proved that tidalDB can handle every discovery surface, sort mode, filter, and feedback loop. But "feature-complete" is not "production-ready." A skeptical SRE can still say: "Sure, it handles 500 items in a test. What happens at 1M items when the process crashes mid-checkpoint? What happens under 3x read load? How do I know the WAL is healthy?" M7 proves they are wrong. After M7, tidalDB can be embedded in a production application and operated with confidence -- crash recovery is correct and fast, graceful degradation works under load, performance meets targets at 1M+ items, abandoned sessions are cleaned up, rate limiting protects against runaway agents, and operational visibility exists. The database is trustworthy.

### UAT Scenario

```
Given:
  A tidalDB instance with:
    - 1,000,000 items, 100,000 users, 10,000 creators
    - 10 signal types with 5 windows each
    - 2 cohorts with materialized signal aggregation
    - 50 active agent sessions with policies
    - Sustained write load: 10,000 signal events/second
    - Concurrent read load: 1,000 RETRIEVE queries/second

When:
  1. Run full workload for 1 hour
  2. Kill the process at a random point (mid-checkpoint, mid-WAL-write,
     mid-signal-aggregation)
  3. Restart and measure recovery time
  4. Verify no data loss and no inconsistency: no phantom items, no lost
     signals, no inconsistent cohort aggregates, no orphaned collections
  5. Verify abandoned sessions (>2h old) have been cleaned up
  6. Run workload at 3x expected load (30K signals/sec, 3K queries/sec)
  7. Verify graceful degradation (reduced precision, not errors)
  8. Inject a runaway agent writing 500 signals/sec to a single session
  9. Verify per-agent rate limiting rejects excess writes without affecting
     other agents
  10. Read QueryStats from results and verify timing breakdown is present
  11. Read /metrics endpoint and verify signal write latency, WAL lag,
      index health, and degradation level are all exported
  12. Run `tidalctl diagnostics --path <dir>` and verify human-readable
      health summary

Then:
  - Step 1: All queries < 50ms p99 (RETRIEVE), < 100ms p99 (SEARCH),
    all signal writes < 100us amortized
  - Step 3: Recovery time < 30 seconds (1M items checkpoint + 5min WAL backlog)
  - Step 4: WAL replay produces state identical to pre-crash; checkpoint
    integrity verified via BLAKE3; cohort ledger, collection index, and
    co-engagement index all recovered correctly; hard negatives (hidden items,
    blocked creators) never leak after any crash scenario
  - Step 5: Sessions exceeding max_session_duration are auto-closed with
    summary archived; sweeper runs every 60 seconds
  - Step 6-7: Under overload, tidalDB reduces candidate set size, uses
    coarser aggregates, skips diversity -- but never returns errors for
    well-formed queries; degradation level exposed in query response
  - Step 8-9: Agent exceeding configured rate limit gets TidalError::RateLimited
    (rate limiting is opt-in; unlimited by default; configure via
    `RateLimiterConfig::limited(rate, burst)` in builder); other agents unaffected;
    rate limit tracked per (agent_id, session_id)
  - Step 10: QueryStats includes candidates_considered, scoring_time_us,
    total_time_us, degradation_level, filters_applied
  - Step 11: Prometheus metrics include tidaldb_wal_lag_bytes,
    tidaldb_checkpoint_age_seconds, tidaldb_signal_hot_entries,
    tidaldb_tantivy_segment_count, tidaldb_usearch_index_size,
    tidaldb_degradation_level
  - Step 12: tidalctl diagnostics prints WAL state, checkpoint age, signal
    state size, index sizes, session count, degradation level
```

### Phases

---

#### Phase 1: Crash Recovery Hardening (m7p1)

**Delivers:** Fault injection test harness, WAL compaction, checkpoint integrity verification, recovery time measurement, and crash fencing for all M6 state surfaces. Every write-path stage tested for crash safety. Recovery < 30 seconds guaranteed at 1M items.

**Acceptance Criteria:**

- [x] Fault injection harness: `CrashPoint` enum covering WAL pre-write, WAL post-write, checkpoint pre-flush, checkpoint post-flush, signal aggregation update, cohort ledger update, collection index update, co-engagement update; configurable via `#[cfg(test)]` feature flag
- [x] Property tests for each crash point: generate N random event sequences (N >= 1000), inject crash at random position within the write path, restart, verify state matches expected from WAL replay to 6 decimal places for decay scores and exact match for counters
- [x] WAL compaction: after successful checkpoint, WAL segments with seqno <= checkpoint seqno are atomically deleted; compaction verifies the new checkpoint is readable before deleting old segments (write-new-then-delete-old pattern)
- [x] Checkpoint integrity: `CheckpointMeta` extended with a BLAKE3 hash of the checkpoint payload; on open, hash verified before applying checkpoint state; corrupt checkpoint triggers fallback to WAL-only replay with a warning log
- [x] Recovery time < 30 seconds for 1M items checkpoint + 5 minutes of WAL backlog (Criterion benchmarked)
- [x] `tidalctl recover --path <dir> --verify-only` dry-runs WAL replay and reports: event count, last seqno, inconsistency count, estimated recovery time, without writing any state
- [x] Crash fencing for cohort state: `CohortSignalLedger` checkpoint/restore roundtrips correctly under all crash points; cohort membership cache rebuilt from user metadata on restart
- [x] Crash fencing for collection state: `CollectionIndex` persisted bitmaps survive all crash points; orphaned bitmap entries (collection deleted but bitmap persists) detected and cleaned on recovery
- [x] Crash fencing for co-engagement state: `CoEngagementIndex` recovered from checkpoint; bounded-LRU invariant preserved across restart
- [x] Crash fencing for session state: active sessions with WAL session-start but no session-close are correctly restored; signal counts and audit logs match WAL replay
- [x] No phantom items (items in index state but not in WAL replay) after any crash scenario
- [x] No lost signals (signals in WAL but missing from state after recovery) after any crash scenario
- [x] No leaked hard negatives (hidden items or blocked creators appearing in query results after crash recovery)

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | `CrashPoint` enum + fault injection hooks | Test-gated hooks at 8 write-path locations; `CrashInjector` struct with configurable trigger (crash at Nth write, random probability) | M |
| 02 | Property tests for signal ledger crash points | 4 crash points (WAL pre/post, checkpoint pre/post) x 1000 random event sequences; verify decay scores match analytical formula to 6 decimal places after recovery | L |
| 03 | WAL compaction | Atomic deletion of pre-checkpoint segments; write-new-checkpoint-then-delete-old pattern; compaction after each successful periodic checkpoint | M |
| 04 | Checkpoint BLAKE3 integrity | Extend `CheckpointMeta` with 32-byte BLAKE3 hash of checkpoint payload; verify on open; fallback to WAL-only replay on corruption | M |
| 05 | Recovery time benchmark | Generate 1M-item checkpoint + 5min WAL backlog; measure cold-start to ready time; assert < 30s | S |
| 06 | `tidalctl recover --verify-only` | Dry-run WAL replay; report event count, last seqno, inconsistencies, estimated recovery time | S |
| 07 | Crash fencing for M6 state (cohort, collection, co-engagement, session) | Property tests for crash recovery of CohortSignalLedger, CollectionIndex, CoEngagementIndex, active sessions; checkpoint/restore roundtrip correctness | L |
| 08 | Hard negative crash invariant test | After any crash scenario, RETRIEVE never returns hidden items or blocked creators; 1000 random crash+restart sequences with hide/block interspersed | M |

**Depends On:** M6 complete
**Complexity:** XL
**Research Reference:** `docs/research/tidaldb_wal.md` (crash recovery, segment format, deduplication), `thoughts.md` Part V.5-6 (quarantine-first, group commit), `docs/research/tidaldb_signal_ledger.md` (checkpoint format, running-score formula)

---

#### Phase 2: Graceful Degradation, Rate Limiting, and Session Cleanup (m7p2)

**Delivers:** Automatic quality reduction under load pressure. 4-stage degradation order documented and enforced. Backpressure on write path. Per-agent token-bucket rate limiting. Session TTL auto-cleanup sweeper. All load behavior visible in query responses.

**Acceptance Criteria:**

- [x] `DegradationLevel` enum: `Full`, `ReducedCandidates`, `CoarseAggregates`, `NoDiversity` -- applied in this order under increasing load
- [x] Load detection: `AtomicU64` tracking in-flight query count; threshold configurable per level (defaults: 200 -> ReducedCandidates, 500 -> CoarseAggregates, 1000 -> NoDiversity)
- [x] `ReducedCandidates`: ANN `top_k` reduced from 500 to 200; BM25 candidate limit halved
- [x] `CoarseAggregates`: windowed count reads use AllTime instead of fine-grained windows for scoring; velocity reads use 24h window regardless of profile configuration
- [x] `NoDiversity`: diversity pass skipped entirely; results returned after scoring only
- [x] Under 3x overload (3000 concurrent queries), all well-formed queries return results (no `ServiceUnavailable` or panic); malformed queries still return errors
- [x] Degradation level exposed in query response: `Results.degradation_level: DegradationLevel` and `SearchResults.degradation_level: DegradationLevel`
- [x] Write backpressure: when WAL batch queue depth exceeds configurable threshold (default: 1000 pending batches), `db.signal()` returns `TidalError::Backpressure { retry_after_ms: u64 }` with exponential backoff hint
- [x] `TidalError::Backpressure` variant added with `retry_after_ms` field
- [x] Per-agent token-bucket rate limiting: `RateLimiter` struct with configurable tokens/second per `(AgentId, SessionId)` pair (default: unlimited; opt-in via `RateLimiterConfig::limited(rate, burst)` in builder); excess writes return `TidalError::RateLimited { agent_id, limit, retry_after_ms }`
- [x] `TidalError::RateLimited` variant added with `agent_id`, `limit`, and `retry_after_ms` fields
- [x] Rate limiter does not affect non-session signal writes (global `db.signal()` is not rate-limited per-agent)
- [x] Session TTL auto-cleanup sweeper: background task runs every 60 seconds; sessions exceeding `max_session_duration` are auto-closed with `SessionSummary` archived; `SessionSummary.auto_closed: bool` field added
- [x] Sweeper is cancellable on `db.close()` / `db.shutdown()`; no dangling threads after shutdown
- [x] Load test: simulate 3x overload for 60 seconds; verify all queries return results; verify degradation progression matches thresholds; verify signal writes under backpressure retry successfully after delay

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | `DegradationLevel` enum + load detector | AtomicU64 in-flight counter; threshold config; level computed on each query entry; RAII guard struct for decrement on drop | M |
| 02 | Query executor degradation branches | Wire `DegradationLevel` into `RetrieveExecutor` and `SearchExecutor`: ReducedCandidates, CoarseAggregates, NoDiversity | M |
| 03 | Degradation level in response + backpressure error | Add `degradation_level` field to `Results` and `SearchResults`; add `Backpressure` variant to `TidalError`; WAL queue depth check before enqueue | M |
| 04 | Per-agent token-bucket rate limiter | `RateLimiter` struct with `DashMap<(AgentId, SessionId), TokenBucket>`; refill rate configurable; wire into `session_signal()` write path; `TidalError::RateLimited` variant | M |
| 05 | Session TTL auto-cleanup sweeper | Background task scanning `active_sessions` every 60s; auto-close expired sessions; `auto_closed` flag on `SessionSummary`; cancellation on shutdown | M |
| 06 | Load test | Simulate 3x overload (concurrent query + write threads); verify degradation progression, backpressure behavior, rate limiting isolation, session cleanup | L |

**Depends On:** m7p1 (stable crash recovery before load testing)
**Complexity:** L
**Research Reference:** `thoughts.md` Part V (graceful degradation), `VISION.md` design principles, M4 deferrals (per-agent QPS rate limiting, session TTL auto-cleanup)

---

#### Phase 3: Performance at Scale (m7p3)

**Delivers:** Benchmarks and optimization at 1M items, 100K users, 10K creators. USearch parameter tuning. Tantivy segment management. Signal state memory footprint optimization. Signal rollups for 30d+ windows if bucketed counters exceed the 50ms budget at scale.

**Acceptance Criteria:**

- [x] Criterion benchmark suite at 1M items: RETRIEVE (for_you profile) p99 < 50ms, SEARCH (hybrid BM25+ANN) p99 < 100ms, signal write p99 < 100us amortized
- [x] USearch index tuning: M={8,16,32} and ef_construction={100,200,400} benchmarked at 1M vectors; optimal config documented and applied; ANN recall@10 > 0.95 within latency budget
- [x] Tantivy segment management: `LogMergePolicy` tuned for 1M docs; segment count < 20 at steady state after 1M document indexing; background merge verified to not block foreground reads (concurrent read/write benchmark)
- [x] Signal state memory footprint measured and documented: bytes per hot entry at 1M items x 10 signal types x 5 windows; total footprint < 10 GB; if footprint exceeds budget, implement signal state trimming (evict entries with no signal activity in the last 30 days)
- [x] Signal rollup evaluation: benchmark bucketed counters at 1M items for 30d windows; if p99 windowed-count read exceeds 50ms, implement hourly rollup materialization (background thread computes hourly aggregates, stores under `Tag::Rollup` key prefix, merge with live counters at read time); if p99 is within budget, document the result and defer rollups
- [x] Profile execution path profiled with `cargo flamegraph` or equivalent; top-3 hotspots documented; any hotspot representing > 10% of total RETRIEVE time optimized or documented with rationale for deferral
- [x] CoEngagementIndex LRU eviction verified at capacity: insert 2x capacity, verify memory stays bounded; verify evicted entries are the least-recently-accessed
- [x] Cross-session preference aggregation verified at scale: 100K users with 10 closed sessions each; preference vector merge completes within the close_session latency budget (< 1ms per merge)
- [x] `tidal/benches/social.rs` extended (or new benchmark) covering 1M-item RETRIEVE with social graph filter, cohort-scoped trending, and collection filter

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Scale benchmark suite | Criterion benches at 1M items for RETRIEVE (for_you, trending, following), SEARCH (hybrid, text-only), signal write; establish baselines | L |
| 02 | USearch parameter tuning | Benchmark M x ef_construction matrix at 1M vectors; document recall/latency tradeoff; apply optimal config | M |
| 03 | Tantivy merge policy tuning | Configure `LogMergePolicy`; benchmark segment count evolution during sustained indexing; verify concurrent read/write latency | M |
| 04 | Signal state memory analysis + trimming | Measure bytes per hot entry; document memory model; implement LRU trimming of inactive entries if footprint exceeds 10 GB | L |
| 05 | Signal rollup evaluation (conditional) | Benchmark 30d windowed count at 1M items; implement hourly rollups if p99 > 50ms; otherwise document and defer | L |
| 06 | Flamegraph profiling + hotspot optimization | Profile RETRIEVE + SEARCH hot paths; document top-3 hotspots; optimize any > 10% of total time | L |
| 07 | CoEngagementIndex LRU + social scale verification | Eviction correctness test at 2x capacity; social graph filter benchmark at 1M items | M |

**Depends On:** m7p1 (stable crash recovery before benchmarking at scale)
**Complexity:** XL
**Research Reference:** `docs/research/ann_for_tidaldb.md` (USearch parameter guidance, M and ef_construction tradeoffs), `docs/research/tantivy.md` (segment management, LogMergePolicy), `docs/research/tidaldb_signal_ledger.md` (memory model, three-tier architecture), M6 deferrals (signal rollups for 30d+ windows)

---

#### Phase 4: Operational Visibility (m7p4)

**Delivers:** Query execution stats, signal system health metrics, index health metrics, structured error reporting with context, `tidalctl diagnostics`, zero-overhead metrics feature flag, RLHF training data export API, and cross-session aggregation query.

**Acceptance Criteria:**

- [x] `QueryStats` struct returned alongside query results: `candidates_considered: u64`, `candidates_after_filter: u64`, `candidates_after_diversity: u64`, `filters_applied: Vec<String>`, `scoring_time_us: u64`, `diversity_time_us: u64`, `total_time_us: u64`, `degradation_level: DegradationLevel`, `profile_name: String`
- [x] `Results.stats: QueryStats` and `SearchResults.stats: QueryStats` populated by executors
- [x] Signal system health metrics exported at `/metrics` (Prometheus text format, gated by `#[cfg(feature = "metrics")]`): `tidaldb_wal_lag_bytes` (gauge), `tidaldb_wal_compacted_segments_total` (counter), `tidaldb_checkpoint_age_seconds` (gauge), `tidaldb_signal_hot_entries` (gauge), `tidaldb_signal_writes_total` (counter), `tidaldb_signal_write_latency_us` (histogram with p50/p99/p999 quantiles)
- [x] Index health metrics: `tidaldb_tantivy_segment_count` (gauge), `tidaldb_tantivy_indexed_docs` (gauge), `tidaldb_usearch_index_size_bytes` (gauge), `tidaldb_usearch_vector_count` (gauge), `tidaldb_bitmap_index_cardinality` (gauge per bitmap name)
- [x] Cohort ledger health: `tidaldb_cohort_ledger_entries` (gauge), `tidaldb_cohort_count` (gauge)
- [x] Session health: `tidaldb_active_sessions` (gauge), `tidaldb_closed_sessions_total` (counter), `tidaldb_session_auto_closed_total` (counter), `tidaldb_rate_limited_total` (counter)
- [x] Degradation level gauge: `tidaldb_degradation_level` (gauge, 0=Full/1=ReducedCandidates/2=CoarseAggregates/3=NoDiversity)
- [x] `tidalctl diagnostics --path <dir>` prints a human-readable health summary: WAL state (lag bytes, last seqno, segment count), checkpoint age, signal state size (entry count, estimated memory), index sizes (Tantivy docs/segments, USearch vectors/bytes), session count (active/closed), degradation level, collection count, cohort count
- [x] All `TidalError` variants include structured context: operation name, entity ID where relevant, signal type where relevant; no bare `"error"` strings in any variant
- [x] RLHF training data export: `db.export_signals(ExportRequest { user_id: Option<u64>, signal_types: Vec<String>, since: Timestamp, until: Timestamp, format: ExportFormat }) -> Result<Vec<ExportedSignal>>` reads signal events from WAL segments within the time range; `ExportedSignal` contains `user_id`, `entity_id`, `signal_type`, `weight`, `timestamp`, `session_id: Option<SessionId>`, `annotation: Option<String>`; `ExportFormat::JsonLines` supported
- [x] Cross-session aggregation: `db.user_session_summary(user_id, since: Timestamp) -> Result<UserSessionSummary>` scans closed session archives; returns `sessions_count`, `total_signals`, `total_rejections`, `top_signal_types: Vec<(String, u64)>`, `preference_drift: f64` (cosine distance between preference vector at `since` and now)
- [x] Metrics are zero-overhead when the `metrics` feature is disabled: all metrics calls wrapped in `#[cfg(feature = "metrics")]`; verified by compiling without the feature

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | `QueryStats` struct + executor instrumentation | Define struct; instrument `RetrieveExecutor` and `SearchExecutor` to record timing and counts at each pipeline stage; populate `Results.stats` and `SearchResults.stats` | M |
| 02 | Signal system + WAL metrics | Wire counters/gauges into WAL (lag, compaction count), checkpoint (age), signal ledger (entry count, write latency histogram) | M |
| 03 | Index health metrics | Expose Tantivy segment count + doc count; USearch vector count + byte size; bitmap index cardinality per bitmap | M |
| 04 | Session + cohort + degradation metrics | Active/closed/auto-closed session gauges; cohort ledger entry count; degradation level gauge; rate-limited counter | S |
| 05 | `tidalctl diagnostics` | Print human-readable health summary; covers WAL, checkpoint, signals, indexes, sessions, cohorts, collections | M |
| 06 | Structured `TidalError` context audit | Audit all `TidalError` variants; add operation name, entity ID, signal type context where missing; remove bare string errors | M |
| 07 | `metrics` feature flag + zero-overhead verification | Wrap all metrics calls in `#[cfg(feature = "metrics")]`; compile without feature; verify no metrics overhead | S |
| 08 | RLHF training data export | `db.export_signals()` API reading WAL segments by time range; `ExportedSignal` type; `ExportFormat::JsonLines` output; integration test | M |
| 09 | Cross-session aggregation query | `db.user_session_summary()` API scanning closed archives; `UserSessionSummary` type with session count, signal totals, top types, preference drift | M |

**Depends On:** m7p1 (stable internals before instrumenting them), m7p2 (degradation level must exist to report it)
**Complexity:** L
**Research Reference:** `docs/research/tidaldb_tooling_and_diagnostics.md`, `thoughts.md` Part V (operational simplicity), M4 deferrals (RLHF training data export), M6 deferrals (cross-session aggregation dashboards)

---

#### Phase 5: M7 UAT Integration Test (m7p5)

**Delivers:** End-to-end M7 UAT integration test proving all production hardening capabilities work together. Crash recovery, graceful degradation, rate limiting, session cleanup, observability, and scale performance all exercised in a single comprehensive test suite.

**Acceptance Criteria:**

- [x] `tidal/tests/m7_uat.rs` test suite with separate `#[test]` functions for each UAT step
- [x] Crash recovery tests: write 10K items + 100K signals; inject crash via `CrashPoint` at 3 write-path stages; verify recovery produces identical state; verify checkpoint BLAKE3 integrity; verify WAL compaction removed old segments; verify hard negatives don't leak after recovery
- [x] Session cleanup test: create session with 30-second TTL; wait 35 seconds; verify sweeper auto-closed the session; verify `auto_closed: true` in summary
- [x] Degradation test: simulate concurrent load above threshold; verify `degradation_level` in response matches expected level; verify all queries return results
- [x] Rate limiting test: configure 10 signals/sec rate limit; write 50 signals in 1 second; verify first 10 succeed and remaining return `TidalError::RateLimited`; verify other sessions unaffected
- [x] QueryStats test: execute RETRIEVE and SEARCH; verify `stats` field populated with non-zero `candidates_considered`, `scoring_time_us`, `total_time_us`
- [x] Metrics test: verify Prometheus text output from `/metrics` contains expected metric names and is non-empty
- [x] Export + aggregation test: write session signals; close session; call `export_signals()` and verify output contains expected events; call `user_session_summary()` and verify counts
- [x] All prior milestone integration tests (m2_uat, m3_uat, m4_uat, m5_uat, m6_uat) continue to pass
- [x] No individual test takes longer than 60 seconds (crash recovery tests use small datasets; load tests use short duration)

**Task Breakdown:**

| # | Task | Delivers | Complexity |
|---|------|----------|------------|
| 01 | Crash recovery UAT tests | 3 tests: crash at WAL-write, crash at checkpoint, crash with M6 state (cohort, collection); verify correct recovery and hard negative invariant | L |
| 02 | Degradation + rate limiting + session cleanup UAT tests | 3 tests: degradation progression under load, per-agent rate limiting isolation, session auto-cleanup after TTL | L |
| 03 | Observability + export UAT tests | 3 tests: QueryStats populated, metrics endpoint content, RLHF export + session aggregation | M |
| 04 | Regression gate | Verify all prior UAT suites pass (m2_uat through m6_uat); no regressions introduced by M7 changes | S |

**Depends On:** m7p1, m7p2, m7p3, m7p4 (UAT exercises everything built in M7)
**Complexity:** L
**Research Reference:** All M7 phase specifications above

---

### Phase Dependency DAG

```
m7p1 (Crash Recovery Hardening)
        |
        +--------+--------+
        |        |        |
     m7p2     m7p3     m7p4*
  (Degrade   (Scale)  (Observ.)
   + Rate
   + Sweep)
        |
        +--- m7p4 (needs degradation level from m7p2)
                 |
              m7p5 (M7 UAT -- depends on all four prior phases)
```

- **m7p1** is the foundation -- crash recovery hardening must be stable before load testing, scale optimization, or instrumentation
- **m7p2** depends on m7p1 (stable system before stress testing); delivers degradation level, rate limiting, and session cleanup
- **m7p3** depends on m7p1 (stable system before benchmarking at scale); can parallelize with m7p2
- **m7p4** depends on m7p1 (stable internals to instrument); tasks 01-03, 05-07 can start in parallel with m7p2; tasks 04, 08, 09 depend on m7p2 being complete (degradation level gauge, rate limited counter, session auto-close counter)
- **m7p5** depends on all four prior phases (UAT exercises everything)

### Deferred to Later Milestones

- **Horizontal distribution / partitioned keyspaces** -- deferred to M8; the single-node architecture scales vertically first; distribution is a separate product decision requiring shard-aware keyspaces, WAL shipping, and deterministic reconciliation. Planned for M8.
- **Multi-tenancy** -- deferred to M8+; per-tenant isolation within a single tidalDB instance requires the distributed fabric's namespace and routing infrastructure. Planned for M8+.
- **A/B testing infrastructure** -- deferred to M8+; comparing two profile versions within the database requires tenant-level isolation and traffic routing. Planned for M8+.
- **Signal rollup to external cold storage** -- deferred to M8+; S3/GCS archival for compliance requires the distributed fabric's WAL shipping infrastructure. Planned for M8+.
- **Client libraries (Python, Node, Go bindings)** -- deferred to M8+; language-specific wrappers beyond Rust embedding require a stable API surface; M7 may still refine APIs. Planned for M8+.
- **Streaming query results** -- deferred post-M7; cursor-based streaming for very large result sets is a refinement once core performance targets are met at 1M items.
- **Multi-vector user interest clustering (PinnerSage)** -- deferred post-M7; single preference vector serves through M7; multi-vector clustering adds a new data structure and requires offline training.
- **"Did you mean" typo correction** -- deferred to M8+. Prefix autocomplete (m6p5) covers the primary use case for search suggestions. Edit-distance automata over the Tantivy term dictionary is a quality-of-life improvement, not a production hardening requirement. M7's scope is crash safety, load handling, and operational readiness; typo correction belongs in a surface-quality milestone after the system is production-hardened. Planned for M8+.
- **Search result explanation ("why this result?")** -- deferred to M8+. Tantivy's `Query::explain()` is expensive at query time and produces per-document scoring breakdowns useful for debugging, not production serving. M7 delivers `QueryStats` (pipeline-level timing and count visibility) which serves the production operator's need. Per-result explanations belong in a developer experience milestone. Planned for M8+.
- **Collaborative collections (multi-user boards)** -- deferred to M8+; multi-user write access requires access control beyond single-owner, which intersects with the multi-tenancy work in M8. Single-owner collections work in M6.
- **Personalized suggestions from full search history** -- deferred to M8+; tracking per-user query history as a signal stream for SUGGEST personalization beyond interest-category boosting (m6p5) is a refinement that belongs after production hardening.
- **Topic-cluster notification capping** -- deferred to M8+; requires topic embedding clustering and per-cluster counters; per-creator and per-user-total caps (M6) cover the primary notification spam prevention need; topic-level refinement belongs after production hardening.
- **Tuned linear combination (replacing RRF for hybrid search)** -- deferred to M8+; requires relevance labels and offline evaluation infrastructure; RRF is the correct zero-configuration starting point through M7.

### Done When

tidalDB operates correctly at 1M items under sustained concurrent read/write load. Crash recovery completes in < 30 seconds with zero data loss -- verified by fault injection at every write-path stage including cohort, collection, co-engagement, and session state. WAL compaction atomically removes pre-checkpoint segments. Checkpoint integrity is BLAKE3-verified on every open. Graceful degradation works under 3x overload without returning errors for well-formed queries, following the documented 4-stage progression (Full -> ReducedCandidates -> CoarseAggregates -> NoDiversity). Per-agent token-bucket rate limiting protects against runaway agents without affecting other sessions. Abandoned sessions are automatically cleaned up by the background sweeper. RETRIEVE p99 < 50ms and SEARCH p99 < 100ms at 1M items. Signal writes p99 < 100us amortized. Signal state memory footprint < 10 GB for 1M items x 10 signal types x 5 windows. QueryStats are returned with every query result. Prometheus metrics expose WAL lag, checkpoint age, signal state, index health, session health, cohort health, and degradation level. `tidalctl diagnostics` prints a human-readable health summary. Signal events are exportable as RLHF training data. Cross-session aggregation answers "what did my agents learn this week?" All prior milestone integration tests (m2_uat through m6_uat) continue to pass. A developer can embed tidalDB in a production application and operate it with confidence.

---

## Milestone 8: Distributed Fabric -- "Agent memory everywhere"

### Milestone Thesis

The exact same signal semantics, session policies, and WAL format power a multi-tenant, multi-region deployment. Instances shard deterministically by `EntityKind` + `EntityId`, ship WAL segments to peers, reconcile deterministically, and expose an eventually consistent API that still honors agent memory guarantees (no hidden items leaking, no double-counted decay). Hosted tidalDB can now back global agent workloads without rewriting application code.

### UAT Scenario

```
Given:
  - Three regions (us-east, eu-west, ap-south) with 5 shards each
  - Global write throughput: 25K signal events/sec, evenly distributed
  - Fat-client agents pinned to local region but free to roam
  - 1-hour network partition between eu-west and ap-south during sustained load

When:
  1. Write signals for a user in us-east, then read in eu-west after < 2s
  2. Crash an entire shard primary; observe automatic promotion and replay
  3. Execute global query (`RETRIEVE ... COHORT locale:EU`) while ap-south is partitioned
  4. Heal the partition; verify deterministic reconciliation (no duplicate counts, hides remain hidden)
  5. Move a tenant (agent workspace) to a new region by changing routing config only

Then:
  - Cross-region replication lag < 2s p99
  - No signal loss or duplication after failover/partition
  - Hard negatives (hide/mute/block) never leak, even while eventual state converges
  - Per-tenant resource isolation enforced (quotas, WAL namespaces)
  - Control plane surfaces reconciliation lag, shard health, and tenant placement
```

### Phases

#### Phase 1: Partitioned Keyspaces and WAL Shipping

**Delivers:** Deterministic shard IDs derived from subject-prefix keys, WAL segment shipping with per-segment checksums, follower apply loops using the same checkpoint format as single-node. Cross-shard atomicity defined at the "entity group" boundary (Item, User, Creator each map to a shard). Lag metrics (`replication_seconds_behind`) exported.

**Acceptance Criteria:**

- [ ] `ShardId = hash(entity_id) mod N` (configurable per `EntityKind`) stored alongside keys; shard map hot-swappable via epoch config.
- [ ] WAL segments have globally unique IDs (`region_id:shard_id:seqno`); followers detect gaps and request retransmit.
- [ ] Followers reapply segments idempotently using the same `EntitySignalState` checkpoint format from M1.
- [ ] Lag SLO: < 2s p99 at 25K writes/sec across 5 shards.
- [ ] CLI: `tidalctl shard status` shows leader, lag, checkpoint age.

**Depends On:** M7 (hardened WAL/Signal ledger)
**Complexity:** XL
**Research Reference:** `docs/research/tidaldb_wal.md`, `docs/research/tidaldb_signal_ledger.md`

#### Phase 2: Conflict Resolution and Session Semantics

**Delivers:** Deterministic reconciliation for eventually-consistent writes: CRDT-style counters for windowed aggregates, last-writer-wins timestamps for session state, and per-session sequence numbers so agents can reason about acknowledgements. Adds write-idempotency keys to the WAL and exposes a reconciliation audit log.

**Acceptance Criteria:**

- [ ] Windowed counters replicated as bounded PN-counters (positive/negative components) with tombstones for expired buckets.
- [ ] Decay scores replay identically because WAL order is preserved per shard; cross-shard dependencies (user->creator) carry causal metadata.
- [ ] Session updates carry `(session_id, seqno)`; duplicates dropped, gaps surfaced via API.
- [ ] `reconcile --since <ts>` tool emits merged vs diverged entries for auditing.
- [ ] Hides/blocks modeled as LWW registers with vector-clock tie-breakers (region priority list).

**Depends On:** Phase 1
**Complexity:** XL
**Research Reference:** `thoughts.md` Part V.5-6 (quarantine-first, group commit), `docs/research/tidaldb_signal_ledger.md`

#### Phase 3: Control Plane, Multi-Tenancy, and Routing

**Delivers:** Tenant-aware namespaces (per-tenant WAL directories and key prefixes), routing layer that maps tenants + entity IDs to shard endpoints, and policy templates (data residency, read-after-write budgets). Adds hosted-ready observability (lag dashboards, per-tenant quotas) and blue/green deploy tooling for the fabric.

**Acceptance Criteria:**

- [ ] Tenant config: `{tenant_id, shard_set, residency=[regions], rpo, rto}` stored in control-plane keyspace.
- [ ] Router SDK chooses nearest healthy region that satisfies residency and read-after-write target; falls back with documented staleness budget.
- [ ] Throttling per tenant (signals/sec, query concurrency) with circuit-breaker events surfaced via metrics + CLI.
- [ ] Rolling upgrade playbook: add shard, rebalance, observe zero dropped writes.
- [ ] Hosted docs: describe how embeddable apps graduate to hosted fabric without rewrites (same query + signal APIs).

**Depends On:** Phase 2
**Complexity:** L

### Done When

tidalDB instances can be deployed as a hosted, multi-region fabric with deterministic replication and reconciliation. Agents anywhere in the world can write signals and rely on hides/mutes/policies holding globally. Operators get tooling for shard health, tenant placement, rolling upgrades, and lag visibility. Embeddable users flip a config switch to opt into the fabric; query and signal APIs remain unchanged.

---

## Milestone 9: Community Sync & Revocation -- "Join, share, leave, purge"

### Milestone Thesis

Users keep a local embeddable personalization profile as source of truth, opt into one or more community personalization overlays, and can leave those overlays safely. Community contributions are scope-aware, auditable, and removable (both stop-forward and retroactive purge) without corrupting local personalization state.

### UAT Scenario

```
Given:
  - User U has a local embeddable profile with 90 days of signals
  - Community C has opt-in policy requiring explicit share scope
  - U has one agent (A) writing session signals locally

When:
  1. U joins C with sharing mode `community_share:enabled`
  2. U allows only selected signal intents (`not_for_me`, `save`, `low_quality`) to sync
  3. Community feed query blends local + community layers for U
  4. U leaves C with `stop_forward` (no new contributions)
  5. U requests `purge_prior_contributions` from C
  6. C rematerializes affected aggregates and U re-queries feed

Then:
  - U's local profile remains intact throughout
  - No new signals from U enter C after `stop_forward`
  - Purged contributions no longer affect C's ranking outputs
  - Hard negatives from U do not leak back in after replay/failover
  - Audit log shows join, share scope, leave, purge, rematerialize checkpoints
```

### Phases

#### Phase 1: Signal Scope and Share Contract

**Delivers:** explicit signal scope model (`local`, `community`, `session`, `agent`) and share policy metadata attached to WAL events. Community replication only ships share-eligible events.

**Acceptance Criteria:**

- [ ] WAL event envelope carries `scope`, `origin`, `share_policy_version`, and idempotency key.
- [ ] Default behavior is local-only; sharing is explicit opt-in.
- [ ] Per-intent share filters supported (`skip_for_now`, `not_for_me`, `low_quality`, `hide/mute/block`, etc.).
- [ ] `tidalctl` can inspect scope distribution and share eligibility.

**Depends On:** M8  
**Complexity:** L

#### Phase 2: Membership Lifecycle and Stop-Forward Semantics

**Delivers:** join/leave lifecycle for community overlays with causal checkpoints and stop-forward guarantees.

**Acceptance Criteria:**

- [ ] Membership states: `joined`, `leaving_stop_forward`, `left`, `rejoined`.
- [ ] `leave(stop_forward)` blocks new community contributions in < 1s p99.
- [ ] Rejoin creates a new membership epoch (no ambiguous replay across epochs).
- [ ] Queries expose active membership epoch for debugging and explainability.

**Depends On:** Phase 1  
**Complexity:** L

#### Phase 3: Retroactive Purge and Deterministic Rematerialization

**Delivers:** remove prior user contributions from community state and rebuild affected aggregates deterministically.

**Acceptance Criteria:**

- [ ] `purge_prior_contributions(user_id, community_id, epoch_range)` API implemented.
- [ ] Purge writes tombstones and triggers deterministic rematerialization job.
- [ ] Rebuilt aggregates are identical across repeated replay of same purge operation.
- [ ] Community queries include purge watermark metadata for auditability.

**Depends On:** Phase 2  
**Complexity:** XL

### Done When

Users can opt into community personalization, leave safely, and purge prior contributions without damaging local personalization or producing inconsistent community rankings.

---

## Milestone 10: Governance & Agent Rights -- "Who can influence ranking, and how"

### Milestone Thesis

Communities and users can govern personalization influence through policy: which signal intents count, what trust thresholds apply, and what agents are allowed to read/write. Agent-contributed signals are fully attributable and revocable by scope.

### UAT Scenario

```
Given:
  - Community C defines ranking governance policy
  - User U has two agents: A_trusted and A_experimental
  - A_experimental is denied community write scope

When:
  1. A_trusted writes allowed community-scoped signals for U
  2. A_experimental attempts the same and is rejected by policy
  3. C changes policy to downweight `skip_for_now`, upweight `low_quality`
  4. U revokes A_trusted community scope and removes A_trusted prior contributions from C
  5. U queries local-only, local+community, and community-only views

Then:
  - Policy enforcement is deterministic and auditable
  - Disallowed agent writes never affect community ranking
  - Policy changes are versioned and explainable in result metadata
  - Agent revocation removes future influence immediately
  - Optional retroactive removal of agent contributions completes within SLA
```

### Phases

#### Phase 1: Community Governance Policy Engine

**Delivers:** versioned community policy definitions governing signal eligibility, weighting bounds, and trust/quality thresholds.

**Acceptance Criteria:**

- [ ] Policy schema includes allowed intents, excluded intents, and weighting constraints.
- [ ] Policy changes are versioned and applied with effective timestamps.
- [ ] Query results can return governing policy version for explanation.
- [ ] Out-of-policy signals are rejected or quarantined by rule.

**Depends On:** M9  
**Complexity:** L

#### Phase 2: Agent Capability and Scope Controls

**Delivers:** per-agent capabilities for read/write by scope (`local`, `community`, `session`) with hard enforcement in write/read paths.

**Acceptance Criteria:**

- [ ] Agent capability tokens include scope permissions and TTL.
- [ ] Reads/writes outside granted scope return policy errors and audit events.
- [ ] Revocation invalidates capabilities immediately (< 1s p99).
- [ ] `tidalctl` can inspect agent capabilities and revocation history.

**Depends On:** Phase 1  
**Complexity:** L

#### Phase 3: Provenance, Explainability, and Remove-by-Scope

**Delivers:** provenance graph for ranking influence and APIs to remove contributions by scope (`agent`, `community`, `session`, `local`).

**Acceptance Criteria:**

- [ ] Every ranking-affecting signal has provenance metadata (`writer`, `scope`, `policy_version`, `membership_epoch`).
- [ ] `remove_from_personalization(scope=...)` API supports precise, non-global deletion.
- [ ] Explainability endpoint can attribute top-ranked items to policy-allowed signals.
- [ ] Replay/failover preserves remove-by-scope outcomes deterministically.

**Depends On:** Phase 2  
**Complexity:** XL

### Done When

Communities and users can control ranking influence with explicit governance and agent rights, while retaining user-owned, revocable personalization semantics end-to-end.

---

## Use Case Coverage Progression

| UC    | Description       | M1      | M2       | M3       | M4      | M5       | M6       | M7   |
| ----- | ----------------- | ------- | -------- | -------- | ------- | -------- | -------- | ---- |
| UC-01 | For You Feed      | -       | -        | **Full** | Full    | Full     | Full     | Full |
| UC-02 | Search            | -       | -        | -        | -       | **Core** | **Full** | Full |
| UC-03 | Trending/Rising   | Signals | **Full** | Full     | Full    | Full     | Full     | Full |
| UC-04 | Following Feed    | -       | Partial  | **Full** | Full    | Full     | Full     | Full |
| UC-05 | Related/Up Next   | -       | -        | **Core** | Core    | Core     | **Full** | Full |
| UC-06 | Browse/Category   | Signals | **Core** | Core     | Core    | Core     | **Full** | Full |
| UC-07 | Notifications     | -       | -        | **Core** | Core    | Core     | **Full** | Full |
| UC-08 | Creator Profile   | -       | **Core** | Core     | Core    | Core     | **Full** | Full |
| UC-09 | User Library      | -       | -        | Partial  | Partial | Partial  | **Full** | Full |
| UC-10 | People Search     | -       | -        | -        | -       | **Core** | **Full** | Full |
| UC-11 | Visual/Semantic   | -       | -        | -        | -       | Partial  | **Full** | Full |
| UC-12 | Live Content      | -       | -        | -        | -       | -        | **Full** | Full |
| UC-13 | Hidden Gems       | -       | **Full** | Full     | Full    | Full     | Full     | Full |
| UC-14 | Controversial/Hot | Signals | **Full** | Full     | Full    | Full     | Full     | Full |

Legend:

- `-` = Not addressed
- `Signals` = Signal primitives exist but no query surface
- `Partial` = Some functionality, not all modes
- `Core` = Primary query path works, some modes/filters missing
- **Full** = All modes, filters, and feedback loops per USE_CASES.md specification

M8-M10 focus on deployment topology, community sync semantics, and governance controls; they leave UC coverage unchanged while making the existing feature surface globally portable, revocable, and policy-safe.

---

## Dependency DAG

```
m1p1 (Types/Schema) ✓
  |
  +---> m1p2 (WAL) ✓
  |       |
  +---> m1p3 (Storage/fjall) ✓ ---+
  |       |                        |
  |       +---> m1p4 (Signal Ledger) ✓
  |               |
  |               +---> m1p5 (Entity + Signal API) ✓  = M1 COMPLETE ✓
  |               |
  |               +---> m2p3 (Ranking Profiles) ✓
  |                       |
  +---> m2p1 (USearch) ✓ -+
  |                        |
  +---> m2p2 (Filters) ✓ -+---> m2p4 (Diversity) ✓
                           |       |
                           +-------+---> m2p5 (RETRIEVE Query) ✓ = M2 COMPLETE ✓
                           |
                           +---> m3p1 (Users/Creators/Relationships) ✓
                           |       |
                           |       +---> m3p2 (Feedback Loop) ✓
                           |               |
                           |               +---> m3p3 (Personalized Profiles) ✓
                           |                       |
                           |                       +---> m3p4 (User State Filters + UAT) ✓
                           |
                           |       m3p4 = M3 COMPLETE ✓
                           |
                           +---> m4 (Agent Session Layer) ✓  = M4 COMPLETE ✓
                                   |
                                   +---> m5p1 (Tantivy) ✓
                                           |
                                           +---> m5p2 (RRF Fusion) ✓
                                           |       |
                                           |       +---> m5p3 (SEARCH Query) ✓
                                           |
                                           +---> m5p4 (Creator Search) ✓

                                           m5p3 + m5p4 = M5 COMPLETE ✓

                                           M6 COMPLETE ✓ (6 phases: cohort, social, sorts, collections, scope, notifications)
                                           M7 COMPLETE ✓ (crash recovery, degradation, scale, observability, UAT + enterprise readiness)
                                           M8 phases depend on M7
                                           M9 phases depend on M8
                                           M10 phases depend on M9
```

**Parallelization opportunities:**

- m1p2 (WAL) and m1p3 (Storage) are parallel after m1p1 (both now complete: m1p3 was completed first, m1p2 followed)
- m2p1 (USearch) and m2p2 (Filters) can be built in parallel after m1p3
- m3p1 (Entities) and m5p1 (Tantivy) can start in parallel with later M2 phases (M4 Agent Memory sits between M3 and M5)
- m3p2 Tasks 01 (User Preference Vector) and 03 (Hard Negatives) can be built in parallel within m3p2
- m4p2 (RRF) and m4p4 (Creator Search) can be built in parallel

---

## Architectural Decisions Locked In

These decisions are made. They are not revisited unless benchmarks prove them wrong.

| Decision             | Chosen                                         | Alternative              | Rationale                                                                             |
| -------------------- | ---------------------------------------------- | ------------------------ | ------------------------------------------------------------------------------------- |
| Storage engine       | fjall (pure Rust)                              | RocksDB                  | Pure Rust, `#![forbid(unsafe_code)]`, fast compile, trait-abstracted for swap         |
| Vector index         | USearch (C++ FFI)                              | hnsw_rs                  | 10-100x QPS, predicate callbacks, mmap, f16 quantization                              |
| Text search          | Tantivy (embedded)                             | Custom BM25              | 40K lines of battle-tested code; Collector/Scorer API provides exact hooks needed     |
| Decay formula        | Running S(t)=S(prev)*exp(-lambda*dt)+w         | Raw event scan           | O(1) vs O(N), proven exact, 20-60x faster at 50+ events/entity                        |
| Windowed aggregation | Bucketed counters (Scotty pattern)             | SWAG two-stacks          | Simpler, serves multiple window sizes from one set of buckets                         |
| Hybrid fusion        | RRF (k=60)                                     | Tuned linear combination | Zero-config, robust; linear combo is the upgrade path with relevance labels           |
| Consistency model    | DB-primary, Tantivy as derived index           | Two-phase commit         | Simpler, deterministic recovery, source of truth is always the entity store           |
| WAL checksums        | BLAKE3                                         | CRC32C                   | Content-addressing enables deduplication; BLAKE3 is fast enough                       |
| Key encoding         | Subject-prefix `[entity_id][0x00][TAG:suffix]` | Separate key namespaces  | Co-locates entity data, natural shard boundary, single prefix scan                    |
| Embedding format     | f16 quantization (default)                     | float32                  | Half memory, < 1% recall loss at 1536D                                                |
| Query language       | Custom (RETRIEVE/SEARCH/SIGNAL)                | SQL                      | Domain semantics cannot be expressed in SQL without losing optimization opportunities |

---

## What This Roadmap Does NOT Cover

These are explicitly out of scope for the foreseeable future:

1. **Embedding generation** -- tidalDB retrieves and ranks over vectors. It does not generate them. Bring your own model.
2. **Generic horizontal distribution** -- M8-M10 deliver the tidalDB-specific fabric (WAL shipping, shard routing, community sync/revocation, governance). We are still not building a general-purpose distributed SQL store or OLTP replica mesh.
3. **ACID transactions across entities** -- Signal writes are atomic within an entity's state. Cross-entity transactions are not needed for the ranking problem.
4. **SQL compatibility** -- The custom query language exists because SQL cannot express ranking semantics. No SQL layer.
5. **Per-request hard multitenancy inside a single shard** -- M8-M10 introduce tenant-aware namespaces, quotas, and governance controls for hosted deployments, but strong regulatory isolation (HIPAA, PCI) still requires separate deployments per tenant.
6. **Content moderation, authentication, payments, CDN** -- tidalDB solves one problem: ranking. Everything else is someone else's job.