tidaldb/docs/planning/milestone-3/phase-1/OVERVIEW.md
jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions
M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:24:48 -07:00

83 lines
5.7 KiB
Markdown

# Milestone 3, Phase 1: User and Creator Entities with Relationships
## Phase Deliverable
User and creator entity types stored in their own fjall keyspaces (`EntityKind::User`, `EntityKind::Creator`) with preference embeddings, metadata, and a relationship graph. Relationship edges are `(from_entity, to_entity, type, weight, timestamp)` stored under the `Tag::Rel` key prefix. The phase delivers three user-state bitmap indexes (`FollowsBitmap`, `UserSeenBitmap`, `UserBlockedSet`) that power the `unseen`, `unblocked`, and `relationship:follows` filters in the RETRIEVE executor. Items can be efficiently filtered by "only from followed creators" and "exclude blocked creators" using roaring bitmap intersection.
This phase builds the entity and relationship foundation that m3p2 (Feedback Loop) and m3p3 (Personalized Profiles) depend on. Without user entities and relationship edges, the `FOR USER` clause has nothing to load and the feedback loop has nowhere to write user state.
## Acceptance Criteria
- [ ] `db.write_user(user_id, metadata, Option<embedding>)` stores user entity in the users keyspace
- [ ] `db.write_creator(creator_id, metadata, Option<embedding>)` stores creator entity in the creators keyspace
- [ ] `db.write_relationship(from, to, rel_type, weight, timestamp)` stores a directional weighted edge
- [ ] `db.read_relationship(from, to, rel_type)` returns `Option<RelationshipEdge>`
- [ ] `db.list_relationships(from, rel_type)` returns all edges of a type from a source entity
- [ ] Relationship types: `follows`, `blocks`, `interaction_weight`, `hide`, `mute`
- [ ] Key encoding: `[from_entity_id][0x00][REL][type_byte][to_entity_id]` for O(1) lookup and prefix scan by (from, type)
- [ ] `FollowsBitmap::for_user(user_id)` returns a `RoaringBitmap` of item IDs from all followed creators
- [ ] `UserSeenBitmap::for_user(user_id)` returns a `RoaringBitmap` of item IDs the user has viewed
- [ ] `UserBlockedSet::for_user(user_id)` returns a set of creator IDs the user has blocked, plus a set of item IDs the user has hidden
- [ ] Relationship write/read latency < 50 microseconds (benchmarked)
- [ ] User and creator entities persist across shutdown and restart
- [ ] Relationships persist across shutdown and restart via storage engine
## Dependencies
- **Requires:** m1p1 (types: `EntityId`, `EntityKind`), m1p3 (storage: `StorageEngine`, `FjallStorage`, key encoding, `Tag::Rel`, `Tag::Meta`), m1p5 (entity write API pattern), m2p1 (vector index for embedding storage), m2p2 (bitmap indexes, `FilterExpr`, `FilterResult`)
- **Blocks:** m3p2 (Feedback Loop), m3p3 (Personalized Profiles), m3p4 (User State Filters)
## Research References
- [docs/research/tidaldb_signal_ledger.md](../../../research/tidaldb_signal_ledger.md) -- Three-tier storage, subject-prefix key encoding
- [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- User preference vector stored in embedding slot
- [thoughts.md](../../../../thoughts.md) -- Part V.12 (subject-prefix keys), Part V.16 (user preference vector as database-managed embedding)
## Task Index
| # | Task | Delivers | Depends On | Complexity |
|---|------|----------|------------|------------|
| 01 | User + Creator Entity Types and Storage | `UserEntity`, `CreatorEntity`, write/read APIs, metadata codec, embedding slots | None | M |
| 02 | Relationship Graph | `RelationshipEdge`, `RelationshipType`, storage codec, CRUD operations, prefix scan | Task 01 | L |
| 03 | User-State Bitmap Indexes | `FollowsBitmap`, `UserSeenBitmap`, `UserBlockedSet`, bitmap maintenance hooks | Task 02 | M |
## Task Dependency DAG
```
Task 01: User + Creator Entity Types and Storage
|
v
Task 02: Relationship Graph
|
v
Task 03: User-State Bitmap Indexes
```
All three tasks are sequential: Task 02 needs entity types from Task 01, and Task 03 needs relationship data from Task 02 to build bitmap indexes.
## File Layout
```
tidal/src/
entities/
mod.rs -- Entity types, UserEntity, CreatorEntity, re-exports (Task 01)
user.rs -- UserEntity, write/read, metadata codec (Task 01)
creator.rs -- CreatorEntity, write/read, metadata codec (Task 01)
relationship.rs -- RelationshipEdge, RelationshipType, storage codec, CRUD (Task 02)
user_state.rs -- FollowsBitmap, UserSeenBitmap, UserBlockedSet (Task 03)
db/
mod.rs -- Extended with user/creator/relationship public APIs
storage/
keys.rs -- Relationship key suffix encoding helpers
tidal/tests/
m3p1_entities.rs -- Phase integration tests
```
## Open Questions
1. **User preference vector initial state**: When a user is created with `embedding: None`, what is the initial preference vector? Options: (a) zero vector (no preference), (b) mean of population vectors (average taste), (c) `None` handled as cold-start in the query planner. Recommendation: (c) -- the query planner detects `None` and falls back to population-level signals. The preference vector is populated on the first signal write in m3p2.
2. **Creator-to-items mapping**: The `FollowsBitmap` needs to know which items belong to which creator. This mapping exists in the items keyspace metadata (creator_id field). Should we maintain a reverse index `[creator_id] -> [item_ids bitmap]` or compute it on demand? Recommendation: maintain a `CreatorItemsBitmap` that is updated on item write. This amortizes the cost at write time rather than scanning at query time.
3. **Relationship storage location**: Relationships are directional. `user_42 follows creator_7` is stored in the users keyspace under user_42's key prefix. Should we also store a reverse index in the creators keyspace? For M3, no -- we only need forward traversal (user -> creators they follow). Reverse traversal (creator -> followers) is deferred to M6 (social graph queries).