M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
MMR diversity enforcement, and m2_uat integration tests.
M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
creator entities, session tracking, and m3_uat integration tests.
M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
ranking executor with explain mode, query executor integration, benchmarks for
query/ranking/vector/filters/diversity, and m4_uat integration tests.
Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
83 lines
5.7 KiB
Markdown
83 lines
5.7 KiB
Markdown
# Milestone 3, Phase 1: User and Creator Entities with Relationships
|
|
|
|
## Phase Deliverable
|
|
|
|
User and creator entity types stored in their own fjall keyspaces (`EntityKind::User`, `EntityKind::Creator`) with preference embeddings, metadata, and a relationship graph. Relationship edges are `(from_entity, to_entity, type, weight, timestamp)` stored under the `Tag::Rel` key prefix. The phase delivers three user-state bitmap indexes (`FollowsBitmap`, `UserSeenBitmap`, `UserBlockedSet`) that power the `unseen`, `unblocked`, and `relationship:follows` filters in the RETRIEVE executor. Items can be efficiently filtered by "only from followed creators" and "exclude blocked creators" using roaring bitmap intersection.
|
|
|
|
This phase builds the entity and relationship foundation that m3p2 (Feedback Loop) and m3p3 (Personalized Profiles) depend on. Without user entities and relationship edges, the `FOR USER` clause has nothing to load and the feedback loop has nowhere to write user state.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [ ] `db.write_user(user_id, metadata, Option<embedding>)` stores user entity in the users keyspace
|
|
- [ ] `db.write_creator(creator_id, metadata, Option<embedding>)` stores creator entity in the creators keyspace
|
|
- [ ] `db.write_relationship(from, to, rel_type, weight, timestamp)` stores a directional weighted edge
|
|
- [ ] `db.read_relationship(from, to, rel_type)` returns `Option<RelationshipEdge>`
|
|
- [ ] `db.list_relationships(from, rel_type)` returns all edges of a type from a source entity
|
|
- [ ] Relationship types: `follows`, `blocks`, `interaction_weight`, `hide`, `mute`
|
|
- [ ] Key encoding: `[from_entity_id][0x00][REL][type_byte][to_entity_id]` for O(1) lookup and prefix scan by (from, type)
|
|
- [ ] `FollowsBitmap::for_user(user_id)` returns a `RoaringBitmap` of item IDs from all followed creators
|
|
- [ ] `UserSeenBitmap::for_user(user_id)` returns a `RoaringBitmap` of item IDs the user has viewed
|
|
- [ ] `UserBlockedSet::for_user(user_id)` returns a set of creator IDs the user has blocked, plus a set of item IDs the user has hidden
|
|
- [ ] Relationship write/read latency < 50 microseconds (benchmarked)
|
|
- [ ] User and creator entities persist across shutdown and restart
|
|
- [ ] Relationships persist across shutdown and restart via storage engine
|
|
|
|
## Dependencies
|
|
|
|
- **Requires:** m1p1 (types: `EntityId`, `EntityKind`), m1p3 (storage: `StorageEngine`, `FjallStorage`, key encoding, `Tag::Rel`, `Tag::Meta`), m1p5 (entity write API pattern), m2p1 (vector index for embedding storage), m2p2 (bitmap indexes, `FilterExpr`, `FilterResult`)
|
|
- **Blocks:** m3p2 (Feedback Loop), m3p3 (Personalized Profiles), m3p4 (User State Filters)
|
|
|
|
## Research References
|
|
|
|
- [docs/research/tidaldb_signal_ledger.md](../../../research/tidaldb_signal_ledger.md) -- Three-tier storage, subject-prefix key encoding
|
|
- [docs/research/ann_for_tidaldb.md](../../../research/ann_for_tidaldb.md) -- User preference vector stored in embedding slot
|
|
- [thoughts.md](../../../../thoughts.md) -- Part V.12 (subject-prefix keys), Part V.16 (user preference vector as database-managed embedding)
|
|
|
|
## Task Index
|
|
|
|
| # | Task | Delivers | Depends On | Complexity |
|
|
|---|------|----------|------------|------------|
|
|
| 01 | User + Creator Entity Types and Storage | `UserEntity`, `CreatorEntity`, write/read APIs, metadata codec, embedding slots | None | M |
|
|
| 02 | Relationship Graph | `RelationshipEdge`, `RelationshipType`, storage codec, CRUD operations, prefix scan | Task 01 | L |
|
|
| 03 | User-State Bitmap Indexes | `FollowsBitmap`, `UserSeenBitmap`, `UserBlockedSet`, bitmap maintenance hooks | Task 02 | M |
|
|
|
|
## Task Dependency DAG
|
|
|
|
```
|
|
Task 01: User + Creator Entity Types and Storage
|
|
|
|
|
v
|
|
Task 02: Relationship Graph
|
|
|
|
|
v
|
|
Task 03: User-State Bitmap Indexes
|
|
```
|
|
|
|
All three tasks are sequential: Task 02 needs entity types from Task 01, and Task 03 needs relationship data from Task 02 to build bitmap indexes.
|
|
|
|
## File Layout
|
|
|
|
```
|
|
tidal/src/
|
|
entities/
|
|
mod.rs -- Entity types, UserEntity, CreatorEntity, re-exports (Task 01)
|
|
user.rs -- UserEntity, write/read, metadata codec (Task 01)
|
|
creator.rs -- CreatorEntity, write/read, metadata codec (Task 01)
|
|
relationship.rs -- RelationshipEdge, RelationshipType, storage codec, CRUD (Task 02)
|
|
user_state.rs -- FollowsBitmap, UserSeenBitmap, UserBlockedSet (Task 03)
|
|
db/
|
|
mod.rs -- Extended with user/creator/relationship public APIs
|
|
storage/
|
|
keys.rs -- Relationship key suffix encoding helpers
|
|
tidal/tests/
|
|
m3p1_entities.rs -- Phase integration tests
|
|
```
|
|
|
|
## Open Questions
|
|
|
|
1. **User preference vector initial state**: When a user is created with `embedding: None`, what is the initial preference vector? Options: (a) zero vector (no preference), (b) mean of population vectors (average taste), (c) `None` handled as cold-start in the query planner. Recommendation: (c) -- the query planner detects `None` and falls back to population-level signals. The preference vector is populated on the first signal write in m3p2.
|
|
|
|
2. **Creator-to-items mapping**: The `FollowsBitmap` needs to know which items belong to which creator. This mapping exists in the items keyspace metadata (creator_id field). Should we maintain a reverse index `[creator_id] -> [item_ids bitmap]` or compute it on demand? Recommendation: maintain a `CreatorItemsBitmap` that is updated on item write. This amortizes the cost at write time rather than scanning at query time.
|
|
|
|
3. **Relationship storage location**: Relationships are directional. `user_42 follows creator_7` is stored in the users keyspace under user_42's key prefix. Should we also store a reverse index in the creators keyspace? For M3, no -- we only need forward traversal (user -> creators they follow). Reverse traversal (creator -> followers) is deferred to M6 (social graph queries).
|