M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
MMR diversity enforcement, and m2_uat integration tests.
M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
creator entities, session tracking, and m3_uat integration tests.
M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
ranking executor with explain mode, query executor integration, benchmarks for
query/ranking/vector/filters/diversity, and m4_uat integration tests.
Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5.7 KiB
Milestone 3, Phase 1: User and Creator Entities with Relationships
Phase Deliverable
User and creator entity types stored in their own fjall keyspaces (EntityKind::User, EntityKind::Creator) with preference embeddings, metadata, and a relationship graph. Relationship edges are (from_entity, to_entity, type, weight, timestamp) stored under the Tag::Rel key prefix. The phase delivers three user-state bitmap indexes (FollowsBitmap, UserSeenBitmap, UserBlockedSet) that power the unseen, unblocked, and relationship:follows filters in the RETRIEVE executor. Items can be efficiently filtered by "only from followed creators" and "exclude blocked creators" using roaring bitmap intersection.
This phase builds the entity and relationship foundation that m3p2 (Feedback Loop) and m3p3 (Personalized Profiles) depend on. Without user entities and relationship edges, the FOR USER clause has nothing to load and the feedback loop has nowhere to write user state.
Acceptance Criteria
db.write_user(user_id, metadata, Option<embedding>)stores user entity in the users keyspacedb.write_creator(creator_id, metadata, Option<embedding>)stores creator entity in the creators keyspacedb.write_relationship(from, to, rel_type, weight, timestamp)stores a directional weighted edgedb.read_relationship(from, to, rel_type)returnsOption<RelationshipEdge>db.list_relationships(from, rel_type)returns all edges of a type from a source entity- Relationship types:
follows,blocks,interaction_weight,hide,mute - Key encoding:
[from_entity_id][0x00][REL][type_byte][to_entity_id]for O(1) lookup and prefix scan by (from, type) FollowsBitmap::for_user(user_id)returns aRoaringBitmapof item IDs from all followed creatorsUserSeenBitmap::for_user(user_id)returns aRoaringBitmapof item IDs the user has viewedUserBlockedSet::for_user(user_id)returns a set of creator IDs the user has blocked, plus a set of item IDs the user has hidden- Relationship write/read latency < 50 microseconds (benchmarked)
- User and creator entities persist across shutdown and restart
- Relationships persist across shutdown and restart via storage engine
Dependencies
- Requires: m1p1 (types:
EntityId,EntityKind), m1p3 (storage:StorageEngine,FjallStorage, key encoding,Tag::Rel,Tag::Meta), m1p5 (entity write API pattern), m2p1 (vector index for embedding storage), m2p2 (bitmap indexes,FilterExpr,FilterResult) - Blocks: m3p2 (Feedback Loop), m3p3 (Personalized Profiles), m3p4 (User State Filters)
Research References
- docs/research/tidaldb_signal_ledger.md -- Three-tier storage, subject-prefix key encoding
- docs/research/ann_for_tidaldb.md -- User preference vector stored in embedding slot
- thoughts.md -- Part V.12 (subject-prefix keys), Part V.16 (user preference vector as database-managed embedding)
Task Index
| # | Task | Delivers | Depends On | Complexity |
|---|---|---|---|---|
| 01 | User + Creator Entity Types and Storage | UserEntity, CreatorEntity, write/read APIs, metadata codec, embedding slots |
None | M |
| 02 | Relationship Graph | RelationshipEdge, RelationshipType, storage codec, CRUD operations, prefix scan |
Task 01 | L |
| 03 | User-State Bitmap Indexes | FollowsBitmap, UserSeenBitmap, UserBlockedSet, bitmap maintenance hooks |
Task 02 | M |
Task Dependency DAG
Task 01: User + Creator Entity Types and Storage
|
v
Task 02: Relationship Graph
|
v
Task 03: User-State Bitmap Indexes
All three tasks are sequential: Task 02 needs entity types from Task 01, and Task 03 needs relationship data from Task 02 to build bitmap indexes.
File Layout
tidal/src/
entities/
mod.rs -- Entity types, UserEntity, CreatorEntity, re-exports (Task 01)
user.rs -- UserEntity, write/read, metadata codec (Task 01)
creator.rs -- CreatorEntity, write/read, metadata codec (Task 01)
relationship.rs -- RelationshipEdge, RelationshipType, storage codec, CRUD (Task 02)
user_state.rs -- FollowsBitmap, UserSeenBitmap, UserBlockedSet (Task 03)
db/
mod.rs -- Extended with user/creator/relationship public APIs
storage/
keys.rs -- Relationship key suffix encoding helpers
tidal/tests/
m3p1_entities.rs -- Phase integration tests
Open Questions
-
User preference vector initial state: When a user is created with
embedding: None, what is the initial preference vector? Options: (a) zero vector (no preference), (b) mean of population vectors (average taste), (c)Nonehandled as cold-start in the query planner. Recommendation: (c) -- the query planner detectsNoneand falls back to population-level signals. The preference vector is populated on the first signal write in m3p2. -
Creator-to-items mapping: The
FollowsBitmapneeds to know which items belong to which creator. This mapping exists in the items keyspace metadata (creator_id field). Should we maintain a reverse index[creator_id] -> [item_ids bitmap]or compute it on demand? Recommendation: maintain aCreatorItemsBitmapthat is updated on item write. This amortizes the cost at write time rather than scanning at query time. -
Relationship storage location: Relationships are directional.
user_42 follows creator_7is stored in the users keyspace under user_42's key prefix. Should we also store a reverse index in the creators keyspace? For M3, no -- we only need forward traversal (user -> creators they follow). Reverse traversal (creator -> followers) is deferred to M6 (social graph queries).