tidaldb/docs/planning/milestone-3/phase-1/OVERVIEW.md
jordan 39ada28c6e feat: complete Milestones 2–4 — RETRIEVE query, vector index, ranking profiles, diversity, entity system, sessions
M2: RETRIEVE query pipeline with 5-stage execution (candidate → filter → score → diversify → limit),
    usearch HNSW vector index, bitmap/range/universe filters, ranking profiles with signal scoring,
    MMR diversity enforcement, and m2_uat integration tests.

M3: Entity system with typed metadata, relationship graph (follows/blocks/interactions),
    creator entities, session tracking, and m3_uat integration tests.

M4: Advanced ranking with builtin functions (freshness, trending, controversy, wilson),
    ranking executor with explain mode, query executor integration, benchmarks for
    query/ranking/vector/filters/diversity, and m4_uat integration tests.

Includes: 9 new blog posts, marketing site updates, updated roadmap, and updated vision doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-21 16:24:48 -07:00

5.7 KiB

Milestone 3, Phase 1: User and Creator Entities with Relationships

Phase Deliverable

User and creator entity types stored in their own fjall keyspaces (EntityKind::User, EntityKind::Creator) with preference embeddings, metadata, and a relationship graph. Relationship edges are (from_entity, to_entity, type, weight, timestamp) stored under the Tag::Rel key prefix. The phase delivers three user-state bitmap indexes (FollowsBitmap, UserSeenBitmap, UserBlockedSet) that power the unseen, unblocked, and relationship:follows filters in the RETRIEVE executor. Items can be efficiently filtered by "only from followed creators" and "exclude blocked creators" using roaring bitmap intersection.

This phase builds the entity and relationship foundation that m3p2 (Feedback Loop) and m3p3 (Personalized Profiles) depend on. Without user entities and relationship edges, the FOR USER clause has nothing to load and the feedback loop has nowhere to write user state.

Acceptance Criteria

  • db.write_user(user_id, metadata, Option<embedding>) stores user entity in the users keyspace
  • db.write_creator(creator_id, metadata, Option<embedding>) stores creator entity in the creators keyspace
  • db.write_relationship(from, to, rel_type, weight, timestamp) stores a directional weighted edge
  • db.read_relationship(from, to, rel_type) returns Option<RelationshipEdge>
  • db.list_relationships(from, rel_type) returns all edges of a type from a source entity
  • Relationship types: follows, blocks, interaction_weight, hide, mute
  • Key encoding: [from_entity_id][0x00][REL][type_byte][to_entity_id] for O(1) lookup and prefix scan by (from, type)
  • FollowsBitmap::for_user(user_id) returns a RoaringBitmap of item IDs from all followed creators
  • UserSeenBitmap::for_user(user_id) returns a RoaringBitmap of item IDs the user has viewed
  • UserBlockedSet::for_user(user_id) returns a set of creator IDs the user has blocked, plus a set of item IDs the user has hidden
  • Relationship write/read latency < 50 microseconds (benchmarked)
  • User and creator entities persist across shutdown and restart
  • Relationships persist across shutdown and restart via storage engine

Dependencies

  • Requires: m1p1 (types: EntityId, EntityKind), m1p3 (storage: StorageEngine, FjallStorage, key encoding, Tag::Rel, Tag::Meta), m1p5 (entity write API pattern), m2p1 (vector index for embedding storage), m2p2 (bitmap indexes, FilterExpr, FilterResult)
  • Blocks: m3p2 (Feedback Loop), m3p3 (Personalized Profiles), m3p4 (User State Filters)

Research References

Task Index

# Task Delivers Depends On Complexity
01 User + Creator Entity Types and Storage UserEntity, CreatorEntity, write/read APIs, metadata codec, embedding slots None M
02 Relationship Graph RelationshipEdge, RelationshipType, storage codec, CRUD operations, prefix scan Task 01 L
03 User-State Bitmap Indexes FollowsBitmap, UserSeenBitmap, UserBlockedSet, bitmap maintenance hooks Task 02 M

Task Dependency DAG

Task 01: User + Creator Entity Types and Storage
    |
    v
Task 02: Relationship Graph
    |
    v
Task 03: User-State Bitmap Indexes

All three tasks are sequential: Task 02 needs entity types from Task 01, and Task 03 needs relationship data from Task 02 to build bitmap indexes.

File Layout

tidal/src/
  entities/
    mod.rs        -- Entity types, UserEntity, CreatorEntity, re-exports (Task 01)
    user.rs       -- UserEntity, write/read, metadata codec (Task 01)
    creator.rs    -- CreatorEntity, write/read, metadata codec (Task 01)
    relationship.rs -- RelationshipEdge, RelationshipType, storage codec, CRUD (Task 02)
    user_state.rs -- FollowsBitmap, UserSeenBitmap, UserBlockedSet (Task 03)
  db/
    mod.rs        -- Extended with user/creator/relationship public APIs
  storage/
    keys.rs       -- Relationship key suffix encoding helpers
tidal/tests/
  m3p1_entities.rs -- Phase integration tests

Open Questions

  1. User preference vector initial state: When a user is created with embedding: None, what is the initial preference vector? Options: (a) zero vector (no preference), (b) mean of population vectors (average taste), (c) None handled as cold-start in the query planner. Recommendation: (c) -- the query planner detects None and falls back to population-level signals. The preference vector is populated on the first signal write in m3p2.

  2. Creator-to-items mapping: The FollowsBitmap needs to know which items belong to which creator. This mapping exists in the items keyspace metadata (creator_id field). Should we maintain a reverse index [creator_id] -> [item_ids bitmap] or compute it on demand? Recommendation: maintain a CreatorItemsBitmap that is updated on item write. This amortizes the cost at write time rather than scanning at query time.

  3. Relationship storage location: Relationships are directional. user_42 follows creator_7 is stored in the users keyspace under user_42's key prefix. Should we also store a reverse index in the creators keyspace? For M3, no -- we only need forward traversal (user -> creators they follow). Reverse traversal (creator -> followers) is deferred to M6 (social graph queries).