# Specification: Relationships Relationships are first-class edges between entities in tidalDB. They model the social graph, content interactions, and behavioral affinity that power personalized ranking, content filtering, and candidate generation. This specification covers edge types, storage format, graph traversal, weight update mechanics, collaborative filtering, and integration points with the query engine, signal system, and ranking pipeline. --- ## Table of Contents - [1. Design Principles](#1-design-principles) - [2. Edge Type Reference](#2-edge-type-reference) - [3. Edge Properties](#3-edge-properties) - [4. Directionality](#4-directionality) - [5. Storage Format](#5-storage-format) - [6. Social Graph Queries](#6-social-graph-queries) - [7. Relationship-Based Candidate Generation](#7-relationship-based-candidate-generation) - [8. Weight Update Mechanics](#8-weight-update-mechanics) - [9. Collaborative Filtering Edges](#9-collaborative-filtering-edges) - [10. Relationship Lifecycle](#10-relationship-lifecycle) - [11. Scale Considerations](#11-scale-considerations) - [12. Integration Points](#12-integration-points) - [13. Performance Targets](#13-performance-targets) --- ## 1. Design Principles **Relationships are not metadata.** A "follows" edge is not a row in a junction table. It is a live, weighted, directional edge that participates in ranking queries, drives candidate generation, and updates atomically with signal events. **Implicit relationships are database-managed.** The application writes explicit relationships (follows, blocks). The database derives implicit relationships (interaction_weight, engagement_affinity, similarity) from signal events. The application never computes these. **Traversal is a query primitive.** "Content from creators I follow" and "content engaged by people I follow" are not application-level loops over API calls. They are query-level operations that the database executes with fan-out control and weight filtering. **Negative relationships are hard constraints.** A blocked edge is not a negative weight in a scoring function. It is a permanent exclusion predicate evaluated before any scoring occurs. Blocked content never enters the candidate set. --- ## 2. Edge Type Reference ### Explicit Relationships (Application-Written) These are created and deleted by the application via `db.write_relationship()` and `db.delete_relationship()`. | Kind | From | To | Weight | Decay | Semantics | |------|------|----|--------|-------|-----------| | `follows` | User | Creator | 1.0 (binary) | Permanent | Subscription. Powers Following feed candidate set. | | `blocked` | User | Creator | 1.0 (binary) | Permanent | Hard exclusion. All content by this creator is excluded from every query for this user. | | `blocked` | User | Item | 1.0 (binary) | Permanent | Hard exclusion. This item is excluded from every query for this user. | | `muted` | User | Creator | 1.0 (binary) | Permanent | Soft exclusion. Excluded from algorithmic feeds and notifications. Visible in explicit search and Following feed if also followed. | | `saved` | User | Item | 1.0 (binary) | Permanent | Bookmark. Powers `Filter::user_state("saved")` and `Sort::DateSaved`. | | `subscribed` | User | Collection | 1.0 (binary) | Permanent | Collection subscription. User receives updates when items are added. | | `member_of` | Creator | Community | 1.0 (binary) | Permanent | Community membership. Powers community-scoped queries. | ### Implicit Relationships (Database-Computed) These are created and updated by the database as a side-effect of signal writes. The application never writes these directly. | Kind | From | To | Weight | Decay | Semantics | |------|------|----|--------|-------|-----------| | `interaction_weight` | User | Creator | 0.0 - 1.0 | Slow (30d half-life) | Cumulative engagement strength. Updated on every signal involving this user and any item by this creator. Powers notification prioritization, social proof, and personalized ranking boosts. | | `engagement_affinity` | User | Item | 0.0 - 1.0 | Medium (7d half-life) | Per-item engagement depth. Computed from signal history (view, like, completion, dwell_time). Powers "continue watching," user library sorting, and user state filters. | | `similarity` | Item | Item | 0.0 - 1.0 | Recomputed periodically | Co-engagement similarity. "Users who engaged with A also engaged with B." Powers related content and up-next queries. | | `creator_similarity` | Creator | Creator | 0.0 - 1.0 | Recomputed periodically | Catalog embedding similarity between creators. Powers "creators like X" discovery (UC-10). | --- ## 3. Edge Properties Every relationship edge carries the following properties: ```rust pub struct RelationshipEdge { /// The relationship type. pub kind: RelationshipKind, /// Source entity. pub from_type: EntityKind, pub from_id: EntityId, /// Target entity. pub to_type: EntityKind, pub to_id: EntityId, /// Edge weight. /// 1.0 for binary relationships (follows, blocked, saved). /// 0.0-1.0 for weighted relationships (interaction_weight, similarity). pub weight: f64, /// When the edge was created or last updated. pub timestamp: Timestamp, /// Optional metadata. Used sparingly. /// Examples: notification preferences on follows, save context. pub metadata: Option>, } ``` ### RelationshipKind ```rust #[derive(Clone, Copy, PartialEq, Eq, Hash)] pub enum RelationshipKind { // Explicit Follows, Blocked, Muted, Saved, Subscribed, MemberOf, // Implicit InteractionWeight, EngagementAffinity, Similarity, CreatorSimilarity, } ``` ### Weight Semantics | Weight Value | Meaning | |-------------|---------| | 0.0 | No relationship or fully decayed | | 0.0 - 0.3 | Weak affinity (occasional engagement) | | 0.3 - 0.7 | Moderate affinity (regular engagement) | | 0.7 - 1.0 | Strong affinity (frequent, deep engagement) | | 1.0 | Binary positive (follows, saved) or maximum affinity | Weight is always in the closed interval [0.0, 1.0]. The system clamps after every update. --- ## 4. Directionality ### Unidirectional A follows B does not imply B follows A. Storage and traversal treat each direction independently. **Types:** `follows`, `blocked`, `muted`, `saved`, `subscribed`, `member_of`, `interaction_weight`, `engagement_affinity` **Storage:** Only the forward edge (from -> to) is stored. Reverse lookups use the reverse index. ### Bidirectional (Symmetric) A is similar to B implies B is similar to A with the same weight. **Types:** `similarity`, `creator_similarity` **Storage:** Store only one edge per pair, using canonical ordering (lower entity ID first). The reverse index provides lookups from either direction. This halves storage for symmetric relationships. ### Asymmetric A's interaction_weight toward B can differ from B's interaction_weight toward A. A fan's weight toward a creator is high; the creator's weight toward that individual fan may be zero. **Types:** `interaction_weight` **Storage:** Each direction is a separate edge. Forward and reverse indexes reflect the actual asymmetric weights. ### Directionality Impact on Storage ``` Unidirectional (follows, blocked, muted, saved): Forward: user_123 -> creator_xyz (stored) Reverse: creator_xyz <- user_123 (indexed for reverse lookup) user_123 -> creator_abc (separate edge, independent) Bidirectional (similarity): Canonical: item_aaa -> item_bbb (stored, aaa < bbb lexicographically) Lookup: item_bbb -> item_aaa (served from reverse index, same weight) Asymmetric (interaction_weight): Forward: user_123 -> creator_xyz weight=0.85 (stored) Forward: creator_xyz -> user_123 weight=0.02 (separate edge, different weight) ``` --- ## 5. Storage Format ### Dual-Index Architecture Relationships use a dual-index storage layout: a forward index for outbound edge traversal and a reverse index for inbound edge lookups. ``` Forward Index: "Who does this entity relate to?" Key: [from_id: u64 BE][0x00][REL:{kind}:{to_id: u64 BE}] Value: [weight: f64][timestamp: u64 BE][metadata_len: u16][metadata: var] Reverse Index: "Who relates to this entity?" Key: [to_id: u64 BE][0x00][RREL:{kind}:{from_id: u64 BE}] Value: [weight: f64][timestamp: u64 BE] ``` This follows the subject-prefix key encoding pattern established in CODING_GUIDELINES.md Section 2. All relationships for a single entity are co-located under its entity ID prefix, enabling efficient prefix scans. ### Key Encoding Detail ``` Example: user_123 follows creator_xyz Forward key: [00 00 00 00 00 00 00 7B] -- user_123 as u64 BE [00] -- separator [52 45 4C 3A] -- "REL:" tag [66 6F 6C 6C 6F 77 73] -- "follows" [3A] -- ":" [00 00 00 00 00 00 01 86] -- creator_xyz as u64 BE Reverse key: [00 00 00 00 00 00 01 86] -- creator_xyz as u64 BE [00] -- separator [52 52 45 4C 3A] -- "RREL:" tag [66 6F 6C 6C 6F 77 73] -- "follows" [3A] -- ":" [00 00 00 00 00 00 00 7B] -- user_123 as u64 BE ``` ### Prefix Scan Patterns | Query | Prefix | |-------|--------| | All relationships from user_123 | `[user_123][0x00]REL:` | | All "follows" edges from user_123 | `[user_123][0x00]REL:follows:` | | All followers of creator_xyz | `[creator_xyz][0x00]RREL:follows:` | | All blocked creators for user_123 | `[user_123][0x00]REL:blocked:` | | All users who blocked creator_bad | `[creator_bad][0x00]RREL:blocked:` | ### Sorted Storage for Top-K For weighted relationship types (`interaction_weight`, `similarity`, `creator_similarity`), an additional weight-sorted index enables efficient top-K retrieval without scanning all edges: ``` Weight-Sorted Index: Key: [from_id: u64 BE][0x00][RELW:{kind}:{inverted_weight: u64 BE}:{to_id: u64 BE}] Value: (empty -- weight is encoded in key) ``` The weight is stored as `u64::MAX - weight.to_bits()` so that byte-lexicographic ordering yields descending weight order. A prefix scan on `[from_id][0x00]RELW:interaction_weight:` returns edges sorted by weight, highest first, enabling early termination for top-K queries. ### Storage Namespace Relationships are stored in a dedicated storage namespace (column family / keyspace), separate from entity metadata and signal ledgers. This ensures that relationship-heavy operations (social graph traversal, fan-out queries) do not contend with signal writes or metadata reads. ``` Namespace: "relationships" Forward index: REL:{kind}:{to_id} Reverse index: RREL:{kind}:{from_id} Weight-sorted index: RELW:{kind}:{inverted_weight}:{to_id} ``` ### Storage Layout Diagram ``` Entity Storage (separate namespace) +------------------------------------------+ | [entity_id][0x00]META -> metadata blob | | [entity_id][0x00]EMB -> embedding vec | +------------------------------------------+ Signal Storage (separate namespace) +------------------------------------------+ | [entity_id][0x00]SIG:view:24h -> agg | | [entity_id][0x00]SIG:like:7d -> agg | +------------------------------------------+ Relationship Storage (this spec) +------------------------------------------+ | Forward edges: | | [user_A][0x00]REL:follows:creator_X | | [user_A][0x00]REL:follows:creator_Y | | [user_A][0x00]REL:blocked:creator_Z | | [user_A][0x00]REL:interaction_weight:cX | | | | Reverse edges: | | [creator_X][0x00]RREL:follows:user_A | | [creator_X][0x00]RREL:follows:user_B | | [creator_X][0x00]RREL:follows:user_C | | | | Weight-sorted edges: | | [user_A][0x00]RELW:interaction_weight:... | +------------------------------------------+ ``` --- ## 6. Social Graph Queries ### Depth-Limited BFS Traversal Social graph queries retrieve entity IDs reachable within a bounded number of hops. The result is a set of entity IDs used as a candidate filter or boost source in ranking queries. ``` Algorithm: depth_limited_bfs(start, edge_kind, max_depth, max_fan_out, min_weight) Input: start: EntityId -- the user whose graph we traverse edge_kind: RelationshipKind -- which edges to follow (e.g., follows) max_depth: u8 -- maximum hops (1 or 2 in practice) max_fan_out: u32 -- max edges to traverse per node per hop min_weight: f64 -- skip edges below this weight Output: Set -- all entities reachable within constraints ``` ### Traversal Pseudocode ``` fn depth_limited_bfs( start: EntityId, edge_kind: RelationshipKind, max_depth: u8, max_fan_out: u32, min_weight: f64, ) -> HashSet { let mut visited: HashSet = HashSet::new(); let mut current_frontier: Vec = vec![start]; let mut result: HashSet = HashSet::new(); for depth in 0..max_depth { let mut next_frontier: Vec = Vec::new(); for entity_id in ¤t_frontier { if !visited.insert(*entity_id) { continue; // already visited } // Scan forward edges, weight-sorted, up to max_fan_out let edges = scan_forward_edges_by_weight( *entity_id, edge_kind, min_weight, max_fan_out, ); for edge in edges { result.insert(edge.to_id); next_frontier.push(edge.to_id); } } current_frontier = next_frontier; } result } ``` ### Depth 1: Direct Graph "Content from creators I follow" -- one hop from user to creators, then items by those creators. ``` User @u123 --follows--> Creator A Creator B Creator C Result: {Creator A, Creator B, Creator C} Candidate set: items where creator_id IN result ``` ### Depth 2: Extended Social Graph "Content engaged by people my follows follow" -- two hops through the social graph. ``` User @u123 --follows--> Creator A --engaged_by--> User X --follows--> Creator D Creator B User Y Creator E Creator C Hop 1: {Creator A, Creator B, Creator C} Hop 2: Traverse items engaged by followers of Creator A/B/C, collect their creators or items Result: Items engaged by users who also follow @u123's followed creators ``` ### Fan-Out Control Without fan-out limits, a depth-2 traversal through a creator with 1M followers is catastrophic. Fan-out control bounds the work at each hop. | Parameter | Default | Purpose | |-----------|---------|---------| | `max_fan_out` | 100 per hop | Caps edges traversed per node. Top-K by weight, so highest-affinity edges are always included. | | `min_weight` | 0.0 | Filters low-affinity edges. Setting to 0.1 skips decayed interaction weights. | | `max_depth` | 2 | Hard cap on traversal depth. Depth 3+ is computationally expensive and produces noisy results. | ### Weight-Filtered Traversal For `interaction_weight` edges, the weight-sorted index enables efficient traversal of only high-affinity connections: ``` // "Content from creators I interact with most" // Only follows where interaction_weight > 0.3 depth_limited_bfs( start: user_123, edge_kind: Follows, // traversal edge max_depth: 1, max_fan_out: 50, min_weight: 0.3, // interaction_weight threshold ) ``` This requires a join between the `follows` edge set and the `interaction_weight` edge set for the same user. The implementation reads the `follows` forward index and lookups each target's `interaction_weight` to apply the threshold. For users with fewer than ~500 follows, this is efficient. For users with thousands of follows, the weight-sorted index on `interaction_weight` is used as the primary scan, filtered against the `follows` set. --- ## 7. Relationship-Based Candidate Generation ### Candidate Sources Ranking profiles reference relationships for candidate generation and scoring: ```rust // Following feed: candidates are items from followed creators Candidate::Relationship { edge: "follows" } // Traverses user -> follows -> creator, then retrieves items by those creators // Social graph scoped: candidates engaged by extended graph Candidate::SocialGraph { depth: 2, edge: "follows", min_weight: 0.1 } // BFS through social graph, collects item IDs engaged by discovered users ``` ### Filter Integration Relationships power several query filters defined in API.md: | Filter | Relationship Used | Behavior | |--------|-------------------|----------| | `Filter::relationship("follows")` | `follows` edge | Items from followed creators only | | `Filter::social_graph(user_id, Depth::Two)` | `follows` + `interaction_weight` | Items engaged by extended social graph | | `Filter::not_blocked()` | `blocked` edge | Exclude items/creators with blocked edges | | `Filter::unseen()` | `engagement_affinity` edge (existence check) | Exclude items with any engagement_affinity edge | | `Filter::user_state("saved")` | `saved` edge | Include only saved items | | `Filter::user_state("liked")` | Signal history (not a relationship) | -- | | `Filter::creator_followed_by_user()` | `follows` edge | Items from followed creators | | `Filter::creator_new_to_user()` | No `interaction_weight` or `engagement_affinity` edges | Creators the user has never engaged with | ### Exclude Predicates in Ranking Profiles ```rust // Ranking profile exclusions evaluated before scoring begins Exclude::relationship("blocked") // Loads user's blocked edge set -> Roaring bitmap of excluded entity IDs // Applied as a pre-filter on the candidate set // Cost: one prefix scan on [user_id][0x00]REL:blocked: at query start ``` ### Boost Predicates in Ranking Profiles ```rust // Ranking profile boosts applied during scoring Boost::relationship("interaction_weight", 0.2) // For each candidate item: // 1. Lookup item's creator_id // 2. Lookup user -> creator interaction_weight edge // 3. Multiply weight * boost_factor and add to score // Cost: one key lookup per candidate (amortized by creator dedup) Boost::social_proof(0.15) // For each candidate item: // 1. Reverse-lookup: which users in the social graph engaged with this item? // 2. Count or weight-sum those engagements // 3. Multiply by boost_factor and add to score // Cost: one reverse scan per candidate (expensive -- cache at query start) ``` ### Social Proof Implementation Social proof requires knowing which items were engaged by users in the querying user's social graph. This is expensive to compute per-candidate. The implementation strategy: 1. At query start, compute the social graph set: `graph_users = bfs(user, follows, depth=2, fan_out=100)` 2. For each user in `graph_users`, load their recent engagement_affinity edges: `engaged_items = scan(graph_user, engagement_affinity, limit=50)` 3. Build a HashMap: `social_proof_map: HashMap` mapping item IDs to aggregate social proof scores 4. During candidate scoring, lookup `social_proof_map.get(candidate_id)` -- O(1) per candidate This pre-computation runs once per query and is bounded by `depth * fan_out * engagement_limit = 2 * 100 * 50 = 10,000` edge reads. At ~1 microsecond per edge read, this is ~10ms -- within budget if cached across pagination requests. --- ## 8. Weight Update Mechanics ### Signal-Driven Weight Updates When a signal event is written, the database atomically updates implicit relationship weights as part of the same transaction. These updates are not configurable by the application -- they are hardcoded behavior of each signal type. ### Interaction Weight Update `interaction_weight` (user -> creator) is the primary implicit relationship. It captures how much a user engages with a specific creator's content. **Update formula:** ``` On signal event for item I by creator C, from user U: current = load_edge(U, interaction_weight, C) // Apply decay since last update dt = now - current.timestamp decayed = current.weight * exp(-lambda_iw * dt) // Apply signal delta delta = signal_weight_map[signal_kind] new_weight = clamp(decayed + delta, 0.0, 1.0) // Store store_edge(U, interaction_weight, C, new_weight, now) ``` **Signal weight map (delta per signal type):** | Signal | Delta | Rationale | |--------|-------|-----------| | `view` | +0.01 | Weak positive. Viewing is passive. | | `completion` | +0.03 * completion_ratio | Moderate positive, scaled by how much was consumed. | | `like` | +0.05 | Strong positive. Explicit approval. | | `share` | +0.07 | Very strong positive. Social endorsement. | | `comment` | +0.04 | Strong positive. Active engagement. | | `save` | +0.03 | Moderate positive. Intent to return. | | `skip` | -0.02 | Weak negative. Single skip is noisy. | | `hide` | -0.10 | Strong negative. Explicit rejection. | | `not_interested` | -0.08 | Strong negative. Topic-level rejection. | | `block` | -> 0.0 | Zeroes weight. Also creates blocked edge. | **Decay rate:** lambda_iw = ln(2) / (30 * 24 * 3600) -- 30-day half-life. Without reinforcing signals, interaction_weight decays to half its value in 30 days. **Bounds:** Weight is clamped to [0.0, 1.0] after every update. A weight that decays below a threshold (0.001) is pruned from storage to bound edge count. ### Engagement Affinity Update `engagement_affinity` (user -> item) tracks per-item engagement depth. **Update formula:** ``` On signal event for item I from user U: current = load_edge(U, engagement_affinity, I) // If no edge exists, create with initial weight if current is None: weight = signal_affinity_map[signal_kind] store_edge(U, engagement_affinity, I, weight, now) return // Existing edge: decay + increment dt = now - current.timestamp decayed = current.weight * exp(-lambda_ea * dt) delta = signal_affinity_map[signal_kind] new_weight = clamp(decayed + delta, 0.0, 1.0) store_edge(U, engagement_affinity, I, new_weight, now) ``` **Signal affinity map:** | Signal | Delta | |--------|-------| | `view` | +0.10 | | `completion` | +0.30 * completion_ratio | | `like` | +0.25 | | `share` | +0.20 | | `save` | +0.15 | | `skip` | -0.15 | | `hide` | -> sets to 0.0, creates permanent exclusion | **Decay rate:** lambda_ea = ln(2) / (7 * 24 * 3600) -- 7-day half-life. Item-level affinity decays faster than creator-level, reflecting the transient nature of individual content interest. ### Block Cascade When a user blocks a creator, a cascade of relationship updates occurs atomically: ``` On block(user U, creator C): 1. Create edge: (U, blocked, C, weight=1.0, now) 2. Delete edge: (U, follows, C) -- if exists 3. Set edge: (U, interaction_weight, C, weight=0.0, now) 4. For each item I by creator C where edge(U, engagement_affinity, I) exists: Set edge: (U, engagement_affinity, I, weight=0.0, now) // Note: do NOT delete -- the zero-weight edge serves as a // permanent exclusion marker for unseen filters ``` The cascade is bounded by the number of items the user has engaged with from this creator, which is typically small (tens, not thousands). For the rare case of blocking a creator after extensive engagement, the cascade is still O(items_engaged), not O(creator_catalog_size). ### Mute Behavior Muting is less aggressive than blocking: ``` On mute(user U, creator C): 1. Create edge: (U, muted, C, weight=1.0, now) // No cascade. No weight changes. No follows removal. // The muted edge is checked during candidate filtering: // - Excluded from algorithmic feeds (for_you, trending, browse) // - Excluded from notifications // - Included in Following feed if the user also follows this creator // - Included in explicit search results ``` --- ## 9. Collaborative Filtering Edges ### Item-Item Similarity `similarity` (item -> item) captures co-engagement patterns: "users who engaged with A also engaged with B." This is not computed in real-time. It is a background job that periodically recomputes similarity edges. ### Computation ``` Algorithm: compute_item_similarity() For each item A with sufficient engagement (min 50 unique engagers): engagers_A = set of users with engagement_affinity edge to A For each item B where |engagers_A intersection engagers_B| >= min_co_engagement: jaccard = |engagers_A & engagers_B| / |engagers_A | engagers_B| // Weight by engagement depth: users who completed both // contribute more than users who merely viewed both weighted_sim = sum( min(affinity(u, A), affinity(u, B)) for u in engagers_A & engagers_B ) / max(|engagers_A|, |engagers_B|) similarity = 0.5 * jaccard + 0.5 * weighted_sim if similarity > min_threshold: store_edge(A, similarity, B, similarity, now) ``` ### Top-K Storage The full N x N similarity matrix is intractable. For 10M items, even 0.1% density means 10 billion edges. Instead, store only the top-K most similar items per item: | Parameter | Value | Rationale | |-----------|-------|-----------| | K (max similar items per item) | 50 | Sufficient for related/up-next queries. Beyond 50, similarity scores are noise. | | min_threshold | 0.05 | Below this, co-engagement is likely coincidental. | | min_co_engagement | 5 users | Below this, sample size is too small for meaningful similarity. | | min_engagers | 50 | Items with fewer engagers lack signal for collaborative filtering. Cold-start items use embedding similarity instead. | ### Update Frequency | Update Type | Frequency | Scope | |-------------|-----------|-------| | Full recomputation | Weekly | All items with sufficient engagement | | Incremental update | Hourly | Items whose engagement count changed by >10% since last computation | | Hot item update | Every 15 minutes | Items in the top 1% by engagement velocity | ### Use in Ranking The `related` profile uses similarity edges for candidate generation: ```rust // In the related/up-next profile: // Given anchor item A, retrieve candidates via: // 1. ANN: items near A's embedding (semantic similarity) // 2. Collaborative: items with similarity edges from A (co-engagement) // Merge both candidate sets, deduplicate, then score Candidate::Hybrid { ann: AnnSource { anchor: item_A, top_k: 200 }, collaborative: CollaborativeSource { anchor: item_A, top_k: 100 }, merge: Merge::Union, } ``` ### Creator-Creator Similarity `creator_similarity` (creator -> creator) is simpler: it uses embedding distance between creator vectors, not co-engagement patterns. ``` For each creator A: top_k_similar = ann_query(A.embedding, top_k=20, entity_type=Creator) for (creator_B, distance) in top_k_similar: similarity = 1.0 - distance // assuming normalized L2 store_edge(A, creator_similarity, B, similarity, now) ``` Update frequency: whenever a creator's embedding is recomputed (typically when their catalog changes significantly). --- ## 10. Relationship Lifecycle ### Create **Explicit relationships:** ```rust db.write_relationship(Relationship { kind: "follows", from: ("user", "user_123"), to: ("creator", "creator_xyz"), weight: 1.0, timestamp: Utc::now(), })?; ``` On commit: 1. Write forward edge: `[user_123][0x00]REL:follows:creator_xyz` 2. Write reverse edge: `[creator_xyz][0x00]RREL:follows:user_123` 3. Append to WAL for durability 4. If follows: initialize `interaction_weight` edge if none exists (weight = 0.1, giving the new follow a small initial boost) **Implicit relationships:** Created automatically on first signal event involving the user-entity pair. No API call needed. ### Update **Explicit relationships:** Idempotent write. Writing a follows edge that already exists updates the timestamp. Weight changes on explicit relationships are not supported -- they are binary. **Implicit relationships:** Updated atomically as part of signal processing. See Section 8 for formulas. ### Delete **Explicit relationships:** ```rust db.delete_relationship(RelationshipDelete { kind: "follows", from: ("user", "user_123"), to: ("creator", "creator_xyz"), })?; ``` On commit: 1. Delete forward edge 2. Delete reverse edge 3. Delete weight-sorted entry (if applicable) 4. Append tombstone to WAL 5. If unfollowing: decay `interaction_weight` by 50% immediately (the user is actively disengaging) **Implicit relationships:** Never explicitly deleted by the application. They decay to zero over time (pruned when weight < 0.001) or are zeroed by cascade operations (block). ### Cascade on Block See Section 8 "Block Cascade" for the full cascade behavior. Summary: ``` block(user, creator): + create blocked edge - delete follows edge - zero interaction_weight - zero all engagement_affinity edges to creator's items ``` ### Cascade on Unblock ``` unblock(user, creator): - delete blocked edge // Does NOT restore follows, interaction_weight, or engagement_affinity. // The user must explicitly re-follow. // interaction_weight starts from zero and rebuilds organically. ``` --- ## 11. Scale Considerations ### Power-Law Distribution Social graphs follow a power-law distribution. Most users follow 10-100 creators. Some follow 10,000+. Most creators have 100-10,000 followers. Some have millions. | Metric | Median | p99 | Max | |--------|--------|-----|-----| | Follows per user | 50 | 2,000 | 50,000 | | Followers per creator | 500 | 100,000 | 10,000,000 | | Interaction_weight edges per user | 30 | 500 | 5,000 | | Engagement_affinity edges per user | 200 | 5,000 | 50,000 | | Similarity edges per item | 20 | 50 | 50 (capped) | ### Storage Budget For 1M users and 100K creators: | Relationship Type | Edge Count (est.) | Bytes per Edge | Total Storage | |-------------------|-------------------|----------------|---------------| | follows | 50M (50 avg/user) | 40 bytes (fwd + rev) | 2.0 GB | | blocked | 2M (2 avg/user) | 40 bytes | 80 MB | | muted | 500K | 40 bytes | 20 MB | | saved | 20M (20 avg/user) | 40 bytes | 800 MB | | interaction_weight | 30M (30 active creators/user) | 56 bytes (fwd + rev + weight-sorted) | 1.7 GB | | engagement_affinity | 200M (200 items/user) | 40 bytes | 8.0 GB | | similarity | 50M (50/item, 1M items) | 24 bytes (canonical + rev) | 1.2 GB | | creator_similarity | 2M (20/creator, 100K creators) | 24 bytes | 48 MB | | **Total** | | | **~14 GB** | ### Hot Relationship State For sub-millisecond ranking query performance, frequently accessed relationship data must be cached in memory: **Must be in memory:** - Blocked edges for the querying user (loaded once at query start, used as exclusion bitmap) - Muted edges for the querying user (loaded once at query start) - Follows edges for the querying user (loaded for Following feed candidate generation) **Should be in memory (LRU cache):** - Interaction_weight edges for the querying user (top-K by weight, for ranking boosts) - Social proof map (computed per query, cached across pagination) **Disk-resident (acceptable latency):** - Similarity edges (read during related/up-next queries, not on hot path for feed queries) - Engagement_affinity edges (read for user state filters, indexed for fast existence checks) - Reverse indexes (read for follower count, notification queries) ### Memory Budget for Relationship Cache | Component | Size | Notes | |-----------|------|-------| | Per-user blocked set | ~64 bytes + 8 bytes per blocked entity | Roaring bitmap. Median: ~80 bytes. | | Per-user muted set | ~64 bytes + 8 bytes per muted entity | Roaring bitmap. Median: ~72 bytes. | | Per-user follows set | ~64 bytes + 8 bytes per follow | Roaring bitmap. Median: ~464 bytes. | | Per-user top-50 interaction weights | 50 * 16 bytes = 800 bytes | (entity_id, weight) pairs. | | Social proof map (per query) | ~80 KB for 10K entries | HashMap. Ephemeral. | For 10K concurrent users with cached relationship state: ~15 MB. Well within the memory budget. ### Fan-Out for Popular Creators A creator with 1M followers means the reverse index `[creator_id][0x00]RREL:follows:` prefix scan returns 1M entries. This scan is never performed during a ranking query -- it is only needed for: 1. Follower count display (use a materialized counter, not a scan) 2. Notification fan-out (background job with rate limiting) 3. "Creators followed by people who follow X" (bounded by fan-out control) **Materialized follower count:** Maintain an atomic counter per creator that increments on follow, decrements on unfollow. Never scan the reverse index for counting. --- ## 12. Integration Points ### Query Engine Integration ``` Query Execution Pipeline: 1. Parse query 2. Load user relationship state <-- RELATIONSHIPS - blocked set -> Roaring bitmap - muted set -> Roaring bitmap - follows set -> Roaring bitmap (if Following feed) 3. Generate candidates - Candidate::Relationship <-- RELATIONSHIPS - Candidate::SocialGraph <-- RELATIONSHIPS - Candidate::Ann (vector search) - Candidate::Scan (full scan) 4. Pre-filter candidates - Remove blocked entities <-- RELATIONSHIPS - Remove muted entities (if algorithmic feed) <-- RELATIONSHIPS - Apply unseen filter <-- RELATIONSHIPS (engagement_affinity existence) 5. Score candidates - Signal-based scoring <-- SIGNALS - Relationship boost <-- RELATIONSHIPS (interaction_weight lookup) - Social proof boost <-- RELATIONSHIPS (pre-computed map) 6. Diversity pass 7. Return results ``` ### Signal System Integration ``` Signal Write Transaction: 1. Append signal event to WAL 2. Update item signal ledger (windowed aggregates, velocity, decay) 3. Update user preference vector 4. Update user -> creator interaction_weight <-- RELATIONSHIPS 5. Update user -> item engagement_affinity <-- RELATIONSHIPS 6. Commit All steps are part of the same atomic transaction. Signal writes are the primary source of implicit relationship updates. ``` ### Feedback Loop Integration ``` Engagement Feedback Loop: User sees item in feed | v User engages (view, like, skip, hide, block) | v db.signal(Signal { kind, item, user, ... }) | +-- Update item signal ledger +-- Update user preference vector +-- Update interaction_weight (user -> creator) <-- RELATIONSHIPS +-- Update engagement_affinity (user -> item) <-- RELATIONSHIPS +-- If block: cascade relationships <-- RELATIONSHIPS | v Next query reflects all updates (within 100ms) ``` ### Ranking Profile Integration Relationships appear in ranking profiles in three positions: ```rust ProfileDef { // 1. Candidate generation candidate: Candidate::Relationship { edge: "follows" }, // 2. Scoring boosts boosts: vec![ Boost::relationship("interaction_weight", 0.2), Boost::social_proof(0.15), ], // 3. Exclusion predicates excludes: vec![ Exclude::relationship("blocked"), Exclude::relationship("muted"), // for algorithmic feeds ], } ``` --- ## 13. Performance Targets | Operation | Target | Method | |-----------|--------|--------| | Load user blocked set | < 100 microseconds | Prefix scan on `REL:blocked:`, build Roaring bitmap | | Load user follows set | < 500 microseconds | Prefix scan on `REL:follows:`, build Roaring bitmap | | Load top-50 interaction weights | < 200 microseconds | Weight-sorted index scan, early termination at 50 | | Single interaction_weight lookup | < 5 microseconds | Point key lookup | | Social graph BFS depth 1 | < 2 ms | Prefix scan + fan-out limit 100 | | Social graph BFS depth 2 | < 10 ms | Two rounds of prefix scan + fan-out limit 100 per hop | | Social proof map construction | < 10 ms | BFS + engagement_affinity scan for graph users | | Write explicit relationship | < 50 microseconds | Forward + reverse index write, WAL append | | Update implicit relationship (within signal write) | < 10 microseconds | Point read + point write (amortized with signal transaction) | | Similarity edge lookup for item | < 100 microseconds | Prefix scan on `REL:similarity:`, top-50 | | Block cascade | < 5 ms | Follows delete + interaction_weight zero + engagement_affinity scan and zero | ### Benchmark Criteria These targets must be validated with `criterion` benchmarks from the first implementation: ```rust // benchmarks/relationships.rs // Relationship read benchmarks bench_load_blocked_set(100_blocked_creators) // target: < 100 us bench_load_follows_set(500_follows) // target: < 500 us bench_top_k_interaction_weights(50_from_300) // target: < 200 us bench_single_weight_lookup() // target: < 5 us bench_social_graph_bfs_depth_1(fan_out_100) // target: < 2 ms bench_social_graph_bfs_depth_2(fan_out_100) // target: < 10 ms // Relationship write benchmarks bench_write_explicit_relationship() // target: < 50 us bench_update_interaction_weight() // target: < 10 us bench_block_cascade(20_engaged_items) // target: < 5 ms ``` --- ## Appendix A: Relationship Trait Interface The relationship subsystem exposes a trait that the query engine and signal system consume. No storage engine types leak across module boundaries. ```rust pub trait RelationshipStore: Send + Sync { /// Write an explicit relationship edge. fn write_edge(&self, edge: &RelationshipEdge) -> Result<()>; /// Delete an explicit relationship edge. fn delete_edge( &self, kind: RelationshipKind, from: EntityId, to: EntityId, ) -> Result<()>; /// Read a single edge weight. Returns None if no edge exists. fn get_weight( &self, kind: RelationshipKind, from: EntityId, to: EntityId, ) -> Result>; /// Load all edges of a given kind from an entity. /// Returns edges sorted by weight descending. fn scan_forward( &self, from: EntityId, kind: RelationshipKind, limit: Option, ) -> Result>; /// Load all edges of a given kind pointing to an entity. fn scan_reverse( &self, to: EntityId, kind: RelationshipKind, limit: Option, ) -> Result>; /// Load the set of entity IDs with a given edge kind from an entity. /// Returns as a Roaring bitmap for efficient set operations. fn load_edge_set( &self, from: EntityId, kind: RelationshipKind, ) -> Result; /// Update an implicit relationship weight atomically. /// Applies decay, adds delta, clamps to [0.0, 1.0]. fn update_weight( &self, kind: RelationshipKind, from: EntityId, to: EntityId, delta: f64, timestamp: Timestamp, ) -> Result; // returns new weight /// Depth-limited BFS traversal. fn traverse_graph( &self, start: EntityId, edge_kind: RelationshipKind, max_depth: u8, max_fan_out: u32, min_weight: f64, ) -> Result>; } ``` --- ## Appendix B: Property Test Invariants The following properties must hold under all inputs (validated with `proptest`): 1. **Blocked exclusion is absolute.** If a blocked edge exists from user U to creator C, no item by C ever appears in any query result for U. No exceptions. 2. **Weight bounds.** All implicit relationship weights are in [0.0, 1.0] after every update, regardless of signal sequence. 3. **Decay monotonicity.** Without new signals, interaction_weight and engagement_affinity monotonically decrease over time. 4. **Symmetric similarity.** For similarity edges, `weight(A, B) == weight(B, A)` always. 5. **Block cascade completeness.** After blocking creator C, `interaction_weight(U, C) == 0.0` and `engagement_affinity(U, I) == 0.0` for all items I by C. 6. **Unblock does not restore.** After unblocking, no follows, interaction_weight, or engagement_affinity edges are restored. They must be rebuilt organically. 7. **Idempotent explicit writes.** Writing the same explicit relationship twice produces the same state as writing it once (timestamp may differ). 8. **Forward-reverse consistency.** For every forward edge `(A, kind, B)`, a corresponding reverse entry `(B, kind, A)` exists in the reverse index. No orphaned forward or reverse entries. 9. **WAL replay produces identical state.** Replaying the relationship WAL from empty storage produces the same forward index, reverse index, and weight-sorted index as uninterrupted execution. 10. **Fan-out bounds are respected.** `traverse_graph` with `max_fan_out=N` never reads more than N edges per node per hop, regardless of actual edge count.