jordan 413b712c0a chore: initialize tidalDB repository with schema foundation and standards

- Schema phase 1 (tasks 01-02): EntityId, EntityKind, Timestamp, Score, SignalTypeDef, DecayModel, Window, WindowSet — all with property tests and benchmarks scaffolding
- Stub modules for storage, signals, query, ranking
- Full documentation suite: VISION, USE_CASES, SEQUENCE, API, CODING_GUIDELINES, ai-lookup, research docs, specs, roadmap, planning docs
- Marketing site (Next.js) with blog infrastructure
- .claude/ agents and skills for the tidalDB development workflow
- Foundation standards enforced: thiserror + tracing declared as dependencies, clippy::unwrap_used = deny added to lint config
- .gitignore hardened: .next/, node_modules/, .env, secrets, logs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 12:52:20 -07:00

41 KiB

Raw Blame History

Specification: Relationships

Relationships are first-class edges between entities in tidalDB. They model the social graph, content interactions, and behavioral affinity that power personalized ranking, content filtering, and candidate generation.

This specification covers edge types, storage format, graph traversal, weight update mechanics, collaborative filtering, and integration points with the query engine, signal system, and ranking pipeline.

1. Design Principles
2. Edge Type Reference
3. Edge Properties
4. Directionality
5. Storage Format
6. Social Graph Queries
7. Relationship-Based Candidate Generation
8. Weight Update Mechanics
9. Collaborative Filtering Edges
10. Relationship Lifecycle
11. Scale Considerations
12. Integration Points
13. Performance Targets

1. Design Principles

Relationships are not metadata. A "follows" edge is not a row in a junction table. It is a live, weighted, directional edge that participates in ranking queries, drives candidate generation, and updates atomically with signal events.

Implicit relationships are database-managed. The application writes explicit relationships (follows, blocks). The database derives implicit relationships (interaction_weight, engagement_affinity, similarity) from signal events. The application never computes these.

Traversal is a query primitive. "Content from creators I follow" and "content engaged by people I follow" are not application-level loops over API calls. They are query-level operations that the database executes with fan-out control and weight filtering.

Negative relationships are hard constraints. A blocked edge is not a negative weight in a scoring function. It is a permanent exclusion predicate evaluated before any scoring occurs. Blocked content never enters the candidate set.

2. Edge Type Reference

Explicit Relationships (Application-Written)

These are created and deleted by the application via db.write_relationship() and db.delete_relationship().

Kind	From	To	Weight	Decay	Semantics
`follows`	User	Creator	1.0 (binary)	Permanent	Subscription. Powers Following feed candidate set.
`blocked`	User	Creator	1.0 (binary)	Permanent	Hard exclusion. All content by this creator is excluded from every query for this user.
`blocked`	User	Item	1.0 (binary)	Permanent	Hard exclusion. This item is excluded from every query for this user.
`muted`	User	Creator	1.0 (binary)	Permanent	Soft exclusion. Excluded from algorithmic feeds and notifications. Visible in explicit search and Following feed if also followed.
`saved`	User	Item	1.0 (binary)	Permanent	Bookmark. Powers `Filter::user_state("saved")` and `Sort::DateSaved`.
`subscribed`	User	Collection	1.0 (binary)	Permanent	Collection subscription. User receives updates when items are added.
`member_of`	Creator	Community	1.0 (binary)	Permanent	Community membership. Powers community-scoped queries.

Implicit Relationships (Database-Computed)

These are created and updated by the database as a side-effect of signal writes. The application never writes these directly.

Kind	From	To	Weight	Decay	Semantics
`interaction_weight`	User	Creator	0.0 - 1.0	Slow (30d half-life)	Cumulative engagement strength. Updated on every signal involving this user and any item by this creator. Powers notification prioritization, social proof, and personalized ranking boosts.
`engagement_affinity`	User	Item	0.0 - 1.0	Medium (7d half-life)	Per-item engagement depth. Computed from signal history (view, like, completion, dwell_time). Powers "continue watching," user library sorting, and user state filters.
`similarity`	Item	Item	0.0 - 1.0	Recomputed periodically	Co-engagement similarity. "Users who engaged with A also engaged with B." Powers related content and up-next queries.
`creator_similarity`	Creator	Creator	0.0 - 1.0	Recomputed periodically	Catalog embedding similarity between creators. Powers "creators like X" discovery (UC-10).

3. Edge Properties

Every relationship edge carries the following properties:

pub struct RelationshipEdge {
    /// The relationship type.
    pub kind: RelationshipKind,

    /// Source entity.
    pub from_type: EntityKind,
    pub from_id: EntityId,

    /// Target entity.
    pub to_type: EntityKind,
    pub to_id: EntityId,

    /// Edge weight.
    /// 1.0 for binary relationships (follows, blocked, saved).
    /// 0.0-1.0 for weighted relationships (interaction_weight, similarity).
    pub weight: f64,

    /// When the edge was created or last updated.
    pub timestamp: Timestamp,

    /// Optional metadata. Used sparingly.
    /// Examples: notification preferences on follows, save context.
    pub metadata: Option<Vec<u8>>,
}

RelationshipKind

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub enum RelationshipKind {
    // Explicit
    Follows,
    Blocked,
    Muted,
    Saved,
    Subscribed,
    MemberOf,

    // Implicit
    InteractionWeight,
    EngagementAffinity,
    Similarity,
    CreatorSimilarity,
}

Weight Semantics

Weight Value	Meaning
0.0	No relationship or fully decayed
0.0 - 0.3	Weak affinity (occasional engagement)
0.3 - 0.7	Moderate affinity (regular engagement)
0.7 - 1.0	Strong affinity (frequent, deep engagement)
1.0	Binary positive (follows, saved) or maximum affinity

Weight is always in the closed interval [0.0, 1.0]. The system clamps after every update.

4. Directionality

Unidirectional

A follows B does not imply B follows A. Storage and traversal treat each direction independently.

Types: follows, blocked, muted, saved, subscribed, member_of, interaction_weight, engagement_affinity

Storage: Only the forward edge (from -> to) is stored. Reverse lookups use the reverse index.

Bidirectional (Symmetric)

A is similar to B implies B is similar to A with the same weight.

Types: similarity, creator_similarity

Storage: Store only one edge per pair, using canonical ordering (lower entity ID first). The reverse index provides lookups from either direction. This halves storage for symmetric relationships.

Asymmetric

A's interaction_weight toward B can differ from B's interaction_weight toward A. A fan's weight toward a creator is high; the creator's weight toward that individual fan may be zero.

Types: interaction_weight

Storage: Each direction is a separate edge. Forward and reverse indexes reflect the actual asymmetric weights.

Directionality Impact on Storage

Unidirectional (follows, blocked, muted, saved):
    Forward:  user_123 -> creator_xyz  (stored)
    Reverse:  creator_xyz <- user_123  (indexed for reverse lookup)

    user_123 -> creator_abc  (separate edge, independent)

Bidirectional (similarity):
    Canonical: item_aaa -> item_bbb  (stored, aaa < bbb lexicographically)
    Lookup:    item_bbb -> item_aaa  (served from reverse index, same weight)

Asymmetric (interaction_weight):
    Forward:  user_123 -> creator_xyz  weight=0.85 (stored)
    Forward:  creator_xyz -> user_123  weight=0.02 (separate edge, different weight)

5. Storage Format

Dual-Index Architecture

Relationships use a dual-index storage layout: a forward index for outbound edge traversal and a reverse index for inbound edge lookups.

Forward Index: "Who does this entity relate to?"
    Key:   [from_id: u64 BE][0x00][REL:{kind}:{to_id: u64 BE}]
    Value: [weight: f64][timestamp: u64 BE][metadata_len: u16][metadata: var]

Reverse Index: "Who relates to this entity?"
    Key:   [to_id: u64 BE][0x00][RREL:{kind}:{from_id: u64 BE}]
    Value: [weight: f64][timestamp: u64 BE]

This follows the subject-prefix key encoding pattern established in CODING_GUIDELINES.md Section 2. All relationships for a single entity are co-located under its entity ID prefix, enabling efficient prefix scans.

Key Encoding Detail

Example: user_123 follows creator_xyz

Forward key:
    [00 00 00 00 00 00 00 7B]  -- user_123 as u64 BE
    [00]                        -- separator
    [52 45 4C 3A]               -- "REL:" tag
    [66 6F 6C 6C 6F 77 73]     -- "follows"
    [3A]                        -- ":"
    [00 00 00 00 00 00 01 86]   -- creator_xyz as u64 BE

Reverse key:
    [00 00 00 00 00 00 01 86]  -- creator_xyz as u64 BE
    [00]                        -- separator
    [52 52 45 4C 3A]            -- "RREL:" tag
    [66 6F 6C 6C 6F 77 73]     -- "follows"
    [3A]                        -- ":"
    [00 00 00 00 00 00 00 7B]  -- user_123 as u64 BE

Prefix Scan Patterns

Query	Prefix
All relationships from user_123	`[user_123][0x00]REL:`
All "follows" edges from user_123	`[user_123][0x00]REL:follows:`
All followers of creator_xyz	`[creator_xyz][0x00]RREL:follows:`
All blocked creators for user_123	`[user_123][0x00]REL:blocked:`
All users who blocked creator_bad	`[creator_bad][0x00]RREL:blocked:`

Sorted Storage for Top-K

For weighted relationship types (interaction_weight, similarity, creator_similarity), an additional weight-sorted index enables efficient top-K retrieval without scanning all edges:

Weight-Sorted Index:
    Key:   [from_id: u64 BE][0x00][RELW:{kind}:{inverted_weight: u64 BE}:{to_id: u64 BE}]
    Value: (empty -- weight is encoded in key)

The weight is stored as u64::MAX - weight.to_bits() so that byte-lexicographic ordering yields descending weight order. A prefix scan on [from_id][0x00]RELW:interaction_weight: returns edges sorted by weight, highest first, enabling early termination for top-K queries.

Storage Namespace

Relationships are stored in a dedicated storage namespace (column family / keyspace), separate from entity metadata and signal ledgers. This ensures that relationship-heavy operations (social graph traversal, fan-out queries) do not contend with signal writes or metadata reads.

Namespace: "relationships"
    Forward index:       REL:{kind}:{to_id}
    Reverse index:       RREL:{kind}:{from_id}
    Weight-sorted index: RELW:{kind}:{inverted_weight}:{to_id}

Storage Layout Diagram

Entity Storage (separate namespace)
+------------------------------------------+
| [entity_id][0x00]META  -> metadata blob  |
| [entity_id][0x00]EMB   -> embedding vec  |
+------------------------------------------+

Signal Storage (separate namespace)
+------------------------------------------+
| [entity_id][0x00]SIG:view:24h -> agg     |
| [entity_id][0x00]SIG:like:7d  -> agg     |
+------------------------------------------+

Relationship Storage (this spec)
+------------------------------------------+
| Forward edges:                            |
| [user_A][0x00]REL:follows:creator_X       |
| [user_A][0x00]REL:follows:creator_Y       |
| [user_A][0x00]REL:blocked:creator_Z       |
| [user_A][0x00]REL:interaction_weight:cX   |
|                                           |
| Reverse edges:                            |
| [creator_X][0x00]RREL:follows:user_A      |
| [creator_X][0x00]RREL:follows:user_B      |
| [creator_X][0x00]RREL:follows:user_C      |
|                                           |
| Weight-sorted edges:                      |
| [user_A][0x00]RELW:interaction_weight:... |
+------------------------------------------+

Depth-Limited BFS Traversal

Social graph queries retrieve entity IDs reachable within a bounded number of hops. The result is a set of entity IDs used as a candidate filter or boost source in ranking queries.

Algorithm: depth_limited_bfs(start, edge_kind, max_depth, max_fan_out, min_weight)

Input:
    start:        EntityId          -- the user whose graph we traverse
    edge_kind:    RelationshipKind  -- which edges to follow (e.g., follows)
    max_depth:    u8                -- maximum hops (1 or 2 in practice)
    max_fan_out:  u32               -- max edges to traverse per node per hop
    min_weight:   f64               -- skip edges below this weight

Output:
    Set<EntityId>  -- all entities reachable within constraints

Traversal Pseudocode

fn depth_limited_bfs(
    start: EntityId,
    edge_kind: RelationshipKind,
    max_depth: u8,
    max_fan_out: u32,
    min_weight: f64,
) -> HashSet<EntityId> {
    let mut visited: HashSet<EntityId> = HashSet::new();
    let mut current_frontier: Vec<EntityId> = vec![start];
    let mut result: HashSet<EntityId> = HashSet::new();

    for depth in 0..max_depth {
        let mut next_frontier: Vec<EntityId> = Vec::new();

        for entity_id in &current_frontier {
            if !visited.insert(*entity_id) {
                continue;  // already visited
            }

            // Scan forward edges, weight-sorted, up to max_fan_out
            let edges = scan_forward_edges_by_weight(
                *entity_id,
                edge_kind,
                min_weight,
                max_fan_out,
            );

            for edge in edges {
                result.insert(edge.to_id);
                next_frontier.push(edge.to_id);
            }
        }

        current_frontier = next_frontier;
    }

    result
}

Depth 1: Direct Graph

"Content from creators I follow" -- one hop from user to creators, then items by those creators.

User @u123 --follows--> Creator A
                         Creator B
                         Creator C

Result: {Creator A, Creator B, Creator C}
Candidate set: items where creator_id IN result

"Content engaged by people my follows follow" -- two hops through the social graph.

User @u123 --follows--> Creator A --engaged_by--> User X --follows--> Creator D
                         Creator B                  User Y               Creator E
                         Creator C

Hop 1: {Creator A, Creator B, Creator C}
Hop 2: Traverse items engaged by followers of Creator A/B/C,
        collect their creators or items
Result: Items engaged by users who also follow @u123's followed creators

Fan-Out Control

Without fan-out limits, a depth-2 traversal through a creator with 1M followers is catastrophic. Fan-out control bounds the work at each hop.

Parameter	Default	Purpose
`max_fan_out`	100 per hop	Caps edges traversed per node. Top-K by weight, so highest-affinity edges are always included.
`min_weight`	0.0	Filters low-affinity edges. Setting to 0.1 skips decayed interaction weights.
`max_depth`	2	Hard cap on traversal depth. Depth 3+ is computationally expensive and produces noisy results.

Weight-Filtered Traversal

For interaction_weight edges, the weight-sorted index enables efficient traversal of only high-affinity connections:

// "Content from creators I interact with most"
// Only follows where interaction_weight > 0.3
depth_limited_bfs(
    start: user_123,
    edge_kind: Follows,     // traversal edge
    max_depth: 1,
    max_fan_out: 50,
    min_weight: 0.3,        // interaction_weight threshold
)

This requires a join between the follows edge set and the interaction_weight edge set for the same user. The implementation reads the follows forward index and lookups each target's interaction_weight to apply the threshold. For users with fewer than ~500 follows, this is efficient. For users with thousands of follows, the weight-sorted index on interaction_weight is used as the primary scan, filtered against the follows set.

7. Relationship-Based Candidate Generation

Candidate Sources

Ranking profiles reference relationships for candidate generation and scoring:

// Following feed: candidates are items from followed creators
Candidate::Relationship { edge: "follows" }
// Traverses user -> follows -> creator, then retrieves items by those creators

// Social graph scoped: candidates engaged by extended graph
Candidate::SocialGraph { depth: 2, edge: "follows", min_weight: 0.1 }
// BFS through social graph, collects item IDs engaged by discovered users

Filter Integration

Relationships power several query filters defined in API.md:

Filter	Relationship Used	Behavior
`Filter::relationship("follows")`	`follows` edge	Items from followed creators only
`Filter::social_graph(user_id, Depth::Two)`	`follows` + `interaction_weight`	Items engaged by extended social graph
`Filter::not_blocked()`	`blocked` edge	Exclude items/creators with blocked edges
`Filter::unseen()`	`engagement_affinity` edge (existence check)	Exclude items with any engagement_affinity edge
`Filter::user_state("saved")`	`saved` edge	Include only saved items
`Filter::user_state("liked")`	Signal history (not a relationship)	--
`Filter::creator_followed_by_user()`	`follows` edge	Items from followed creators
`Filter::creator_new_to_user()`	No `interaction_weight` or `engagement_affinity` edges	Creators the user has never engaged with

Exclude Predicates in Ranking Profiles

// Ranking profile exclusions evaluated before scoring begins
Exclude::relationship("blocked")
// Loads user's blocked edge set -> Roaring bitmap of excluded entity IDs
// Applied as a pre-filter on the candidate set
// Cost: one prefix scan on [user_id][0x00]REL:blocked: at query start

Boost Predicates in Ranking Profiles

// Ranking profile boosts applied during scoring
Boost::relationship("interaction_weight", 0.2)
// For each candidate item:
//   1. Lookup item's creator_id
//   2. Lookup user -> creator interaction_weight edge
//   3. Multiply weight * boost_factor and add to score
// Cost: one key lookup per candidate (amortized by creator dedup)

Boost::social_proof(0.15)
// For each candidate item:
//   1. Reverse-lookup: which users in the social graph engaged with this item?
//   2. Count or weight-sum those engagements
//   3. Multiply by boost_factor and add to score
// Cost: one reverse scan per candidate (expensive -- cache at query start)

Social proof requires knowing which items were engaged by users in the querying user's social graph. This is expensive to compute per-candidate. The implementation strategy:

At query start, compute the social graph set: graph_users = bfs(user, follows, depth=2, fan_out=100)
For each user in graph_users, load their recent engagement_affinity edges: engaged_items = scan(graph_user, engagement_affinity, limit=50)
Build a HashMap: social_proof_map: HashMap<EntityId, f64> mapping item IDs to aggregate social proof scores
During candidate scoring, lookup social_proof_map.get(candidate_id) -- O(1) per candidate

This pre-computation runs once per query and is bounded by depth * fan_out * engagement_limit = 2 * 100 * 50 = 10,000 edge reads. At ~1 microsecond per edge read, this is ~10ms -- within budget if cached across pagination requests.

8. Weight Update Mechanics

Signal-Driven Weight Updates

When a signal event is written, the database atomically updates implicit relationship weights as part of the same transaction. These updates are not configurable by the application -- they are hardcoded behavior of each signal type.

Interaction Weight Update

interaction_weight (user -> creator) is the primary implicit relationship. It captures how much a user engages with a specific creator's content.

Update formula:

On signal event for item I by creator C, from user U:

    current = load_edge(U, interaction_weight, C)

    // Apply decay since last update
    dt = now - current.timestamp
    decayed = current.weight * exp(-lambda_iw * dt)

    // Apply signal delta
    delta = signal_weight_map[signal_kind]
    new_weight = clamp(decayed + delta, 0.0, 1.0)

    // Store
    store_edge(U, interaction_weight, C, new_weight, now)

Signal weight map (delta per signal type):

Signal	Delta	Rationale
`view`	+0.01	Weak positive. Viewing is passive.
`completion`	+0.03 * completion_ratio	Moderate positive, scaled by how much was consumed.
`like`	+0.05	Strong positive. Explicit approval.
`share`	+0.07	Very strong positive. Social endorsement.
`comment`	+0.04	Strong positive. Active engagement.
`save`	+0.03	Moderate positive. Intent to return.
`skip`	-0.02	Weak negative. Single skip is noisy.
`hide`	-0.10	Strong negative. Explicit rejection.
`not_interested`	-0.08	Strong negative. Topic-level rejection.
`block`	-> 0.0	Zeroes weight. Also creates blocked edge.

Decay rate: lambda_iw = ln(2) / (30 * 24 * 3600) -- 30-day half-life. Without reinforcing signals, interaction_weight decays to half its value in 30 days.

Bounds: Weight is clamped to [0.0, 1.0] after every update. A weight that decays below a threshold (0.001) is pruned from storage to bound edge count.

Engagement Affinity Update

engagement_affinity (user -> item) tracks per-item engagement depth.

Update formula:

On signal event for item I from user U:

    current = load_edge(U, engagement_affinity, I)

    // If no edge exists, create with initial weight
    if current is None:
        weight = signal_affinity_map[signal_kind]
        store_edge(U, engagement_affinity, I, weight, now)
        return

    // Existing edge: decay + increment
    dt = now - current.timestamp
    decayed = current.weight * exp(-lambda_ea * dt)
    delta = signal_affinity_map[signal_kind]
    new_weight = clamp(decayed + delta, 0.0, 1.0)

    store_edge(U, engagement_affinity, I, new_weight, now)

Signal affinity map:

Signal	Delta
`view`	+0.10
`completion`	+0.30 * completion_ratio
`like`	+0.25
`share`	+0.20
`save`	+0.15
`skip`	-0.15
`hide`	-> sets to 0.0, creates permanent exclusion

Decay rate: lambda_ea = ln(2) / (7 * 24 * 3600) -- 7-day half-life. Item-level affinity decays faster than creator-level, reflecting the transient nature of individual content interest.

Block Cascade

When a user blocks a creator, a cascade of relationship updates occurs atomically:

On block(user U, creator C):

    1. Create edge: (U, blocked, C, weight=1.0, now)
    2. Delete edge: (U, follows, C)         -- if exists
    3. Set edge:   (U, interaction_weight, C, weight=0.0, now)
    4. For each item I by creator C where edge(U, engagement_affinity, I) exists:
       Set edge: (U, engagement_affinity, I, weight=0.0, now)
       // Note: do NOT delete -- the zero-weight edge serves as a
       // permanent exclusion marker for unseen filters

The cascade is bounded by the number of items the user has engaged with from this creator, which is typically small (tens, not thousands). For the rare case of blocking a creator after extensive engagement, the cascade is still O(items_engaged), not O(creator_catalog_size).

Mute Behavior

Muting is less aggressive than blocking:

On mute(user U, creator C):

    1. Create edge: (U, muted, C, weight=1.0, now)
    // No cascade. No weight changes. No follows removal.
    // The muted edge is checked during candidate filtering:
    //   - Excluded from algorithmic feeds (for_you, trending, browse)
    //   - Excluded from notifications
    //   - Included in Following feed if the user also follows this creator
    //   - Included in explicit search results

9. Collaborative Filtering Edges

Item-Item Similarity

similarity (item -> item) captures co-engagement patterns: "users who engaged with A also engaged with B."

This is not computed in real-time. It is a background job that periodically recomputes similarity edges.

Computation

Algorithm: compute_item_similarity()

For each item A with sufficient engagement (min 50 unique engagers):
    engagers_A = set of users with engagement_affinity edge to A

    For each item B where |engagers_A intersection engagers_B| >= min_co_engagement:
        jaccard = |engagers_A & engagers_B| / |engagers_A | engagers_B|

        // Weight by engagement depth: users who completed both
        // contribute more than users who merely viewed both
        weighted_sim = sum(
            min(affinity(u, A), affinity(u, B))
            for u in engagers_A & engagers_B
        ) / max(|engagers_A|, |engagers_B|)

        similarity = 0.5 * jaccard + 0.5 * weighted_sim

        if similarity > min_threshold:
            store_edge(A, similarity, B, similarity, now)

Top-K Storage

The full N x N similarity matrix is intractable. For 10M items, even 0.1% density means 10 billion edges. Instead, store only the top-K most similar items per item:

Parameter	Value	Rationale
K (max similar items per item)	50	Sufficient for related/up-next queries. Beyond 50, similarity scores are noise.
min_threshold	0.05	Below this, co-engagement is likely coincidental.
min_co_engagement	5 users	Below this, sample size is too small for meaningful similarity.
min_engagers	50	Items with fewer engagers lack signal for collaborative filtering. Cold-start items use embedding similarity instead.

Update Frequency

Update Type	Frequency	Scope
Full recomputation	Weekly	All items with sufficient engagement
Incremental update	Hourly	Items whose engagement count changed by >10% since last computation
Hot item update	Every 15 minutes	Items in the top 1% by engagement velocity

Use in Ranking

The related profile uses similarity edges for candidate generation:

// In the related/up-next profile:
// Given anchor item A, retrieve candidates via:
// 1. ANN: items near A's embedding (semantic similarity)
// 2. Collaborative: items with similarity edges from A (co-engagement)
// Merge both candidate sets, deduplicate, then score

Candidate::Hybrid {
    ann: AnnSource { anchor: item_A, top_k: 200 },
    collaborative: CollaborativeSource { anchor: item_A, top_k: 100 },
    merge: Merge::Union,
}

Creator-Creator Similarity

creator_similarity (creator -> creator) is simpler: it uses embedding distance between creator vectors, not co-engagement patterns.

For each creator A:
    top_k_similar = ann_query(A.embedding, top_k=20, entity_type=Creator)
    for (creator_B, distance) in top_k_similar:
        similarity = 1.0 - distance  // assuming normalized L2
        store_edge(A, creator_similarity, B, similarity, now)

Update frequency: whenever a creator's embedding is recomputed (typically when their catalog changes significantly).

10. Relationship Lifecycle

Create

Explicit relationships:

db.write_relationship(Relationship {
    kind: "follows",
    from: ("user", "user_123"),
    to: ("creator", "creator_xyz"),
    weight: 1.0,
    timestamp: Utc::now(),
})?;

On commit:

Write forward edge: [user_123][0x00]REL:follows:creator_xyz
Write reverse edge: [creator_xyz][0x00]RREL:follows:user_123
Append to WAL for durability
If follows: initialize interaction_weight edge if none exists (weight = 0.1, giving the new follow a small initial boost)

Implicit relationships:

Created automatically on first signal event involving the user-entity pair. No API call needed.

Update

Explicit relationships: Idempotent write. Writing a follows edge that already exists updates the timestamp. Weight changes on explicit relationships are not supported -- they are binary.

Implicit relationships: Updated atomically as part of signal processing. See Section 8 for formulas.

Delete

Explicit relationships:

db.delete_relationship(RelationshipDelete {
    kind: "follows",
    from: ("user", "user_123"),
    to: ("creator", "creator_xyz"),
})?;

On commit:

Delete forward edge
Delete reverse edge
Delete weight-sorted entry (if applicable)
Append tombstone to WAL
If unfollowing: decay interaction_weight by 50% immediately (the user is actively disengaging)

Implicit relationships: Never explicitly deleted by the application. They decay to zero over time (pruned when weight < 0.001) or are zeroed by cascade operations (block).

Cascade on Block

See Section 8 "Block Cascade" for the full cascade behavior. Summary:

block(user, creator):
    + create blocked edge
    - delete follows edge
    - zero interaction_weight
    - zero all engagement_affinity edges to creator's items

Cascade on Unblock

unblock(user, creator):
    - delete blocked edge
    // Does NOT restore follows, interaction_weight, or engagement_affinity.
    // The user must explicitly re-follow.
    // interaction_weight starts from zero and rebuilds organically.

11. Scale Considerations

Power-Law Distribution

Social graphs follow a power-law distribution. Most users follow 10-100 creators. Some follow 10,000+. Most creators have 100-10,000 followers. Some have millions.

Metric	Median	p99	Max
Follows per user	50	2,000	50,000
Followers per creator	500	100,000	10,000,000
Interaction_weight edges per user	30	500	5,000
Engagement_affinity edges per user	200	5,000	50,000
Similarity edges per item	20	50	50 (capped)

Storage Budget

For 1M users and 100K creators:

Relationship Type	Edge Count (est.)	Bytes per Edge	Total Storage
follows	50M (50 avg/user)	40 bytes (fwd + rev)	2.0 GB
blocked	2M (2 avg/user)	40 bytes	80 MB
muted	500K	40 bytes	20 MB
saved	20M (20 avg/user)	40 bytes	800 MB
interaction_weight	30M (30 active creators/user)	56 bytes (fwd + rev + weight-sorted)	1.7 GB
engagement_affinity	200M (200 items/user)	40 bytes	8.0 GB
similarity	50M (50/item, 1M items)	24 bytes (canonical + rev)	1.2 GB
creator_similarity	2M (20/creator, 100K creators)	24 bytes	48 MB
Total			~14 GB

Hot Relationship State

For sub-millisecond ranking query performance, frequently accessed relationship data must be cached in memory:

Must be in memory:

Blocked edges for the querying user (loaded once at query start, used as exclusion bitmap)
Muted edges for the querying user (loaded once at query start)
Follows edges for the querying user (loaded for Following feed candidate generation)

Should be in memory (LRU cache):

Interaction_weight edges for the querying user (top-K by weight, for ranking boosts)
Social proof map (computed per query, cached across pagination)

Disk-resident (acceptable latency):

Similarity edges (read during related/up-next queries, not on hot path for feed queries)
Engagement_affinity edges (read for user state filters, indexed for fast existence checks)
Reverse indexes (read for follower count, notification queries)

Memory Budget for Relationship Cache

Component	Size	Notes
Per-user blocked set	~64 bytes + 8 bytes per blocked entity	Roaring bitmap. Median: ~80 bytes.
Per-user muted set	~64 bytes + 8 bytes per muted entity	Roaring bitmap. Median: ~72 bytes.
Per-user follows set	~64 bytes + 8 bytes per follow	Roaring bitmap. Median: ~464 bytes.
Per-user top-50 interaction weights	50 * 16 bytes = 800 bytes	(entity_id, weight) pairs.
Social proof map (per query)	~80 KB for 10K entries	HashMap<EntityId, f64>. Ephemeral.

For 10K concurrent users with cached relationship state: ~15 MB. Well within the memory budget.

Fan-Out for Popular Creators

A creator with 1M followers means the reverse index [creator_id][0x00]RREL:follows: prefix scan returns 1M entries. This scan is never performed during a ranking query -- it is only needed for:

Follower count display (use a materialized counter, not a scan)
Notification fan-out (background job with rate limiting)
"Creators followed by people who follow X" (bounded by fan-out control)

Materialized follower count: Maintain an atomic counter per creator that increments on follow, decrements on unfollow. Never scan the reverse index for counting.

12. Integration Points

Query Engine Integration

Query Execution Pipeline:

    1. Parse query
    2. Load user relationship state        <-- RELATIONSHIPS
       - blocked set -> Roaring bitmap
       - muted set -> Roaring bitmap
       - follows set -> Roaring bitmap (if Following feed)
    3. Generate candidates
       - Candidate::Relationship           <-- RELATIONSHIPS
       - Candidate::SocialGraph            <-- RELATIONSHIPS
       - Candidate::Ann (vector search)
       - Candidate::Scan (full scan)
    4. Pre-filter candidates
       - Remove blocked entities           <-- RELATIONSHIPS
       - Remove muted entities (if algorithmic feed) <-- RELATIONSHIPS
       - Apply unseen filter               <-- RELATIONSHIPS (engagement_affinity existence)
    5. Score candidates
       - Signal-based scoring              <-- SIGNALS
       - Relationship boost                <-- RELATIONSHIPS (interaction_weight lookup)
       - Social proof boost                <-- RELATIONSHIPS (pre-computed map)
    6. Diversity pass
    7. Return results

Signal System Integration

Signal Write Transaction:

    1. Append signal event to WAL
    2. Update item signal ledger (windowed aggregates, velocity, decay)
    3. Update user preference vector
    4. Update user -> creator interaction_weight    <-- RELATIONSHIPS
    5. Update user -> item engagement_affinity      <-- RELATIONSHIPS
    6. Commit

All steps are part of the same atomic transaction.
Signal writes are the primary source of implicit relationship updates.

Feedback Loop Integration

Engagement Feedback Loop:

    User sees item in feed
        |
        v
    User engages (view, like, skip, hide, block)
        |
        v
    db.signal(Signal { kind, item, user, ... })
        |
        +-- Update item signal ledger
        +-- Update user preference vector
        +-- Update interaction_weight (user -> creator)     <-- RELATIONSHIPS
        +-- Update engagement_affinity (user -> item)       <-- RELATIONSHIPS
        +-- If block: cascade relationships                 <-- RELATIONSHIPS
        |
        v
    Next query reflects all updates (within 100ms)

Ranking Profile Integration

Relationships appear in ranking profiles in three positions:

ProfileDef {
    // 1. Candidate generation
    candidate: Candidate::Relationship { edge: "follows" },

    // 2. Scoring boosts
    boosts: vec![
        Boost::relationship("interaction_weight", 0.2),
        Boost::social_proof(0.15),
    ],

    // 3. Exclusion predicates
    excludes: vec![
        Exclude::relationship("blocked"),
        Exclude::relationship("muted"),  // for algorithmic feeds
    ],
}

13. Performance Targets

Operation	Target	Method
Load user blocked set	< 100 microseconds	Prefix scan on `REL:blocked:`, build Roaring bitmap
Load user follows set	< 500 microseconds	Prefix scan on `REL:follows:`, build Roaring bitmap
Load top-50 interaction weights	< 200 microseconds	Weight-sorted index scan, early termination at 50
Single interaction_weight lookup	< 5 microseconds	Point key lookup
Social graph BFS depth 1	< 2 ms	Prefix scan + fan-out limit 100
Social graph BFS depth 2	< 10 ms	Two rounds of prefix scan + fan-out limit 100 per hop
Social proof map construction	< 10 ms	BFS + engagement_affinity scan for graph users
Write explicit relationship	< 50 microseconds	Forward + reverse index write, WAL append
Update implicit relationship (within signal write)	< 10 microseconds	Point read + point write (amortized with signal transaction)
Similarity edge lookup for item	< 100 microseconds	Prefix scan on `REL:similarity:`, top-50
Block cascade	< 5 ms	Follows delete + interaction_weight zero + engagement_affinity scan and zero

Benchmark Criteria

These targets must be validated with criterion benchmarks from the first implementation:

// benchmarks/relationships.rs

// Relationship read benchmarks
bench_load_blocked_set(100_blocked_creators)       // target: < 100 us
bench_load_follows_set(500_follows)                // target: < 500 us
bench_top_k_interaction_weights(50_from_300)       // target: < 200 us
bench_single_weight_lookup()                       // target: < 5 us
bench_social_graph_bfs_depth_1(fan_out_100)        // target: < 2 ms
bench_social_graph_bfs_depth_2(fan_out_100)        // target: < 10 ms

// Relationship write benchmarks
bench_write_explicit_relationship()                // target: < 50 us
bench_update_interaction_weight()                  // target: < 10 us
bench_block_cascade(20_engaged_items)              // target: < 5 ms

Appendix A: Relationship Trait Interface

The relationship subsystem exposes a trait that the query engine and signal system consume. No storage engine types leak across module boundaries.

pub trait RelationshipStore: Send + Sync {
    /// Write an explicit relationship edge.
    fn write_edge(&self, edge: &RelationshipEdge) -> Result<()>;

    /// Delete an explicit relationship edge.
    fn delete_edge(
        &self,
        kind: RelationshipKind,
        from: EntityId,
        to: EntityId,
    ) -> Result<()>;

    /// Read a single edge weight. Returns None if no edge exists.
    fn get_weight(
        &self,
        kind: RelationshipKind,
        from: EntityId,
        to: EntityId,
    ) -> Result<Option<f64>>;

    /// Load all edges of a given kind from an entity.
    /// Returns edges sorted by weight descending.
    fn scan_forward(
        &self,
        from: EntityId,
        kind: RelationshipKind,
        limit: Option<u32>,
    ) -> Result<Vec<RelationshipEdge>>;

    /// Load all edges of a given kind pointing to an entity.
    fn scan_reverse(
        &self,
        to: EntityId,
        kind: RelationshipKind,
        limit: Option<u32>,
    ) -> Result<Vec<RelationshipEdge>>;

    /// Load the set of entity IDs with a given edge kind from an entity.
    /// Returns as a Roaring bitmap for efficient set operations.
    fn load_edge_set(
        &self,
        from: EntityId,
        kind: RelationshipKind,
    ) -> Result<RoaringBitmap>;

    /// Update an implicit relationship weight atomically.
    /// Applies decay, adds delta, clamps to [0.0, 1.0].
    fn update_weight(
        &self,
        kind: RelationshipKind,
        from: EntityId,
        to: EntityId,
        delta: f64,
        timestamp: Timestamp,
    ) -> Result<f64>;  // returns new weight

    /// Depth-limited BFS traversal.
    fn traverse_graph(
        &self,
        start: EntityId,
        edge_kind: RelationshipKind,
        max_depth: u8,
        max_fan_out: u32,
        min_weight: f64,
    ) -> Result<Vec<EntityId>>;
}

Appendix B: Property Test Invariants

The following properties must hold under all inputs (validated with proptest):

Blocked exclusion is absolute. If a blocked edge exists from user U to creator C, no item by C ever appears in any query result for U. No exceptions.
Weight bounds. All implicit relationship weights are in [0.0, 1.0] after every update, regardless of signal sequence.
Decay monotonicity. Without new signals, interaction_weight and engagement_affinity monotonically decrease over time.
Symmetric similarity. For similarity edges, weight(A, B) == weight(B, A) always.
Block cascade completeness. After blocking creator C, interaction_weight(U, C) == 0.0 and engagement_affinity(U, I) == 0.0 for all items I by C.
Unblock does not restore. After unblocking, no follows, interaction_weight, or engagement_affinity edges are restored. They must be rebuilt organically.
Idempotent explicit writes. Writing the same explicit relationship twice produces the same state as writing it once (timestamp may differ).
Forward-reverse consistency. For every forward edge (A, kind, B), a corresponding reverse entry (B, kind, A) exists in the reverse index. No orphaned forward or reverse entries.
WAL replay produces identical state. Replaying the relationship WAL from empty storage produces the same forward index, reverse index, and weight-sorted index as uninterrupted execution.
Fan-out bounds are respected. traverse_graph with max_fan_out=N never reads more than N edges per node per hop, regardless of actual edge count.

41 KiB Raw Blame History