# Task 07: CoEngagementIndex LRU + Social Scale Verification ## Delivers Verification that `CoEngagementIndex` eviction is correct at 2x capacity (memory stays bounded, weight-based ordering preserved). Social graph filter benchmark at 1M items confirming the `social_graph_bitmap` filter meets the < 50ms target. Cross-session preference merge at 100K users confirmed < 1ms per merge. ## Complexity M ## Dependencies - task-01 complete (1M-item benchmark infrastructure) - Existing bench: `tidal/benches/social.rs` (10-creator, 20-follower baseline) ## Technical Design ### 1. CoEngagementIndex eviction at 2x capacity The current implementation uses weight-based batch eviction: when `edges.len() > capacity`, it sorts all edges by weight ascending and removes the lowest-weight entries. This is correct but has two concerns at scale: 1. **Sort cost:** O(N log N) on the full edge map. At 100K edges, this is ~1.7M comparisons per eviction. 2. **Memory correctness:** After eviction, `edges.len()` must be exactly `capacity`. #### Benchmark: eviction latency at scale Extend `tidal/benches/social.rs`: ```rust fn bench_co_engagement_eviction_at_scale(c: &mut Criterion) { let mut group = c.benchmark_group("co_engagement_eviction"); // Benchmark eviction at 100K capacity (production-like) for &capacity in &[10_000, 50_000, 100_000] { group.bench_function( BenchmarkId::new("eviction", format!("cap_{capacity}")), |b| { let index = CoEngagementIndex::with_capacity(capacity); // Pre-fill to exactly capacity let items_per_user = 50; let users_needed = capacity / items_per_user + 1; for user in 0..users_needed as u64 { for item in 0..items_per_user as u64 { index.record_positive(user, EntityId::new(user * 1000 + item)); } } // Now each record_positive may trigger eviction let mut next_user = users_needed as u64; b.iter(|| { index.record_positive( black_box(next_user), black_box(EntityId::new(next_user * 1000)), ); index.record_positive( black_box(next_user), black_box(EntityId::new(next_user * 1000 + 1)), ); next_user += 1; }); }, ); } group.finish(); } ``` #### Correctness test: 2x capacity stress ```rust #[test] fn eviction_correctness_at_2x_capacity() { let capacity = 1_000; let index = CoEngagementIndex::with_capacity(capacity); // Drive to 2x capacity worth of insertions let total_users = 200u64; let items_per_user = 20u64; for user in 0..total_users { for item in 0..items_per_user { index.record_positive(user, EntityId::new(user * 100 + item)); } } // Invariant: edge_count <= capacity at all times assert!( index.edge_count() <= capacity, "edge_count {} exceeds capacity {capacity}", index.edge_count() ); // Verify surviving edges have higher weights than evicted edges would have. // Since weight-based eviction preserves strongest edges, surviving edges // should have weight >= any evicted edge. let edges = index.iter_edges(); let min_surviving_weight = edges .iter() .map(|&(_, _, w)| w) .fold(f32::INFINITY, f32::min); // With uniform engagement patterns, all edges have weight 1.0, // so min_surviving_weight should be >= 1.0 assert!( min_surviving_weight >= 1.0, "surviving edge weight {min_surviving_weight} unexpectedly low" ); } ``` ### 2. Social graph filter at 1M items The existing benchmark operates at 10 creators / 20 followers / 50 items. At 1M items with 10K creators and larger social graphs, the bitmap construction may behave differently. #### Scale benchmark ```rust fn bench_social_graph_bitmap_1m(c: &mut Criterion) { let mut group = c.benchmark_group("social_graph_bitmap_1m"); group.sample_size(10); group.measurement_time(Duration::from_secs(20)); // Build a social graph at scale: // - 100 followed creators (realistic for an active user) // - 500 followers per creator (moderate fan-out) // - 10,000 items per creator (100 * 10K = 1M items) let (user_state, creator_items) = build_social_state(100, 500, 10_000); group.bench_function("depth1_100creators_10k_items", |b| { b.iter(|| { social_graph_bitmap( black_box(1), black_box(1), &user_state, &creator_items, ) }); }); group.bench_function("depth2_100creators_500followers_10k_items", |b| { b.iter(|| { social_graph_bitmap( black_box(1), black_box(2), &user_state, &creator_items, ) }); }); group.finish(); } /// Extended social state builder for 1M-item scale. /// /// Reuses the `build_social_state` pattern from `social.rs` but at higher scale. fn build_social_state_1m( num_creators: usize, followers_per_creator: usize, items_per_creator: usize, ) -> (UserStateIndex, CreatorItemsBitmap) { let user_state = UserStateIndex::new(); let creator_items = CreatorItemsBitmap::new(); let user_id = 1u64; for c in 0..num_creators { let creator_id = (100 + c) as u64; user_state.add_follow(user_id, creator_id); user_state.add_creator_follower(creator_id, user_id); for f in 0..followers_per_creator { let follower_id = (10_000 + c * followers_per_creator + f) as u64; user_state.add_creator_follower(creator_id, follower_id); // Each co-follower has seen some items for i in 0..5u32 { user_state.mark_seen(follower_id, (follower_id as u32) * 100 + i); } } for i in 0..items_per_creator { let item_id = ((c * items_per_creator + i) as u32) + 1; creator_items.add_item(creator_id, item_id); } } (user_state, creator_items) } ``` ### 3. Cross-session preference merge at 100K users The `close_session` hook blends signaled item embeddings into `PreferenceVectors` via `update_with_custom_rate`. At 100K users x 10 sessions, verify each merge completes in < 1ms. ```rust fn bench_preference_merge_100k(c: &mut Criterion) { let mut group = c.benchmark_group("preference_merge"); let pref_vectors = PreferenceVectors::new(); // Pre-populate 100K users with initial preference vectors let dim = 128; let mut rng = rand::rng(); for user_id in 0..100_000u64 { let vec: Vec = (0..dim).map(|_| rng.random::() - 0.5).collect(); pref_vectors.set(user_id, &vec); } // Benchmark a single merge (EMA update with a new embedding) let update_vec: Vec = (0..dim).map(|_| rng.random::() - 0.5).collect(); group.bench_function("single_merge_128d", |b| { let mut user_counter = 0u64; b.iter(|| { let user_id = user_counter % 100_000; user_counter += 1; pref_vectors.update_with_custom_rate( black_box(user_id), black_box(&update_vec), black_box(0.1), // DAMPING ); }); }); // Benchmark batch merge (10 sessions worth of updates for one user) group.bench_function("10_session_merge_128d", |b| { let updates: Vec> = (0..10) .map(|_| (0..dim).map(|_| rng.random::() - 0.5).collect()) .collect(); b.iter(|| { for update in black_box(&updates) { pref_vectors.update_with_custom_rate( black_box(42), update, black_box(0.1), ); } }); }); group.finish(); } ``` ### 4. CoEngagementIndex LRU ordering verification The current eviction is weight-based, not time-based LRU. The task acceptance criteria say "LRU ordering correct" -- verify that the weight-based strategy is indeed the intended design (per the `co_engagement.rs` doc comment), and that it produces the correct ordering under adversarial input: ```rust #[test] fn eviction_preserves_high_weight_edges_under_skewed_input() { let capacity = 100; let index = CoEngagementIndex::with_capacity(capacity); // Phase 1: Create a few high-weight edges (multiple co-occurrences) // Users 1-10 all engage with items 1 and 2 -> edge (2, 1) gets weight 10 for user in 1..=10u64 { index.record_positive(user, EntityId::new(1)); index.record_positive(user, EntityId::new(2)); } let high_weight = index.score(EntityId::new(2), EntityId::new(1)); assert!(high_weight >= 10.0, "high-weight edge should have weight >= 10"); // Phase 2: Flood with low-weight edges to trigger eviction for user in 100..300u64 { for item in (user * 10)..(user * 10 + 5) { index.record_positive(user, EntityId::new(item)); } } // The high-weight edge should survive eviction let surviving_weight = index.score(EntityId::new(2), EntityId::new(1)); assert!( surviving_weight >= 10.0, "high-weight edge (weight={surviving_weight}) should survive eviction" ); // Capacity invariant holds assert!( index.edge_count() <= capacity, "edge_count {} exceeds capacity {capacity}", index.edge_count() ); } ``` ### 5. Memory bounding verification ```rust #[test] fn co_engagement_memory_bounded_at_2x_insertions() { let capacity = 10_000; let index = CoEngagementIndex::with_capacity(capacity); // Insert 2x capacity worth of unique edges let total_edges_attempted = capacity * 2; let mut edge_count_history = Vec::new(); for user in 0..(total_edges_attempted as u64 / 10) { for item in 0..10u64 { index.record_positive(user, EntityId::new(user * 100 + item)); } edge_count_history.push(index.edge_count()); } // No point in the history should exceed capacity let max_observed = edge_count_history.iter().max().copied().unwrap_or(0); assert!( max_observed <= capacity, "max observed edge count {max_observed} exceeds capacity {capacity}" ); // Final state should be at or below capacity assert!(index.edge_count() <= capacity); } ``` ## Acceptance Criteria - [ ] CoEngagementIndex eviction at 2x capacity: `edge_count <= capacity` invariant holds - [ ] Weight-based eviction preserves highest-weight edges (verified with skewed input) - [ ] Eviction latency at 50K and 100K capacity benchmarked and documented - [ ] Social graph bitmap at 1M items (100 creators x 10K items): depth-1 and depth-2 benchmarked - [ ] Social graph filter p99 < 50ms at 1M items - [ ] Cross-session preference merge: single merge < 1ms, 10-session batch < 10ms at 100K users - [ ] Memory bounding: no edge count exceeds capacity at any point during 2x insertions - [ ] Results documented in `docs/profiling/social-scale.md` ## Test Strategy 1. **Eviction invariant (property test):** ```rust proptest! { #[test] fn edge_count_never_exceeds_capacity( capacity in 10usize..1000, users in 5u64..50, items_per_user in 2u64..20, ) { let index = CoEngagementIndex::with_capacity(capacity); for user in 0..users { for item in 0..items_per_user { index.record_positive(user, EntityId::new(user * 100 + item)); prop_assert!( index.edge_count() <= capacity, "edge_count {} > capacity {capacity} after record_positive(user={user}, item={item})", index.edge_count() ); } } } } ``` 2. **Social graph correctness:** Verify that the bitmap returned by `social_graph_bitmap` at 1M items contains only items from followed creators (no false positives from bitmap overflow). 3. **Preference merge accuracy:** After 10 merges with known vectors, verify the resulting preference vector is the correct EMA: ```rust #[test] fn preference_ema_accuracy() { let pref = PreferenceVectors::new(); let initial = vec![1.0f32; 128]; pref.set(1, &initial); let update = vec![0.0f32; 128]; let lr = 0.1; // After one merge: (1 - 0.1) * 1.0 + 0.1 * 0.0 = 0.9 pref.update_with_custom_rate(1, &update, lr); let result = pref.get(1).unwrap(); for &v in &result { assert!((v - 0.9).abs() < 1e-5, "expected 0.9, got {v}"); } } ``` 4. **Benchmark regression:** Compare new social bench results against existing `tidal/benches/social.rs` baselines to ensure no degradation at small scale.