# Forage — Architecture ## Overview Forage has two layers: a reusable **engine** and a demo **server**. The engine is the thing that transfers to other applications. The server is the demo that proves it. `plan.md` is the canonical build spec when details conflict. ``` applications/forage/ ├── engine/ ← library crate — tidalDB wrapper + MAB + signal schema └── server/ ← binary crate — Axum HTTP server + feed page (depends on engine) ``` Any application that wants a foraging loop embeds `forage-engine` directly. The Axum server and the feed page are one instantiation of that engine, not the thing itself. Runtime default for the demo server: - Persistent state at `~/.forage/data` - Optional `--ephemeral` mode for throwaway sessions ### System Diagram ``` ┌──────────────────────────────────────────────┐ │ Feed Page (browser, localhost:4242) │ │ │ │ User (or Claude) clicks, skips, saves │ │ JS posts signals directly via fetch() │ │ Page polls /feed every 5s, re-renders │ └──────────────────┬───────────────────────────┘ │ HTTP (localhost:4242) │ POST /signal (from page JS) │ GET /feed ▼ ┌──────────────────────────────────────────────┐ │ forage-server (Axum, thin) │ │ routes → handlers → ForageEngine │ └──────────────────┬───────────────────────────┘ │ Rust function calls ▼ ┌──────────────────────────────────────────────┐ │ forage-engine (library crate) │ │ │ │ ForageEngine { db: TidalDb } │ │ fn signal(user, item, type) -> Result<()> │ │ fn feed(user, limit) -> Result> │ │ fn seed(corpus) -> Result<()> │ │ │ │ MAB layer (epsilon-greedy, labels) │ │ Signal schema (view/dwell/save/skip/share) │ │ Ranking profiles (default/explore/converge) │ └──────────────────┬───────────────────────────┘ │ embedded ▼ ┌──────────────────────────────────────────────┐ │ tidalDB │ │ Entities · Signals · Profiles · HNSW · BM25│ └──────────────────────────────────────────────┘ ``` ### Chrome Extension Role In **P0**, the Chrome extension is a **light observer**, not a driver. The feed page handles its own signal posting via plain JS `fetch()` — no MCP tools needed for every click. Claude uses the extension to check in occasionally: - **Once at session start**: `navigate` to the feed page - **Periodically**: `read_page` to snapshot the current feed state (one call, not per-interaction) - **At the end**: compare snapshots, report what shifted This keeps token usage low. The interesting loop — signal → re-rank → new feed — runs entirely in the browser and server without any Claude involvement. Claude's role is observer and reporter, not puppeteer in P0. --- ## Data Flow ### Write Path (Signal) ``` User clicks an item card on the feed page → POST /signal { user_id: 1, item_id: 42, signal_type: "view" } → forage-server receives request → forage-engine::signal(user, item, SignalType::View) → db.signal("view", EntityId::new(42), 1.0, Timestamp::now()) // value derived in engine → tidalDB writes to hot-tier SignalLedger (in-memory DashMap) → tidalDB updates user PreferenceVector (EMA blend toward item embedding) → tidalDB persists WAL entry (fjall, durability) → HTTP 200 ← total: < 5ms ``` ### Read Path (Feed) ``` Feed page requests feed → GET /feed?user=1&limit=7 → forage-server calls engine.feed(user_id, 7) → forage-engine MAB layer: exploit_pool = db.retrieve( Retrieve::builder() .for_user(user_id) .using_profile("forage_default") .filter(FilterExpr::unseen(user_id)) .diversity(max_per_category: 2) .limit(20) .build() ) explore_candidates = items where category_signal_count(user, cat) < EXPLORE_THRESHOLD final_7 = interleave(exploit_pool[0..6], explore_candidates[0..1], label each) → serialize to JSON with label, score, why_reason → HTTP 200 { items: [...] } ← total: < 50ms ``` ### Preference Evolution tidalDB's `apply_session_preference_update` is called on session close, not per-signal. Forage uses a **periodic flush** pattern: a background task closes and reopens each user's session every 60 seconds, triggering the EMA blend of signaled item embeddings into the preference vector. ``` // forage-engine background task (spawned at startup) every 60s: for each active user: db.close_session(user_id, session_id) → triggers apply_session_preference_update db.open_session(user_id) → fresh session for next window ``` Effective learning rates per signal type (via `update_with_custom_rate`): ``` "view" → lr=0.05 (mild positive) "dwell" → lr=0.10 (stronger — reading time is intent) "save" → lr=0.20 (strong intent) "skip" → lr=-0.02 (mild negative) "share" → lr=0.30 (strongest positive) ``` The 60s flush means preference vectors lag signals by up to 60s — acceptable for a foraging engine where the feed refreshes every 5s but deep preference shifts evolve over sessions, not seconds. The adaptive learning rate (tidalDB M6p6: `alpha = base / (1 + ln(n+1))`) means early signals have more influence; later signals refine without overcorrecting. --- ## Signal Schema ```rust // Declared on startup in forage-engine/src/schema.rs let schema = SchemaBuilder::new() .signal("view", EntityKind::Item, DecaySpec::Exponential { half_life: days(7) }) .windows(&[Window::TwentyFourHours, Window::AllTime]) .velocity(true) .add() .signal("dwell", EntityKind::Item, DecaySpec::Exponential { half_life: days(3) }) .windows(&[Window::TwentyFourHours, Window::AllTime]) .velocity(false) .add() .signal("save", EntityKind::Item, DecaySpec::Exponential { half_life: days(30) }) .windows(&[Window::AllTime]) .velocity(false) .add() .signal("skip", EntityKind::Item, DecaySpec::Exponential { half_life: days(1) }) .windows(&[Window::TwentyFourHours]) .velocity(false) .add() .signal("share", EntityKind::Item, DecaySpec::Exponential { half_life: days(14) }) .windows(&[Window::AllTime]) .velocity(false) .add() .build()?; ``` Signal semantics: | Signal | Half-life | Meaning | Learning Rate | |--------|-----------|---------|---------------| | `view` | 7 days | User opened the item | 0.05 | | `dwell` | 3 days | User read for ≥30s (proxy for completion) | 0.10 | | `save` | 30 days | User explicitly bookmarked it | 0.20 | | `skip` | 1 day | User dismissed it | −0.02 | | `share` | 14 days | User sent it to someone | 0.30 | --- ## Ranking Profiles Three profiles covering the exploration/exploitation spectrum: ### `forage_default` (primary) - Personalized blend: preference_match 0.5, signal_recency 0.3, quality 0.2 - Exploration budget: 14% (roughly 1 in 7 items) - Diversity: `max_per_category: 2` - Unseen filter: always on ### `forage_explore` (cold start / adventurous users) - Exploration budget: 35% - Boosts `hidden_gems` profile weighting (high quality, low view count) - Wider diversity: `max_per_category: 1` ### `forage_converge` (power users with strong preferences) - Exploration budget: 5% - Pure preference match + recency - Tighter diversity: `max_per_category: 3` (allows depth in known interests) --- ## MAB Layer The epsilon-greedy MAB lives in `forage-engine/src/mab.rs`. It wraps tidalDB queries — it does not replace them. ```rust pub struct MabConfig { pub exploration_ratio: f32, // default 0.14 pub explore_threshold: u64, // categories with < N user signals = exploration eligible } pub fn rank(db: &TidalDb, user_id: u64, limit: usize, cfg: &MabConfig) -> Result> { // Step 1: Get exploit pool (2× limit so we have headroom) let exploit_count = ((1.0 - cfg.exploration_ratio) * limit as f32).ceil() as usize; let explore_count = limit - exploit_count; let exploit_pool = db.retrieve( Retrieve::builder() .for_user(EntityId::new(user_id)) .using_profile("forage_default") .filter(FilterExpr::unseen_by(user_id)) .diversity(DiversityConstraints { max_per_category: Some(2), ..Default::default() }) .limit(limit * 2) .build()? )?; // Step 2: Find exploration candidates (categories with < threshold signals) let explore_pool = exploit_pool.iter() .filter(|item| category_signal_count(db, user_id, item.category()) < cfg.explore_threshold) .take(explore_count * 3) // more candidates = better exploration variety .collect::>(); // Step 3: Interleave, label, return let mut result = Vec::with_capacity(limit); let mut exploit_iter = exploit_pool.iter().filter(|i| !is_explore_candidate(i)); let mut explore_iter = explore_pool.iter(); for i in 0..limit { let is_explore_slot = (i + 1) % (limit / explore_count.max(1)) == 0; if is_explore_slot { if let Some(item) = explore_iter.next() { result.push(label(item, ItemLabel::Exploring)); continue; } } if let Some(item) = exploit_iter.next() { result.push(label(item, determine_label(item, user_id, db))); } } Ok(result) } ``` Labels assigned at ranking time, returned in the feed response: - `"match"` — cosine similarity to preference vector above threshold - `"exploring"` — from underexplored category bucket - `"trending"` — high velocity regardless of personalization - `"resurfaced"` — prior low engagement, being re-evaluated after decay --- ## HTTP API ### `POST /signal` ```json { "user_id": 1, "item_id": 42, "signal_type": "view", "duration_ms": null } ``` Response: `200 OK { "ok": true }` For `dwell` signals, `duration_ms` is used internally to scale signal strength: `value = min(duration_ms / 30000.0, 3.0)`. ### `GET /feed?user=X&limit=7` ```json { "user_id": 1, "items": [ { "id": 42, "title": "Toward a Theory of Generative Systems", "source": "mitpress.mit.edu", "category": "science", "reading_time_min": 8, "description": "You have engaged with complexity theory and emergent systems. This paper bridges those interests with formal generative grammar.", "label": "match", "score": 0.847, "url": "https://..." } ], "generated_at_ms": 1708720000000 } ``` ### `GET /items` Returns all seed items. Used by the feed page for initial render and by Claude for browsing context. ### `GET /` Serves `static/index.html` — the feed page. --- ## Seed Data 100 items, 8 categories, reproducible via seeded RNG (`seed = 42`). | Category | Count | Sample titles | |----------|-------|---------------| | `tech` | 15 | "Consistent Hashing and Load Distribution", "CRDT Primer: Convergent Data Structures", "Why Your Database Lies About Durability" | | `music` | 10 | "Brian Eno's Oblique Strategies", "Sidechaining as Musical Grammar", "Why Lo-Fi Works" | | `jazz` | 15 | "Coltrane Changes: Why They Work", "West African Rhythm and American Jazz", "The Harmony of Ornette Coleman" | | `cooking` | 12 | "The Chemistry of Sourdough", "Miso in Three Steps", "Lacto-Fermentation Without Fear" | | `fitness` | 10 | "Loaded Carries and Their Underuse", "Joint Mobility vs. Flexibility", "Walking Is Enough" | | `travel` | 10 | "Night Trains Through Central Europe", "Walking Cities by Sound", "Markets, Routes, and Street Cartography" | | `science` | 15 | "Emergence: From Cells to Consciousness", "Small Worlds and Scale-Free Networks", "Power Laws in Nature" | | `literature` | 13 | "Joan Didion on Self-Respect", "Montaigne's Recursive Method", "David Foster Wallace on Attention" | Items include realistic metadata: `created_at`, `reading_time`, `word_count`, `source_domain`, `author`. ### P0 Embeddings Strategy P0 uses **category-axis vectors** — no embedding service required. Each category is assigned a basis vector in 8-dimensional space (one dimension per category). Items within the same category get similar vectors; items in different categories get orthogonal ones. A small random offset (seeded, deterministic) gives intra-category variation. ```rust // forage-engine/src/seed.rs fn category_vector(category: &str, item_offset: u64) -> Vec { let mut v = vec![0.0f32; 8]; let dim = category_index(category); // 0..7 v[dim] = 0.9; // small deterministic noise from item_id seed add_seeded_noise(&mut v, item_offset, 0.1); l2_normalize(&mut v) } ``` This makes semantic similarity actually demonstrate something in P0: items in the same category cluster together, cross-category exploration is genuinely "far" from the user's centroid. When preference vectors form, they point toward the user's engaged categories and `similar_to` queries return items from those categories. P2 replaces these with real embeddings from an external service. The seed corpus entries and the vector shape in tidalDB are identical — only the values change. --- ## Feed Page Minimal. Static HTML, no framework. Under 200 lines. ``` ┌──────────────────────────────────────────────────────────────┐ │ ◦ forage [user: 1 ▾] [7 items] last updated: 2s ago │ ├──────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ [match] │ │ [exploring] │ │ [match] │ │ │ │ │ │ │ │ │ │ │ │ Title │ │ Title │ │ Title │ │ │ │ source · 8m │ │ source · 4m │ │ source · 12m│ │ │ │ │ │ │ │ │ │ │ │ Description │ │ Description │ │ Description │ │ │ │ paragraph │ │ paragraph │ │ paragraph │ │ │ │ │ │ │ │ │ │ │ │ [skip] [▸] │ │ [skip] [▸] │ │ [skip] [▸] │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ [trending] │ │ [match] │ │ [match] │ │ │ │ ... │ │ ... │ │ ... │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ ┌─────────────┐ │ │ │ [exploring] │ │ │ │ ... │ │ │ └─────────────┘ │ └──────────────────────────────────────────────────────────────┘ ``` Interactions: - **Click card** → `POST /signal view` + open URL in new tab - **Hover ≥3s** → `POST /signal dwell` (JS timer, fires on mouseleave if threshold met) - **[skip]** → `POST /signal skip` + animate card out + pull next item - **[▸]** (save) → `POST /signal save` + animate bookmark indicator - **Auto-refresh** → polls `/feed` every 5s, diffs result, animates re-ordering --- ## Project Layout ``` applications/forage/ ├── vision.md # What it is and why ├── plan.md # Phased build plan ├── architecture.md # This file ├── readme.md # How to run it │ ├── engine/ # Library crate — the reusable core │ ├── Cargo.toml │ └── src/ │ ├── lib.rs # ForageEngine public API │ ├── schema.rs # tidalDB schema declaration │ ├── seed.rs # Deterministic seed corpus builder │ ├── mab.rs # Epsilon-greedy MAB wrapper │ └── labels.rs # Label assignment logic │ └── server/ # Binary crate — the demo ├── Cargo.toml └── src/ ├── main.rs # Axum startup ├── handlers.rs # HTTP handlers (signal, feed, items) └── static/ └── index.html # Feed page (plain HTML/JS, ~150 lines) ``` ### Crate dependencies `forage-engine/Cargo.toml`: ```toml [lib] name = "forage_engine" [dependencies] tidaldb = { path = "../../../tidal" } serde = { version = "1", features = ["derive"] } ``` `forage-server/Cargo.toml`: ```toml [[bin]] name = "forage-server" [dependencies] forage-engine = { path = "../engine" } axum = "0.7" tokio = { version = "1", features = ["full"] } serde_json = "1" tower-http = { version = "0.5", features = ["cors", "fs"] } ``` CORS headers are required on the Axum server so the feed page's `fetch()` calls to `/signal` and `/feed` work without browser errors. ### Embedding in another application Any Rust application that wants the foraging loop: ```toml [dependencies] forage-engine = { path = "path/to/forage/engine" } ``` ```rust use forage_engine::ForageEngine; use std::path::Path; let engine = ForageEngine::persistent(Path::new("/home/you/.forage/data"))?; engine.seed_default_corpus()?; // Write a signal engine.signal(user_id, item_id, SignalType::View)?; // Get a ranked feed with MAB labels let feed = engine.feed(user_id, 7)?; ``` The Axum server is optional. The engine is the thing that transfers. --- ## What tidalDB Handles (Nothing to Reimplement) - Preference vector maintenance and EMA updates - Signal decay, velocity, windowed aggregation - HNSW vector index (semantic similarity) - BM25 full-text index (keyword search) - Diversity constraints (max per category, max per creator) - Cold-start exploration budget (items with no signals) - Session persistence and WAL durability - Filter evaluation (unseen, category, signal threshold) - The `hidden_gems` profile (high quality, low reach) The MAB layer is the only thing Forage adds on top of tidalDB. Everything else is a query.