Milestone 8 (phases 1-4): - Shard-aware WAL segment naming, BatchHeader v2, ShardRouter - Transport trait, InProcessTransport, WalShipper, FollowerDb - HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine - Session replication bridge with SeqNo/HWM, idempotency store Forage application: - Multi-source discovery engine with MAB exploration - Embedding-based label system, server handlers, UI refresh Other: - QUICKSTART.md, README.md, milestone-8 planning docs - Hard negative union semantics, RLHF export enhancements - Recovery benchmark and visibility test expansions - Split 8 oversized source files per CODING_GUIDELINES §9 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
489 lines
23 KiB
Markdown
489 lines
23 KiB
Markdown
# Forage — Build Plan
|
||
|
||
## What We Are Proving
|
||
|
||
Each phase proves something specific. Do not build phase N+1 until phase N has proven its thesis.
|
||
|
||
| Phase | Proves | Delivers |
|
||
|-------|--------|----------|
|
||
| **P0** | The loop closes — signal in, re-rank out, observable in real time | Local server + seed data + Claude observes interactions |
|
||
| **P1** | Claude can discover content without the user browsing — reactions alone drive the loop | Autonomous discovery agent + browse-tasks API + source registry |
|
||
| **P2** | Semantic search works over content Forage finds on the real web | Embedding service + real web crawl |
|
||
| **P3** | The MAB sharpens — exploration items hit more often over time | Adaptive exploration budget, centroid tracking, exploration-hit instrumentation |
|
||
| **P4** | The surprise moment — cross-centroid discoveries emerge naturally | Multi-session preference evolution, intersection surfacing |
|
||
|
||
---
|
||
|
||
## Phase 0 — Close the Loop (MVP Demo)
|
||
|
||
**Goal:** A running demo where a user interacts with a local feed page, signals are posted by the page itself, and Claude's Chrome extension observes visible ranking shifts. No real web crawl. No real embeddings. Proves the feedback loop.
|
||
|
||
**What we build:**
|
||
|
||
### `forage-engine` (library crate)
|
||
|
||
The reusable core. Wraps tidalDB with the foraging-specific schema, seed corpus, MAB layer, and public API. This is what transfers to other applications.
|
||
|
||
```rust
|
||
pub struct ForageEngine { db: TidalDb }
|
||
|
||
impl ForageEngine {
|
||
pub fn ephemeral() -> Result<Self>
|
||
pub fn persistent(data_dir: &Path) -> Result<Self>
|
||
pub fn seed_default_corpus(&self) -> Result<()>
|
||
pub fn signal(&self, user: u64, item: u64, kind: SignalKind) -> Result<()>
|
||
pub fn signal_dwell(&self, user: u64, item: u64, duration_ms: u64) -> Result<()>
|
||
pub fn feed(&self, user: u64, limit: usize) -> Result<Vec<ForageItem>>
|
||
pub fn all_items(&self) -> &[SeedItem]
|
||
pub fn add_item(&self, item: ForageItemInput) -> Result<u64> // P1
|
||
}
|
||
```
|
||
|
||
### `forage-server` (Axum binary)
|
||
|
||
A thin HTTP wrapper over `forage-engine`. Serves:
|
||
|
||
```
|
||
POST /signal { user_id, item_id, signal_type, duration_ms? }
|
||
GET /feed ?user=X&limit=7
|
||
GET /items (all items, for page render)
|
||
GET / (serves the feed HTML page)
|
||
```
|
||
|
||
Runtime mode:
|
||
- Persistent by default (`~/.forage/data`)
|
||
- Optional `--ephemeral` mode for throwaway demo runs
|
||
|
||
Schema on startup:
|
||
- 100 seed items across 8 categories (tech, music, jazz, cooking, fitness, travel, science, literature)
|
||
- Each item: title, url (placeholder), category, source, reading_time, description
|
||
- Seeded RNG — same items every run, deterministic
|
||
- 3 pre-built users: `cold` (no signals), `explorer` (light signals), `convergent` (heavy signals in 2 categories)
|
||
|
||
Signal types registered:
|
||
- `view` — half-life 7d, AllTime + 24h windows
|
||
- `dwell` — half-life 3d (reading time is stronger signal than click)
|
||
- `save` — half-life 30d (strong intent)
|
||
- `skip` — half-life 1d (mild negative, decays fast)
|
||
- `share` — half-life 14d (strongest positive)
|
||
|
||
Ranking profiles:
|
||
- `forage_default` — personalized, 14% exploration (~1 in 7), max_per_category:2
|
||
- `forage_explore` — heavy exploration, weighted toward underexplored categories
|
||
- `forage_converge` — pure exploitation, no exploration
|
||
|
||
MAB layer (thin wrapper over tidalDB query):
|
||
```
|
||
candidate_pool = RETRIEVE items FOR USER @u USING PROFILE forage_default LIMIT 20
|
||
exploit = candidate_pool[0..6] // top 6 by score
|
||
explore = candidate_pool filtered by (category_signal_count < 5) // pick 1 from underexplored
|
||
final = interleave(exploit, explore, ratio=0.14) // ~1 in 7
|
||
```
|
||
|
||
Item labels returned in feed response:
|
||
- `"match"` — near a known centroid
|
||
- `"exploring"` — from exploration budget
|
||
- `"trending"` — high velocity regardless of personalization
|
||
- `"resurfaced"` — user had prior engagement, decayed, being re-checked
|
||
|
||
### Feed Page (`/`)
|
||
|
||
Static HTML + minimal JS. No framework.
|
||
|
||
- Grid of 7 item cards
|
||
- Each card: title, source, category chip, reading time, description, label badge
|
||
- Click card → `POST /signal {signal_type: "view"}`, open URL in new tab
|
||
- Hover for >3s → `POST /signal {signal_type: "dwell", duration_ms: N}`
|
||
- "Skip" button on card → `POST /signal {signal_type: "skip"}`
|
||
- "Save" button → `POST /signal {signal_type: "save"}`
|
||
- Auto-refresh feed every 5s (or on any signal write)
|
||
- Visual: ranking shift animation when feed re-orders
|
||
|
||
### What Claude Does in P0
|
||
|
||
Claude uses the Chrome extension **lightly** — as an observer, not a puppeteer. The feed page handles signal posting itself via JS `fetch()`. Claude's role:
|
||
|
||
1. `navigate` to `localhost:4242` — one call
|
||
2. `read_page` to snapshot the initial feed state — one call
|
||
3. Wait while a human (or scripted JS) interacts with the feed for 10+ interactions
|
||
4. `read_page` again to snapshot the final feed state — one call
|
||
5. Report: what categories dominated before vs. after, which exploration items appeared, how the labels shifted
|
||
|
||
Three MCP tool calls per session. That is the ceiling. The interesting loop — signal → re-rank → new feed — runs entirely in the browser without Claude's involvement. Claude observes the outcome, it does not produce it.
|
||
|
||
This is the demo. This is the proof-of-concept that makes the thesis visible.
|
||
|
||
**Deliverables:**
|
||
- `applications/forage/engine/` — `ForageEngine` library crate (tidalDB + MAB + schema)
|
||
- `applications/forage/server/` — thin Axum binary wrapping the engine
|
||
- `applications/forage/server/static/index.html` — feed page (plain HTML/JS, signals via fetch())
|
||
- CORS headers on the server so the feed page can post signals without browser errors
|
||
|
||
---
|
||
|
||
## Phase 1 — Autonomous Discovery Loop
|
||
|
||
**Goal:** Claude discovers content proactively. The user never browses. The loop closes entirely through reactions to what Claude finds.
|
||
|
||
**Thesis:** A personalized feed can be driven without the user visiting a single page. Claude browses on behalf of the user, Forage ranks what it finds, and the user's reactions (save/skip/dwell) teach Claude where to look next. The feedback signal is not visits — it is choices.
|
||
|
||
**The loop:**
|
||
|
||
```
|
||
Background task in forage-server
|
||
↓ emits browse-tasks (topics weighted by user preference + tag affinity, source list)
|
||
Claude (--chrome, persistent session)
|
||
↓ navigates sources → finds article links
|
||
↓ reads each article in full
|
||
↓ analyses: topics, entities, content type, summary, quality
|
||
↓ POST /capture with enriched metadata
|
||
forage-server
|
||
↓ stores rich metadata, fires view signals, broadcasts via SSE
|
||
Feed page (localhost:4242)
|
||
↓ shows enriched cards (tags, content type, entities, Claude summary) live
|
||
User reacts (save / skip / dwell)
|
||
↓ signals update preference vector AND tag affinity counters
|
||
Next browse-tasks call
|
||
↓ returns topics + tag weights → Claude targets specific subtopics next cycle
|
||
Loop repeats with higher precision
|
||
```
|
||
|
||
**Why Claude's enrichment matters here:**
|
||
|
||
A JavaScript content script extracts what the page declares about itself. Claude reads and understands what the page actually says. These are different things.
|
||
|
||
- `<meta name="description">` on a jazz article: *"The latest from Blue Note Records"*
|
||
- Claude's analysis: `topics: ["hard bop", "trumpet"], entities: ["Lee Morgan", "Blue Note"], content_type: "review", summary: "A career retrospective on Lee Morgan's 1960s Blue Note recordings, focusing on his development of the hard bop trumpet style."`
|
||
|
||
The preference model runs on Claude's output, not the page's self-description. This is what makes tag-level personalization possible before real embeddings arrive in P2.
|
||
|
||
---
|
||
|
||
### What we build
|
||
|
||
#### Source Registry (in `forage-engine`)
|
||
|
||
A hardcoded per-category list of seed URLs Claude can navigate to find articles. Each source is a page where the top-level links are articles (list pages, front pages, RSS-style feeds).
|
||
|
||
```
|
||
technology: news.ycombinator.com, lobste.rs
|
||
science: phys.org, news.ycombinator.com?q=science
|
||
jazz: pitchfork.com/reviews/albums, allaboutjazz.com/news
|
||
travel: theguardian.com/travel, cntraveler.com/latest-news
|
||
cooking: seriouseats.com, bonappetit.com/recipe
|
||
design: designobserver.com, dezeen.com/news
|
||
history: historytoday.com, smithsonianmag.com/history
|
||
health: health.harvard.edu/blog, theatlantic.com/health
|
||
```
|
||
|
||
`ForageEngine` gains:
|
||
```rust
|
||
pub fn browse_tasks(&self, user_id: u64, limit_per_topic: usize) -> BrowsePlan
|
||
```
|
||
|
||
`BrowsePlan` contains:
|
||
- `topics: Vec<BrowseTopic>` — ordered by preference weight + tag affinity, cold-start gets equal weight across all 8
|
||
- `limit_per_topic: usize` — how many articles to capture per source
|
||
- `should_run: bool` — false if last discovery was recent and feed has ≥5 items
|
||
- `tag_hints: Vec<String>` — top tags from saved/dwelled items the agent should bias toward within each topic (e.g. `["modal jazz", "improvisation"]` tells Claude to prefer theory-heavy jazz sources over jazz news)
|
||
|
||
#### `GET /browse-tasks` (forage-server)
|
||
|
||
Returns a `BrowsePlan` as JSON for user 1:
|
||
```json
|
||
{
|
||
"should_run": true,
|
||
"interval_minutes": 30,
|
||
"limit_per_topic": 5,
|
||
"tag_hints": ["modal jazz", "improvisation", "music theory"],
|
||
"topics": [
|
||
{ "name": "jazz", "priority": 0.72, "sources": ["pitchfork.com/reviews/albums", "allaboutjazz.com/news"] },
|
||
{ "name": "technology", "priority": 0.51, "sources": ["news.ycombinator.com", "lobste.rs"] },
|
||
{ "name": "science", "priority": 0.28, "sources": ["phys.org"] }
|
||
]
|
||
}
|
||
```
|
||
|
||
`tag_hints` comes from the top tags across the user's positively-signaled items (save + dwell ≥15s). The agent uses these to bias which articles it chooses to read in depth within each source — skipping news roundups and prioritizing analysis pieces that match the hints.
|
||
|
||
Cold-start response: all 8 categories at equal `priority: 0.125`, 2 sources each, `tag_hints: []`.
|
||
|
||
#### `POST /discovery/heartbeat` (forage-server)
|
||
|
||
Agent calls this on every cycle start. Server records `agent_last_seen` timestamp. Used by feed page to show connection status.
|
||
|
||
#### `GET /discovery/status` (forage-server)
|
||
|
||
```json
|
||
{
|
||
"agent_connected": true,
|
||
"last_discovery_at": "2026-02-24T10:30:00Z",
|
||
"items_found_last_run": 23,
|
||
"next_run_in_minutes": 12
|
||
}
|
||
```
|
||
|
||
`agent_connected: true` when `agent_last_seen` is within the last 5 minutes.
|
||
|
||
#### Discovery state in `AppState`
|
||
|
||
```rust
|
||
pub struct DiscoveryState {
|
||
pub last_discovery_at: Mutex<Option<std::time::Instant>>,
|
||
pub agent_last_seen: Mutex<Option<std::time::Instant>>,
|
||
pub items_last_run: Mutex<u32>,
|
||
}
|
||
```
|
||
|
||
Added to `AppState` alongside `engine` and `events`. Handlers update it; feed page polls it.
|
||
|
||
#### Feed page status indicator
|
||
|
||
A small status bar below the header:
|
||
- `● Active — last run 4 min ago` (green dot) — `agent_connected: true`
|
||
- `○ Agent not connected` (grey dot) — no heartbeat in 5 min
|
||
- `⟳ Discovering...` (spinning) — between heartbeat and items appearing
|
||
|
||
#### Enriched capture payload
|
||
|
||
`ForageItemInput` and `POST /capture` gain Claude-specific fields:
|
||
|
||
```rust
|
||
pub struct ForageItemInput {
|
||
pub url: String,
|
||
pub title: String,
|
||
pub source: String,
|
||
pub category: String,
|
||
pub reading_time_min: u32,
|
||
pub description: String,
|
||
// Claude-enriched fields (all optional; empty = not provided)
|
||
pub tags: Vec<String>, // specific subtopics: ["modal jazz", "music theory", "john coltrane"]
|
||
pub entities: Vec<String>, // named entities: ["John Coltrane", "Blue Note Records"]
|
||
pub content_type: String, // "analysis" | "news" | "tutorial" | "opinion" | "review" | "interview" | "research" | ""
|
||
pub summary: String, // Claude's 2-sentence summary of what the article actually says
|
||
}
|
||
```
|
||
|
||
All fields serialized into item metadata storage. `tags` stored as `"tags"` (comma-separated string) so existing metadata retrieval works without schema changes.
|
||
|
||
`CaptureReq` in `handlers.rs` gains the same optional fields with `#[serde(default)]`.
|
||
|
||
#### Tag affinity in `ForageEngine`
|
||
|
||
`top_tags(user_id, limit) -> Vec<String>` — scans metadata of positively-signaled items (save + strong dwell), splits the `"tags"` metadata field, returns the top-N by frequency. Used to populate `tag_hints` in `BrowsePlan`.
|
||
|
||
No schema changes to tidalDB. Tag affinity runs entirely over item metadata; the preference vector stays 8-dimensional and tracks category-level signal. Tags are a secondary signal that guides the agent's article selection within a source, not a replacement for the embedding.
|
||
|
||
#### Enriched feed cards
|
||
|
||
Feed cards gain three new display elements:
|
||
- **Tag chips** — top 3 tags from the item, rendered as small outlined badges below the description. Tap a tag → future feed cards filtered/boosted for that tag (stored as a `localStorage` tag preference that biases the next `/browse-tasks` call via a `?prefer_tags=` query param)
|
||
- **Content type badge** — right of the category chip, distinct color: `analysis` (blue), `tutorial` (green), `news` (grey), `opinion` (amber), `review` (purple)
|
||
- **Claude summary** — shown instead of the meta description when non-empty; clearly signals what Claude learned from reading, not what the page says about itself
|
||
|
||
#### The discovery agent prompt
|
||
|
||
A file at `applications/forage/agent.md` — the instruction set Claude runs with `--chrome`.
|
||
|
||
Core loop:
|
||
1. `GET localhost:4242/browse-tasks`
|
||
2. If `should_run: false` → wait `interval_minutes`, repeat from step 1
|
||
3. `POST localhost:4242/discovery/heartbeat`
|
||
4. For each topic (in priority order):
|
||
- For each source URL:
|
||
- Navigate to the source page
|
||
- Find article links on the page (exclude nav, footer, sidebar links; prefer main content area)
|
||
- **Select** up to `limit_per_topic` articles that appear relevant to `tag_hints` (if hints are present); prefer depth over breadth — analysis and tutorial pieces over news roundups
|
||
- For each selected article:
|
||
- Navigate to the article
|
||
- **Read the full page text** (`get_page_text`)
|
||
- **Analyse**:
|
||
- `title` — headline (from `<h1>` if better than `<title>`)
|
||
- `canonical_url` — from `<link rel="canonical">`
|
||
- `reading_time_min` — word count ÷ 200, rounded up
|
||
- `tags` — 2–5 specific subtopic tags, lowercase, singular nouns or short phrases (e.g. `"modal jazz"` not `"jazz"`)
|
||
- `entities` — up to 5 named people, companies, technologies, or places central to the article
|
||
- `content_type` — one of: `analysis`, `news`, `tutorial`, `opinion`, `review`, `interview`, `research`
|
||
- `summary` — 2 sentences: what the article argues or reports, not what the site says about it
|
||
- Skip if: title is empty, contains "Sign In" / "Subscribe" / "Login" / "Create Account", or URL is localhost / chrome://
|
||
- `POST localhost:4242/capture` with all enriched fields
|
||
- Wait 1–2 seconds (politeness)
|
||
5. Wait `interval_minutes` minutes
|
||
6. Repeat
|
||
|
||
**What Claude must NOT do:** summarise the meta description. The point is that Claude reads the article and describes what it actually contains. A meta description that says "Read our latest article on jazz" is useless. Claude's summary should say "Argues that Coltrane's 1965 transition to free jazz was less a rejection of hard bop than an extension of it into harmonic territory bebop had not explored."
|
||
|
||
#### Invocation
|
||
|
||
One shell script at repo root:
|
||
```bash
|
||
#!/usr/bin/env bash
|
||
# forage-discover.sh — start the Forage discovery agent
|
||
# Prerequisites: forage-server running at localhost:4242, claude CLI with --chrome support
|
||
claude --chrome "$(cat applications/forage/agent.md)"
|
||
```
|
||
|
||
User starts the system with two terminal tabs:
|
||
```bash
|
||
# Tab 1 — server
|
||
cargo run -p forage-server --manifest-path applications/forage/server/Cargo.toml
|
||
|
||
# Tab 2 — agent
|
||
./forage-discover.sh
|
||
```
|
||
|
||
Then opens `localhost:4242` and reacts.
|
||
|
||
---
|
||
|
||
### Edge cases
|
||
|
||
| Situation | Handled by |
|
||
|-----------|------------|
|
||
| Cold start (no prefs, no tags) | Equal weight all 8 categories, `tag_hints: []`, agent reads broadly |
|
||
| Agent not running | Feed shows "Agent not connected"; `should_run: true` stays set |
|
||
| Navigation 404 / timeout | Agent skips to next URL, cycle continues |
|
||
| Paywall / login page | Agent skips on title check ("Sign In", blank, "Subscribe") |
|
||
| Empty title | `POST /capture` returns 400; agent skips |
|
||
| Duplicate URL | `add_item` idempotent via FNV-1a; same ID, no duplicate; enrichment not re-written |
|
||
| Feed sparse (< 5 items) | `should_run: true` overrides interval immediately |
|
||
| Two agents running | Both browse; idempotent captures; harmless double coverage |
|
||
| Server restart | `last_discovery_at` resets to null; agent runs on next cycle |
|
||
| Prefs shift mid-cycle | Current cycle finishes with old plan; next call picks up new weights |
|
||
| Claude context grows | Agent processes one topic at a time, not all sources in one turn |
|
||
| Rate limit (HTTP 429) | Agent skips source, logs, continues to next |
|
||
| Article has no meta description | Agent uses first paragraph or derives from full read; summary field carries real content |
|
||
| Tags on first article in a new category | Tags come from Claude's reading, not from existing tag history; tag affinity starts building immediately |
|
||
| User taps tag chip on feed card | Stored as `localStorage` tag preference; next `/browse-tasks?prefer_tags=modal+jazz` biases hints |
|
||
| Item enrichment fails (Claude unsure) | Fields default to empty string / empty array; `POST /capture` still succeeds; card renders with basic metadata only |
|
||
|
||
---
|
||
|
||
### What this does NOT build
|
||
|
||
- A Chrome Web Store extension
|
||
- Server-push to Claude (server cannot initiate Claude actions; Claude polls)
|
||
- Per-user discovery (single user, user 1, as per multi-user scope constraint)
|
||
- Configurable source lists via UI (source registry is hardcoded for P1)
|
||
|
||
---
|
||
|
||
### Acceptance criteria
|
||
|
||
1. Two terminal commands start the full system: one for the server, one for the agent
|
||
2. Within 5 minutes of starting, the feed contains ≥10 real articles discovered by Claude
|
||
3. Items appear in the feed in real-time via SSE as Claude captures them (no manual refresh)
|
||
4. Feed page shows `● Active` status while the agent is running
|
||
5. ≥80% of captured items have non-empty `tags`, `content_type`, and `summary` fields — Claude is analysing, not just extracting
|
||
6. At least one feed card's summary is observably different from and more informative than its meta description
|
||
7. After saving ≥5 items tagged `"modal jazz"`, the next `/browse-tasks` response includes `"modal jazz"` in `tag_hints`
|
||
8. After 20 user reactions (≥5 saves on jazz items), the next `/browse-tasks` response ranks jazz sources first
|
||
9. A navigation failure (404, timeout) during a discovery cycle does not crash the agent or the server
|
||
10. Re-running the agent after a server restart re-populates the feed within one cycle (items already in DB are not re-added)
|
||
|
||
---
|
||
|
||
## Phase 2 — Real Embeddings
|
||
|
||
**Goal:** Semantic search and similarity-based recommendations over content Forage actually finds.
|
||
|
||
**What changes:**
|
||
|
||
A thin embedding sidecar (separate process, any language):
|
||
```
|
||
POST /embed { text: string } → { vector: f32[1536] }
|
||
```
|
||
|
||
Default: OpenAI `text-embedding-3-small`. Swappable. Forage calls this when writing new items.
|
||
|
||
The text embedded is now significantly richer than P0's title-only approach. With Claude's enrichment from P1 available, the embedder receives:
|
||
```
|
||
{title} — {summary}. Topics: {tags}. Entities: {entities}.
|
||
```
|
||
This embeds Claude's understanding of the article, not the page's self-description. The preference centroid that emerges is a semantic model of what the user actually engages with.
|
||
|
||
With real embeddings:
|
||
- `SearchBuilder::semantic("jazz theory")` works for real
|
||
- `SearchBuilder::similar_to(item_id)` produces genuine similarity
|
||
- Preference vectors actually mean something — they are in embedding space
|
||
|
||
The feed profile adds:
|
||
- `semantic_boost: 0.3` — items semantically near preference centroid score higher
|
||
- `similar_to_saved: true` — items near saved items get boosted
|
||
|
||
**Proves:** The preference vector is not just a signal frequency map — it is a semantic model of what the user cares about, queryable by meaning.
|
||
|
||
---
|
||
|
||
## Phase 3 — Adaptive MAB
|
||
|
||
**Goal:** The exploration budget adapts per-user based on their exploration-hit history.
|
||
|
||
**What changes:**
|
||
|
||
Track per-user: `exploration_hits` / `exploration_total` → hit rate.
|
||
|
||
```
|
||
if hit_rate > 0.5: exploration_ratio = 0.25 (adventurous user)
|
||
if hit_rate < 0.2: exploration_ratio = 0.10 (convergent user)
|
||
else: exploration_ratio = 0.14 (default)
|
||
```
|
||
|
||
UCB1 bonus on underexplored categories:
|
||
```
|
||
ucb_bonus = sqrt(2 * ln(total_signals) / category_signal_count)
|
||
```
|
||
Categories with few signals get a score boost, naturally surfacing exploration candidates higher.
|
||
|
||
Instrumentation persists per-user exploration outcomes (`exploration_hits`, `exploration_total`) and feeds adaptation logic.
|
||
|
||
**Proves:** The MAB is not static noise — it learns the user's exploration tolerance and adjusts. Power users feel the system getting bolder with them.
|
||
|
||
---
|
||
|
||
## Phase 4 — The Surprise Moment
|
||
|
||
**Goal:** Cross-centroid discoveries emerge. Users find things at the intersection of two interests they did not know were related.
|
||
|
||
**What changes:**
|
||
|
||
Centroid intersection query:
|
||
```
|
||
centroids = top_2_active_centroids(user)
|
||
midpoint = (centroid_a.vector + centroid_b.vector) / 2
|
||
intersection_candidates = ANN(midpoint, limit=5)
|
||
inject 1 intersection candidate into every 7-item feed
|
||
label it: "bridge: {category_a} × {category_b}"
|
||
```
|
||
|
||
Over time, if intersection items hit consistently, they form a new centroid — the user's interests have genuinely merged into a new territory.
|
||
|
||
**Proves:** This is not a feature that could be added to a recommendation system after the fact. It is the natural consequence of having a semantic preference model that updates with the feedback loop. The "surprise moment" is an emergent property of the system working correctly.
|
||
|
||
---
|
||
|
||
## What We Are Not Building
|
||
|
||
- A Chrome extension we publish to the Chrome Web Store (P0 is Claude-driven, not user-installed)
|
||
- A mobile app
|
||
- Multi-user / server-hosted version (single user, local process)
|
||
- Content moderation, NSFW filtering, language filtering
|
||
- Payment, accounts, authentication
|
||
- A scraper that violates robots.txt or rate limits
|
||
|
||
---
|
||
|
||
## Phase 0 Acceptance Criteria
|
||
|
||
The P0 demo is complete when:
|
||
|
||
1. `cargo run -p forage-server --manifest-path applications/forage/server/Cargo.toml` starts a server at `localhost:4242`
|
||
2. The feed page loads with 7 items across ≥3 categories
|
||
3. A user generates 10+ signals from the feed page (mix of views, skips, saves) while Claude observes before/after state
|
||
4. After 10 signals, the feed has visibly shifted toward the signaled categories
|
||
5. At least 1 item in the feed is labeled `exploring` (from the exploration budget)
|
||
6. The signal-to-re-rank latency is < 200ms (measured by feed refresh after `POST /signal`)
|
||
7. A second user (`?user=2`) with no signals gets a different, more exploratory feed than a user with 20+ signals
|
||
|
||
If these 7 criteria are met, the loop is closed and the thesis is proven.
|