tidaldb/applications/forage/plan.md
jordan f4cfd6c81f feat: complete M8 replication primitives + forage enhancements + docs
Milestone 8 (phases 1-4):
- Shard-aware WAL segment naming, BatchHeader v2, ShardRouter
- Transport trait, InProcessTransport, WalShipper, FollowerDb
- HLC, PNCounter, LWWRegister, CrdtSignalState, ReconciliationEngine
- Session replication bridge with SeqNo/HWM, idempotency store

Forage application:
- Multi-source discovery engine with MAB exploration
- Embedding-based label system, server handlers, UI refresh

Other:
- QUICKSTART.md, README.md, milestone-8 planning docs
- Hard negative union semantics, RLHF export enhancements
- Recovery benchmark and visibility test expansions
- Split 8 oversized source files per CODING_GUIDELINES §9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-24 13:17:19 -07:00

23 KiB
Raw Blame History

Forage — Build Plan

What We Are Proving

Each phase proves something specific. Do not build phase N+1 until phase N has proven its thesis.

Phase Proves Delivers
P0 The loop closes — signal in, re-rank out, observable in real time Local server + seed data + Claude observes interactions
P1 Claude can discover content without the user browsing — reactions alone drive the loop Autonomous discovery agent + browse-tasks API + source registry
P2 Semantic search works over content Forage finds on the real web Embedding service + real web crawl
P3 The MAB sharpens — exploration items hit more often over time Adaptive exploration budget, centroid tracking, exploration-hit instrumentation
P4 The surprise moment — cross-centroid discoveries emerge naturally Multi-session preference evolution, intersection surfacing

Phase 0 — Close the Loop (MVP Demo)

Goal: A running demo where a user interacts with a local feed page, signals are posted by the page itself, and Claude's Chrome extension observes visible ranking shifts. No real web crawl. No real embeddings. Proves the feedback loop.

What we build:

forage-engine (library crate)

The reusable core. Wraps tidalDB with the foraging-specific schema, seed corpus, MAB layer, and public API. This is what transfers to other applications.

pub struct ForageEngine { db: TidalDb }

impl ForageEngine {
    pub fn ephemeral() -> Result<Self>
    pub fn persistent(data_dir: &Path) -> Result<Self>
    pub fn seed_default_corpus(&self) -> Result<()>
    pub fn signal(&self, user: u64, item: u64, kind: SignalKind) -> Result<()>
    pub fn signal_dwell(&self, user: u64, item: u64, duration_ms: u64) -> Result<()>
    pub fn feed(&self, user: u64, limit: usize) -> Result<Vec<ForageItem>>
    pub fn all_items(&self) -> &[SeedItem]
    pub fn add_item(&self, item: ForageItemInput) -> Result<u64>  // P1
}

forage-server (Axum binary)

A thin HTTP wrapper over forage-engine. Serves:

POST /signal    { user_id, item_id, signal_type, duration_ms? }
GET  /feed      ?user=X&limit=7
GET  /items     (all items, for page render)
GET  /          (serves the feed HTML page)

Runtime mode:

  • Persistent by default (~/.forage/data)
  • Optional --ephemeral mode for throwaway demo runs

Schema on startup:

  • 100 seed items across 8 categories (tech, music, jazz, cooking, fitness, travel, science, literature)
  • Each item: title, url (placeholder), category, source, reading_time, description
  • Seeded RNG — same items every run, deterministic
  • 3 pre-built users: cold (no signals), explorer (light signals), convergent (heavy signals in 2 categories)

Signal types registered:

  • view — half-life 7d, AllTime + 24h windows
  • dwell — half-life 3d (reading time is stronger signal than click)
  • save — half-life 30d (strong intent)
  • skip — half-life 1d (mild negative, decays fast)
  • share — half-life 14d (strongest positive)

Ranking profiles:

  • forage_default — personalized, 14% exploration (~1 in 7), max_per_category:2
  • forage_explore — heavy exploration, weighted toward underexplored categories
  • forage_converge — pure exploitation, no exploration

MAB layer (thin wrapper over tidalDB query):

candidate_pool = RETRIEVE items FOR USER @u USING PROFILE forage_default LIMIT 20
exploit = candidate_pool[0..6]  // top 6 by score
explore = candidate_pool filtered by (category_signal_count < 5) // pick 1 from underexplored
final = interleave(exploit, explore, ratio=0.14)  // ~1 in 7

Item labels returned in feed response:

  • "match" — near a known centroid
  • "exploring" — from exploration budget
  • "trending" — high velocity regardless of personalization
  • "resurfaced" — user had prior engagement, decayed, being re-checked

Feed Page (/)

Static HTML + minimal JS. No framework.

  • Grid of 7 item cards
  • Each card: title, source, category chip, reading time, description, label badge
  • Click card → POST /signal {signal_type: "view"}, open URL in new tab
  • Hover for >3s → POST /signal {signal_type: "dwell", duration_ms: N}
  • "Skip" button on card → POST /signal {signal_type: "skip"}
  • "Save" button → POST /signal {signal_type: "save"}
  • Auto-refresh feed every 5s (or on any signal write)
  • Visual: ranking shift animation when feed re-orders

What Claude Does in P0

Claude uses the Chrome extension lightly — as an observer, not a puppeteer. The feed page handles signal posting itself via JS fetch(). Claude's role:

  1. navigate to localhost:4242 — one call
  2. read_page to snapshot the initial feed state — one call
  3. Wait while a human (or scripted JS) interacts with the feed for 10+ interactions
  4. read_page again to snapshot the final feed state — one call
  5. Report: what categories dominated before vs. after, which exploration items appeared, how the labels shifted

Three MCP tool calls per session. That is the ceiling. The interesting loop — signal → re-rank → new feed — runs entirely in the browser without Claude's involvement. Claude observes the outcome, it does not produce it.

This is the demo. This is the proof-of-concept that makes the thesis visible.

Deliverables:

  • applications/forage/engine/ForageEngine library crate (tidalDB + MAB + schema)
  • applications/forage/server/ — thin Axum binary wrapping the engine
  • applications/forage/server/static/index.html — feed page (plain HTML/JS, signals via fetch())
  • CORS headers on the server so the feed page can post signals without browser errors

Phase 1 — Autonomous Discovery Loop

Goal: Claude discovers content proactively. The user never browses. The loop closes entirely through reactions to what Claude finds.

Thesis: A personalized feed can be driven without the user visiting a single page. Claude browses on behalf of the user, Forage ranks what it finds, and the user's reactions (save/skip/dwell) teach Claude where to look next. The feedback signal is not visits — it is choices.

The loop:

Background task in forage-server
    ↓ emits browse-tasks (topics weighted by user preference + tag affinity, source list)
Claude (--chrome, persistent session)
    ↓ navigates sources → finds article links
    ↓ reads each article in full
    ↓ analyses: topics, entities, content type, summary, quality
    ↓ POST /capture with enriched metadata
forage-server
    ↓ stores rich metadata, fires view signals, broadcasts via SSE
Feed page (localhost:4242)
    ↓ shows enriched cards (tags, content type, entities, Claude summary) live
User reacts (save / skip / dwell)
    ↓ signals update preference vector AND tag affinity counters
Next browse-tasks call
    ↓ returns topics + tag weights → Claude targets specific subtopics next cycle
Loop repeats with higher precision

Why Claude's enrichment matters here:

A JavaScript content script extracts what the page declares about itself. Claude reads and understands what the page actually says. These are different things.

  • <meta name="description"> on a jazz article: "The latest from Blue Note Records"
  • Claude's analysis: topics: ["hard bop", "trumpet"], entities: ["Lee Morgan", "Blue Note"], content_type: "review", summary: "A career retrospective on Lee Morgan's 1960s Blue Note recordings, focusing on his development of the hard bop trumpet style."

The preference model runs on Claude's output, not the page's self-description. This is what makes tag-level personalization possible before real embeddings arrive in P2.


What we build

Source Registry (in forage-engine)

A hardcoded per-category list of seed URLs Claude can navigate to find articles. Each source is a page where the top-level links are articles (list pages, front pages, RSS-style feeds).

technology:  news.ycombinator.com, lobste.rs
science:     phys.org, news.ycombinator.com?q=science
jazz:        pitchfork.com/reviews/albums, allaboutjazz.com/news
travel:      theguardian.com/travel, cntraveler.com/latest-news
cooking:     seriouseats.com, bonappetit.com/recipe
design:      designobserver.com, dezeen.com/news
history:     historytoday.com, smithsonianmag.com/history
health:      health.harvard.edu/blog, theatlantic.com/health

ForageEngine gains:

pub fn browse_tasks(&self, user_id: u64, limit_per_topic: usize) -> BrowsePlan

BrowsePlan contains:

  • topics: Vec<BrowseTopic> — ordered by preference weight + tag affinity, cold-start gets equal weight across all 8
  • limit_per_topic: usize — how many articles to capture per source
  • should_run: bool — false if last discovery was recent and feed has ≥5 items
  • tag_hints: Vec<String> — top tags from saved/dwelled items the agent should bias toward within each topic (e.g. ["modal jazz", "improvisation"] tells Claude to prefer theory-heavy jazz sources over jazz news)

GET /browse-tasks (forage-server)

Returns a BrowsePlan as JSON for user 1:

{
  "should_run": true,
  "interval_minutes": 30,
  "limit_per_topic": 5,
  "tag_hints": ["modal jazz", "improvisation", "music theory"],
  "topics": [
    { "name": "jazz",       "priority": 0.72, "sources": ["pitchfork.com/reviews/albums", "allaboutjazz.com/news"] },
    { "name": "technology", "priority": 0.51, "sources": ["news.ycombinator.com", "lobste.rs"] },
    { "name": "science",    "priority": 0.28, "sources": ["phys.org"] }
  ]
}

tag_hints comes from the top tags across the user's positively-signaled items (save + dwell ≥15s). The agent uses these to bias which articles it chooses to read in depth within each source — skipping news roundups and prioritizing analysis pieces that match the hints.

Cold-start response: all 8 categories at equal priority: 0.125, 2 sources each, tag_hints: [].

POST /discovery/heartbeat (forage-server)

Agent calls this on every cycle start. Server records agent_last_seen timestamp. Used by feed page to show connection status.

GET /discovery/status (forage-server)

{
  "agent_connected": true,
  "last_discovery_at": "2026-02-24T10:30:00Z",
  "items_found_last_run": 23,
  "next_run_in_minutes": 12
}

agent_connected: true when agent_last_seen is within the last 5 minutes.

Discovery state in AppState

pub struct DiscoveryState {
    pub last_discovery_at: Mutex<Option<std::time::Instant>>,
    pub agent_last_seen:   Mutex<Option<std::time::Instant>>,
    pub items_last_run:    Mutex<u32>,
}

Added to AppState alongside engine and events. Handlers update it; feed page polls it.

Feed page status indicator

A small status bar below the header:

  • ● Active — last run 4 min ago (green dot) — agent_connected: true
  • ○ Agent not connected (grey dot) — no heartbeat in 5 min
  • ⟳ Discovering... (spinning) — between heartbeat and items appearing

Enriched capture payload

ForageItemInput and POST /capture gain Claude-specific fields:

pub struct ForageItemInput {
    pub url: String,
    pub title: String,
    pub source: String,
    pub category: String,
    pub reading_time_min: u32,
    pub description: String,
    // Claude-enriched fields (all optional; empty = not provided)
    pub tags: Vec<String>,       // specific subtopics: ["modal jazz", "music theory", "john coltrane"]
    pub entities: Vec<String>,   // named entities: ["John Coltrane", "Blue Note Records"]
    pub content_type: String,    // "analysis" | "news" | "tutorial" | "opinion" | "review" | "interview" | "research" | ""
    pub summary: String,         // Claude's 2-sentence summary of what the article actually says
}

All fields serialized into item metadata storage. tags stored as "tags" (comma-separated string) so existing metadata retrieval works without schema changes.

CaptureReq in handlers.rs gains the same optional fields with #[serde(default)].

Tag affinity in ForageEngine

top_tags(user_id, limit) -> Vec<String> — scans metadata of positively-signaled items (save + strong dwell), splits the "tags" metadata field, returns the top-N by frequency. Used to populate tag_hints in BrowsePlan.

No schema changes to tidalDB. Tag affinity runs entirely over item metadata; the preference vector stays 8-dimensional and tracks category-level signal. Tags are a secondary signal that guides the agent's article selection within a source, not a replacement for the embedding.

Enriched feed cards

Feed cards gain three new display elements:

  • Tag chips — top 3 tags from the item, rendered as small outlined badges below the description. Tap a tag → future feed cards filtered/boosted for that tag (stored as a localStorage tag preference that biases the next /browse-tasks call via a ?prefer_tags= query param)
  • Content type badge — right of the category chip, distinct color: analysis (blue), tutorial (green), news (grey), opinion (amber), review (purple)
  • Claude summary — shown instead of the meta description when non-empty; clearly signals what Claude learned from reading, not what the page says about itself

The discovery agent prompt

A file at applications/forage/agent.md — the instruction set Claude runs with --chrome.

Core loop:

  1. GET localhost:4242/browse-tasks
  2. If should_run: false → wait interval_minutes, repeat from step 1
  3. POST localhost:4242/discovery/heartbeat
  4. For each topic (in priority order):
    • For each source URL:
      • Navigate to the source page
      • Find article links on the page (exclude nav, footer, sidebar links; prefer main content area)
      • Select up to limit_per_topic articles that appear relevant to tag_hints (if hints are present); prefer depth over breadth — analysis and tutorial pieces over news roundups
      • For each selected article:
        • Navigate to the article
        • Read the full page text (get_page_text)
        • Analyse:
          • title — headline (from <h1> if better than <title>)
          • canonical_url — from <link rel="canonical">
          • reading_time_min — word count ÷ 200, rounded up
          • tags — 25 specific subtopic tags, lowercase, singular nouns or short phrases (e.g. "modal jazz" not "jazz")
          • entities — up to 5 named people, companies, technologies, or places central to the article
          • content_type — one of: analysis, news, tutorial, opinion, review, interview, research
          • summary — 2 sentences: what the article argues or reports, not what the site says about it
        • Skip if: title is empty, contains "Sign In" / "Subscribe" / "Login" / "Create Account", or URL is localhost / chrome://
        • POST localhost:4242/capture with all enriched fields
        • Wait 12 seconds (politeness)
  5. Wait interval_minutes minutes
  6. Repeat

What Claude must NOT do: summarise the meta description. The point is that Claude reads the article and describes what it actually contains. A meta description that says "Read our latest article on jazz" is useless. Claude's summary should say "Argues that Coltrane's 1965 transition to free jazz was less a rejection of hard bop than an extension of it into harmonic territory bebop had not explored."

Invocation

One shell script at repo root:

#!/usr/bin/env bash
# forage-discover.sh — start the Forage discovery agent
# Prerequisites: forage-server running at localhost:4242, claude CLI with --chrome support
claude --chrome "$(cat applications/forage/agent.md)"

User starts the system with two terminal tabs:

# Tab 1 — server
cargo run -p forage-server --manifest-path applications/forage/server/Cargo.toml

# Tab 2 — agent
./forage-discover.sh

Then opens localhost:4242 and reacts.


Edge cases

Situation Handled by
Cold start (no prefs, no tags) Equal weight all 8 categories, tag_hints: [], agent reads broadly
Agent not running Feed shows "Agent not connected"; should_run: true stays set
Navigation 404 / timeout Agent skips to next URL, cycle continues
Paywall / login page Agent skips on title check ("Sign In", blank, "Subscribe")
Empty title POST /capture returns 400; agent skips
Duplicate URL add_item idempotent via FNV-1a; same ID, no duplicate; enrichment not re-written
Feed sparse (< 5 items) should_run: true overrides interval immediately
Two agents running Both browse; idempotent captures; harmless double coverage
Server restart last_discovery_at resets to null; agent runs on next cycle
Prefs shift mid-cycle Current cycle finishes with old plan; next call picks up new weights
Claude context grows Agent processes one topic at a time, not all sources in one turn
Rate limit (HTTP 429) Agent skips source, logs, continues to next
Article has no meta description Agent uses first paragraph or derives from full read; summary field carries real content
Tags on first article in a new category Tags come from Claude's reading, not from existing tag history; tag affinity starts building immediately
User taps tag chip on feed card Stored as localStorage tag preference; next /browse-tasks?prefer_tags=modal+jazz biases hints
Item enrichment fails (Claude unsure) Fields default to empty string / empty array; POST /capture still succeeds; card renders with basic metadata only

What this does NOT build

  • A Chrome Web Store extension
  • Server-push to Claude (server cannot initiate Claude actions; Claude polls)
  • Per-user discovery (single user, user 1, as per multi-user scope constraint)
  • Configurable source lists via UI (source registry is hardcoded for P1)

Acceptance criteria

  1. Two terminal commands start the full system: one for the server, one for the agent
  2. Within 5 minutes of starting, the feed contains ≥10 real articles discovered by Claude
  3. Items appear in the feed in real-time via SSE as Claude captures them (no manual refresh)
  4. Feed page shows ● Active status while the agent is running
  5. ≥80% of captured items have non-empty tags, content_type, and summary fields — Claude is analysing, not just extracting
  6. At least one feed card's summary is observably different from and more informative than its meta description
  7. After saving ≥5 items tagged "modal jazz", the next /browse-tasks response includes "modal jazz" in tag_hints
  8. After 20 user reactions (≥5 saves on jazz items), the next /browse-tasks response ranks jazz sources first
  9. A navigation failure (404, timeout) during a discovery cycle does not crash the agent or the server
  10. Re-running the agent after a server restart re-populates the feed within one cycle (items already in DB are not re-added)

Phase 2 — Real Embeddings

Goal: Semantic search and similarity-based recommendations over content Forage actually finds.

What changes:

A thin embedding sidecar (separate process, any language):

POST /embed  { text: string } → { vector: f32[1536] }

Default: OpenAI text-embedding-3-small. Swappable. Forage calls this when writing new items.

The text embedded is now significantly richer than P0's title-only approach. With Claude's enrichment from P1 available, the embedder receives:

{title} — {summary}. Topics: {tags}. Entities: {entities}.

This embeds Claude's understanding of the article, not the page's self-description. The preference centroid that emerges is a semantic model of what the user actually engages with.

With real embeddings:

  • SearchBuilder::semantic("jazz theory") works for real
  • SearchBuilder::similar_to(item_id) produces genuine similarity
  • Preference vectors actually mean something — they are in embedding space

The feed profile adds:

  • semantic_boost: 0.3 — items semantically near preference centroid score higher
  • similar_to_saved: true — items near saved items get boosted

Proves: The preference vector is not just a signal frequency map — it is a semantic model of what the user cares about, queryable by meaning.


Phase 3 — Adaptive MAB

Goal: The exploration budget adapts per-user based on their exploration-hit history.

What changes:

Track per-user: exploration_hits / exploration_total → hit rate.

if hit_rate > 0.5:  exploration_ratio = 0.25  (adventurous user)
if hit_rate < 0.2:  exploration_ratio = 0.10  (convergent user)
else:               exploration_ratio = 0.14  (default)

UCB1 bonus on underexplored categories:

ucb_bonus = sqrt(2 * ln(total_signals) / category_signal_count)

Categories with few signals get a score boost, naturally surfacing exploration candidates higher.

Instrumentation persists per-user exploration outcomes (exploration_hits, exploration_total) and feeds adaptation logic.

Proves: The MAB is not static noise — it learns the user's exploration tolerance and adjusts. Power users feel the system getting bolder with them.


Phase 4 — The Surprise Moment

Goal: Cross-centroid discoveries emerge. Users find things at the intersection of two interests they did not know were related.

What changes:

Centroid intersection query:

centroids = top_2_active_centroids(user)
midpoint = (centroid_a.vector + centroid_b.vector) / 2
intersection_candidates = ANN(midpoint, limit=5)
inject 1 intersection candidate into every 7-item feed
label it: "bridge: {category_a} × {category_b}"

Over time, if intersection items hit consistently, they form a new centroid — the user's interests have genuinely merged into a new territory.

Proves: This is not a feature that could be added to a recommendation system after the fact. It is the natural consequence of having a semantic preference model that updates with the feedback loop. The "surprise moment" is an emergent property of the system working correctly.


What We Are Not Building

  • A Chrome extension we publish to the Chrome Web Store (P0 is Claude-driven, not user-installed)
  • A mobile app
  • Multi-user / server-hosted version (single user, local process)
  • Content moderation, NSFW filtering, language filtering
  • Payment, accounts, authentication
  • A scraper that violates robots.txt or rate limits

Phase 0 Acceptance Criteria

The P0 demo is complete when:

  1. cargo run -p forage-server --manifest-path applications/forage/server/Cargo.toml starts a server at localhost:4242
  2. The feed page loads with 7 items across ≥3 categories
  3. A user generates 10+ signals from the feed page (mix of views, skips, saves) while Claude observes before/after state
  4. After 10 signals, the feed has visibly shifted toward the signaled categories
  5. At least 1 item in the feed is labeled exploring (from the exploration budget)
  6. The signal-to-re-rank latency is < 200ms (measured by feed refresh after POST /signal)
  7. A second user (?user=2) with no signals gets a different, more exploratory feed than a user with 20+ signals

If these 7 criteria are met, the loop is closed and the thesis is proven.