stemedb/applications/findmyhealth/architecture.md
jordan 58594bc7b9 feat: add feed endpoint, dashboard feed panel, and FindMyHealth app
- Add /v1/feed API endpoint with handler and tests
- Remove health endpoint rate limiting (behind firewall, caused spurious 429s)
- Add dashboard feed panel with list, row, empty state, and loading skeleton
- Update home page to show feed instead of redirecting to skeptic
- Improve API key auth middleware and DTO create/query params
- Add OpenAPI conceptual guide (api-intro.md) with semaglutide examples
- Add FindMyHealth application scaffolding (vision, architecture, prototypes)
- Add FindMyHealth designer/writer and Aphoria founder-CEO agents
- Update roadmap with current progress

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 17:16:17 -07:00

6.1 KiB

FindMyHealth: Technical Architecture

The Automated Ingestion Pipeline

This architecture leverages the existing StemeDB backbone and shifts the focus from "database" to "automated research engine." The pipeline identifies what is trending, finds the evidence, and structures the Truth Lenses with minimal human effort.


Pipeline Overview

[Watchtower]          Trend detection & topic selection
     |
     v
[Harvester]           Fan-out scraping across all tiers
     |
     v
[Extraction Cortex]   LLM-powered claim extraction
     |
     v
[StemeDB Spine]       Assertion storage, indexing, lens resolution
     |
     v
[Output Engine]       Newsletter generation & premium alerts

1. The Watchtower (Trigger Layer)

A cron job (or serverless function) that identifies the "Subject" for the research sprint.

  • Google Trends API: Filter for Health and Science categories with >50% breakout velocity.
  • Reddit Scraper: Monitor r/all, r/Biohackers, and r/Nootropics for keyword frequency spikes.
  • PubMed RSS: Watch for new publications in high-impact journals (NEJM, Lancet).
  • Output: A TopicID (e.g., magnesium_threonate_sleep) sent to the Orchestrator.

2. The Harvester (Scraping Layer)

Once a topic is picked, the system fans out to gather the "Evidence Stack."

Tier 0/1 (Regulatory/Clinical):

  • PubMed/NCBI API: Fetch abstracts of the top 5 most cited papers on the topic.
  • FDA/EMA Crawlers: Search official label databases and Adverse Event reports.

Tier 5 (Anecdotal):

  • Reddit API: Fetch the top 3 threads from the last 90 days.
  • Twitter/X API: Sample recent high-engagement posts for sentiment signal.

Output: A collection of raw text blobs associated with the TopicID.

3. The Extraction Cortex (LLM Layer)

Raw text becomes StemeDB Signed Assertions via structural extraction using a high-reasoning model.

Prompt Instruction:

"Extract every distinct claim regarding [Topic]. For each claim, identify: The Proposition (Subject-Predicate-Object), the Date of the claim, the Source Type, and the Confidence level of the author. Output as a JSON array of StemeDB assertions."

Transformation Example:

  • Input: "Reddit user says: 'I took Mag Threonate and had vivid nightmares for a week.'"
  • Output:
{
  "subject": "magnesium_threonate",
  "predicate": "side_effect",
  "object": "vivid_nightmares",
  "source_class": 5,
  "confidence": 0.9,
  "timestamp": 1707340800,
  "source_metadata": {"user": "u/jdoe", "platform": "reddit"}
}

4. The Spine (StemeDB Integration)

Extracted assertions are pushed into StemeDB.

  • Latticing: StemeDB automatically indexes new assertions against the existing graph.
  • Lens Resolution: The system runs a SkepticLens query. If the Tier 5 "Social" cluster deviates significantly from the Tier 0 "Regulatory" consensus, a Conflict Flag is raised.

5. The Output Engine (Newsletter/App)

The final layer converts database state into human-readable intelligence.

  • Automated Summarization: An LLM reads the resolved state of StemeDB (the output of the Truth Lens) and writes a 200-word summary for the newsletter.
  • Alert Trigger: If ConflictScore > 0.8, the system pushes a notification to Premium users: "Emerging Signal: High volume of anecdotal reports for [Topic] contradicts clinical data."

Email Architecture: The Dual-Track System

Resend handles two tracks: Transactional (high-priority, immediate) and Broadcast (bulk, scheduled).

Track Type Usage Strategy
Track A: Transactional Individual API calls Alerts, password resets, opt-ins resend.emails.send()
Track B: Broadcasts Batch/Audience API Daily Evidence Pulse, Weekly Trends resend.broadcasts.create()

Broadcast Engine (Newsletter)

// /lib/email/broadcast.ts
import { Resend } from 'resend';
const resend = new Resend(process.env.RESEND_API_KEY);

export async function sendDailyDigest(topicData: any) {
  await resend.broadcasts.create({
    audienceId: process.env.FMH_AUDIENCE_ID!,
    from: 'FindMyHealth Intel <digest@findmyhealth.com>',
    subject: `[Evidence Alert] ${topicData.topic_name}: Signal Shift Detected`,
    html: await render(DigestTemplate({ data: topicData })),
  });
}

Evidence Alert (Transactional)

// /api/alerts/trigger.ts
const { data, error } = await resend.emails.send({
  from: 'FindMyHealth Alerts <alerts@findmyhealth.com>',
  to: user.email,
  subject: `Urgent: New Research Conflict for ${substance}`,
  react: AlertTemplate({ substance, conflictDetails }),
  tags: [{ name: 'category', value: 'conflict_alert' }],
});

Subscriber Flow

  1. Opt-in: User signs up on the homepage.
  2. Double Opt-in (Mandatory): Transactional email with unique verification link.
  3. Audience Sync: Add to Resend Audience only after verification click.
  4. Tagging: Add metadata (e.g., specialty: "oncology", tier: "premium") for segmented broadcasts.

Webhook Feedback Loop

  • email.bounced: Mark user as inactive to protect sender reputation.
  • email.clicked: Track which Truth Tiers users engage with to inform ingestion priority.

Deliverability Checklist

  • Subdomain Isolation: digest.findmyhealth.com for newsletters, auth.findmyhealth.com for transactional.
  • DKIM/SPF: Authenticate via Resend DNS settings.
  • Plain Text Fallback: Always include a text version via @react-email/components.
  • Batching: Use resend.batch.send() (up to 100 per call) to stay under rate limits.

Tech Stack

Component Technology Why
Orchestrator Temporal.io or Node-RED Long-running, retriable scraping workflows
Scrapers Firecrawl or Apify Turns complex websites into LLM-ready Markdown
The Cortex Claude API Best-in-class at complex JSON extraction schemas
The Spine StemeDB Custom probabilistic knowledge graph
Frontend Next.js + Tailwind Fast, SEO-friendly, Linear/Stripe aesthetic
Email Resend + React Email Developer-first, compliance built-in