stemedb/applications/findmyhealth/architecture.md
jordan 58594bc7b9 feat: add feed endpoint, dashboard feed panel, and FindMyHealth app
- Add /v1/feed API endpoint with handler and tests
- Remove health endpoint rate limiting (behind firewall, caused spurious 429s)
- Add dashboard feed panel with list, row, empty state, and loading skeleton
- Update home page to show feed instead of redirecting to skeptic
- Improve API key auth middleware and DTO create/query params
- Add OpenAPI conceptual guide (api-intro.md) with semaglutide examples
- Add FindMyHealth application scaffolding (vision, architecture, prototypes)
- Add FindMyHealth designer/writer and Aphoria founder-CEO agents
- Update roadmap with current progress

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 17:16:17 -07:00

163 lines
6.1 KiB
Markdown

# FindMyHealth: Technical Architecture
## The Automated Ingestion Pipeline
This architecture leverages the existing StemeDB backbone and shifts the focus from "database" to **"automated research engine."** The pipeline identifies what is trending, finds the evidence, and structures the Truth Lenses with minimal human effort.
---
## Pipeline Overview
```
[Watchtower] Trend detection & topic selection
|
v
[Harvester] Fan-out scraping across all tiers
|
v
[Extraction Cortex] LLM-powered claim extraction
|
v
[StemeDB Spine] Assertion storage, indexing, lens resolution
|
v
[Output Engine] Newsletter generation & premium alerts
```
---
## 1. The Watchtower (Trigger Layer)
A cron job (or serverless function) that identifies the "Subject" for the research sprint.
- **Google Trends API:** Filter for `Health` and `Science` categories with >50% breakout velocity.
- **Reddit Scraper:** Monitor `r/all`, `r/Biohackers`, and `r/Nootropics` for keyword frequency spikes.
- **PubMed RSS:** Watch for new publications in high-impact journals (NEJM, Lancet).
- **Output:** A `TopicID` (e.g., `magnesium_threonate_sleep`) sent to the Orchestrator.
## 2. The Harvester (Scraping Layer)
Once a topic is picked, the system fans out to gather the "Evidence Stack."
**Tier 0/1 (Regulatory/Clinical):**
- **PubMed/NCBI API:** Fetch abstracts of the top 5 most cited papers on the topic.
- **FDA/EMA Crawlers:** Search official label databases and Adverse Event reports.
**Tier 5 (Anecdotal):**
- **Reddit API:** Fetch the top 3 threads from the last 90 days.
- **Twitter/X API:** Sample recent high-engagement posts for sentiment signal.
**Output:** A collection of raw text blobs associated with the `TopicID`.
## 3. The Extraction Cortex (LLM Layer)
Raw text becomes **StemeDB Signed Assertions** via structural extraction using a high-reasoning model.
**Prompt Instruction:**
> "Extract every distinct claim regarding [Topic]. For each claim, identify: The Proposition (Subject-Predicate-Object), the Date of the claim, the Source Type, and the Confidence level of the author. Output as a JSON array of StemeDB assertions."
**Transformation Example:**
- **Input:** "Reddit user says: 'I took Mag Threonate and had vivid nightmares for a week.'"
- **Output:**
```json
{
"subject": "magnesium_threonate",
"predicate": "side_effect",
"object": "vivid_nightmares",
"source_class": 5,
"confidence": 0.9,
"timestamp": 1707340800,
"source_metadata": {"user": "u/jdoe", "platform": "reddit"}
}
```
## 4. The Spine (StemeDB Integration)
Extracted assertions are pushed into StemeDB.
- **Latticing:** StemeDB automatically indexes new assertions against the existing graph.
- **Lens Resolution:** The system runs a `SkepticLens` query. If the Tier 5 "Social" cluster deviates significantly from the Tier 0 "Regulatory" consensus, a **Conflict Flag** is raised.
## 5. The Output Engine (Newsletter/App)
The final layer converts database state into human-readable intelligence.
- **Automated Summarization:** An LLM reads the *resolved* state of StemeDB (the output of the Truth Lens) and writes a 200-word summary for the newsletter.
- **Alert Trigger:** If `ConflictScore > 0.8`, the system pushes a notification to Premium users: *"Emerging Signal: High volume of anecdotal reports for [Topic] contradicts clinical data."*
---
## Email Architecture: The Dual-Track System
Resend handles two tracks: **Transactional** (high-priority, immediate) and **Broadcast** (bulk, scheduled).
| Track | Type | Usage | Strategy |
|-------|------|-------|----------|
| **Track A: Transactional** | Individual API calls | Alerts, password resets, opt-ins | `resend.emails.send()` |
| **Track B: Broadcasts** | Batch/Audience API | Daily Evidence Pulse, Weekly Trends | `resend.broadcasts.create()` |
### Broadcast Engine (Newsletter)
```typescript
// /lib/email/broadcast.ts
import { Resend } from 'resend';
const resend = new Resend(process.env.RESEND_API_KEY);
export async function sendDailyDigest(topicData: any) {
await resend.broadcasts.create({
audienceId: process.env.FMH_AUDIENCE_ID!,
from: 'FindMyHealth Intel <digest@findmyhealth.com>',
subject: `[Evidence Alert] ${topicData.topic_name}: Signal Shift Detected`,
html: await render(DigestTemplate({ data: topicData })),
});
}
```
### Evidence Alert (Transactional)
```typescript
// /api/alerts/trigger.ts
const { data, error } = await resend.emails.send({
from: 'FindMyHealth Alerts <alerts@findmyhealth.com>',
to: user.email,
subject: `Urgent: New Research Conflict for ${substance}`,
react: AlertTemplate({ substance, conflictDetails }),
tags: [{ name: 'category', value: 'conflict_alert' }],
});
```
### Subscriber Flow
1. **Opt-in:** User signs up on the homepage.
2. **Double Opt-in (Mandatory):** Transactional email with unique verification link.
3. **Audience Sync:** Add to Resend `Audience` only after verification click.
4. **Tagging:** Add metadata (e.g., `specialty: "oncology"`, `tier: "premium"`) for segmented broadcasts.
### Webhook Feedback Loop
- **`email.bounced`:** Mark user as `inactive` to protect sender reputation.
- **`email.clicked`:** Track which Truth Tiers users engage with to inform ingestion priority.
### Deliverability Checklist
- **Subdomain Isolation:** `digest.findmyhealth.com` for newsletters, `auth.findmyhealth.com` for transactional.
- **DKIM/SPF:** Authenticate via Resend DNS settings.
- **Plain Text Fallback:** Always include a `text` version via `@react-email/components`.
- **Batching:** Use `resend.batch.send()` (up to 100 per call) to stay under rate limits.
---
## Tech Stack
| Component | Technology | Why |
|-----------|------------|-----|
| **Orchestrator** | Temporal.io or Node-RED | Long-running, retriable scraping workflows |
| **Scrapers** | Firecrawl or Apify | Turns complex websites into LLM-ready Markdown |
| **The Cortex** | Claude API | Best-in-class at complex JSON extraction schemas |
| **The Spine** | StemeDB | Custom probabilistic knowledge graph |
| **Frontend** | Next.js + Tailwind | Fast, SEO-friendly, Linear/Stripe aesthetic |
| **Email** | Resend + React Email | Developer-first, compliance built-in |