- Add /v1/feed API endpoint with handler and tests - Remove health endpoint rate limiting (behind firewall, caused spurious 429s) - Add dashboard feed panel with list, row, empty state, and loading skeleton - Update home page to show feed instead of redirecting to skeptic - Improve API key auth middleware and DTO create/query params - Add OpenAPI conceptual guide (api-intro.md) with semaglutide examples - Add FindMyHealth application scaffolding (vision, architecture, prototypes) - Add FindMyHealth designer/writer and Aphoria founder-CEO agents - Update roadmap with current progress Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
163 lines
6.1 KiB
Markdown
163 lines
6.1 KiB
Markdown
# FindMyHealth: Technical Architecture
|
|
|
|
## The Automated Ingestion Pipeline
|
|
|
|
This architecture leverages the existing StemeDB backbone and shifts the focus from "database" to **"automated research engine."** The pipeline identifies what is trending, finds the evidence, and structures the Truth Lenses with minimal human effort.
|
|
|
|
---
|
|
|
|
## Pipeline Overview
|
|
|
|
```
|
|
[Watchtower] Trend detection & topic selection
|
|
|
|
|
v
|
|
[Harvester] Fan-out scraping across all tiers
|
|
|
|
|
v
|
|
[Extraction Cortex] LLM-powered claim extraction
|
|
|
|
|
v
|
|
[StemeDB Spine] Assertion storage, indexing, lens resolution
|
|
|
|
|
v
|
|
[Output Engine] Newsletter generation & premium alerts
|
|
```
|
|
|
|
---
|
|
|
|
## 1. The Watchtower (Trigger Layer)
|
|
|
|
A cron job (or serverless function) that identifies the "Subject" for the research sprint.
|
|
|
|
- **Google Trends API:** Filter for `Health` and `Science` categories with >50% breakout velocity.
|
|
- **Reddit Scraper:** Monitor `r/all`, `r/Biohackers`, and `r/Nootropics` for keyword frequency spikes.
|
|
- **PubMed RSS:** Watch for new publications in high-impact journals (NEJM, Lancet).
|
|
- **Output:** A `TopicID` (e.g., `magnesium_threonate_sleep`) sent to the Orchestrator.
|
|
|
|
## 2. The Harvester (Scraping Layer)
|
|
|
|
Once a topic is picked, the system fans out to gather the "Evidence Stack."
|
|
|
|
**Tier 0/1 (Regulatory/Clinical):**
|
|
- **PubMed/NCBI API:** Fetch abstracts of the top 5 most cited papers on the topic.
|
|
- **FDA/EMA Crawlers:** Search official label databases and Adverse Event reports.
|
|
|
|
**Tier 5 (Anecdotal):**
|
|
- **Reddit API:** Fetch the top 3 threads from the last 90 days.
|
|
- **Twitter/X API:** Sample recent high-engagement posts for sentiment signal.
|
|
|
|
**Output:** A collection of raw text blobs associated with the `TopicID`.
|
|
|
|
## 3. The Extraction Cortex (LLM Layer)
|
|
|
|
Raw text becomes **StemeDB Signed Assertions** via structural extraction using a high-reasoning model.
|
|
|
|
**Prompt Instruction:**
|
|
|
|
> "Extract every distinct claim regarding [Topic]. For each claim, identify: The Proposition (Subject-Predicate-Object), the Date of the claim, the Source Type, and the Confidence level of the author. Output as a JSON array of StemeDB assertions."
|
|
|
|
**Transformation Example:**
|
|
|
|
- **Input:** "Reddit user says: 'I took Mag Threonate and had vivid nightmares for a week.'"
|
|
- **Output:**
|
|
|
|
```json
|
|
{
|
|
"subject": "magnesium_threonate",
|
|
"predicate": "side_effect",
|
|
"object": "vivid_nightmares",
|
|
"source_class": 5,
|
|
"confidence": 0.9,
|
|
"timestamp": 1707340800,
|
|
"source_metadata": {"user": "u/jdoe", "platform": "reddit"}
|
|
}
|
|
```
|
|
|
|
## 4. The Spine (StemeDB Integration)
|
|
|
|
Extracted assertions are pushed into StemeDB.
|
|
|
|
- **Latticing:** StemeDB automatically indexes new assertions against the existing graph.
|
|
- **Lens Resolution:** The system runs a `SkepticLens` query. If the Tier 5 "Social" cluster deviates significantly from the Tier 0 "Regulatory" consensus, a **Conflict Flag** is raised.
|
|
|
|
## 5. The Output Engine (Newsletter/App)
|
|
|
|
The final layer converts database state into human-readable intelligence.
|
|
|
|
- **Automated Summarization:** An LLM reads the *resolved* state of StemeDB (the output of the Truth Lens) and writes a 200-word summary for the newsletter.
|
|
- **Alert Trigger:** If `ConflictScore > 0.8`, the system pushes a notification to Premium users: *"Emerging Signal: High volume of anecdotal reports for [Topic] contradicts clinical data."*
|
|
|
|
---
|
|
|
|
## Email Architecture: The Dual-Track System
|
|
|
|
Resend handles two tracks: **Transactional** (high-priority, immediate) and **Broadcast** (bulk, scheduled).
|
|
|
|
| Track | Type | Usage | Strategy |
|
|
|-------|------|-------|----------|
|
|
| **Track A: Transactional** | Individual API calls | Alerts, password resets, opt-ins | `resend.emails.send()` |
|
|
| **Track B: Broadcasts** | Batch/Audience API | Daily Evidence Pulse, Weekly Trends | `resend.broadcasts.create()` |
|
|
|
|
### Broadcast Engine (Newsletter)
|
|
|
|
```typescript
|
|
// /lib/email/broadcast.ts
|
|
import { Resend } from 'resend';
|
|
const resend = new Resend(process.env.RESEND_API_KEY);
|
|
|
|
export async function sendDailyDigest(topicData: any) {
|
|
await resend.broadcasts.create({
|
|
audienceId: process.env.FMH_AUDIENCE_ID!,
|
|
from: 'FindMyHealth Intel <digest@findmyhealth.com>',
|
|
subject: `[Evidence Alert] ${topicData.topic_name}: Signal Shift Detected`,
|
|
html: await render(DigestTemplate({ data: topicData })),
|
|
});
|
|
}
|
|
```
|
|
|
|
### Evidence Alert (Transactional)
|
|
|
|
```typescript
|
|
// /api/alerts/trigger.ts
|
|
const { data, error } = await resend.emails.send({
|
|
from: 'FindMyHealth Alerts <alerts@findmyhealth.com>',
|
|
to: user.email,
|
|
subject: `Urgent: New Research Conflict for ${substance}`,
|
|
react: AlertTemplate({ substance, conflictDetails }),
|
|
tags: [{ name: 'category', value: 'conflict_alert' }],
|
|
});
|
|
```
|
|
|
|
### Subscriber Flow
|
|
|
|
1. **Opt-in:** User signs up on the homepage.
|
|
2. **Double Opt-in (Mandatory):** Transactional email with unique verification link.
|
|
3. **Audience Sync:** Add to Resend `Audience` only after verification click.
|
|
4. **Tagging:** Add metadata (e.g., `specialty: "oncology"`, `tier: "premium"`) for segmented broadcasts.
|
|
|
|
### Webhook Feedback Loop
|
|
|
|
- **`email.bounced`:** Mark user as `inactive` to protect sender reputation.
|
|
- **`email.clicked`:** Track which Truth Tiers users engage with to inform ingestion priority.
|
|
|
|
### Deliverability Checklist
|
|
|
|
- **Subdomain Isolation:** `digest.findmyhealth.com` for newsletters, `auth.findmyhealth.com` for transactional.
|
|
- **DKIM/SPF:** Authenticate via Resend DNS settings.
|
|
- **Plain Text Fallback:** Always include a `text` version via `@react-email/components`.
|
|
- **Batching:** Use `resend.batch.send()` (up to 100 per call) to stay under rate limits.
|
|
|
|
---
|
|
|
|
## Tech Stack
|
|
|
|
| Component | Technology | Why |
|
|
|-----------|------------|-----|
|
|
| **Orchestrator** | Temporal.io or Node-RED | Long-running, retriable scraping workflows |
|
|
| **Scrapers** | Firecrawl or Apify | Turns complex websites into LLM-ready Markdown |
|
|
| **The Cortex** | Claude API | Best-in-class at complex JSON extraction schemas |
|
|
| **The Spine** | StemeDB | Custom probabilistic knowledge graph |
|
|
| **Frontend** | Next.js + Tailwind | Fast, SEO-friendly, Linear/Stripe aesthetic |
|
|
| **Email** | Resend + React Email | Developer-first, compliance built-in |
|