# FindMyHealth: Technical Architecture

## The Automated Ingestion Pipeline

This architecture leverages the existing StemeDB backbone and shifts the focus from "database" to **"automated research engine."** The pipeline identifies what is trending, finds the evidence, and structures the Truth Lenses with minimal human effort.

---

## Pipeline Overview

```
[Watchtower]          Trend detection & topic selection
     |
     v
[Harvester]           Fan-out scraping across all tiers
     |
     v
[Extraction Cortex]   LLM-powered claim extraction
     |
     v
[StemeDB Spine]       Assertion storage, indexing, lens resolution
     |
     v
[Output Engine]       Newsletter generation & premium alerts
```

---

## 1. The Watchtower (Trigger Layer)

A cron job (or serverless function) that identifies the "Subject" for the research sprint.

- **Google Trends API:** Filter for `Health` and `Science` categories with >50% breakout velocity.
- **Reddit Scraper:** Monitor `r/all`, `r/Biohackers`, and `r/Nootropics` for keyword frequency spikes.
- **PubMed RSS:** Watch for new publications in high-impact journals (NEJM, Lancet).
- **Output:** A `TopicID` (e.g., `magnesium_threonate_sleep`) sent to the Orchestrator.

## 2. The Harvester (Scraping Layer)

Once a topic is picked, the system fans out to gather the "Evidence Stack."

**Tier 0/1 (Regulatory/Clinical):**
- **PubMed/NCBI API:** Fetch abstracts of the top 5 most cited papers on the topic.
- **FDA/EMA Crawlers:** Search official label databases and Adverse Event reports.

**Tier 5 (Anecdotal):**
- **Reddit API:** Fetch the top 3 threads from the last 90 days.
- **Twitter/X API:** Sample recent high-engagement posts for sentiment signal.

**Output:** A collection of raw text blobs associated with the `TopicID`.

## 3. The Extraction Cortex (LLM Layer)

Raw text becomes **StemeDB Signed Assertions** via structural extraction using a high-reasoning model.

**Prompt Instruction:**

> "Extract every distinct claim regarding [Topic]. For each claim, identify: The Proposition (Subject-Predicate-Object), the Date of the claim, the Source Type, and the Confidence level of the author. Output as a JSON array of StemeDB assertions."

**Transformation Example:**

- **Input:** "Reddit user says: 'I took Mag Threonate and had vivid nightmares for a week.'"
- **Output:**

```json
{
  "subject": "magnesium_threonate",
  "predicate": "side_effect",
  "object": "vivid_nightmares",
  "source_class": 5,
  "confidence": 0.9,
  "timestamp": 1707340800,
  "source_metadata": {"user": "u/jdoe", "platform": "reddit"}
}
```

## 4. The Spine (StemeDB Integration)

Extracted assertions are pushed into StemeDB.

- **Latticing:** StemeDB automatically indexes new assertions against the existing graph.
- **Lens Resolution:** The system runs a `SkepticLens` query. If the Tier 5 "Social" cluster deviates significantly from the Tier 0 "Regulatory" consensus, a **Conflict Flag** is raised.

## 5. The Output Engine (Newsletter/App)

The final layer converts database state into human-readable intelligence.

- **Automated Summarization:** An LLM reads the *resolved* state of StemeDB (the output of the Truth Lens) and writes a 200-word summary for the newsletter.
- **Alert Trigger:** If `ConflictScore > 0.8`, the system pushes a notification to Premium users: *"Emerging Signal: High volume of anecdotal reports for [Topic] contradicts clinical data."*

---

## Email Architecture: The Dual-Track System

Resend handles two tracks: **Transactional** (high-priority, immediate) and **Broadcast** (bulk, scheduled).

| Track | Type | Usage | Strategy |
|-------|------|-------|----------|
| **Track A: Transactional** | Individual API calls | Alerts, password resets, opt-ins | `resend.emails.send()` |
| **Track B: Broadcasts** | Batch/Audience API | Daily Evidence Pulse, Weekly Trends | `resend.broadcasts.create()` |

### Broadcast Engine (Newsletter)

```typescript
// /lib/email/broadcast.ts
import { Resend } from 'resend';
const resend = new Resend(process.env.RESEND_API_KEY);

export async function sendDailyDigest(topicData: any) {
  await resend.broadcasts.create({
    audienceId: process.env.FMH_AUDIENCE_ID!,
    from: 'FindMyHealth Intel <digest@findmyhealth.com>',
    subject: `[Evidence Alert] ${topicData.topic_name}: Signal Shift Detected`,
    html: await render(DigestTemplate({ data: topicData })),
  });
}
```

### Evidence Alert (Transactional)

```typescript
// /api/alerts/trigger.ts
const { data, error } = await resend.emails.send({
  from: 'FindMyHealth Alerts <alerts@findmyhealth.com>',
  to: user.email,
  subject: `Urgent: New Research Conflict for ${substance}`,
  react: AlertTemplate({ substance, conflictDetails }),
  tags: [{ name: 'category', value: 'conflict_alert' }],
});
```

### Subscriber Flow

1. **Opt-in:** User signs up on the homepage.
2. **Double Opt-in (Mandatory):** Transactional email with unique verification link.
3. **Audience Sync:** Add to Resend `Audience` only after verification click.
4. **Tagging:** Add metadata (e.g., `specialty: "oncology"`, `tier: "premium"`) for segmented broadcasts.

### Webhook Feedback Loop

- **`email.bounced`:** Mark user as `inactive` to protect sender reputation.
- **`email.clicked`:** Track which Truth Tiers users engage with to inform ingestion priority.

### Deliverability Checklist

- **Subdomain Isolation:** `digest.findmyhealth.com` for newsletters, `auth.findmyhealth.com` for transactional.
- **DKIM/SPF:** Authenticate via Resend DNS settings.
- **Plain Text Fallback:** Always include a `text` version via `@react-email/components`.
- **Batching:** Use `resend.batch.send()` (up to 100 per call) to stay under rate limits.

---

## Tech Stack

| Component | Technology | Why |
|-----------|------------|-----|
| **Orchestrator** | Temporal.io or Node-RED | Long-running, retriable scraping workflows |
| **Scrapers** | Firecrawl or Apify | Turns complex websites into LLM-ready Markdown |
| **The Cortex** | Claude API | Best-in-class at complex JSON extraction schemas |
| **The Spine** | StemeDB | Custom probabilistic knowledge graph |
| **Frontend** | Next.js + Tailwind | Fast, SEO-friendly, Linear/Stripe aesthetic |
| **Email** | Resend + React Email | Developer-first, compliance built-in |