# FindMyHealth: Technical Architecture ## The Automated Ingestion Pipeline This architecture leverages the existing StemeDB backbone and shifts the focus from "database" to **"automated research engine."** The pipeline identifies what is trending, finds the evidence, and structures the Truth Lenses with minimal human effort. --- ## Pipeline Overview ``` [Watchtower] Trend detection & topic selection | v [Harvester] Fan-out scraping across all tiers | v [Extraction Cortex] LLM-powered claim extraction | v [StemeDB Spine] Assertion storage, indexing, lens resolution | v [Output Engine] Newsletter generation & premium alerts ``` --- ## 1. The Watchtower (Trigger Layer) A cron job (or serverless function) that identifies the "Subject" for the research sprint. - **Google Trends API:** Filter for `Health` and `Science` categories with >50% breakout velocity. - **Reddit Scraper:** Monitor `r/all`, `r/Biohackers`, and `r/Nootropics` for keyword frequency spikes. - **PubMed RSS:** Watch for new publications in high-impact journals (NEJM, Lancet). - **Output:** A `TopicID` (e.g., `magnesium_threonate_sleep`) sent to the Orchestrator. ## 2. The Harvester (Scraping Layer) Once a topic is picked, the system fans out to gather the "Evidence Stack." **Tier 0/1 (Regulatory/Clinical):** - **PubMed/NCBI API:** Fetch abstracts of the top 5 most cited papers on the topic. - **FDA/EMA Crawlers:** Search official label databases and Adverse Event reports. **Tier 5 (Anecdotal):** - **Reddit API:** Fetch the top 3 threads from the last 90 days. - **Twitter/X API:** Sample recent high-engagement posts for sentiment signal. **Output:** A collection of raw text blobs associated with the `TopicID`. ## 3. The Extraction Cortex (LLM Layer) Raw text becomes **StemeDB Signed Assertions** via structural extraction using a high-reasoning model. **Prompt Instruction:** > "Extract every distinct claim regarding [Topic]. For each claim, identify: The Proposition (Subject-Predicate-Object), the Date of the claim, the Source Type, and the Confidence level of the author. Output as a JSON array of StemeDB assertions." **Transformation Example:** - **Input:** "Reddit user says: 'I took Mag Threonate and had vivid nightmares for a week.'" - **Output:** ```json { "subject": "magnesium_threonate", "predicate": "side_effect", "object": "vivid_nightmares", "source_class": 5, "confidence": 0.9, "timestamp": 1707340800, "source_metadata": {"user": "u/jdoe", "platform": "reddit"} } ``` ## 4. The Spine (StemeDB Integration) Extracted assertions are pushed into StemeDB. - **Latticing:** StemeDB automatically indexes new assertions against the existing graph. - **Lens Resolution:** The system runs a `SkepticLens` query. If the Tier 5 "Social" cluster deviates significantly from the Tier 0 "Regulatory" consensus, a **Conflict Flag** is raised. ## 5. The Output Engine (Newsletter/App) The final layer converts database state into human-readable intelligence. - **Automated Summarization:** An LLM reads the *resolved* state of StemeDB (the output of the Truth Lens) and writes a 200-word summary for the newsletter. - **Alert Trigger:** If `ConflictScore > 0.8`, the system pushes a notification to Premium users: *"Emerging Signal: High volume of anecdotal reports for [Topic] contradicts clinical data."* --- ## Email Architecture: The Dual-Track System Resend handles two tracks: **Transactional** (high-priority, immediate) and **Broadcast** (bulk, scheduled). | Track | Type | Usage | Strategy | |-------|------|-------|----------| | **Track A: Transactional** | Individual API calls | Alerts, password resets, opt-ins | `resend.emails.send()` | | **Track B: Broadcasts** | Batch/Audience API | Daily Evidence Pulse, Weekly Trends | `resend.broadcasts.create()` | ### Broadcast Engine (Newsletter) ```typescript // /lib/email/broadcast.ts import { Resend } from 'resend'; const resend = new Resend(process.env.RESEND_API_KEY); export async function sendDailyDigest(topicData: any) { await resend.broadcasts.create({ audienceId: process.env.FMH_AUDIENCE_ID!, from: 'FindMyHealth Intel ', subject: `[Evidence Alert] ${topicData.topic_name}: Signal Shift Detected`, html: await render(DigestTemplate({ data: topicData })), }); } ``` ### Evidence Alert (Transactional) ```typescript // /api/alerts/trigger.ts const { data, error } = await resend.emails.send({ from: 'FindMyHealth Alerts ', to: user.email, subject: `Urgent: New Research Conflict for ${substance}`, react: AlertTemplate({ substance, conflictDetails }), tags: [{ name: 'category', value: 'conflict_alert' }], }); ``` ### Subscriber Flow 1. **Opt-in:** User signs up on the homepage. 2. **Double Opt-in (Mandatory):** Transactional email with unique verification link. 3. **Audience Sync:** Add to Resend `Audience` only after verification click. 4. **Tagging:** Add metadata (e.g., `specialty: "oncology"`, `tier: "premium"`) for segmented broadcasts. ### Webhook Feedback Loop - **`email.bounced`:** Mark user as `inactive` to protect sender reputation. - **`email.clicked`:** Track which Truth Tiers users engage with to inform ingestion priority. ### Deliverability Checklist - **Subdomain Isolation:** `digest.findmyhealth.com` for newsletters, `auth.findmyhealth.com` for transactional. - **DKIM/SPF:** Authenticate via Resend DNS settings. - **Plain Text Fallback:** Always include a `text` version via `@react-email/components`. - **Batching:** Use `resend.batch.send()` (up to 100 per call) to stay under rate limits. --- ## Tech Stack | Component | Technology | Why | |-----------|------------|-----| | **Orchestrator** | Temporal.io or Node-RED | Long-running, retriable scraping workflows | | **Scrapers** | Firecrawl or Apify | Turns complex websites into LLM-ready Markdown | | **The Cortex** | Claude API | Best-in-class at complex JSON extraction schemas | | **The Spine** | StemeDB | Custom probabilistic knowledge graph | | **Frontend** | Next.js + Tailwind | Fast, SEO-friendly, Linear/Stripe aesthetic | | **Email** | Resend + React Email | Developer-first, compliance built-in |