stemedb/latent/roadmap.md

# Latent: Implementation Roadmap (Solo Engineer)

**Status:** Phase 1 Complete. Phase 2 In Progress.

---

## Phase 1: The "Semaglutide" Vertical (COMPLETED)
*Goal: End-to-end signal detection for one drug family.*

### Week 1: Tier 0 (Regulatory) Ground Truth
- [x] **Infrastructure:** Set up a local StemeDB instance.
- [x] **Source:** OpenFDA API (Free, JSON).
- [x] **Ingestor:** Build `latent-ingest-fda` (Rust/Python) to fetch labels for: `Semaglutide`, `Tirzepatide`, `Liraglutide`.
- [x] **Extract:** Parse "Adverse Reactions" section into Assertions.
- [x] **Output:** A graph with the "Official Truth" for 3 drugs.

### Week 2: Tier 5 (Social) Noise
- [x] **Source:** Reddit (Manual API script or Apify if budget permits).
- [x] **Targets:** `/r/Ozempic`, `/r/Mounjaro`.
- [x] **Ingestor:** Build `latent-ingest-reddit` to fetch last 30 days of posts.
- [x] **Filter:** Simple keyword matching: `stomach`, `paralysis`, `vomit`, `hair loss`.
- [x] **Extract:** Use `gpt-4o-mini` to turn matched posts into Assertions.

### Week 3: The Divergence Engine
- [x] **Logic:** Implement the "Skeptic Lens" query in StemeDB.
- [x] **Algorithm:** Compare Tier 0 (Official) vs Tier 5 (Social).
- [x] **Scoring:** Calculate divergence score based on frequency of Social clusters vs presence in Tier 0.

### Week 4: The Minimal Dashboard
- [x] **UI:** Simple Next.js page showing the "Semaglutide Conflict Heatmap".
- [x] **Ship:** Deployed local prototype with sample data.
- **Milestone:** A working URL showing "Reddit hates Ozempic's side effects more than the FDA does."

---

## Phase 2: Expansion & Hardening (Weeks 5-8)
*Goal: Add credibility and history.*

### Week 5: Tier 1 (Clinical) Context
- [ ] **Source:** ClinicalTrials.gov API (Free).
- [ ] **Ingestor:** Fetch completed trials for target drugs.
- [ ] **Data:** Extract "Serious Adverse Events" tables.
- [ ] **Value:** Now you can show "Reddit vs. Trials" conflicts (stronger than just Reddit vs. Label).

### Week 6: Time Travel (Backfilling)
- [ ] **Backfill:** Scrape Reddit back to 2021.
- [ ] **History:** Ingest historical FDA labels (from DailyMed archives).
- [ ] **Analysis:** Generate the "Knowledge Lag" chart. Prove that Latent *would have* predicted the gastroparesis warning.

### Week 7: The Daily Cron
- [ ] **Automation:** Move scripts to a cron job/temporal workflow.
- [ ] **Alerting:** Simple email/Discord alert when Divergence Score spikes.

### Week 8: Marketing The Signal
- [ ] **Artifact:** Write a blog post: "How Latent predicted the Ozempic warnings 6 months early."
- [ ] **Outreach:** Send the report to 10 BioTech Hedge Funds.

---

## Phase 3: Commercialization (Weeks 9-12)
*Goal: First paying customer.*

- [ ] **Expansion:** Add 5 more high-volatility drugs (e.g., Alzheimer's, new Oncology).
- [ ] **Polish:** Clean up the UI. Add export to CSV.
- [ ] **Sales:** Demo the "Alpha Signal" to investors.

---

## "Solo Scraper" Tech Stack
*Cheap, resilient, manageable.*

*   **Language:** Python (for scraping/NLP), Rust (for StemeDB).
*   **Database:** SQLite (local cache) -> StemeDB (Graph).
*   **Proxies:** BrightData (Pay-as-you-go) or ScraperAPI. *Only use when strictly necessary.*
*   **Orchestration:** Simple `systemd` timers or a lightweight Go scheduler. No Kubernetes.
*   **Compute:** One robust server (64GB RAM, plenty of cores) running everything.