# Latent: Implementation Roadmap (Solo Engineer) **Status:** Phase 1 Complete. Phase 2 In Progress. --- ## Phase 1: The "Semaglutide" Vertical (COMPLETED) *Goal: End-to-end signal detection for one drug family.* ### Week 1: Tier 0 (Regulatory) Ground Truth - [x] **Infrastructure:** Set up a local StemeDB instance. - [x] **Source:** OpenFDA API (Free, JSON). - [x] **Ingestor:** Build `latent-ingest-fda` (Rust/Python) to fetch labels for: `Semaglutide`, `Tirzepatide`, `Liraglutide`. - [x] **Extract:** Parse "Adverse Reactions" section into Assertions. - [x] **Output:** A graph with the "Official Truth" for 3 drugs. ### Week 2: Tier 5 (Social) Noise - [x] **Source:** Reddit (Manual API script or Apify if budget permits). - [x] **Targets:** `/r/Ozempic`, `/r/Mounjaro`. - [x] **Ingestor:** Build `latent-ingest-reddit` to fetch last 30 days of posts. - [x] **Filter:** Simple keyword matching: `stomach`, `paralysis`, `vomit`, `hair loss`. - [x] **Extract:** Use `gpt-4o-mini` to turn matched posts into Assertions. ### Week 3: The Divergence Engine - [x] **Logic:** Implement the "Skeptic Lens" query in StemeDB. - [x] **Algorithm:** Compare Tier 0 (Official) vs Tier 5 (Social). - [x] **Scoring:** Calculate divergence score based on frequency of Social clusters vs presence in Tier 0. ### Week 4: The Minimal Dashboard - [x] **UI:** Simple Next.js page showing the "Semaglutide Conflict Heatmap". - [x] **Ship:** Deployed local prototype with sample data. - **Milestone:** A working URL showing "Reddit hates Ozempic's side effects more than the FDA does." --- ## Phase 2: Expansion & Hardening (Weeks 5-8) *Goal: Add credibility and history.* ### Week 5: Tier 1 (Clinical) Context - [ ] **Source:** ClinicalTrials.gov API (Free). - [ ] **Ingestor:** Fetch completed trials for target drugs. - [ ] **Data:** Extract "Serious Adverse Events" tables. - [ ] **Value:** Now you can show "Reddit vs. Trials" conflicts (stronger than just Reddit vs. Label). ### Week 6: Time Travel (Backfilling) - [ ] **Backfill:** Scrape Reddit back to 2021. - [ ] **History:** Ingest historical FDA labels (from DailyMed archives). - [ ] **Analysis:** Generate the "Knowledge Lag" chart. Prove that Latent *would have* predicted the gastroparesis warning. ### Week 7: The Daily Cron - [ ] **Automation:** Move scripts to a cron job/temporal workflow. - [ ] **Alerting:** Simple email/Discord alert when Divergence Score spikes. ### Week 8: Marketing The Signal - [ ] **Artifact:** Write a blog post: "How Latent predicted the Ozempic warnings 6 months early." - [ ] **Outreach:** Send the report to 10 BioTech Hedge Funds. --- ## Phase 3: Commercialization (Weeks 9-12) *Goal: First paying customer.* - [ ] **Expansion:** Add 5 more high-volatility drugs (e.g., Alzheimer's, new Oncology). - [ ] **Polish:** Clean up the UI. Add export to CSV. - [ ] **Sales:** Demo the "Alpha Signal" to investors. --- ## "Solo Scraper" Tech Stack *Cheap, resilient, manageable.* * **Language:** Python (for scraping/NLP), Rust (for StemeDB). * **Database:** SQLite (local cache) -> StemeDB (Graph). * **Proxies:** BrightData (Pay-as-you-go) or ScraperAPI. *Only use when strictly necessary.* * **Orchestration:** Simple `systemd` timers or a lightweight Go scheduler. No Kubernetes. * **Compute:** One robust server (64GB RAM, plenty of cores) running everything.