stemedb/latent/roadmap.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

76 lines
3.3 KiB
Markdown

# Latent: Implementation Roadmap (Solo Engineer)
**Status:** Phase 1 Complete. Phase 2 In Progress.
---
## Phase 1: The "Semaglutide" Vertical (COMPLETED)
*Goal: End-to-end signal detection for one drug family.*
### Week 1: Tier 0 (Regulatory) Ground Truth
- [x] **Infrastructure:** Set up a local StemeDB instance.
- [x] **Source:** OpenFDA API (Free, JSON).
- [x] **Ingestor:** Build `latent-ingest-fda` (Rust/Python) to fetch labels for: `Semaglutide`, `Tirzepatide`, `Liraglutide`.
- [x] **Extract:** Parse "Adverse Reactions" section into Assertions.
- [x] **Output:** A graph with the "Official Truth" for 3 drugs.
### Week 2: Tier 5 (Social) Noise
- [x] **Source:** Reddit (Manual API script or Apify if budget permits).
- [x] **Targets:** `/r/Ozempic`, `/r/Mounjaro`.
- [x] **Ingestor:** Build `latent-ingest-reddit` to fetch last 30 days of posts.
- [x] **Filter:** Simple keyword matching: `stomach`, `paralysis`, `vomit`, `hair loss`.
- [x] **Extract:** Use `gpt-4o-mini` to turn matched posts into Assertions.
### Week 3: The Divergence Engine
- [x] **Logic:** Implement the "Skeptic Lens" query in StemeDB.
- [x] **Algorithm:** Compare Tier 0 (Official) vs Tier 5 (Social).
- [x] **Scoring:** Calculate divergence score based on frequency of Social clusters vs presence in Tier 0.
### Week 4: The Minimal Dashboard
- [x] **UI:** Simple Next.js page showing the "Semaglutide Conflict Heatmap".
- [x] **Ship:** Deployed local prototype with sample data.
- **Milestone:** A working URL showing "Reddit hates Ozempic's side effects more than the FDA does."
---
## Phase 2: Expansion & Hardening (Weeks 5-8)
*Goal: Add credibility and history.*
### Week 5: Tier 1 (Clinical) Context
- [ ] **Source:** ClinicalTrials.gov API (Free).
- [ ] **Ingestor:** Fetch completed trials for target drugs.
- [ ] **Data:** Extract "Serious Adverse Events" tables.
- [ ] **Value:** Now you can show "Reddit vs. Trials" conflicts (stronger than just Reddit vs. Label).
### Week 6: Time Travel (Backfilling)
- [ ] **Backfill:** Scrape Reddit back to 2021.
- [ ] **History:** Ingest historical FDA labels (from DailyMed archives).
- [ ] **Analysis:** Generate the "Knowledge Lag" chart. Prove that Latent *would have* predicted the gastroparesis warning.
### Week 7: The Daily Cron
- [ ] **Automation:** Move scripts to a cron job/temporal workflow.
- [ ] **Alerting:** Simple email/Discord alert when Divergence Score spikes.
### Week 8: Marketing The Signal
- [ ] **Artifact:** Write a blog post: "How Latent predicted the Ozempic warnings 6 months early."
- [ ] **Outreach:** Send the report to 10 BioTech Hedge Funds.
---
## Phase 3: Commercialization (Weeks 9-12)
*Goal: First paying customer.*
- [ ] **Expansion:** Add 5 more high-volatility drugs (e.g., Alzheimer's, new Oncology).
- [ ] **Polish:** Clean up the UI. Add export to CSV.
- [ ] **Sales:** Demo the "Alpha Signal" to investors.
---
## "Solo Scraper" Tech Stack
*Cheap, resilient, manageable.*
* **Language:** Python (for scraping/NLP), Rust (for StemeDB).
* **Database:** SQLite (local cache) -> StemeDB (Graph).
* **Proxies:** BrightData (Pay-as-you-go) or ScraperAPI. *Only use when strictly necessary.*
* **Orchestration:** Simple `systemd` timers or a lightweight Go scheduler. No Kubernetes.
* **Compute:** One robust server (64GB RAM, plenty of cores) running everything.