Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
76 lines
3.3 KiB
Markdown
76 lines
3.3 KiB
Markdown
# Latent: Implementation Roadmap (Solo Engineer)
|
|
|
|
**Status:** Phase 1 Complete. Phase 2 In Progress.
|
|
|
|
---
|
|
|
|
## Phase 1: The "Semaglutide" Vertical (COMPLETED)
|
|
*Goal: End-to-end signal detection for one drug family.*
|
|
|
|
### Week 1: Tier 0 (Regulatory) Ground Truth
|
|
- [x] **Infrastructure:** Set up a local StemeDB instance.
|
|
- [x] **Source:** OpenFDA API (Free, JSON).
|
|
- [x] **Ingestor:** Build `latent-ingest-fda` (Rust/Python) to fetch labels for: `Semaglutide`, `Tirzepatide`, `Liraglutide`.
|
|
- [x] **Extract:** Parse "Adverse Reactions" section into Assertions.
|
|
- [x] **Output:** A graph with the "Official Truth" for 3 drugs.
|
|
|
|
### Week 2: Tier 5 (Social) Noise
|
|
- [x] **Source:** Reddit (Manual API script or Apify if budget permits).
|
|
- [x] **Targets:** `/r/Ozempic`, `/r/Mounjaro`.
|
|
- [x] **Ingestor:** Build `latent-ingest-reddit` to fetch last 30 days of posts.
|
|
- [x] **Filter:** Simple keyword matching: `stomach`, `paralysis`, `vomit`, `hair loss`.
|
|
- [x] **Extract:** Use `gpt-4o-mini` to turn matched posts into Assertions.
|
|
|
|
### Week 3: The Divergence Engine
|
|
- [x] **Logic:** Implement the "Skeptic Lens" query in StemeDB.
|
|
- [x] **Algorithm:** Compare Tier 0 (Official) vs Tier 5 (Social).
|
|
- [x] **Scoring:** Calculate divergence score based on frequency of Social clusters vs presence in Tier 0.
|
|
|
|
### Week 4: The Minimal Dashboard
|
|
- [x] **UI:** Simple Next.js page showing the "Semaglutide Conflict Heatmap".
|
|
- [x] **Ship:** Deployed local prototype with sample data.
|
|
- **Milestone:** A working URL showing "Reddit hates Ozempic's side effects more than the FDA does."
|
|
|
|
---
|
|
|
|
## Phase 2: Expansion & Hardening (Weeks 5-8)
|
|
*Goal: Add credibility and history.*
|
|
|
|
### Week 5: Tier 1 (Clinical) Context
|
|
- [ ] **Source:** ClinicalTrials.gov API (Free).
|
|
- [ ] **Ingestor:** Fetch completed trials for target drugs.
|
|
- [ ] **Data:** Extract "Serious Adverse Events" tables.
|
|
- [ ] **Value:** Now you can show "Reddit vs. Trials" conflicts (stronger than just Reddit vs. Label).
|
|
|
|
### Week 6: Time Travel (Backfilling)
|
|
- [ ] **Backfill:** Scrape Reddit back to 2021.
|
|
- [ ] **History:** Ingest historical FDA labels (from DailyMed archives).
|
|
- [ ] **Analysis:** Generate the "Knowledge Lag" chart. Prove that Latent *would have* predicted the gastroparesis warning.
|
|
|
|
### Week 7: The Daily Cron
|
|
- [ ] **Automation:** Move scripts to a cron job/temporal workflow.
|
|
- [ ] **Alerting:** Simple email/Discord alert when Divergence Score spikes.
|
|
|
|
### Week 8: Marketing The Signal
|
|
- [ ] **Artifact:** Write a blog post: "How Latent predicted the Ozempic warnings 6 months early."
|
|
- [ ] **Outreach:** Send the report to 10 BioTech Hedge Funds.
|
|
|
|
---
|
|
|
|
## Phase 3: Commercialization (Weeks 9-12)
|
|
*Goal: First paying customer.*
|
|
|
|
- [ ] **Expansion:** Add 5 more high-volatility drugs (e.g., Alzheimer's, new Oncology).
|
|
- [ ] **Polish:** Clean up the UI. Add export to CSV.
|
|
- [ ] **Sales:** Demo the "Alpha Signal" to investors.
|
|
|
|
---
|
|
|
|
## "Solo Scraper" Tech Stack
|
|
*Cheap, resilient, manageable.*
|
|
|
|
* **Language:** Python (for scraping/NLP), Rust (for StemeDB).
|
|
* **Database:** SQLite (local cache) -> StemeDB (Graph).
|
|
* **Proxies:** BrightData (Pay-as-you-go) or ScraperAPI. *Only use when strictly necessary.*
|
|
* **Orchestration:** Simple `systemd` timers or a lightweight Go scheduler. No Kubernetes.
|
|
* **Compute:** One robust server (64GB RAM, plenty of cores) running everything. |