Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3.3 KiB
3.3 KiB
Latent: Implementation Roadmap (Solo Engineer)
Status: Phase 1 Complete. Phase 2 In Progress.
Phase 1: The "Semaglutide" Vertical (COMPLETED)
Goal: End-to-end signal detection for one drug family.
Week 1: Tier 0 (Regulatory) Ground Truth
- Infrastructure: Set up a local StemeDB instance.
- Source: OpenFDA API (Free, JSON).
- Ingestor: Build
latent-ingest-fda(Rust/Python) to fetch labels for:Semaglutide,Tirzepatide,Liraglutide. - Extract: Parse "Adverse Reactions" section into Assertions.
- Output: A graph with the "Official Truth" for 3 drugs.
Week 2: Tier 5 (Social) Noise
- Source: Reddit (Manual API script or Apify if budget permits).
- Targets:
/r/Ozempic,/r/Mounjaro. - Ingestor: Build
latent-ingest-redditto fetch last 30 days of posts. - Filter: Simple keyword matching:
stomach,paralysis,vomit,hair loss. - Extract: Use
gpt-4o-minito turn matched posts into Assertions.
Week 3: The Divergence Engine
- Logic: Implement the "Skeptic Lens" query in StemeDB.
- Algorithm: Compare Tier 0 (Official) vs Tier 5 (Social).
- Scoring: Calculate divergence score based on frequency of Social clusters vs presence in Tier 0.
Week 4: The Minimal Dashboard
- UI: Simple Next.js page showing the "Semaglutide Conflict Heatmap".
- Ship: Deployed local prototype with sample data.
- Milestone: A working URL showing "Reddit hates Ozempic's side effects more than the FDA does."
Phase 2: Expansion & Hardening (Weeks 5-8)
Goal: Add credibility and history.
Week 5: Tier 1 (Clinical) Context
- Source: ClinicalTrials.gov API (Free).
- Ingestor: Fetch completed trials for target drugs.
- Data: Extract "Serious Adverse Events" tables.
- Value: Now you can show "Reddit vs. Trials" conflicts (stronger than just Reddit vs. Label).
Week 6: Time Travel (Backfilling)
- Backfill: Scrape Reddit back to 2021.
- History: Ingest historical FDA labels (from DailyMed archives).
- Analysis: Generate the "Knowledge Lag" chart. Prove that Latent would have predicted the gastroparesis warning.
Week 7: The Daily Cron
- Automation: Move scripts to a cron job/temporal workflow.
- Alerting: Simple email/Discord alert when Divergence Score spikes.
Week 8: Marketing The Signal
- Artifact: Write a blog post: "How Latent predicted the Ozempic warnings 6 months early."
- Outreach: Send the report to 10 BioTech Hedge Funds.
Phase 3: Commercialization (Weeks 9-12)
Goal: First paying customer.
- Expansion: Add 5 more high-volatility drugs (e.g., Alzheimer's, new Oncology).
- Polish: Clean up the UI. Add export to CSV.
- Sales: Demo the "Alpha Signal" to investors.
"Solo Scraper" Tech Stack
Cheap, resilient, manageable.
- Language: Python (for scraping/NLP), Rust (for StemeDB).
- Database: SQLite (local cache) -> StemeDB (Graph).
- Proxies: BrightData (Pay-as-you-go) or ScraperAPI. Only use when strictly necessary.
- Orchestration: Simple
systemdtimers or a lightweight Go scheduler. No Kubernetes. - Compute: One robust server (64GB RAM, plenty of cores) running everything.