Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2.6 KiB
2.6 KiB
Latent: Data Sources Catalog
Latent maps the world's drug safety data into the StemeDB Source Class hierarchy.
Tier 0: Regulatory (The Ground Truth)
Static, authoritative, legally mandated.
| Source | Access Method | Update Frequency | Data Format |
|---|---|---|---|
| FDA Labels | OpenFDA API | Weekly | Structured JSON |
| EMA Post-Auth | Web Scraper / RSS | Monthly | PDF / HTML |
| DailyMed | NIH API / Bulk | Daily | SPL (XML) |
| PMDA (Japan) | Web Scraper | Quarterly | HTML (Japanese) |
Tier 1: Clinical (The Science)
Rigorous, peer-reviewed, baseline statistics.
| Source | Access Method | Data Points |
|---|---|---|
| ClinicalTrials.gov | CT.gov API v2 | Adverse Event Tables, Trial Status |
| EudraCT | Web Scraper | European Clinical Trial results |
| Registry Metadata | Crossref API | Publication status of completed trials |
Tier 2: Observational & Expert (The Narrative)
Case reports, specialist guidelines, real-world studies.
| Source | Access Method | Role |
|---|---|---|
| PubMed / MEDLINE | Entrez E-utilities | Case reports of rare adverse events |
| bioRxiv / medRxiv | API | Pre-print signals (Fast but unverified) |
| NICE Guidelines | Web Scraper | Standard of care changes |
Tier 4: Aggregated Community (The Volume)
Structured reports from non-regulatory sources.
| Source | Access Method | Role |
|---|---|---|
| FAERS | OpenFDA API | Public side-effect reporting (Noisy) |
| VAERS | OpenFDA API | Vaccine-specific adverse events |
| PatientsLikeMe | Web Scraper | Structured patient-reported outcomes |
Tier 5: Anecdotal (The Early Warning)
Unstructured, high-velocity, messy.
| Source | Access Method | Target Channels |
|---|---|---|
| Apify / Reddit API | r/Ozempic, r/Medicine, r/Biohackers | |
| Twitter / X | Apify | #MedTwitter, #PharmaSafety |
| TikTok | Web Scraper | Trending side-effect "storytimes" |
Ingestion Strategy
1. The "Golden Path" (High Confidence)
Automatic ingestion of Tier 0 and Tier 1 data. These sources are considered permanent and override all others in the Authority Lens.
2. The "Signal Path" (Predictive)
Clustering of Tier 5 data.
- Individual reports are ignored.
- Clusters (e.g., 50+ mentions of a symptom in 7 days) are promoted to "Latent Signals" and flagged for comparison against Tier 0.
3. Language Translation
Latent uses google-cloud-translate or local marian-nmt models to normalize Tier 0 data from the PMDA (Japan) and EMA (EU) into English assertions for global conflict detection.