Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
63 lines
2.6 KiB
Markdown
63 lines
2.6 KiB
Markdown
# Latent: Data Sources Catalog
|
|
|
|
Latent maps the world's drug safety data into the StemeDB Source Class hierarchy.
|
|
|
|
## Tier 0: Regulatory (The Ground Truth)
|
|
*Static, authoritative, legally mandated.*
|
|
|
|
| Source | Access Method | Update Frequency | Data Format |
|
|
| :--- | :--- | :--- | :--- |
|
|
| **FDA Labels** | OpenFDA API | Weekly | Structured JSON |
|
|
| **EMA Post-Auth** | Web Scraper / RSS | Monthly | PDF / HTML |
|
|
| **DailyMed** | NIH API / Bulk | Daily | SPL (XML) |
|
|
| **PMDA (Japan)** | Web Scraper | Quarterly | HTML (Japanese) |
|
|
|
|
## Tier 1: Clinical (The Science)
|
|
*Rigorous, peer-reviewed, baseline statistics.*
|
|
|
|
| Source | Access Method | Data Points |
|
|
| :--- | :--- | :--- |
|
|
| **ClinicalTrials.gov** | CT.gov API v2 | Adverse Event Tables, Trial Status |
|
|
| **EudraCT** | Web Scraper | European Clinical Trial results |
|
|
| **Registry Metadata** | Crossref API | Publication status of completed trials |
|
|
|
|
## Tier 2: Observational & Expert (The Narrative)
|
|
*Case reports, specialist guidelines, real-world studies.*
|
|
|
|
| Source | Access Method | Role |
|
|
| :--- | :--- | :--- |
|
|
| **PubMed / MEDLINE** | Entrez E-utilities | Case reports of rare adverse events |
|
|
| **bioRxiv / medRxiv** | API | Pre-print signals (Fast but unverified) |
|
|
| **NICE Guidelines** | Web Scraper | Standard of care changes |
|
|
|
|
## Tier 4: Aggregated Community (The Volume)
|
|
*Structured reports from non-regulatory sources.*
|
|
|
|
| Source | Access Method | Role |
|
|
| :--- | :--- | :--- |
|
|
| **FAERS** | OpenFDA API | Public side-effect reporting (Noisy) |
|
|
| **VAERS** | OpenFDA API | Vaccine-specific adverse events |
|
|
| **PatientsLikeMe** | Web Scraper | Structured patient-reported outcomes |
|
|
|
|
## Tier 5: Anecdotal (The Early Warning)
|
|
*Unstructured, high-velocity, messy.*
|
|
|
|
| Source | Access Method | Target Channels |
|
|
| :--- | :--- | :--- |
|
|
| **Reddit** | Apify / Reddit API | r/Ozempic, r/Medicine, r/Biohackers |
|
|
| **Twitter / X** | Apify | #MedTwitter, #PharmaSafety |
|
|
| **TikTok** | Web Scraper | Trending side-effect "storytimes" |
|
|
|
|
## Ingestion Strategy
|
|
|
|
### 1. The "Golden Path" (High Confidence)
|
|
Automatic ingestion of **Tier 0 and Tier 1** data. These sources are considered permanent and override all others in the **Authority Lens**.
|
|
|
|
### 2. The "Signal Path" (Predictive)
|
|
Clustering of **Tier 5** data.
|
|
- Individual reports are ignored.
|
|
- **Clusters** (e.g., 50+ mentions of a symptom in 7 days) are promoted to "Latent Signals" and flagged for comparison against Tier 0.
|
|
|
|
### 3. Language Translation
|
|
Latent uses `google-cloud-translate` or local `marian-nmt` models to normalize Tier 0 data from the PMDA (Japan) and EMA (EU) into English assertions for global conflict detection.
|