# Latent: Data Sources Catalog Latent maps the world's drug safety data into the StemeDB Source Class hierarchy. ## Tier 0: Regulatory (The Ground Truth) *Static, authoritative, legally mandated.* | Source | Access Method | Update Frequency | Data Format | | :--- | :--- | :--- | :--- | | **FDA Labels** | OpenFDA API | Weekly | Structured JSON | | **EMA Post-Auth** | Web Scraper / RSS | Monthly | PDF / HTML | | **DailyMed** | NIH API / Bulk | Daily | SPL (XML) | | **PMDA (Japan)** | Web Scraper | Quarterly | HTML (Japanese) | ## Tier 1: Clinical (The Science) *Rigorous, peer-reviewed, baseline statistics.* | Source | Access Method | Data Points | | :--- | :--- | :--- | | **ClinicalTrials.gov** | CT.gov API v2 | Adverse Event Tables, Trial Status | | **EudraCT** | Web Scraper | European Clinical Trial results | | **Registry Metadata** | Crossref API | Publication status of completed trials | ## Tier 2: Observational & Expert (The Narrative) *Case reports, specialist guidelines, real-world studies.* | Source | Access Method | Role | | :--- | :--- | :--- | | **PubMed / MEDLINE** | Entrez E-utilities | Case reports of rare adverse events | | **bioRxiv / medRxiv** | API | Pre-print signals (Fast but unverified) | | **NICE Guidelines** | Web Scraper | Standard of care changes | ## Tier 4: Aggregated Community (The Volume) *Structured reports from non-regulatory sources.* | Source | Access Method | Role | | :--- | :--- | :--- | | **FAERS** | OpenFDA API | Public side-effect reporting (Noisy) | | **VAERS** | OpenFDA API | Vaccine-specific adverse events | | **PatientsLikeMe** | Web Scraper | Structured patient-reported outcomes | ## Tier 5: Anecdotal (The Early Warning) *Unstructured, high-velocity, messy.* | Source | Access Method | Target Channels | | :--- | :--- | :--- | | **Reddit** | Apify / Reddit API | r/Ozempic, r/Medicine, r/Biohackers | | **Twitter / X** | Apify | #MedTwitter, #PharmaSafety | | **TikTok** | Web Scraper | Trending side-effect "storytimes" | ## Ingestion Strategy ### 1. The "Golden Path" (High Confidence) Automatic ingestion of **Tier 0 and Tier 1** data. These sources are considered permanent and override all others in the **Authority Lens**. ### 2. The "Signal Path" (Predictive) Clustering of **Tier 5** data. - Individual reports are ignored. - **Clusters** (e.g., 50+ mentions of a symptom in 7 days) are promoted to "Latent Signals" and flagged for comparison against Tier 0. ### 3. Language Translation Latent uses `google-cloud-translate` or local `marian-nmt` models to normalize Tier 0 data from the PMDA (Japan) and EMA (EU) into English assertions for global conflict detection.