# Latent: System Architecture Latent is an intelligence layer built on top of **StemeDB**. It transforms raw unstructured health data into a knowledge graph of conflicting safety assertions. ## 1. High-Level Architecture ```text [ EXTERNAL SOURCES ] │ ▼ ┌──────────────────┐ ┌──────────────────┐ │ Ingestion Pods │ │ Extraction │ │ (The Sensors) │─────►│ (LLM Pipeline) │ └──────────────────┘ └────────┬─────────┘ │ (Signed Assertions) ▼ ┌──────────────────┐ ┌──────────────────┐ │ StemeDB Spine │◄─────┤ Assertion │ │ (Storage/WAL) │ │ Manager │ └────────┬─────────┘ └──────────────────┘ │ ▼ ┌──────────────────┐ ┌──────────────────┐ │ Lens Engine │ │ Divergence │ │ (Resolution) │◄─────┤ Analyzer │ └────────┬─────────┘ └────────┬─────────┘ │ │ ▼ ▼ ┌──────────────────┐ ┌──────────────────┐ │ Web Dashboard │ │ Alerting │ │ (Next.js) │ │ (Slack/Email) │ └──────────────────┘ └──────────────────┘ ``` ## 2. Component Breakdown ### 2.1. Ingestion Pods (Sensors) Distributed workers responsible for pulling data from the `latent/sources.md` catalog. - **Regulatory Sensor:** Polls OpenFDA and DailyMed for label changes (SPL XML/JSON). - **Clinical Sensor:** Tracks CT.gov for trial completions and PubMed for case reports. - **Social Sensor:** Utilizes headless browsers or API bridges (Apify) to monitor Reddit/Twitter clusters. ### 2.2. Extraction Pipeline (The Brain) Converts raw text/PDFs into structured **Assertions**. - **Model:** GPT-4o-mini (Cloud) or Llama-3-70B (Local) for PII-sensitive paths. - **Process:** 1. Entity Recognition (Molecules, Symptoms). 2. Relation Extraction (Mechanism of action, Adverse event). 3. Sentiment/Magnitude normalization. - **Output:** A StemeDB-compatible `Assertion` object. ### 2.3. Assertion Manager The gatekeeper for the knowledge graph. - **Signing:** Every assertion extracted by the pipeline is cryptographically signed by the Latent Extraction Agent. - **Deduplication:** Uses content-addressing (StemeDB hashes) to ensure the same Reddit post isn't ingested twice. ### 2.4. Divergence Analyzer A specialized background service that queries StemeDB using the **Skeptic Lens**. - **Logic:** Compares Tier 0 (Regulatory) against Tier 5 (Social). - **Score Calculation:** `Divergence = (SocialMagnitude * SocialConfidence) / RegulatorySilence` - **Indexing:** Updates materialized views in StemeDB for O(1) molecule status lookups. ## 3. Data Privacy & Compliance - **De-identification:** All Social data (Tier 5) is stripped of usernames and PII before being written to the permanent StemeDB ledger. - **Auditability:** Every divergence alert carries a "Lineage Hash" back to the raw source snippet. ## 4. Scalability Strategy - **Rust Core:** The Extraction and Assertion managers are written in Rust for high-concurrency ingestion. - **Vector Search:** Uses StemeDB's vector index to find "Semantic Clusters" of side effects across different languages (e.g., "stomach paralysis" vs "gastric stasis").