# Latent: System Architecture

Latent is an intelligence layer built on top of **StemeDB**. It transforms raw unstructured health data into a knowledge graph of conflicting safety assertions.

## 1. High-Level Architecture

```text
  [ EXTERNAL SOURCES ]
          │
          ▼
  ┌──────────────────┐      ┌──────────────────┐
  │  Ingestion Pods  │      │  Extraction      │
  │  (The Sensors)   │─────►│  (LLM Pipeline)  │
  └──────────────────┘      └────────┬─────────┘
                                     │ (Signed Assertions)
                                     ▼
  ┌──────────────────┐      ┌──────────────────┐
  │  StemeDB Spine   │◄─────┤  Assertion       │
  │  (Storage/WAL)   │      │  Manager         │
  └────────┬─────────┘      └──────────────────┘
           │
           ▼
  ┌──────────────────┐      ┌──────────────────┐
  │  Lens Engine     │      │  Divergence      │
  │  (Resolution)    │◄─────┤  Analyzer        │
  └────────┬─────────┘      └────────┬─────────┘
           │                         │
           ▼                         ▼
  ┌──────────────────┐      ┌──────────────────┐
  │  Web Dashboard   │      │  Alerting        │
  │  (Next.js)       │      │  (Slack/Email)   │
  └──────────────────┘      └──────────────────┘
```

## 2. Component Breakdown

### 2.1. Ingestion Pods (Sensors)
Distributed workers responsible for pulling data from the `latent/sources.md` catalog.
- **Regulatory Sensor:** Polls OpenFDA and DailyMed for label changes (SPL XML/JSON).
- **Clinical Sensor:** Tracks CT.gov for trial completions and PubMed for case reports.
- **Social Sensor:** Utilizes headless browsers or API bridges (Apify) to monitor Reddit/Twitter clusters.

### 2.2. Extraction Pipeline (The Brain)
Converts raw text/PDFs into structured **Assertions**.
- **Model:** GPT-4o-mini (Cloud) or Llama-3-70B (Local) for PII-sensitive paths.
- **Process:** 
    1. Entity Recognition (Molecules, Symptoms).
    2. Relation Extraction (Mechanism of action, Adverse event).
    3. Sentiment/Magnitude normalization.
- **Output:** A StemeDB-compatible `Assertion` object.

### 2.3. Assertion Manager
The gatekeeper for the knowledge graph.
- **Signing:** Every assertion extracted by the pipeline is cryptographically signed by the Latent Extraction Agent.
- **Deduplication:** Uses content-addressing (StemeDB hashes) to ensure the same Reddit post isn't ingested twice.

### 2.4. Divergence Analyzer
A specialized background service that queries StemeDB using the **Skeptic Lens**.
- **Logic:** Compares Tier 0 (Regulatory) against Tier 5 (Social).
- **Score Calculation:** 
    `Divergence = (SocialMagnitude * SocialConfidence) / RegulatorySilence`
- **Indexing:** Updates materialized views in StemeDB for O(1) molecule status lookups.

## 3. Data Privacy & Compliance
- **De-identification:** All Social data (Tier 5) is stripped of usernames and PII before being written to the permanent StemeDB ledger.
- **Auditability:** Every divergence alert carries a "Lineage Hash" back to the raw source snippet.

## 4. Scalability Strategy
- **Rust Core:** The Extraction and Assertion managers are written in Rust for high-concurrency ingestion.
- **Vector Search:** Uses StemeDB's vector index to find "Semantic Clusters" of side effects across different languages (e.g., "stomach paralysis" vs "gastric stasis").