stemedb/latent/architecture.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

71 lines
4.0 KiB
Markdown

# Latent: System Architecture
Latent is an intelligence layer built on top of **StemeDB**. It transforms raw unstructured health data into a knowledge graph of conflicting safety assertions.
## 1. High-Level Architecture
```text
[ EXTERNAL SOURCES ]
┌──────────────────┐ ┌──────────────────┐
│ Ingestion Pods │ │ Extraction │
│ (The Sensors) │─────►│ (LLM Pipeline) │
└──────────────────┘ └────────┬─────────┘
│ (Signed Assertions)
┌──────────────────┐ ┌──────────────────┐
│ StemeDB Spine │◄─────┤ Assertion │
│ (Storage/WAL) │ │ Manager │
└────────┬─────────┘ └──────────────────┘
┌──────────────────┐ ┌──────────────────┐
│ Lens Engine │ │ Divergence │
│ (Resolution) │◄─────┤ Analyzer │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Web Dashboard │ │ Alerting │
│ (Next.js) │ │ (Slack/Email) │
└──────────────────┘ └──────────────────┘
```
## 2. Component Breakdown
### 2.1. Ingestion Pods (Sensors)
Distributed workers responsible for pulling data from the `latent/sources.md` catalog.
- **Regulatory Sensor:** Polls OpenFDA and DailyMed for label changes (SPL XML/JSON).
- **Clinical Sensor:** Tracks CT.gov for trial completions and PubMed for case reports.
- **Social Sensor:** Utilizes headless browsers or API bridges (Apify) to monitor Reddit/Twitter clusters.
### 2.2. Extraction Pipeline (The Brain)
Converts raw text/PDFs into structured **Assertions**.
- **Model:** GPT-4o-mini (Cloud) or Llama-3-70B (Local) for PII-sensitive paths.
- **Process:**
1. Entity Recognition (Molecules, Symptoms).
2. Relation Extraction (Mechanism of action, Adverse event).
3. Sentiment/Magnitude normalization.
- **Output:** A StemeDB-compatible `Assertion` object.
### 2.3. Assertion Manager
The gatekeeper for the knowledge graph.
- **Signing:** Every assertion extracted by the pipeline is cryptographically signed by the Latent Extraction Agent.
- **Deduplication:** Uses content-addressing (StemeDB hashes) to ensure the same Reddit post isn't ingested twice.
### 2.4. Divergence Analyzer
A specialized background service that queries StemeDB using the **Skeptic Lens**.
- **Logic:** Compares Tier 0 (Regulatory) against Tier 5 (Social).
- **Score Calculation:**
`Divergence = (SocialMagnitude * SocialConfidence) / RegulatorySilence`
- **Indexing:** Updates materialized views in StemeDB for O(1) molecule status lookups.
## 3. Data Privacy & Compliance
- **De-identification:** All Social data (Tier 5) is stripped of usernames and PII before being written to the permanent StemeDB ledger.
- **Auditability:** Every divergence alert carries a "Lineage Hash" back to the raw source snippet.
## 4. Scalability Strategy
- **Rust Core:** The Extraction and Assertion managers are written in Rust for high-concurrency ingestion.
- **Vector Search:** Uses StemeDB's vector index to find "Semantic Clusters" of side effects across different languages (e.g., "stomach paralysis" vs "gastric stasis").