Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
71 lines
4.0 KiB
Markdown
71 lines
4.0 KiB
Markdown
# Latent: System Architecture
|
|
|
|
Latent is an intelligence layer built on top of **StemeDB**. It transforms raw unstructured health data into a knowledge graph of conflicting safety assertions.
|
|
|
|
## 1. High-Level Architecture
|
|
|
|
```text
|
|
[ EXTERNAL SOURCES ]
|
|
│
|
|
▼
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ Ingestion Pods │ │ Extraction │
|
|
│ (The Sensors) │─────►│ (LLM Pipeline) │
|
|
└──────────────────┘ └────────┬─────────┘
|
|
│ (Signed Assertions)
|
|
▼
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ StemeDB Spine │◄─────┤ Assertion │
|
|
│ (Storage/WAL) │ │ Manager │
|
|
└────────┬─────────┘ └──────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ Lens Engine │ │ Divergence │
|
|
│ (Resolution) │◄─────┤ Analyzer │
|
|
└────────┬─────────┘ └────────┬─────────┘
|
|
│ │
|
|
▼ ▼
|
|
┌──────────────────┐ ┌──────────────────┐
|
|
│ Web Dashboard │ │ Alerting │
|
|
│ (Next.js) │ │ (Slack/Email) │
|
|
└──────────────────┘ └──────────────────┘
|
|
```
|
|
|
|
## 2. Component Breakdown
|
|
|
|
### 2.1. Ingestion Pods (Sensors)
|
|
Distributed workers responsible for pulling data from the `latent/sources.md` catalog.
|
|
- **Regulatory Sensor:** Polls OpenFDA and DailyMed for label changes (SPL XML/JSON).
|
|
- **Clinical Sensor:** Tracks CT.gov for trial completions and PubMed for case reports.
|
|
- **Social Sensor:** Utilizes headless browsers or API bridges (Apify) to monitor Reddit/Twitter clusters.
|
|
|
|
### 2.2. Extraction Pipeline (The Brain)
|
|
Converts raw text/PDFs into structured **Assertions**.
|
|
- **Model:** GPT-4o-mini (Cloud) or Llama-3-70B (Local) for PII-sensitive paths.
|
|
- **Process:**
|
|
1. Entity Recognition (Molecules, Symptoms).
|
|
2. Relation Extraction (Mechanism of action, Adverse event).
|
|
3. Sentiment/Magnitude normalization.
|
|
- **Output:** A StemeDB-compatible `Assertion` object.
|
|
|
|
### 2.3. Assertion Manager
|
|
The gatekeeper for the knowledge graph.
|
|
- **Signing:** Every assertion extracted by the pipeline is cryptographically signed by the Latent Extraction Agent.
|
|
- **Deduplication:** Uses content-addressing (StemeDB hashes) to ensure the same Reddit post isn't ingested twice.
|
|
|
|
### 2.4. Divergence Analyzer
|
|
A specialized background service that queries StemeDB using the **Skeptic Lens**.
|
|
- **Logic:** Compares Tier 0 (Regulatory) against Tier 5 (Social).
|
|
- **Score Calculation:**
|
|
`Divergence = (SocialMagnitude * SocialConfidence) / RegulatorySilence`
|
|
- **Indexing:** Updates materialized views in StemeDB for O(1) molecule status lookups.
|
|
|
|
## 3. Data Privacy & Compliance
|
|
- **De-identification:** All Social data (Tier 5) is stripped of usernames and PII before being written to the permanent StemeDB ledger.
|
|
- **Auditability:** Every divergence alert carries a "Lineage Hash" back to the raw source snippet.
|
|
|
|
## 4. Scalability Strategy
|
|
- **Rust Core:** The Extraction and Assertion managers are written in Rust for high-concurrency ingestion.
|
|
- **Vector Search:** Uses StemeDB's vector index to find "Semantic Clusters" of side effects across different languages (e.g., "stomach paralysis" vs "gastric stasis").
|