Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.0 KiB
4.0 KiB
Latent: System Architecture
Latent is an intelligence layer built on top of StemeDB. It transforms raw unstructured health data into a knowledge graph of conflicting safety assertions.
1. High-Level Architecture
[ EXTERNAL SOURCES ]
│
▼
┌──────────────────┐ ┌──────────────────┐
│ Ingestion Pods │ │ Extraction │
│ (The Sensors) │─────►│ (LLM Pipeline) │
└──────────────────┘ └────────┬─────────┘
│ (Signed Assertions)
▼
┌──────────────────┐ ┌──────────────────┐
│ StemeDB Spine │◄─────┤ Assertion │
│ (Storage/WAL) │ │ Manager │
└────────┬─────────┘ └──────────────────┘
│
▼
┌──────────────────┐ ┌──────────────────┐
│ Lens Engine │ │ Divergence │
│ (Resolution) │◄─────┤ Analyzer │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Web Dashboard │ │ Alerting │
│ (Next.js) │ │ (Slack/Email) │
└──────────────────┘ └──────────────────┘
2. Component Breakdown
2.1. Ingestion Pods (Sensors)
Distributed workers responsible for pulling data from the latent/sources.md catalog.
- Regulatory Sensor: Polls OpenFDA and DailyMed for label changes (SPL XML/JSON).
- Clinical Sensor: Tracks CT.gov for trial completions and PubMed for case reports.
- Social Sensor: Utilizes headless browsers or API bridges (Apify) to monitor Reddit/Twitter clusters.
2.2. Extraction Pipeline (The Brain)
Converts raw text/PDFs into structured Assertions.
- Model: GPT-4o-mini (Cloud) or Llama-3-70B (Local) for PII-sensitive paths.
- Process:
- Entity Recognition (Molecules, Symptoms).
- Relation Extraction (Mechanism of action, Adverse event).
- Sentiment/Magnitude normalization.
- Output: A StemeDB-compatible
Assertionobject.
2.3. Assertion Manager
The gatekeeper for the knowledge graph.
- Signing: Every assertion extracted by the pipeline is cryptographically signed by the Latent Extraction Agent.
- Deduplication: Uses content-addressing (StemeDB hashes) to ensure the same Reddit post isn't ingested twice.
2.4. Divergence Analyzer
A specialized background service that queries StemeDB using the Skeptic Lens.
- Logic: Compares Tier 0 (Regulatory) against Tier 5 (Social).
- Score Calculation:
Divergence = (SocialMagnitude * SocialConfidence) / RegulatorySilence - Indexing: Updates materialized views in StemeDB for O(1) molecule status lookups.
3. Data Privacy & Compliance
- De-identification: All Social data (Tier 5) is stripped of usernames and PII before being written to the permanent StemeDB ledger.
- Auditability: Every divergence alert carries a "Lineage Hash" back to the raw source snippet.
4. Scalability Strategy
- Rust Core: The Extraction and Assertion managers are written in Rust for high-concurrency ingestion.
- Vector Search: Uses StemeDB's vector index to find "Semantic Clusters" of side effects across different languages (e.g., "stomach paralysis" vs "gastric stasis").