Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1.0 KiB
1.0 KiB
Latent: FDA Ingestor (Tier 0)
This is the "Ground Truth" ingestor for the Latent system. It fetches the latest Structured Product Labels (SPL) from the OpenFDA API for target molecules and converts them into StemeDB Assertions.
Scope (Week 1)
- Source: OpenFDA API (
api.fda.gov) - Target Molecules: Semaglutide, Tirzepatide, Liraglutide
- Sections Extracted:
boxed_warning(Black box warnings)adverse_reactions(Side effects list)warnings_and_precautions(General safety)
Usage
-
Install dependencies:
pip install -r requirements.txt -
Run the ingestor:
python main.py -
Output:
- Creates
tier0_regulatory_graph.jsonl. - Each line is a JSON object representing a StemeDB Assertion with
source_class: 0.
- Creates
Next Steps
- Implement NLP entity extraction to break the large text blocks into granular assertions (e.g., "causes nausea" instead of the full text block).
- Connect directly to the StemeDB Rust bindings instead of outputting JSONL.