stemedb/latent/ingest-fda/README.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

1.0 KiB

Latent: FDA Ingestor (Tier 0)

This is the "Ground Truth" ingestor for the Latent system. It fetches the latest Structured Product Labels (SPL) from the OpenFDA API for target molecules and converts them into StemeDB Assertions.

Scope (Week 1)

  • Source: OpenFDA API (api.fda.gov)
  • Target Molecules: Semaglutide, Tirzepatide, Liraglutide
  • Sections Extracted:
    • boxed_warning (Black box warnings)
    • adverse_reactions (Side effects list)
    • warnings_and_precautions (General safety)

Usage

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Run the ingestor:

    python main.py
    
  3. Output:

    • Creates tier0_regulatory_graph.jsonl.
    • Each line is a JSON object representing a StemeDB Assertion with source_class: 0.

Next Steps

  • Implement NLP entity extraction to break the large text blocks into granular assertions (e.g., "causes nausea" instead of the full text block).
  • Connect directly to the StemeDB Rust bindings instead of outputting JSONL.