stemedb/latent/ingest-reddit/README.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

1.4 KiB

Latent: Reddit Ingestor (Tier 5)

This component monitors social signals ("The Noise") to detect latent safety issues before they appear in clinical literature.

Scope (Week 2)

  • Source: Reddit API (PRAW)
  • Targets: r/Ozempic, r/Mounjaro, r/Semaglutide, r/Wegovy
  • Method: Fetches new posts, filters by severity keywords (paralysis, er, hospital), and extracts structured assertions.

Setup

  1. Credentials: You need a Reddit App (Script type). Go to https://www.reddit.com/prefs/apps.

    Create a .env file in latent/ingest-reddit/:

    REDDIT_CLIENT_ID=your_id_here
    REDDIT_CLIENT_SECRET=your_secret_here
    REDDIT_USER_AGENT=LatentBot/0.1
    
    # Optional: For real extraction (otherwise uses Regex Mock)
    OPENAI_API_KEY=sk-...
    
  2. Install:

    pip install -r requirements.txt
    
  3. Run:

    python main.py
    

Output

Generates tier5_social_graph.jsonl. Entries look like:

{
  "subject": "semaglutide",
  "predicate": "side_effect",
  "object": "gastroparesis",
  "source_class": 5,
  "source_metadata": {
    "type": "reddit_post",
    "subreddit": "Ozempic",
    "severity": "high"
  }
}

Privacy Note

Author names are hashed (hash(post["author"])) before storage to provide basic anonymization while allowing for "Cluster" analysis (detecting if one user is spamming vs many unique users).