stemedb/latent/ingest-reddit/README.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

54 lines
1.4 KiB
Markdown

# Latent: Reddit Ingestor (Tier 5)
This component monitors social signals ("The Noise") to detect latent safety issues before they appear in clinical literature.
## Scope (Week 2)
- **Source:** Reddit API (PRAW)
- **Targets:** r/Ozempic, r/Mounjaro, r/Semaglutide, r/Wegovy
- **Method:** Fetches `new` posts, filters by severity keywords (`paralysis`, `er`, `hospital`), and extracts structured assertions.
## Setup
1. **Credentials:**
You need a Reddit App (Script type). Go to [https://www.reddit.com/prefs/apps](https://www.reddit.com/prefs/apps).
Create a `.env` file in `latent/ingest-reddit/`:
```env
REDDIT_CLIENT_ID=your_id_here
REDDIT_CLIENT_SECRET=your_secret_here
REDDIT_USER_AGENT=LatentBot/0.1
# Optional: For real extraction (otherwise uses Regex Mock)
OPENAI_API_KEY=sk-...
```
2. **Install:**
```bash
pip install -r requirements.txt
```
3. **Run:**
```bash
python main.py
```
## Output
Generates `tier5_social_graph.jsonl`.
Entries look like:
```json
{
"subject": "semaglutide",
"predicate": "side_effect",
"object": "gastroparesis",
"source_class": 5,
"source_metadata": {
"type": "reddit_post",
"subreddit": "Ozempic",
"severity": "high"
}
}
```
## Privacy Note
Author names are hashed (`hash(post["author"])`) before storage to provide basic anonymization while allowing for "Cluster" analysis (detecting if one user is spamming vs many unique users).