Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
54 lines
1.4 KiB
Markdown
54 lines
1.4 KiB
Markdown
# Latent: Reddit Ingestor (Tier 5)
|
|
|
|
This component monitors social signals ("The Noise") to detect latent safety issues before they appear in clinical literature.
|
|
|
|
## Scope (Week 2)
|
|
- **Source:** Reddit API (PRAW)
|
|
- **Targets:** r/Ozempic, r/Mounjaro, r/Semaglutide, r/Wegovy
|
|
- **Method:** Fetches `new` posts, filters by severity keywords (`paralysis`, `er`, `hospital`), and extracts structured assertions.
|
|
|
|
## Setup
|
|
|
|
1. **Credentials:**
|
|
You need a Reddit App (Script type). Go to [https://www.reddit.com/prefs/apps](https://www.reddit.com/prefs/apps).
|
|
|
|
Create a `.env` file in `latent/ingest-reddit/`:
|
|
```env
|
|
REDDIT_CLIENT_ID=your_id_here
|
|
REDDIT_CLIENT_SECRET=your_secret_here
|
|
REDDIT_USER_AGENT=LatentBot/0.1
|
|
|
|
# Optional: For real extraction (otherwise uses Regex Mock)
|
|
OPENAI_API_KEY=sk-...
|
|
```
|
|
|
|
2. **Install:**
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. **Run:**
|
|
```bash
|
|
python main.py
|
|
```
|
|
|
|
## Output
|
|
Generates `tier5_social_graph.jsonl`.
|
|
Entries look like:
|
|
```json
|
|
{
|
|
"subject": "semaglutide",
|
|
"predicate": "side_effect",
|
|
"object": "gastroparesis",
|
|
"source_class": 5,
|
|
"source_metadata": {
|
|
"type": "reddit_post",
|
|
"subreddit": "Ozempic",
|
|
"severity": "high"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Privacy Note
|
|
Author names are hashed (`hash(post["author"])`) before storage to provide basic anonymization while allowing for "Cluster" analysis (detecting if one user is spamming vs many unique users).
|