Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
4.7 KiB
4.7 KiB
VulnBank - Security Scanner Comparison Demo
VulnBank is an intentionally vulnerable multi-language application designed to demonstrate the precision difference between Aphoria's knowledge-graph approach and traditional pattern-matching security scanners.
DO NOT USE IN PRODUCTION
This codebase contains real security vulnerabilities for testing purposes. Never deploy this code.
Quick Start
# Build Aphoria
cd /path/to/stemedb
cargo build --release -p aphoria
# Run the benchmark
cd docs/demo/vulnbank
./benchmark.sh
What's Inside
| Language | Files | Vulnerability Types |
|---|---|---|
| Rust | 5 | JWT, CORS, TLS, Crypto, Secrets |
| Python | 3 | TLS, SQL Injection, Command Injection |
| Go | 3 | JWT, CORS, TLS, Crypto |
| Node.js | 3 | TLS, JWT, SQL Injection, Command Injection |
| Config | 2 | Secrets, Misconfigurations |
Expected Results
Aphoria Findings (~40)
| Extractor | Count | Examples |
|---|---|---|
| jwt_config | 6 | validate_aud = false, Algorithm::None |
| tls_verify | 6 | danger_accept_invalid_certs, verify=False |
| weak_crypto | 6 | Md5::new(), sha1.Sum(), rc4.NewCipher() |
| hardcoded_secrets | 5 | API keys, passwords in source |
| cors_config | 5 | Access-Control-Allow-Origin: * |
| sql_injection | 4 | f"SELECT * FROM users WHERE id = {id}" |
| command_injection | 4 | shell=True, os.system() |
| rate_limit | 3 | rate_limit_enabled: false |
Precision: 100% - Every finding is a real vulnerability backed by RFC/OWASP specifications.
Semgrep Findings (~100-150)
Semgrep finds many more "issues" because it uses pattern matching:
- Flags every
unsafeblock in Rust - Flags every
.args()call - Flags benign code patterns that "look dangerous"
Precision: ~20-30% - Most findings are false positives or low-severity noise.
Why the Difference?
Semgrep Approach
Pattern: "md5" → ALERT!
Pattern: "shell=True" → ALERT!
Pattern: "*.args()" → ALERT!
No context. No understanding. Just pattern matching.
Aphoria Approach
Assertion: "MD5 is cryptographically broken" (RFC 6151)
Code: `Md5::new()` for password hashing
Conflict: Code contradicts RFC 6151 → BLOCK
Every finding is backed by an authoritative source.
Files Overview
vulnbank/
├── rust/
│ ├── src/
│ │ ├── auth.rs # JWT vulnerabilities
│ │ ├── cors.rs # CORS misconfigurations
│ │ ├── tls.rs # TLS verification disabled
│ │ ├── crypto.rs # Weak cryptography
│ │ └── config.rs # Hardcoded secrets
│ └── Cargo.toml
├── python/
│ ├── app.py # TLS, secrets, rate limiting
│ ├── db.py # SQL injection
│ └── runner.py # Command injection
├── go/
│ ├── main.go
│ ├── handler.go # JWT, CORS, TLS
│ └── crypto.go # Weak cryptography
├── node/
│ ├── server.js # TLS, CORS, JWT
│ ├── db.js # SQL injection
│ └── exec.js # Command injection
├── config/
│ ├── production.yaml # Config vulnerabilities
│ └── .env.example # Hardcoded secrets
├── benchmark.sh # Aphoria vs Semgrep comparison
└── README.md # This file
Running Individual Scans
Aphoria Only
aphoria scan . --format table
Semgrep Only
semgrep --config=auto .
JSON Output
aphoria scan . --format json > aphoria-results.json
semgrep --config=auto --json . > semgrep-results.json
Understanding the Vulnerabilities
JWT Misconfigurations
validate_aud = false- Tokens can be replayed across servicesAlgorithm::None- Attackers can forge unsigned tokens- Missing algorithm validation - Algorithm confusion attacks
TLS Issues
verify=False/InsecureSkipVerify- Man-in-the-middle attacks- TLS 1.0/1.1 allowed - Known protocol vulnerabilities
Weak Cryptography
- MD5 - Collision attacks practical since 2004
- SHA1 - SHAttered attack demonstrated in 2017
- DES/RC4 - Brute-forceable / multiple known attacks
Injection Vulnerabilities
- SQL Injection - f-strings, format(), concatenation
- Command Injection - shell=True, os.system(), exec()
Secrets Management
- Hardcoded API keys in source code
- Passwords in configuration files
- Credentials in .env.example files