Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
155 lines
4.7 KiB
Markdown
155 lines
4.7 KiB
Markdown
# VulnBank - Security Scanner Comparison Demo
|
|
|
|
VulnBank is an intentionally vulnerable multi-language application designed to demonstrate the precision difference between Aphoria's knowledge-graph approach and traditional pattern-matching security scanners.
|
|
|
|
## DO NOT USE IN PRODUCTION
|
|
|
|
This codebase contains **real security vulnerabilities** for testing purposes. Never deploy this code.
|
|
|
|
## Quick Start
|
|
|
|
```bash
|
|
# Build Aphoria
|
|
cd /path/to/stemedb
|
|
cargo build --release -p aphoria
|
|
|
|
# Run the benchmark
|
|
cd docs/demo/vulnbank
|
|
./benchmark.sh
|
|
```
|
|
|
|
## What's Inside
|
|
|
|
| Language | Files | Vulnerability Types |
|
|
|----------|-------|---------------------|
|
|
| Rust | 5 | JWT, CORS, TLS, Crypto, Secrets |
|
|
| Python | 3 | TLS, SQL Injection, Command Injection |
|
|
| Go | 3 | JWT, CORS, TLS, Crypto |
|
|
| Node.js | 3 | TLS, JWT, SQL Injection, Command Injection |
|
|
| Config | 2 | Secrets, Misconfigurations |
|
|
|
|
## Expected Results
|
|
|
|
### Aphoria Findings (~40)
|
|
|
|
| Extractor | Count | Examples |
|
|
|-----------|-------|----------|
|
|
| jwt_config | 6 | `validate_aud = false`, `Algorithm::None` |
|
|
| tls_verify | 6 | `danger_accept_invalid_certs`, `verify=False` |
|
|
| weak_crypto | 6 | `Md5::new()`, `sha1.Sum()`, `rc4.NewCipher()` |
|
|
| hardcoded_secrets | 5 | API keys, passwords in source |
|
|
| cors_config | 5 | `Access-Control-Allow-Origin: *` |
|
|
| sql_injection | 4 | `f"SELECT * FROM users WHERE id = {id}"` |
|
|
| command_injection | 4 | `shell=True`, `os.system()` |
|
|
| rate_limit | 3 | `rate_limit_enabled: false` |
|
|
|
|
**Precision: 100%** - Every finding is a real vulnerability backed by RFC/OWASP specifications.
|
|
|
|
### Semgrep Findings (~100-150)
|
|
|
|
Semgrep finds many more "issues" because it uses pattern matching:
|
|
- Flags every `unsafe` block in Rust
|
|
- Flags every `.args()` call
|
|
- Flags benign code patterns that "look dangerous"
|
|
|
|
**Precision: ~20-30%** - Most findings are false positives or low-severity noise.
|
|
|
|
## Why the Difference?
|
|
|
|
### Semgrep Approach
|
|
```
|
|
Pattern: "md5" → ALERT!
|
|
Pattern: "shell=True" → ALERT!
|
|
Pattern: "*.args()" → ALERT!
|
|
```
|
|
No context. No understanding. Just pattern matching.
|
|
|
|
### Aphoria Approach
|
|
```
|
|
Assertion: "MD5 is cryptographically broken" (RFC 6151)
|
|
Code: `Md5::new()` for password hashing
|
|
Conflict: Code contradicts RFC 6151 → BLOCK
|
|
```
|
|
Every finding is backed by an authoritative source.
|
|
|
|
## Files Overview
|
|
|
|
```
|
|
vulnbank/
|
|
├── rust/
|
|
│ ├── src/
|
|
│ │ ├── auth.rs # JWT vulnerabilities
|
|
│ │ ├── cors.rs # CORS misconfigurations
|
|
│ │ ├── tls.rs # TLS verification disabled
|
|
│ │ ├── crypto.rs # Weak cryptography
|
|
│ │ └── config.rs # Hardcoded secrets
|
|
│ └── Cargo.toml
|
|
├── python/
|
|
│ ├── app.py # TLS, secrets, rate limiting
|
|
│ ├── db.py # SQL injection
|
|
│ └── runner.py # Command injection
|
|
├── go/
|
|
│ ├── main.go
|
|
│ ├── handler.go # JWT, CORS, TLS
|
|
│ └── crypto.go # Weak cryptography
|
|
├── node/
|
|
│ ├── server.js # TLS, CORS, JWT
|
|
│ ├── db.js # SQL injection
|
|
│ └── exec.js # Command injection
|
|
├── config/
|
|
│ ├── production.yaml # Config vulnerabilities
|
|
│ └── .env.example # Hardcoded secrets
|
|
├── benchmark.sh # Aphoria vs Semgrep comparison
|
|
└── README.md # This file
|
|
```
|
|
|
|
## Running Individual Scans
|
|
|
|
### Aphoria Only
|
|
```bash
|
|
aphoria scan . --format table
|
|
```
|
|
|
|
### Semgrep Only
|
|
```bash
|
|
semgrep --config=auto .
|
|
```
|
|
|
|
### JSON Output
|
|
```bash
|
|
aphoria scan . --format json > aphoria-results.json
|
|
semgrep --config=auto --json . > semgrep-results.json
|
|
```
|
|
|
|
## Understanding the Vulnerabilities
|
|
|
|
### JWT Misconfigurations
|
|
- `validate_aud = false` - Tokens can be replayed across services
|
|
- `Algorithm::None` - Attackers can forge unsigned tokens
|
|
- Missing algorithm validation - Algorithm confusion attacks
|
|
|
|
### TLS Issues
|
|
- `verify=False` / `InsecureSkipVerify` - Man-in-the-middle attacks
|
|
- TLS 1.0/1.1 allowed - Known protocol vulnerabilities
|
|
|
|
### Weak Cryptography
|
|
- MD5 - Collision attacks practical since 2004
|
|
- SHA1 - SHAttered attack demonstrated in 2017
|
|
- DES/RC4 - Brute-forceable / multiple known attacks
|
|
|
|
### Injection Vulnerabilities
|
|
- SQL Injection - f-strings, format(), concatenation
|
|
- Command Injection - shell=True, os.system(), exec()
|
|
|
|
### Secrets Management
|
|
- Hardcoded API keys in source code
|
|
- Passwords in configuration files
|
|
- Credentials in .env.example files
|
|
|
|
## Learn More
|
|
|
|
- [Aphoria Documentation](../../../applications/aphoria/README.md)
|
|
- [Episteme Knowledge Graph](../../../what-is-episteme.md)
|
|
- [RFC 7519 - JWT](https://datatracker.ietf.org/doc/html/rfc7519)
|
|
- [OWASP Top 10](https://owasp.org/www-project-top-ten/)
|