Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
237 lines
6.6 KiB
Markdown
237 lines
6.6 KiB
Markdown
# Lessons Learned: Benchmark Against Semgrep
|
|
|
|
**Date:** 2026-02-03
|
|
**Type:** Post-mortem / Improvement Plan
|
|
|
|
---
|
|
|
|
## Issues Discovered
|
|
|
|
### Issue 1: Corpus Not Integrated in Ephemeral Mode
|
|
|
|
**Severity:** HIGH
|
|
**Impact:** Only 11 hardcoded assertions used instead of full RFC/OWASP corpus
|
|
|
|
**Root Cause:** `EphemeralDetector::new()` calls `create_authoritative_corpus()` which returns the hardcoded 11 assertions. The `CorpusRegistry` and RFC/OWASP builders from Phase 1 are not invoked.
|
|
|
|
**Current Flow:**
|
|
```
|
|
EphemeralDetector::new()
|
|
→ create_authoritative_corpus() // 11 hardcoded assertions
|
|
→ ConceptIndex::build()
|
|
```
|
|
|
|
**Expected Flow:**
|
|
```
|
|
EphemeralDetector::new()
|
|
→ CorpusRegistry::new()
|
|
→ registry.register(HardcodedCorpusBuilder)
|
|
→ registry.register(RfcCorpusBuilder) // RFC 7519, 5246, etc.
|
|
→ registry.register(OwaspCorpusBuilder) // Cheat sheets
|
|
→ registry.register(VendorCorpusBuilder) // PostgreSQL, Redis, etc.
|
|
→ registry.build_all()
|
|
→ ConceptIndex::build()
|
|
```
|
|
|
|
**Fix:**
|
|
```rust
|
|
// In src/episteme/mod.rs
|
|
impl EphemeralDetector {
|
|
pub fn new(signing_key: &SigningKey) -> Self {
|
|
// Build full corpus from all sources
|
|
let registry = CorpusRegistry::new();
|
|
registry.register(Box::new(HardcodedCorpusBuilder::new()));
|
|
registry.register(Box::new(RfcCorpusBuilder::new()));
|
|
registry.register(Box::new(OwaspCorpusBuilder::new()));
|
|
registry.register(Box::new(VendorCorpusBuilder::new()));
|
|
|
|
let corpus = registry.build_all(signing_key);
|
|
let index = ConceptIndex::build(&corpus);
|
|
|
|
Self { corpus, index }
|
|
}
|
|
}
|
|
```
|
|
|
|
**Effort:** 2 hours
|
|
**Priority:** P0 (blocks value demonstration)
|
|
|
|
---
|
|
|
|
### Issue 2: No Diagnostic Output for Claim→Conflict Pipeline
|
|
|
|
**Severity:** MEDIUM
|
|
**Impact:** Can't debug why 458 claims produced 0 conflicts
|
|
|
|
**Root Cause:** No visibility into:
|
|
- What claims are extracted (subjects, predicates, values)
|
|
- What index keys are generated
|
|
- What lookups are attempted
|
|
- Why no matches found
|
|
|
|
**Fix:** Add `--debug` flag that outputs:
|
|
```json
|
|
{
|
|
"claims": [
|
|
{
|
|
"concept_path": "code://rust/sqlx/sqlx-mysql/connection/auth/crypto",
|
|
"predicate": "algorithm",
|
|
"value": "sha1",
|
|
"index_key": "auth/crypto::algorithm",
|
|
"corpus_match": null,
|
|
"reason": "No authoritative assertion for auth/crypto::algorithm"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
**Effort:** 4 hours
|
|
**Priority:** P1 (blocks debugging)
|
|
|
|
---
|
|
|
|
### Issue 3: Extractors Without Corpus Coverage
|
|
|
|
**Severity:** MEDIUM
|
|
**Impact:** Wasted extraction work, confusing claim counts
|
|
|
|
**Current State:**
|
|
| Extractor | Claims Extracted | Conflicts Possible |
|
|
|-----------|------------------|-------------------|
|
|
| `tls_verify` | Yes | Yes |
|
|
| `jwt_config` | Yes | Yes |
|
|
| `timeout_config` | Yes | **No** |
|
|
| `dep_versions` | Yes | **No** |
|
|
|
|
`timeout_config` and `dep_versions` extract claims but there's no authoritative corpus to compare against. This inflates "claims_extracted" without producing conflicts.
|
|
|
|
**Fix Options:**
|
|
|
|
A) Add authoritative assertions for timeouts/deps:
|
|
```rust
|
|
create_authoritative_assertion(
|
|
"vendor://postgresql/connection/idle_timeout",
|
|
"config_value",
|
|
ObjectValue::Text("recommended_range:1000-30000ms"),
|
|
SourceClass::Observational,
|
|
"PostgreSQL docs recommend idle timeout between 1-30 seconds",
|
|
);
|
|
```
|
|
|
|
B) Don't count claims that can't produce conflicts:
|
|
```rust
|
|
// Only extract claims for concepts with authoritative coverage
|
|
if !corpus_index.has_coverage(concept_path) {
|
|
continue;
|
|
}
|
|
```
|
|
|
|
C) Report separately:
|
|
```
|
|
Claims extracted: 458 (97 with corpus coverage, 361 without)
|
|
Conflicts found: 3
|
|
```
|
|
|
|
**Effort:** 4-8 hours (depending on approach)
|
|
**Priority:** P2 (UX improvement)
|
|
|
|
---
|
|
|
|
### Issue 4: Missing Extractors for Common Issues
|
|
|
|
**Severity:** MEDIUM
|
|
**Impact:** Aphoria misses security patterns that other tools catch
|
|
|
|
**Missing Extractors:**
|
|
|
|
| Pattern | Priority | Example |
|
|
|---------|----------|---------|
|
|
| Weak crypto (MD5/SHA1 for non-protocol) | P1 | `Md5::new()` for password hashing |
|
|
| Hardcoded crypto constants | P1 | `const IV: [u8; 16] = [0; 16]` |
|
|
| Weak RNG | P1 | `rand::thread_rng()` for crypto keys |
|
|
| SQL in strings | P2 | `format!("SELECT * FROM {} WHERE", table)` |
|
|
| Command injection | P2 | `Command::new(user_input)` |
|
|
|
|
**Fix:** Create new extractors with corresponding authoritative assertions.
|
|
|
|
**Effort:** 2-4 hours per extractor
|
|
**Priority:** P1 (expands value)
|
|
|
|
---
|
|
|
|
### Issue 5: Test File Skipping Not Visible
|
|
|
|
**Severity:** LOW
|
|
**Impact:** Confusion when comparing to other tools
|
|
|
|
**Current Output:**
|
|
```
|
|
Scanned: 56 files | Claims: 127 | Conflicts: 0
|
|
```
|
|
|
|
**Expected Output:**
|
|
```
|
|
Scanned: 56 files (24 test files skipped) | Claims: 127 | Conflicts: 0
|
|
```
|
|
|
|
**Fix:** Track and report skipped files.
|
|
|
|
**Effort:** 1 hour
|
|
**Priority:** P3 (polish)
|
|
|
|
---
|
|
|
|
### Issue 6: Library vs Application Not Documented
|
|
|
|
**Severity:** LOW
|
|
**Impact:** Users may be confused when Aphoria finds nothing in well-maintained libraries
|
|
|
|
**Root Cause:** Aphoria is designed for application code where developers make configuration mistakes. Libraries like reqwest/sqlx have correct defaults.
|
|
|
|
**Fix:** Add to README and skill:
|
|
```markdown
|
|
## Best For
|
|
|
|
Aphoria excels at scanning **application code** where developers:
|
|
- Configure TLS, JWT, CORS, rate limiting
|
|
- Hardcode secrets or API keys
|
|
- Override secure defaults
|
|
|
|
It's less useful for **library code** which typically:
|
|
- Has secure defaults by design
|
|
- Exposes configuration APIs rather than making configuration decisions
|
|
```
|
|
|
|
**Effort:** 30 minutes
|
|
**Priority:** P3 (documentation)
|
|
|
|
---
|
|
|
|
## Action Items
|
|
|
|
| Priority | Issue | Fix | Effort |
|
|
|----------|-------|-----|--------|
|
|
| **P0** | Corpus not integrated | Wire CorpusRegistry into EphemeralDetector | 2h |
|
|
| **P1** | No diagnostic output | Add `--debug` flag with claim→conflict trace | 4h |
|
|
| **P1** | Missing extractors | Add weak crypto, hardcoded constants extractors | 4-8h |
|
|
| **P2** | Extractors without corpus | Add authoritative assertions OR report separately | 4-8h |
|
|
| **P3** | Test file skipping invisible | Add "(N test files skipped)" to output | 1h |
|
|
| **P3** | Library vs app undocumented | Update README and skill | 30m |
|
|
|
|
**Total Effort:** 15-24 hours
|
|
|
|
---
|
|
|
|
## Architectural Insight
|
|
|
|
The benchmark validated Aphoria's approach:
|
|
- **100% precision** proves the knowledge-graph method works
|
|
- But **recall is limited by corpus coverage**
|
|
|
|
The path forward is:
|
|
1. Expand the corpus (more RFCs, more OWASP, more vendor docs)
|
|
2. Add extractors for patterns that have authoritative guidance
|
|
3. Don't extract patterns without corpus (wastes user's mental energy)
|
|
|
|
The "Claims extracted" number should represent "things we checked against authoritative sources" not "things we regex-matched."
|