Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.6 KiB
Lessons Learned: Benchmark Against Semgrep
Date: 2026-02-03 Type: Post-mortem / Improvement Plan
Issues Discovered
Issue 1: Corpus Not Integrated in Ephemeral Mode
Severity: HIGH Impact: Only 11 hardcoded assertions used instead of full RFC/OWASP corpus
Root Cause: EphemeralDetector::new() calls create_authoritative_corpus() which returns the hardcoded 11 assertions. The CorpusRegistry and RFC/OWASP builders from Phase 1 are not invoked.
Current Flow:
EphemeralDetector::new()
→ create_authoritative_corpus() // 11 hardcoded assertions
→ ConceptIndex::build()
Expected Flow:
EphemeralDetector::new()
→ CorpusRegistry::new()
→ registry.register(HardcodedCorpusBuilder)
→ registry.register(RfcCorpusBuilder) // RFC 7519, 5246, etc.
→ registry.register(OwaspCorpusBuilder) // Cheat sheets
→ registry.register(VendorCorpusBuilder) // PostgreSQL, Redis, etc.
→ registry.build_all()
→ ConceptIndex::build()
Fix:
// In src/episteme/mod.rs
impl EphemeralDetector {
pub fn new(signing_key: &SigningKey) -> Self {
// Build full corpus from all sources
let registry = CorpusRegistry::new();
registry.register(Box::new(HardcodedCorpusBuilder::new()));
registry.register(Box::new(RfcCorpusBuilder::new()));
registry.register(Box::new(OwaspCorpusBuilder::new()));
registry.register(Box::new(VendorCorpusBuilder::new()));
let corpus = registry.build_all(signing_key);
let index = ConceptIndex::build(&corpus);
Self { corpus, index }
}
}
Effort: 2 hours Priority: P0 (blocks value demonstration)
Issue 2: No Diagnostic Output for Claim→Conflict Pipeline
Severity: MEDIUM Impact: Can't debug why 458 claims produced 0 conflicts
Root Cause: No visibility into:
- What claims are extracted (subjects, predicates, values)
- What index keys are generated
- What lookups are attempted
- Why no matches found
Fix: Add --debug flag that outputs:
{
"claims": [
{
"concept_path": "code://rust/sqlx/sqlx-mysql/connection/auth/crypto",
"predicate": "algorithm",
"value": "sha1",
"index_key": "auth/crypto::algorithm",
"corpus_match": null,
"reason": "No authoritative assertion for auth/crypto::algorithm"
}
]
}
Effort: 4 hours Priority: P1 (blocks debugging)
Issue 3: Extractors Without Corpus Coverage
Severity: MEDIUM Impact: Wasted extraction work, confusing claim counts
Current State:
| Extractor | Claims Extracted | Conflicts Possible |
|---|---|---|
tls_verify |
Yes | Yes |
jwt_config |
Yes | Yes |
timeout_config |
Yes | No |
dep_versions |
Yes | No |
timeout_config and dep_versions extract claims but there's no authoritative corpus to compare against. This inflates "claims_extracted" without producing conflicts.
Fix Options:
A) Add authoritative assertions for timeouts/deps:
create_authoritative_assertion(
"vendor://postgresql/connection/idle_timeout",
"config_value",
ObjectValue::Text("recommended_range:1000-30000ms"),
SourceClass::Observational,
"PostgreSQL docs recommend idle timeout between 1-30 seconds",
);
B) Don't count claims that can't produce conflicts:
// Only extract claims for concepts with authoritative coverage
if !corpus_index.has_coverage(concept_path) {
continue;
}
C) Report separately:
Claims extracted: 458 (97 with corpus coverage, 361 without)
Conflicts found: 3
Effort: 4-8 hours (depending on approach) Priority: P2 (UX improvement)
Issue 4: Missing Extractors for Common Issues
Severity: MEDIUM Impact: Aphoria misses security patterns that other tools catch
Missing Extractors:
| Pattern | Priority | Example |
|---|---|---|
| Weak crypto (MD5/SHA1 for non-protocol) | P1 | Md5::new() for password hashing |
| Hardcoded crypto constants | P1 | const IV: [u8; 16] = [0; 16] |
| Weak RNG | P1 | rand::thread_rng() for crypto keys |
| SQL in strings | P2 | format!("SELECT * FROM {} WHERE", table) |
| Command injection | P2 | Command::new(user_input) |
Fix: Create new extractors with corresponding authoritative assertions.
Effort: 2-4 hours per extractor Priority: P1 (expands value)
Issue 5: Test File Skipping Not Visible
Severity: LOW Impact: Confusion when comparing to other tools
Current Output:
Scanned: 56 files | Claims: 127 | Conflicts: 0
Expected Output:
Scanned: 56 files (24 test files skipped) | Claims: 127 | Conflicts: 0
Fix: Track and report skipped files.
Effort: 1 hour Priority: P3 (polish)
Issue 6: Library vs Application Not Documented
Severity: LOW Impact: Users may be confused when Aphoria finds nothing in well-maintained libraries
Root Cause: Aphoria is designed for application code where developers make configuration mistakes. Libraries like reqwest/sqlx have correct defaults.
Fix: Add to README and skill:
## Best For
Aphoria excels at scanning **application code** where developers:
- Configure TLS, JWT, CORS, rate limiting
- Hardcode secrets or API keys
- Override secure defaults
It's less useful for **library code** which typically:
- Has secure defaults by design
- Exposes configuration APIs rather than making configuration decisions
Effort: 30 minutes Priority: P3 (documentation)
Action Items
| Priority | Issue | Fix | Effort |
|---|---|---|---|
| P0 | Corpus not integrated | Wire CorpusRegistry into EphemeralDetector | 2h |
| P1 | No diagnostic output | Add --debug flag with claim→conflict trace |
4h |
| P1 | Missing extractors | Add weak crypto, hardcoded constants extractors | 4-8h |
| P2 | Extractors without corpus | Add authoritative assertions OR report separately | 4-8h |
| P3 | Test file skipping invisible | Add "(N test files skipped)" to output | 1h |
| P3 | Library vs app undocumented | Update README and skill | 30m |
Total Effort: 15-24 hours
Architectural Insight
The benchmark validated Aphoria's approach:
- 100% precision proves the knowledge-graph method works
- But recall is limited by corpus coverage
The path forward is:
- Expand the corpus (more RFCs, more OWASP, more vendor docs)
- Add extractors for patterns that have authoritative guidance
- Don't extract patterns without corpus (wastes user's mental energy)
The "Claims extracted" number should represent "things we checked against authoritative sources" not "things we regex-matched."