stemedb/applications/aphoria/uat/2026-02-03-lessons-learned.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

237 lines
6.6 KiB
Markdown

# Lessons Learned: Benchmark Against Semgrep
**Date:** 2026-02-03
**Type:** Post-mortem / Improvement Plan
---
## Issues Discovered
### Issue 1: Corpus Not Integrated in Ephemeral Mode
**Severity:** HIGH
**Impact:** Only 11 hardcoded assertions used instead of full RFC/OWASP corpus
**Root Cause:** `EphemeralDetector::new()` calls `create_authoritative_corpus()` which returns the hardcoded 11 assertions. The `CorpusRegistry` and RFC/OWASP builders from Phase 1 are not invoked.
**Current Flow:**
```
EphemeralDetector::new()
→ create_authoritative_corpus() // 11 hardcoded assertions
→ ConceptIndex::build()
```
**Expected Flow:**
```
EphemeralDetector::new()
→ CorpusRegistry::new()
→ registry.register(HardcodedCorpusBuilder)
→ registry.register(RfcCorpusBuilder) // RFC 7519, 5246, etc.
→ registry.register(OwaspCorpusBuilder) // Cheat sheets
→ registry.register(VendorCorpusBuilder) // PostgreSQL, Redis, etc.
→ registry.build_all()
→ ConceptIndex::build()
```
**Fix:**
```rust
// In src/episteme/mod.rs
impl EphemeralDetector {
pub fn new(signing_key: &SigningKey) -> Self {
// Build full corpus from all sources
let registry = CorpusRegistry::new();
registry.register(Box::new(HardcodedCorpusBuilder::new()));
registry.register(Box::new(RfcCorpusBuilder::new()));
registry.register(Box::new(OwaspCorpusBuilder::new()));
registry.register(Box::new(VendorCorpusBuilder::new()));
let corpus = registry.build_all(signing_key);
let index = ConceptIndex::build(&corpus);
Self { corpus, index }
}
}
```
**Effort:** 2 hours
**Priority:** P0 (blocks value demonstration)
---
### Issue 2: No Diagnostic Output for Claim→Conflict Pipeline
**Severity:** MEDIUM
**Impact:** Can't debug why 458 claims produced 0 conflicts
**Root Cause:** No visibility into:
- What claims are extracted (subjects, predicates, values)
- What index keys are generated
- What lookups are attempted
- Why no matches found
**Fix:** Add `--debug` flag that outputs:
```json
{
"claims": [
{
"concept_path": "code://rust/sqlx/sqlx-mysql/connection/auth/crypto",
"predicate": "algorithm",
"value": "sha1",
"index_key": "auth/crypto::algorithm",
"corpus_match": null,
"reason": "No authoritative assertion for auth/crypto::algorithm"
}
]
}
```
**Effort:** 4 hours
**Priority:** P1 (blocks debugging)
---
### Issue 3: Extractors Without Corpus Coverage
**Severity:** MEDIUM
**Impact:** Wasted extraction work, confusing claim counts
**Current State:**
| Extractor | Claims Extracted | Conflicts Possible |
|-----------|------------------|-------------------|
| `tls_verify` | Yes | Yes |
| `jwt_config` | Yes | Yes |
| `timeout_config` | Yes | **No** |
| `dep_versions` | Yes | **No** |
`timeout_config` and `dep_versions` extract claims but there's no authoritative corpus to compare against. This inflates "claims_extracted" without producing conflicts.
**Fix Options:**
A) Add authoritative assertions for timeouts/deps:
```rust
create_authoritative_assertion(
"vendor://postgresql/connection/idle_timeout",
"config_value",
ObjectValue::Text("recommended_range:1000-30000ms"),
SourceClass::Observational,
"PostgreSQL docs recommend idle timeout between 1-30 seconds",
);
```
B) Don't count claims that can't produce conflicts:
```rust
// Only extract claims for concepts with authoritative coverage
if !corpus_index.has_coverage(concept_path) {
continue;
}
```
C) Report separately:
```
Claims extracted: 458 (97 with corpus coverage, 361 without)
Conflicts found: 3
```
**Effort:** 4-8 hours (depending on approach)
**Priority:** P2 (UX improvement)
---
### Issue 4: Missing Extractors for Common Issues
**Severity:** MEDIUM
**Impact:** Aphoria misses security patterns that other tools catch
**Missing Extractors:**
| Pattern | Priority | Example |
|---------|----------|---------|
| Weak crypto (MD5/SHA1 for non-protocol) | P1 | `Md5::new()` for password hashing |
| Hardcoded crypto constants | P1 | `const IV: [u8; 16] = [0; 16]` |
| Weak RNG | P1 | `rand::thread_rng()` for crypto keys |
| SQL in strings | P2 | `format!("SELECT * FROM {} WHERE", table)` |
| Command injection | P2 | `Command::new(user_input)` |
**Fix:** Create new extractors with corresponding authoritative assertions.
**Effort:** 2-4 hours per extractor
**Priority:** P1 (expands value)
---
### Issue 5: Test File Skipping Not Visible
**Severity:** LOW
**Impact:** Confusion when comparing to other tools
**Current Output:**
```
Scanned: 56 files | Claims: 127 | Conflicts: 0
```
**Expected Output:**
```
Scanned: 56 files (24 test files skipped) | Claims: 127 | Conflicts: 0
```
**Fix:** Track and report skipped files.
**Effort:** 1 hour
**Priority:** P3 (polish)
---
### Issue 6: Library vs Application Not Documented
**Severity:** LOW
**Impact:** Users may be confused when Aphoria finds nothing in well-maintained libraries
**Root Cause:** Aphoria is designed for application code where developers make configuration mistakes. Libraries like reqwest/sqlx have correct defaults.
**Fix:** Add to README and skill:
```markdown
## Best For
Aphoria excels at scanning **application code** where developers:
- Configure TLS, JWT, CORS, rate limiting
- Hardcode secrets or API keys
- Override secure defaults
It's less useful for **library code** which typically:
- Has secure defaults by design
- Exposes configuration APIs rather than making configuration decisions
```
**Effort:** 30 minutes
**Priority:** P3 (documentation)
---
## Action Items
| Priority | Issue | Fix | Effort |
|----------|-------|-----|--------|
| **P0** | Corpus not integrated | Wire CorpusRegistry into EphemeralDetector | 2h |
| **P1** | No diagnostic output | Add `--debug` flag with claim→conflict trace | 4h |
| **P1** | Missing extractors | Add weak crypto, hardcoded constants extractors | 4-8h |
| **P2** | Extractors without corpus | Add authoritative assertions OR report separately | 4-8h |
| **P3** | Test file skipping invisible | Add "(N test files skipped)" to output | 1h |
| **P3** | Library vs app undocumented | Update README and skill | 30m |
**Total Effort:** 15-24 hours
---
## Architectural Insight
The benchmark validated Aphoria's approach:
- **100% precision** proves the knowledge-graph method works
- But **recall is limited by corpus coverage**
The path forward is:
1. Expand the corpus (more RFCs, more OWASP, more vendor docs)
2. Add extractors for patterns that have authoritative guidance
3. Don't extract patterns without corpus (wastes user's mental energy)
The "Claims extracted" number should represent "things we checked against authoritative sources" not "things we regex-matched."