stemedb/applications/aphoria/uat/2026-02-03-lessons-learned.md

# Lessons Learned: Benchmark Against Semgrep

**Date:** 2026-02-03
**Type:** Post-mortem / Improvement Plan

---

## Issues Discovered

### Issue 1: Corpus Not Integrated in Ephemeral Mode

**Severity:** HIGH
**Impact:** Only 11 hardcoded assertions used instead of full RFC/OWASP corpus

**Root Cause:** `EphemeralDetector::new()` calls `create_authoritative_corpus()` which returns the hardcoded 11 assertions. The `CorpusRegistry` and RFC/OWASP builders from Phase 1 are not invoked.

**Current Flow:**
```
EphemeralDetector::new()
  → create_authoritative_corpus()  // 11 hardcoded assertions
  → ConceptIndex::build()
```

**Expected Flow:**
```
EphemeralDetector::new()
  → CorpusRegistry::new()
  → registry.register(HardcodedCorpusBuilder)
  → registry.register(RfcCorpusBuilder)      // RFC 7519, 5246, etc.
  → registry.register(OwaspCorpusBuilder)    // Cheat sheets
  → registry.register(VendorCorpusBuilder)   // PostgreSQL, Redis, etc.
  → registry.build_all()
  → ConceptIndex::build()
```

**Fix:**
```rust
// In src/episteme/mod.rs
impl EphemeralDetector {
    pub fn new(signing_key: &SigningKey) -> Self {
        // Build full corpus from all sources
        let registry = CorpusRegistry::new();
        registry.register(Box::new(HardcodedCorpusBuilder::new()));
        registry.register(Box::new(RfcCorpusBuilder::new()));
        registry.register(Box::new(OwaspCorpusBuilder::new()));
        registry.register(Box::new(VendorCorpusBuilder::new()));

        let corpus = registry.build_all(signing_key);
        let index = ConceptIndex::build(&corpus);

        Self { corpus, index }
    }
}
```

**Effort:** 2 hours
**Priority:** P0 (blocks value demonstration)

---

### Issue 2: No Diagnostic Output for Claim→Conflict Pipeline

**Severity:** MEDIUM
**Impact:** Can't debug why 458 claims produced 0 conflicts

**Root Cause:** No visibility into:
- What claims are extracted (subjects, predicates, values)
- What index keys are generated
- What lookups are attempted
- Why no matches found

**Fix:** Add `--debug` flag that outputs:
```json
{
  "claims": [
    {
      "concept_path": "code://rust/sqlx/sqlx-mysql/connection/auth/crypto",
      "predicate": "algorithm",
      "value": "sha1",
      "index_key": "auth/crypto::algorithm",
      "corpus_match": null,
      "reason": "No authoritative assertion for auth/crypto::algorithm"
    }
  ]
}
```

**Effort:** 4 hours
**Priority:** P1 (blocks debugging)

---

### Issue 3: Extractors Without Corpus Coverage

**Severity:** MEDIUM
**Impact:** Wasted extraction work, confusing claim counts

**Current State:**
| Extractor | Claims Extracted | Conflicts Possible |
|-----------|------------------|-------------------|
| `tls_verify` | Yes | Yes |
| `jwt_config` | Yes | Yes |
| `timeout_config` | Yes | **No** |
| `dep_versions` | Yes | **No** |

`timeout_config` and `dep_versions` extract claims but there's no authoritative corpus to compare against. This inflates "claims_extracted" without producing conflicts.

**Fix Options:**

A) Add authoritative assertions for timeouts/deps:
```rust
create_authoritative_assertion(
    "vendor://postgresql/connection/idle_timeout",
    "config_value",
    ObjectValue::Text("recommended_range:1000-30000ms"),
    SourceClass::Observational,
    "PostgreSQL docs recommend idle timeout between 1-30 seconds",
);
```

B) Don't count claims that can't produce conflicts:
```rust
// Only extract claims for concepts with authoritative coverage
if !corpus_index.has_coverage(concept_path) {
    continue;
}
```

C) Report separately:
```
Claims extracted: 458 (97 with corpus coverage, 361 without)
Conflicts found: 3
```

**Effort:** 4-8 hours (depending on approach)
**Priority:** P2 (UX improvement)

---

### Issue 4: Missing Extractors for Common Issues

**Severity:** MEDIUM
**Impact:** Aphoria misses security patterns that other tools catch

**Missing Extractors:**

| Pattern | Priority | Example |
|---------|----------|---------|
| Weak crypto (MD5/SHA1 for non-protocol) | P1 | `Md5::new()` for password hashing |
| Hardcoded crypto constants | P1 | `const IV: [u8; 16] = [0; 16]` |
| Weak RNG | P1 | `rand::thread_rng()` for crypto keys |
| SQL in strings | P2 | `format!("SELECT * FROM {} WHERE", table)` |
| Command injection | P2 | `Command::new(user_input)` |

**Fix:** Create new extractors with corresponding authoritative assertions.

**Effort:** 2-4 hours per extractor
**Priority:** P1 (expands value)

---

### Issue 5: Test File Skipping Not Visible

**Severity:** LOW
**Impact:** Confusion when comparing to other tools

**Current Output:**
```
Scanned: 56 files | Claims: 127 | Conflicts: 0
```

**Expected Output:**
```
Scanned: 56 files (24 test files skipped) | Claims: 127 | Conflicts: 0
```

**Fix:** Track and report skipped files.

**Effort:** 1 hour
**Priority:** P3 (polish)

---

### Issue 6: Library vs Application Not Documented

**Severity:** LOW
**Impact:** Users may be confused when Aphoria finds nothing in well-maintained libraries

**Root Cause:** Aphoria is designed for application code where developers make configuration mistakes. Libraries like reqwest/sqlx have correct defaults.

**Fix:** Add to README and skill:
```markdown
## Best For

Aphoria excels at scanning **application code** where developers:
- Configure TLS, JWT, CORS, rate limiting
- Hardcode secrets or API keys
- Override secure defaults

It's less useful for **library code** which typically:
- Has secure defaults by design
- Exposes configuration APIs rather than making configuration decisions
```

**Effort:** 30 minutes
**Priority:** P3 (documentation)

---

## Action Items

| Priority | Issue | Fix | Effort |
|----------|-------|-----|--------|
| **P0** | Corpus not integrated | Wire CorpusRegistry into EphemeralDetector | 2h |
| **P1** | No diagnostic output | Add `--debug` flag with claim→conflict trace | 4h |
| **P1** | Missing extractors | Add weak crypto, hardcoded constants extractors | 4-8h |
| **P2** | Extractors without corpus | Add authoritative assertions OR report separately | 4-8h |
| **P3** | Test file skipping invisible | Add "(N test files skipped)" to output | 1h |
| **P3** | Library vs app undocumented | Update README and skill | 30m |

**Total Effort:** 15-24 hours

---

## Architectural Insight

The benchmark validated Aphoria's approach:
- **100% precision** proves the knowledge-graph method works
- But **recall is limited by corpus coverage**

The path forward is:
1. Expand the corpus (more RFCs, more OWASP, more vendor docs)
2. Add extractors for patterns that have authoritative guidance
3. Don't extract patterns without corpus (wastes user's mental energy)

The "Claims extracted" number should represent "things we checked against authoritative sources" not "things we regex-matched."