stemedb/applications/aphoria/uat/2026-02-03-lessons-learned.md
jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI
Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 01:24:14 -07:00

6.6 KiB

Lessons Learned: Benchmark Against Semgrep

Date: 2026-02-03 Type: Post-mortem / Improvement Plan


Issues Discovered

Issue 1: Corpus Not Integrated in Ephemeral Mode

Severity: HIGH Impact: Only 11 hardcoded assertions used instead of full RFC/OWASP corpus

Root Cause: EphemeralDetector::new() calls create_authoritative_corpus() which returns the hardcoded 11 assertions. The CorpusRegistry and RFC/OWASP builders from Phase 1 are not invoked.

Current Flow:

EphemeralDetector::new()
  → create_authoritative_corpus()  // 11 hardcoded assertions
  → ConceptIndex::build()

Expected Flow:

EphemeralDetector::new()
  → CorpusRegistry::new()
  → registry.register(HardcodedCorpusBuilder)
  → registry.register(RfcCorpusBuilder)      // RFC 7519, 5246, etc.
  → registry.register(OwaspCorpusBuilder)    // Cheat sheets
  → registry.register(VendorCorpusBuilder)   // PostgreSQL, Redis, etc.
  → registry.build_all()
  → ConceptIndex::build()

Fix:

// In src/episteme/mod.rs
impl EphemeralDetector {
    pub fn new(signing_key: &SigningKey) -> Self {
        // Build full corpus from all sources
        let registry = CorpusRegistry::new();
        registry.register(Box::new(HardcodedCorpusBuilder::new()));
        registry.register(Box::new(RfcCorpusBuilder::new()));
        registry.register(Box::new(OwaspCorpusBuilder::new()));
        registry.register(Box::new(VendorCorpusBuilder::new()));

        let corpus = registry.build_all(signing_key);
        let index = ConceptIndex::build(&corpus);

        Self { corpus, index }
    }
}

Effort: 2 hours Priority: P0 (blocks value demonstration)


Issue 2: No Diagnostic Output for Claim→Conflict Pipeline

Severity: MEDIUM Impact: Can't debug why 458 claims produced 0 conflicts

Root Cause: No visibility into:

  • What claims are extracted (subjects, predicates, values)
  • What index keys are generated
  • What lookups are attempted
  • Why no matches found

Fix: Add --debug flag that outputs:

{
  "claims": [
    {
      "concept_path": "code://rust/sqlx/sqlx-mysql/connection/auth/crypto",
      "predicate": "algorithm",
      "value": "sha1",
      "index_key": "auth/crypto::algorithm",
      "corpus_match": null,
      "reason": "No authoritative assertion for auth/crypto::algorithm"
    }
  ]
}

Effort: 4 hours Priority: P1 (blocks debugging)


Issue 3: Extractors Without Corpus Coverage

Severity: MEDIUM Impact: Wasted extraction work, confusing claim counts

Current State:

Extractor Claims Extracted Conflicts Possible
tls_verify Yes Yes
jwt_config Yes Yes
timeout_config Yes No
dep_versions Yes No

timeout_config and dep_versions extract claims but there's no authoritative corpus to compare against. This inflates "claims_extracted" without producing conflicts.

Fix Options:

A) Add authoritative assertions for timeouts/deps:

create_authoritative_assertion(
    "vendor://postgresql/connection/idle_timeout",
    "config_value",
    ObjectValue::Text("recommended_range:1000-30000ms"),
    SourceClass::Observational,
    "PostgreSQL docs recommend idle timeout between 1-30 seconds",
);

B) Don't count claims that can't produce conflicts:

// Only extract claims for concepts with authoritative coverage
if !corpus_index.has_coverage(concept_path) {
    continue;
}

C) Report separately:

Claims extracted: 458 (97 with corpus coverage, 361 without)
Conflicts found: 3

Effort: 4-8 hours (depending on approach) Priority: P2 (UX improvement)


Issue 4: Missing Extractors for Common Issues

Severity: MEDIUM Impact: Aphoria misses security patterns that other tools catch

Missing Extractors:

Pattern Priority Example
Weak crypto (MD5/SHA1 for non-protocol) P1 Md5::new() for password hashing
Hardcoded crypto constants P1 const IV: [u8; 16] = [0; 16]
Weak RNG P1 rand::thread_rng() for crypto keys
SQL in strings P2 format!("SELECT * FROM {} WHERE", table)
Command injection P2 Command::new(user_input)

Fix: Create new extractors with corresponding authoritative assertions.

Effort: 2-4 hours per extractor Priority: P1 (expands value)


Issue 5: Test File Skipping Not Visible

Severity: LOW Impact: Confusion when comparing to other tools

Current Output:

Scanned: 56 files | Claims: 127 | Conflicts: 0

Expected Output:

Scanned: 56 files (24 test files skipped) | Claims: 127 | Conflicts: 0

Fix: Track and report skipped files.

Effort: 1 hour Priority: P3 (polish)


Issue 6: Library vs Application Not Documented

Severity: LOW Impact: Users may be confused when Aphoria finds nothing in well-maintained libraries

Root Cause: Aphoria is designed for application code where developers make configuration mistakes. Libraries like reqwest/sqlx have correct defaults.

Fix: Add to README and skill:

## Best For

Aphoria excels at scanning **application code** where developers:
- Configure TLS, JWT, CORS, rate limiting
- Hardcode secrets or API keys
- Override secure defaults

It's less useful for **library code** which typically:
- Has secure defaults by design
- Exposes configuration APIs rather than making configuration decisions

Effort: 30 minutes Priority: P3 (documentation)


Action Items

Priority Issue Fix Effort
P0 Corpus not integrated Wire CorpusRegistry into EphemeralDetector 2h
P1 No diagnostic output Add --debug flag with claim→conflict trace 4h
P1 Missing extractors Add weak crypto, hardcoded constants extractors 4-8h
P2 Extractors without corpus Add authoritative assertions OR report separately 4-8h
P3 Test file skipping invisible Add "(N test files skipped)" to output 1h
P3 Library vs app undocumented Update README and skill 30m

Total Effort: 15-24 hours


Architectural Insight

The benchmark validated Aphoria's approach:

  • 100% precision proves the knowledge-graph method works
  • But recall is limited by corpus coverage

The path forward is:

  1. Expand the corpus (more RFCs, more OWASP, more vendor docs)
  2. Add extractors for patterns that have authoritative guidance
  3. Don't extract patterns without corpus (wastes user's mental energy)

The "Claims extracted" number should represent "things we checked against authoritative sources" not "things we regex-matched."