jordan b3e8a9a058 feat: Multi-application expansion with chaos testing and community UI

Major additions:
- Community Next.js app (port 18187) for browsing claims with API docs
- stemedb-chaos crate: Fault injection, chaos testing, CRDT properties
- Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents
- Disputed claims handling: Manual review workflows and validation
- Aphoria security scanner: New extractors (SQL injection, command
  injection, weak crypto, TLS version), policy-based ignores, UAT reports
- Docker infrastructure: Dockerfile, docker-compose.yml for full stack
- VulnBank demo: Intentionally vulnerable multi-language test corpus

SDK & API enhancements:
- Source registry handlers for tracking data provenance
- Metrics endpoint
- Skeptic filtering improvements

Code quality:
- Split 14 large files (>500 lines) into focused modules
- All files now under 500-line limit per project guidelines

Documentation:
- Chaos testing guide, circuit breakers, observability docs
- Phase 7 UAT documentation updates
- Martin Kleppmann technical writer agent

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 01:24:14 -07:00

6.6 KiB

Raw Blame History

Lessons Learned: Benchmark Against Semgrep

Date: 2026-02-03 Type: Post-mortem / Improvement Plan

Issues Discovered

Issue 1: Corpus Not Integrated in Ephemeral Mode

Severity: HIGH Impact: Only 11 hardcoded assertions used instead of full RFC/OWASP corpus

Root Cause: EphemeralDetector::new() calls create_authoritative_corpus() which returns the hardcoded 11 assertions. The CorpusRegistry and RFC/OWASP builders from Phase 1 are not invoked.

Current Flow:

EphemeralDetector::new()
  → create_authoritative_corpus()  // 11 hardcoded assertions
  → ConceptIndex::build()

Expected Flow:

EphemeralDetector::new()
  → CorpusRegistry::new()
  → registry.register(HardcodedCorpusBuilder)
  → registry.register(RfcCorpusBuilder)      // RFC 7519, 5246, etc.
  → registry.register(OwaspCorpusBuilder)    // Cheat sheets
  → registry.register(VendorCorpusBuilder)   // PostgreSQL, Redis, etc.
  → registry.build_all()
  → ConceptIndex::build()

Fix:

// In src/episteme/mod.rs
impl EphemeralDetector {
    pub fn new(signing_key: &SigningKey) -> Self {
        // Build full corpus from all sources
        let registry = CorpusRegistry::new();
        registry.register(Box::new(HardcodedCorpusBuilder::new()));
        registry.register(Box::new(RfcCorpusBuilder::new()));
        registry.register(Box::new(OwaspCorpusBuilder::new()));
        registry.register(Box::new(VendorCorpusBuilder::new()));

        let corpus = registry.build_all(signing_key);
        let index = ConceptIndex::build(&corpus);

        Self { corpus, index }
    }
}

Effort: 2 hours Priority: P0 (blocks value demonstration)

Issue 2: No Diagnostic Output for Claim→Conflict Pipeline

Severity: MEDIUM Impact: Can't debug why 458 claims produced 0 conflicts

Root Cause: No visibility into:

What claims are extracted (subjects, predicates, values)
What index keys are generated
What lookups are attempted
Why no matches found

Fix: Add --debug flag that outputs:

{
  "claims": [
    {
      "concept_path": "code://rust/sqlx/sqlx-mysql/connection/auth/crypto",
      "predicate": "algorithm",
      "value": "sha1",
      "index_key": "auth/crypto::algorithm",
      "corpus_match": null,
      "reason": "No authoritative assertion for auth/crypto::algorithm"
    }
  ]
}

Effort: 4 hours Priority: P1 (blocks debugging)

Issue 3: Extractors Without Corpus Coverage

Severity: MEDIUM Impact: Wasted extraction work, confusing claim counts

Current State:

Extractor	Claims Extracted	Conflicts Possible
`tls_verify`	Yes	Yes
`jwt_config`	Yes	Yes
`timeout_config`	Yes	No
`dep_versions`	Yes	No

timeout_config and dep_versions extract claims but there's no authoritative corpus to compare against. This inflates "claims_extracted" without producing conflicts.

Fix Options:

A) Add authoritative assertions for timeouts/deps:

create_authoritative_assertion(
    "vendor://postgresql/connection/idle_timeout",
    "config_value",
    ObjectValue::Text("recommended_range:1000-30000ms"),
    SourceClass::Observational,
    "PostgreSQL docs recommend idle timeout between 1-30 seconds",
);

B) Don't count claims that can't produce conflicts:

// Only extract claims for concepts with authoritative coverage
if !corpus_index.has_coverage(concept_path) {
    continue;
}

C) Report separately:

Claims extracted: 458 (97 with corpus coverage, 361 without)
Conflicts found: 3

Effort: 4-8 hours (depending on approach) Priority: P2 (UX improvement)

Issue 4: Missing Extractors for Common Issues

Severity: MEDIUM Impact: Aphoria misses security patterns that other tools catch

Missing Extractors:

Pattern	Priority	Example
Weak crypto (MD5/SHA1 for non-protocol)	P1	`Md5::new()` for password hashing
Hardcoded crypto constants	P1	`const IV: [u8; 16] = [0; 16]`
Weak RNG	P1	`rand::thread_rng()` for crypto keys
SQL in strings	P2	`format!("SELECT * FROM {} WHERE", table)`
Command injection	P2	`Command::new(user_input)`

Fix: Create new extractors with corresponding authoritative assertions.

Effort: 2-4 hours per extractor Priority: P1 (expands value)

Issue 5: Test File Skipping Not Visible

Severity: LOW Impact: Confusion when comparing to other tools

Current Output:

Scanned: 56 files | Claims: 127 | Conflicts: 0

Expected Output:

Scanned: 56 files (24 test files skipped) | Claims: 127 | Conflicts: 0

Fix: Track and report skipped files.

Effort: 1 hour Priority: P3 (polish)

Issue 6: Library vs Application Not Documented

Severity: LOW Impact: Users may be confused when Aphoria finds nothing in well-maintained libraries

Root Cause: Aphoria is designed for application code where developers make configuration mistakes. Libraries like reqwest/sqlx have correct defaults.

Fix: Add to README and skill:

## Best For

Aphoria excels at scanning **application code** where developers:
- Configure TLS, JWT, CORS, rate limiting
- Hardcode secrets or API keys
- Override secure defaults

It's less useful for **library code** which typically:
- Has secure defaults by design
- Exposes configuration APIs rather than making configuration decisions

Effort: 30 minutes Priority: P3 (documentation)

Action Items

Priority	Issue	Fix	Effort
P0	Corpus not integrated	Wire CorpusRegistry into EphemeralDetector	2h
P1	No diagnostic output	Add `--debug` flag with claim→conflict trace	4h
P1	Missing extractors	Add weak crypto, hardcoded constants extractors	4-8h
P2	Extractors without corpus	Add authoritative assertions OR report separately	4-8h
P3	Test file skipping invisible	Add "(N test files skipped)" to output	1h
P3	Library vs app undocumented	Update README and skill	30m

Total Effort: 15-24 hours

Architectural Insight

The benchmark validated Aphoria's approach:

100% precision proves the knowledge-graph method works
But recall is limited by corpus coverage

The path forward is:

Expand the corpus (more RFCs, more OWASP, more vendor docs)
Add extractors for patterns that have authoritative guidance
Don't extract patterns without corpus (wastes user's mental energy)

The "Claims extracted" number should represent "things we checked against authoritative sources" not "things we regex-matched."

6.6 KiB Raw Blame History