# Aphoria Roadmap --- ## Phase 0: StemeDB Foundation Changes to the core database that Aphoria depends on. These ship before the CLI. ### 0.1 ConceptPath Type Add the `ConceptPath` struct to `stemedb-core`. Parsing, validation, wire format (`scheme://segments/leaf`), prefix matching, parent traversal. Backward-compatible: bare strings parse as `custom://{string}`. **Depends on:** [concept-hierarchy spec](../../docs/specs/concept-hierarchy.md) **Crate:** `stemedb-core` ### 0.2 ConceptPath in Assertion Replace `Assertion.subject: EntityId` with `Assertion.subject: ConceptPath`. Update rkyv serialization. Update all downstream consumers (ingestion, query, lenses, API, tests). **Crate:** `stemedb-core`, `stemedb-ingest`, `stemedb-query`, `stemedb-lens`, `stemedb-api` ### 0.3 Hierarchical Index Update `IndexStore` key construction to use ConceptPath wire format. Verify that `scan_prefix` on `S:{concept_path}/` returns all descendants. No new index structure needed — the `/` in the path maps to byte-level prefix scanning. **Crate:** `stemedb-storage` ### 0.4 Alias Store Add `CA:` (alias → canonical) and `CAR:` (canonical → all aliases) key prefixes. Implement alias resolution in the query path: lookup aliases before index scan, merge results, deduplicate. Transitive alias resolution. **Crate:** `stemedb-storage`, `stemedb-query` ### 0.5 Source Class Inference Wire scheme-based tier inference into ingestion. If no explicit `source_class` is set, infer from ConceptPath scheme. `rfc://` → Tier 0, `code://` → Tier 3, etc. **Crate:** `stemedb-ingest` ### 0.6 Concept API Endpoints ``` POST /v1/concepts/alias Create alias GET /v1/concepts/aliases/{path} List aliases for a path DELETE /v1/concepts/alias Remove alias GET /v1/concepts/tree/{prefix} Browse hierarchy under prefix GET /v1/concepts/suggest Suggested aliases (shared leaf detection) ``` **Crate:** `stemedb-api` --- ## Phase 1: Authoritative Corpus Before Aphoria can find conflicts, Episteme needs the authoritative sources to conflict against. ### 1.1 RFC Ingester A CLI tool (or ingestion module) that: - Fetches RFC text from `rfc-editor.org` (text format, no PDF parsing needed) - Extracts normative statements (MUST, MUST NOT, SHOULD, SHALL per RFC 2119) - Maps each statement to a ConceptPath: `rfc://{number}/{topic}/{claim}` - Ingests as Tier 0 assertions Start with a curated list of security-relevant RFCs: | RFC | Topic | |-----|-------| | 7519 | JWT | | 6749 | OAuth 2.0 | | 6750 | Bearer tokens | | 8446 | TLS 1.3 | | 7525 | TLS best practices | | 6238 | TOTP | | 7617 | HTTP Basic Auth | | 9110 | HTTP Semantics | ### 1.2 OWASP Ingester Parse OWASP Cheat Sheets (markdown source on GitHub): - Extract each recommendation as a claim - Map to `owasp://cheatsheet/{topic}/{claim}` - Ingest as Tier 1 assertions Priority cheat sheets: Authentication, JWT, TLS, Secrets Management, Input Validation, Session Management. ### 1.3 Vendor Docs (Manual Bootstrap) For v1, manually curate a small set of vendor doc claims: - Postgres connection pool recommendations - Redis timeout defaults - Common HTTP client library defaults (reqwest, hyper, net/http) These are `vendor://{product}/{topic}/{claim}` at Tier 2. This doesn't need to be exhaustive. It needs to cover the claims that Aphoria's extractors will actually find in code. --- ## Phase 2: CLI Core The Aphoria binary itself. ### 2.1 Project Walker Input: a project root path. Output: a list of files to scan, each tagged with: - Language (rust, go, python, typescript, yaml, toml, json) - ConceptPath prefix derived from directory structure ``` crates/citadeldb/src/auth/jwt.rs → language: rust → prefix: code://rust/citadeldb/auth/jwt ``` Normalization rules: - Strip `src/`, `lib/`, `pkg/`, `internal/` (language boilerplate) - Strip `crates/`, `packages/`, `apps/` (monorepo wrappers) - Map `config/`, `deploy/`, `infra/` to `code://config/{project}/...` - File extension determines language, not directory ### 2.2 Extractors Each extractor is a module that: - Takes a file path + content + language - Returns a `Vec` Ship these extractors in v1: | Extractor | What it finds | Languages | |-----------|--------------|-----------| | `tls_verify` | TLS certificate verification disabled | rust, go, python, js/ts | | `jwt_config` | JWT validation settings (aud, exp, alg) | rust, go, python, js/ts | | `hardcoded_secrets` | Credentials in source (not .env) | all | | `timeout_config` | HTTP/DB/Redis timeout values | all (config files) | | `dep_versions` | Known-vulnerable dependency versions | Cargo.toml, go.mod, package.json, requirements.txt | | `cors_config` | CORS allow-origin settings | rust, go, js/ts | | `rate_limit` | Rate limiting disabled or unreasonable | rust, go, js/ts | Extractors use regex + AST patterns, not LLMs. Each extractor declares: - The patterns it searches for - The ConceptPath leaf it maps to - The predicate (e.g., `config_value`, `enabled`, `version`) - How to extract the ObjectValue from the match ### 2.3 Ingestion Bridge Connect extractor output to the Episteme ingestion pipeline: ``` ExtractedClaim { path: code://rust/citadeldb/auth/jwt/audience_validation predicate: "enabled" value: Boolean(false) source_location: "src/auth/jwt.rs:47" confidence: 1.0 // regex match, not heuristic } ↓ Assertion { subject: ConceptPath::parse("code://rust/citadeldb/auth/jwt/audience_validation") predicate: "enabled" object: ObjectValue::Boolean(false) source_class: SourceClass::Expert // inferred from code:// scheme source_hash: blake3(file_content) source_metadata: { "file": "src/auth/jwt.rs", "line": 47 } confidence: 1.0 lifecycle: LifecycleStage::Approved // code is deployed, it's a fact about the code } ``` The bridge handles: - ConceptPath construction from extractor output - Source hash computation (BLAKE3 of the file at scan time) - Source metadata encoding (file path, line number, extraction method) - Signing with the Aphoria agent's keypair ### 2.4 Conflict Query After ingestion, query Episteme for each extracted concept: ```rust for claim in extracted_claims { let results = query_engine.query(Query { subject: Some(claim.path.to_string()), resolve_aliases: true, hierarchical: false, lens: Some("skeptic"), ..Default::default() }); if results.conflict_score > threshold { report.add_conflict(claim, results); } } ``` The Skeptic lens returns all claims for the concept across all aliased paths, with a conflict score. If the code claim (Tier 3) contradicts an RFC claim (Tier 0), the conflict score will be high because of the tier spread. ### 2.5 Report Output ``` $ aphoria scan ./citadeldb --format table ┌──────────────────────────────────────────────────────────────────────┐ │ Aphoria Report: citadeldb │ │ Scanned: 142 files │ Claims: 23 │ Conflicts: 3 │ ├──────────┬───────────────────────────────────────┬──────────┬───────┤ │ Verdict │ Concept │ Score │ Tier │ ├──────────┼───────────────────────────────────────┼──────────┼───────┤ │ BLOCK │ auth/jwt/audience_validation │ 0.92 │ 0↔3 │ │ BLOCK │ net/tls/cert_verification │ 0.87 │ 1↔3 │ │ FLAG │ http/timeout │ 0.54 │ 2↔3 │ └──────────┴───────────────────────────────────────┴──────────┴───────┘ Details: BLOCK code://rust/citadeldb/auth/jwt/audience_validation Your code: aud validation disabled (src/auth/jwt.rs:47) RFC 7519: aud validation MUST be enabled (Tier 0) Action: Fix or acknowledge with: aphoria ack --reason "..." BLOCK code://rust/citadeldb/net/tls/cert_verification Your code: verify = false (src/net/client.rs:23) OWASP: verification required (Tier 1) Action: Fix or acknowledge with: aphoria ack --reason "..." FLAG code://rust/citadeldb/http/timeout Your code: timeout = 0 (infinite) (config/production.yaml:8) reqwest: default timeout 30s (Tier 2) Action: Review recommended ``` Output formats: `table` (default), `json`, `sarif` (for CI integration), `markdown`. ### 2.6 Acknowledge Command ``` $ aphoria ack code://rust/citadeldb/auth/jwt/audience_validation \ --reason "Internal service, no external JWT consumers. Accepted risk per SEC-2024-003." ``` This creates a new Assertion: - Subject: `internal://decision/citadeldb/auth/jwt/audience_validation` - Predicate: `deviation_accepted` - Object: Text with the reason - SourceClass: Expert (Tier 3) - Aliased to: `code://rust/citadeldb/auth/jwt/audience_validation` The conflict still exists in Episteme, but the acknowledgment is recorded. Next scan, the conflict still shows but with context: "Acknowledged by [agent] on [date]: [reason]." The Skeptic lens sees the acknowledgment as an additional claim in the space. --- ## Phase 3: Skill Integration ### 3.1 Claude Code Skill A `/aphoria` skill that wraps the CLI: ``` /aphoria scan Scan current project, report conflicts /aphoria scan --fix Scan and offer to fix each conflict /aphoria ack Acknowledge a conflict with a reason /aphoria status Show current conflict summary /aphoria diff Show new conflicts since last scan ``` The skill runs the CLI binary, parses the JSON output, and presents results inline in the Claude Code session. ### 3.2 Agent Pre-Flight Hook A Claude Code hook that runs Aphoria before certain operations: ```json { "hooks": { "pre-commit": "aphoria scan --format sarif --exit-code", "pre-deploy": "aphoria scan --strict --exit-code" } } ``` `--exit-code` returns non-zero if any BLOCK verdicts exist, preventing the commit or deploy. ### 3.3 Alias Suggestion Workflow When Aphoria scans a new project and finds concepts that share leaf names with existing authoritative paths, it prompts: ``` New concept detected: code://rust/newproject/auth/jwt/audience_validation Suggested alias: → rfc://7519/jwt/audience_validation (Tier 0, RFC 7519 Section 4.1.3) Accept? [y/n/defer] ``` Accepting creates the alias. Deferring flags it for later review. Rejecting records that these are intentionally different concepts. --- ## Phase 4: CI Integration ### 4.1 GitHub Action ```yaml - name: Aphoria Scan uses: orchard9/aphoria-action@v1 with: episteme-url: ${{ secrets.EPISTEME_URL }} fail-on: block format: sarif ``` Publishes SARIF results to GitHub Security tab. BLOCK verdicts fail the check. FLAG verdicts appear as warnings. ### 4.2 PR Comment Bot On pull request, Aphoria scans the diff (not the whole project) and comments: ``` ## Aphoria Report This PR introduces 1 new conflict: | File | Conflict | Score | |------|----------|-------| | src/auth/jwt.rs:47 | Disables aud validation (RFC 7519 requires it) | 0.92 | Run `aphoria ack` to acknowledge, or fix before merge. ``` ### 4.3 Baseline Mode For existing projects with many conflicts, `aphoria baseline` records the current state. Subsequent scans only report *new* conflicts. This prevents the "500 warnings so we ignore all of them" problem. ``` $ aphoria baseline Baseline recorded: 12 existing conflicts frozen. Future scans will only report new conflicts. ``` --- ## Phase 5: Research Agent Loop ### 5.1 Gap Detection When Aphoria extracts a claim and no authoritative source exists for that concept, log it as a gap: ``` GAP: code://rust/citadeldb/cache/redis/max_memory_policy No authoritative source found for redis/max_memory_policy Seen in 3 projects ``` ### 5.2 Research Agent Trigger When a gap is seen across N projects (configurable, default 3), dispatch a research agent: 1. Agent searches for authoritative documentation on `redis max_memory_policy` 2. Finds Redis official docs 3. Extracts normative claims: "default is `noeviction`, recommended `allkeys-lru` for cache use cases" 4. Ingests as `vendor://redis/cache/max_memory_policy` at Tier 2 5. Future Aphoria scans now have something to conflict against ### 5.3 Community Corpus Contributions Users who run Aphoria can opt in to contribute their alias mappings and acknowledgment patterns (anonymized) to a shared corpus. Common patterns propagate: - "Every Rust project has this JWT pattern" → pre-built alias set for Rust JWT libraries - "This Redis config is always flagged and always acknowledged" → lower the default threshold for that concept - "This TLS pattern is always a real bug" → elevate the default threshold --- ## Milestone Summary | Phase | Deliverable | Depends On | |-------|-------------|------------| | 0 | ConceptPath in StemeDB | concept-hierarchy spec | | 1 | Authoritative corpus (RFCs, OWASP) | Phase 0 | | 2 | Aphoria CLI (scan, report, ack) | Phase 0, Phase 1 | | 3 | Claude Code skill + hooks | Phase 2 | | 4 | CI integration (GitHub Action, PR bot) | Phase 2 | | 5 | Research agent loop | Phase 2, Phase 4 (gap data) | Phase 0 and Phase 1 can run in parallel — the corpus ingestion uses the ConceptPath types as they're built. Phase 2 is the critical path. Everything after Phase 2 is distribution and flywheel.