# Aphoria Roadmap --- ## Phase 0: StemeDB Foundation ✅ > **Tracked in:** [roadmap.md § 5D. Concept Hierarchy](../../roadmap.md) Changes to the core database that Aphoria depends on. Shipped as **Phase 5D** of the main StemeDB roadmap. | Aphoria Phase 0 | StemeDB Phase 5D | Status | |-----------------|------------------|--------| | 0.1 ConceptPath Type | 5D.1 ConceptPath Type | ✅ | | 0.2 ConceptPath in Assertion | (implicit in 5D.1) | ✅ | | 0.3 Hierarchical Index | 5D.4 Hierarchical Query | ✅ | | 0.4 Alias Store | 5D.3 Alias Store + 5D.5 Alias Resolution | ✅ | | 0.5 Source Class Inference | 5D.6 Source Class Inference | ✅ | | 0.6 Concept API Endpoints | 5D.7 Concept API Endpoints | ✅ | **Spec:** [docs/specs/concept-hierarchy.md](../../docs/specs/concept-hierarchy.md) --- ## Phase 2: CLI Core ✅ > Phase 2 was built before Phase 1 (authoritative corpus expansion). The CLI pipeline works end-to-end with a bootstrapped corpus of 11 hardcoded assertions covering TLS, JWT, CORS, secrets, and rate limiting. | Task | Status | |------|--------| | 2.1 Project Walker | ✅ `walker/mod.rs`, `walker/path_mapper.rs`, `walker/language.rs` | | 2.2 Extractors (10) | ✅ `tls_verify`, `jwt_config`, `hardcoded_secrets`, `timeout_config`, `dep_versions`, `cors_config`, `rate_limit`, `weak_crypto`, `command_injection`, `sql_injection` | | 2.3 Ingestion Bridge | ✅ `bridge.rs` — BLAKE3 hashing, Ed25519 signing, claim→assertion conversion | | 2.4 Conflict Query | ✅ `episteme.rs` — LocalEpisteme with check_conflicts() | | 2.5 Report Output | ✅ `report/` — table (comfy-table), JSON, SARIF 2.1.0, markdown | | 2.6 Acknowledge Command | ✅ `lib.rs` acknowledge() | | Baseline & Diff | ✅ `lib.rs` set_baseline(), show_diff() | | Status Command | ✅ `lib.rs` show_status() | 183 tests pass. Clippy and fmt clean. ### Phase 2 Code Quality Fixes ✅ Code review improvements to extractors: | Issue | Fix | Status | |-------|-----|--------| | DES/RC4 concept path misclassification | Split `check_pattern()` into `check_hash_pattern()` and `check_encryption_pattern()`; DES/RC4 now use `crypto/encryption/algorithm` path | ✅ | | SHA1 edge case undocumented | Added comments and test documenting that SHA1 detection is intentionally broad (triggers for git hashes, etc.) | ✅ | | JS exec() regex overly broad | Tightened regex to require `child_process.` prefix or non-word/non-dot preceding character; prevents `RegExp.exec()` false positives | ✅ | --- ## Phase 2A: Concept Matching ✅ > **Status:** Complete. Tail-path matching (2A.1), alias-aware queries (2A.2), and auto-alias creation (2A.3) all implemented. ### 2A.1 Leaf-Based Concept Matching (Aphoria-side fix) ✅ Implemented in `episteme.rs` via `ConceptIndex`: - `make_key(subject, predicate)` extracts tail 2 path segments + predicate - `build(assertions)` creates in-memory index keyed by tail path - `lookup(subject, predicate)` finds matching authoritative assertions - `check_conflicts()` uses `ConceptIndex` instead of `QueryEngine` for cross-scheme matching Integration tests prove TLS and JWT conflicts are detected correctly. ### 2A.2 Alias Resolution in QueryEngine (StemeDB-side fix) ✅ Wired `AliasStore` into `QueryEngine.execute()`: - Added `resolve_aliases: bool` field to `Query` (defaults to `false`) - Added `alias_store: Option>` to `QueryEngine` - Added `.with_alias_store()` builder method - When `resolve_aliases: true`, expands subject via `AliasStore.resolve_all()` before index lookup - Added `fetch_by_subjects()` and `fetch_by_subjects_predicate()` for multi-subject deduplication - Modified `Query.matches()` to skip subject filtering when aliases are resolved - Skips fast path (MV lookup) when `resolve_aliases: true` - Gracefully degrades when no alias store is configured 7 unit tests in `engine/tests/alias_resolution.rs`. This is the architecturally correct long-term fix that complements leaf matching. ### 2A.3 Auto-Alias Creation ✅ When Aphoria ingests authoritative assertions and code claims that share leaf names, automatically create aliases: - `code://rust/myapp/tls/cert_verification` ↔ `rfc://5246/tls/cert_verification` - `code://rust/myapp/auth/jwt/audience_validation` ↔ `rfc://7519/jwt/audience_validation` This bridges 2A.1 (leaf matching) with 2A.2 (alias resolution) — leaf matching identifies candidates, aliases persist the relationship. **Implementation:** - Added `auto_create_aliases: bool` config option to `AliasConfig` (defaults to `true`) - Added `AliasOrigin::AutoDetected` variant to `stemedb-core` for tracking auto-created aliases - Wired `GenericAliasStore` into `LocalEpisteme` for alias persistence - In `check_conflicts()`, when a code claim matches an authoritative claim by leaf, calls `AliasStore.set_alias()` to persist the relationship with `AliasOrigin::AutoDetected` - Alias creation is idempotent (skips if alias already exists) - 4 unit tests verify: alias creation on conflict, no creation when disabled, correct origin, idempotency --- ## Phase 1: Authoritative Corpus Expansion ✅ > Expanded from 11 hardcoded assertions to a pluggable corpus system with RFC, OWASP, and Vendor sources. ### Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ aphoria corpus build │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────┐ │ │ │ RFC Ingester │ │ OWASP │ │ Vendor Bootstrapper │ │ │ │ (Tier 0) │ │ Ingester │ │ (Tier 2) │ │ │ │ │ │ (Tier 1) │ │ │ │ │ └──────┬───────┘ └──────┬───────┘ └───────────┬───────────┘ │ │ │ │ │ │ │ └─────────────────┼──────────────────────┘ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ CorpusRegistry │ │ │ └────────┬────────┘ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ LocalEpisteme │ │ │ │ ingest_ │ │ │ │ authoritative() │ │ │ └─────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ### 1.1 CorpusBuilder Trait ✅ | Task | Status | |------|--------| | `CorpusBuilder` trait | ✅ `corpus/mod.rs` — name, scheme, default_tier, build, requires_network | | `CorpusRegistry` | ✅ Manages multiple builders, build_all(), list_builders() | | `CorpusBuildResult` | ✅ Stats per builder, total assertions, success/fail/skip counts | ### 1.2 RFC Ingester ✅ | Task | Status | |------|--------| | `RfcCorpusBuilder` | ✅ `corpus/rfc.rs` | | HTTP fetching | ✅ Via `ureq`, cached to `~/.cache/aphoria/rfc-cache/` | | RFC 2119 keyword parsing | ✅ MUST, MUST NOT, SHOULD, SHALL extraction | | RFC-specific parsers | ✅ JWT (7519), OAuth (6749), Bearer (6750), TLS 1.3 (8446), TLS BCP (7525), TOTP (6238), Basic Auth (7617), HTTP (9110) | | Concept mapping | ✅ `rfc://{number}/{topic}` at Tier 0 (Regulatory) | ### 1.3 OWASP Ingester ✅ | Task | Status | |------|--------| | `OwaspCorpusBuilder` | ✅ `corpus/owasp.rs` | | HTTP fetching | ✅ From GitHub raw content, cached to `~/.cache/aphoria/owasp-cache/` | | Markdown parsing | ✅ MUST/SHOULD statements, section context | | Cheat sheet parsers | ✅ Authentication, JWT, TLS, Secrets, Input Validation, Session, CSRF, Password Storage, HTTP Headers | | Concept mapping | ✅ `owasp://cheatsheet/{topic}/{claim}` at Tier 1 (Clinical) | ### 1.4 Vendor Docs ✅ | Task | Status | |------|--------| | `VendorCorpusBuilder` | ✅ `corpus/vendor.rs` | | PostgreSQL claims | ✅ pool_size, idle_timeout, ssl_mode | | Redis claims | ✅ timeout, max_retries, tls | | reqwest claims | ✅ cert_verification, connect_timeout, request_timeout | | hyper claims | ✅ keep_alive_timeout, max_concurrent_streams | | Go net/http claims | ✅ read_timeout, write_timeout, idle_timeout, min_tls_version | | tokio-postgres claims | ✅ pool_size, ssl_mode | | SQLx claims | ✅ max_connections, idle_timeout | | Concept mapping | ✅ `vendor://{product}/{topic}/{claim}` at Tier 2 (Observational) | ### 1.5 Hardcoded Refactor ✅ | Task | Status | |------|--------| | `HardcodedCorpusBuilder` | ✅ `corpus/hardcoded.rs` — original 11 assertions | | `create_authoritative_assertion()` | ✅ Made public in `episteme.rs` for corpus builders | ### 1.6 CLI Integration ✅ | Task | Status | |------|--------| | `aphoria corpus build` | ✅ Fetches and ingests from all sources | | `--only rfc,owasp,vendor` | ✅ Filter to specific sources | | `--offline` | ✅ Skip network-requiring sources | | `--clear-cache` | ✅ Clear cache before building | | `aphoria corpus list` | ✅ List available corpus sources | | `CorpusConfig` | ✅ cache_dir, include_*, rfc_list options | ### 1.7 Error Handling ✅ | Task | Status | |------|--------| | `RfcFetch` error | ✅ Per-RFC fetch failures with context | | `OwaspFetch` error | ✅ Per-cheat-sheet fetch failures with context | | `CorpusBuild` error | ✅ General corpus build failures | | Graceful degradation | ✅ Continue with other sources if one fails | **Files:** `corpus/mod.rs`, `corpus/hardcoded.rs`, `corpus/rfc.rs`, `corpus/owasp.rs`, `corpus/vendor.rs` --- ## Phase 3: Skill Integration ✅ > Complete. Aphoria is now usable in Claude Code agent workflows. ### 3.1 Claude Code Skill ✅ | Task | Status | |------|--------| | `skill/SKILL.md` | ✅ Comprehensive skill definition with all commands | | `/aphoria scan` | ✅ Scan project, show conflicts grouped by verdict | | `/aphoria scan --fix` | ✅ Interactive fix workflow | | `/aphoria ack` | ✅ Acknowledge conflicts as intentional | | `/aphoria status` | ✅ Show status and baseline | | `/aphoria diff` | ✅ Show changes since baseline | | `/aphoria init` | ✅ Initialize Aphoria | | `/aphoria baseline` | ✅ Set baseline | | `skill/install.sh` | ✅ Install script for `~/.claude/skills/aphoria/` | **Files:** `skill/SKILL.md`, `skill/install.sh`, `skill/hooks.json` ### 3.2 Agent Pre-Flight Hook ✅ | Task | Status | |------|--------| | `--exit-code` flag | ✅ Returns 2 for BLOCK, 1 for FLAG only, 0 for clean | | `--strict` flag | ✅ Lower thresholds (FLAG at 0.3, BLOCK at 0.5) | | Hook template | ✅ `skill/hooks.json` with PreCommit and PrePush examples | **Usage:** ```json { "hooks": { "PreCommit": [{"command": "aphoria scan --format sarif --exit-code"}], "PrePush": [{"command": "aphoria scan --strict --exit-code"}] } } ``` ### 3.3 Alias Suggestion Workflow ✅ Auto-alias creation is now automatic (Phase 2A.3). When Aphoria scans: 1. Tail-path matching finds authoritative assertions 2. Aliases are auto-created with `AliasOrigin::AutoDetected` 3. Future queries use the alias automatically The skill documents the suggestion flow for manual alias management: - **y (Accept)**: Creates alias - **n (Reject)**: Records intentional difference - **defer**: Flags for later review --- ## Phase 4: Pre-Commit Integration ⬜ > Depends on Phase 3 (skill validates the UX before hook automation). ### 4.1 Git Pre-Commit Hook ⬜ A git pre-commit hook that runs Aphoria before every commit: ```bash #!/bin/sh # .git/hooks/pre-commit aphoria scan --exit-code --format table if [ $? -eq 2 ]; then echo "BLOCKED: Fix conflicts before committing" exit 1 fi ``` Or using pre-commit framework (`.pre-commit-config.yaml`): ```yaml repos: - repo: local hooks: - id: aphoria name: Aphoria Security Lint entry: aphoria scan --exit-code language: system pass_filenames: false ``` ### 4.2 Baseline Mode ✅ Already implemented in Phase 2. For existing projects with many conflicts: ``` $ aphoria baseline Baseline recorded: 12 existing conflicts frozen. Future scans will only report new conflicts. ``` ### 4.3 Diff-Only Scanning ⬜ Scan only changed files instead of the whole project: ```bash # Scan only staged files aphoria scan --staged # Scan only files changed since baseline aphoria scan --since-baseline ``` This makes pre-commit hooks fast even in large projects. --- ## Phase 4.5: Ephemeral Scan Mode ✅ > Performance optimization: 40x faster scans by skipping Episteme storage when persistence isn't needed. ### Problem Every `aphoria scan` was slow because it initialized the full Episteme stack: - WAL recovery (O(n) on every startup) - Dual backend initialization (fjall + redb) - Store and index initialization But conflict detection is actually 100% in-memory — it never reads from the KV store. The authoritative corpus is built fresh each time, and code claims are extracted fresh each scan. ### Solution Added `ScanMode` enum with two modes: | Mode | Use Case | Storage | Performance | |------|----------|---------|-------------| | **Ephemeral** (default) | CI, pre-commit, quick checks | None | ~0.25 seconds | | **Persistent** | Baseline/diff tracking, alias creation | WAL + store | ~1-2 seconds | ### Implementation ✅ | Task | Status | |------|--------| | `ScanMode` enum | ✅ `types.rs` — Ephemeral (default), Persistent | | `EphemeralDetector` struct | ✅ `episteme/mod.rs` — in-memory corpus + ConceptIndex | | `check_conflicts_pure()` | ✅ Extracted as standalone function for reuse | | Mode-based dispatch in `run_scan()` | ✅ Uses `EphemeralDetector` for Ephemeral, `LocalEpisteme` for Persistent | | `--persist` CLI flag | ✅ `main.rs` — opt-in to persistent mode | | Tests for both modes | ✅ `test_ephemeral_scan_no_storage_created`, `test_persistent_scan_creates_storage`, `test_scan_modes_produce_same_conflicts` | ### Usage ```bash # Fast ephemeral scan (default) — no storage created aphoria scan . # Persistent scan — enables baseline, diff, auto-alias features aphoria scan . --persist ``` ### Performance | Mode | Time | Storage | |------|------|---------| | Ephemeral | ~0.25s | None | | Persistent | ~1-2s | WAL + store directories | **Files:** `types.rs`, `episteme/mod.rs`, `lib.rs`, `main.rs`, `tests.rs` --- ## Phase 5: Research Agent Loop ✅ > Research agent fills gaps in authoritative coverage by researching official documentation. ### 5.1 Gap Detection ✅ | Task | Status | |------|--------| | `Gap` struct | ✅ `research/gap_detector.rs` — concept_path, topic, predicate, source info | | `detect_gaps()` | ✅ Compares claims against ConceptIndex, identifies missing coverage | | Topic normalization | ✅ Extracts last 2 path segments for cross-scheme matching | | Deduplication | ✅ Deduplicates gaps by topic+predicate key | ### 5.2 Gap Storage ✅ | Task | Status | |------|--------| | `GapRecord` | ✅ `research/gap_store.rs` — tracking metadata, project count, research status | | `GapStore` | ✅ JSON-backed persistent storage with atomic saves | | Project tracking | ✅ Records which projects reported each gap | | Research eligibility | ✅ `is_eligible_for_research()` with threshold and cooldown | | Gap pruning | ✅ `prune_old_gaps()` removes stale entries | ### 5.3 Quality Validation ✅ | Task | Status | |------|--------| | `QualityValidator` | ✅ `research/quality.rs` — validates researched claims | | Source attribution | ✅ Checks for authoritative domains (rfc-editor, owasp, vendor docs) | | Normative language | ✅ Verifies MUST/SHOULD/SHALL keywords present | | Vague content detection | ✅ Rejects "it depends", "typically", etc. | | Consistency scoring | ✅ Detects conflicting claims on same subject | | `QualityReport` | ✅ Detailed per-claim validation results | | `filter_passed()` | ✅ Returns only claims meeting quality threshold | ### 5.4 Research Execution ✅ | Task | Status | |------|--------| | `Researcher` | ✅ `research/researcher.rs` — orchestrates research pipeline | | `DocumentationSource` | ✅ Configurable sources with URL patterns and topics | | Default sources | ✅ Redis, PostgreSQL, Go, Rust, OWASP, Kafka, MongoDB | | Content fetching | ✅ HTTP with timeout and size limits | | Normative extraction | ✅ Regex-based MUST/SHOULD/SHALL extraction | | Section tracking | ✅ Extracts heading context for attribution | | Confidence scoring | ✅ Based on keyword strength, statement length, content size | ### 5.5 CLI Integration ✅ | Task | Status | |------|--------| | `aphoria research run` | ✅ Run research agent with configurable threshold | | `aphoria research status` | ✅ Show gap statistics and research progress | | `aphoria research gaps` | ✅ List gaps by project count | | `--threshold` | ✅ Minimum projects before researching (default: 3) | | `--strict` | ✅ Use strict quality validation | | `--prune` | ✅ Remove stale gaps before researching | | `--ready` | ✅ Show only gaps ready for research | **Files:** `research/mod.rs`, `research/gap_detector.rs`, `research/gap_store.rs`, `research/quality.rs`, `research/researcher.rs`, `research/tests.rs` ### 5.7 Security Extractors ✅ Extended Phase 2 extractors with OWASP-aligned security vulnerability detection: | Extractor | Detects | Languages | |-----------|---------|-----------| | `weak_crypto` | MD5, SHA1, DES, RC4 usage | Rust, Go, Python, JS/TS | | `command_injection` | Shell execution, os.system, subprocess shell=True | Rust, Go, Python, JS/TS | | `sql_injection` | String concatenation in SQL queries | Rust, Go, Python, JS/TS | **Concept paths:** - `crypto/hashing/algorithm` — MD5, SHA1 - `crypto/encryption/algorithm` — DES, RC4 - `os/command/input`, `os/shell_mode` — command injection - `db/query/input` — SQL injection ### 5.6 Community Corpus Contributions ⬜ > Future: Users can opt in to contribute patterns anonymously. - "Every Rust project has this JWT pattern" → pre-built alias set - "This Redis config is always acknowledged" → adjust default threshold - "This TLS pattern is always a real bug" → elevate threshold --- ## Phase 6: Federated Policy & Trust Packs ✅ > Allow teams to define their own authoritative truths and distribute them as signed Trust Packs. This enables "Enterprise Grade" compliance across distributed teams. ### 6.1 Trust Pack Format ✅ | Task | Status | |------|--------| | `TrustPack` schema | ✅ `policy.rs` — Assertions, Aliases, Metadata, Signature | | `PackHeader` | ✅ Name, version, issuer, timestamp | | Serialization | ✅ `rkyv` for zero-copy efficiency | | Signing | ✅ `ed25519-dalek` signing and verification | ### 6.2 Policy Management ✅ | Task | Status | |------|--------| | `PolicyManager` | ✅ Loads local and remote (HTTP/HTTPS) policies | | Caching | ✅ Caches remote policies in `~/.cache/aphoria/policies/` | | `aphoria.toml` config | ✅ `policies` list support | ### 6.3 Core Integration ✅ | Task | Status | |------|--------| | `EphemeralDetector` integration | ✅ Ingests policies into memory corpus/index | | `check_conflicts_pure` update | ✅ Resolves policy aliases before authoritative lookup | | `LocalEpisteme` export helpers | ✅ `fetch_acknowledgments`, `fetch_manual_aliases` | ### 6.4 CLI Commands ✅ | Task | Status | |------|--------| | `aphoria policy export` | ✅ Exports local `ack` decisions as a Trust Pack | | `aphoria scan` policy loading | ✅ Auto-loads policies from config | **Files:** `policy.rs`, `config.rs`, `episteme/mod.rs`, `lib.rs`, `main.rs` --- ## Phase 7: Declarative Extractors ⬜ > Enable users to define new extractors in config/policy files (YAML/TOML) without writing Rust code. This removes the recompilation bottleneck for custom pattern enforcement. ### 7.1 Declarative Schema ⬜ Define a schema for pattern-based extraction: ```yaml extractors: - name: "api_style" language: "go" pattern: 'func \w+\(.*\) \[\]\w+' claim: subject: "api/response_format" predicate: "structure" object: "raw_array" ``` ### 7.2 Implementation Tasks ⬜ | Task | Description | |------|-------------| | `DeclarativeExtractor` | Generic extractor implementation reading from config | | `ExtractorConfig` update | Load declarative definitions from `aphoria.toml` and Trust Packs | | `Regex` optimization | Pre-compile all declarative patterns | | Validation | Ensure valid regex and claim structure at load time | --- ## Milestone Summary | Phase | Deliverable | Depends On | Status | |-------|-------------|------------|--------| | 0 | ConceptPath in StemeDB | concept-hierarchy spec | ✅ | | 2 | Aphoria CLI (scan, report, ack) | Phase 0 | ✅ | | 2A.1 | Leaf-based concept matching | Phase 2 | ✅ | | 2A.2 | Alias resolution in QueryEngine | Phase 2 | ✅ | | 2A.3 | Auto-alias creation | Phase 2A.2 | ✅ | | 1 | Authoritative corpus expansion | Phase 0 | ✅ | | 3 | Claude Code skill + hooks | Phase 2A | ✅ | | 4.5 | Ephemeral scan mode (40x faster) | Phase 2 | ✅ | | 5 | Research agent loop | Phase 3 | ✅ | | 5.7 | Security extractors (weak_crypto, command_injection, sql_injection) | Phase 2 | ✅ | | 6 | Federated Policy & Trust Packs | Phase 4.5 | ✅ | | **7** | **Declarative Extractors** | **Phase 6** | **⬜ NEXT** | | **4** | **Pre-commit integration (git hooks, diff scanning)** | **Phase 3, 4.5** | **⬜ PLANNED** | **Current state:** - Phase 1 is complete: RFC, OWASP, and Vendor corpus builders with `aphoria corpus build` CLI - Phase 2 expanded: 10 extractors including `weak_crypto`, `command_injection`, `sql_injection` - Phase 2 code quality: DES/RC4 concept paths fixed, SHA1 behavior documented, JS exec() regex tightened - Phase 2A is complete: conflict detection via tail-path matching, alias-aware QueryEngine, and auto-alias creation - Phase 3 is complete: `/aphoria` skill installed to `~/.claude/skills/aphoria/`, hook templates ready - Phase 4.5 is complete: Ephemeral scan mode with 40x faster performance for CI/pre-commit use - Phase 5 is complete: Research agent with gap detection, quality validation, and official doc research - Phase 6 is complete: Federated Policy & Trust Packs implemented (export, import, signing, remote loading) - **183 tests pass. Clippy and fmt clean.** **Next:** Phase 7 — Declarative Extractors. This will allow Trust Packs to ship both the *Policy* (Assertion) and the *Detection Logic* (Extractor) in a single file.