# Aphoria Roadmap --- ## Phase 0: StemeDB Foundation ✅ > **Tracked in:** [roadmap.md § 5D. Concept Hierarchy](../../roadmap.md) Changes to the core database that Aphoria depends on. Shipped as **Phase 5D** of the main StemeDB roadmap. | Aphoria Phase 0 | StemeDB Phase 5D | Status | |-----------------|------------------|--------| | 0.1 ConceptPath Type | 5D.1 ConceptPath Type | ✅ | | 0.2 ConceptPath in Assertion | (implicit in 5D.1) | ✅ | | 0.3 Hierarchical Index | 5D.4 Hierarchical Query | ✅ | | 0.4 Alias Store | 5D.3 Alias Store + 5D.5 Alias Resolution | ✅ | | 0.5 Source Class Inference | 5D.6 Source Class Inference | ✅ | | 0.6 Concept API Endpoints | 5D.7 Concept API Endpoints | ✅ | **Spec:** [docs/specs/concept-hierarchy.md](../../docs/specs/concept-hierarchy.md) --- ## Phase 2: CLI Core ✅ > Phase 2 was built before Phase 1 (authoritative corpus expansion). The CLI pipeline works end-to-end with a bootstrapped corpus of 11 hardcoded assertions covering TLS, JWT, CORS, secrets, and rate limiting. | Task | Status | |------|--------| | 2.1 Project Walker | ✅ `walker/mod.rs`, `walker/path_mapper.rs`, `walker/language.rs` | | 2.2 Extractors (10) | ✅ `tls_verify`, `jwt_config`, `hardcoded_secrets`, `timeout_config`, `dep_versions`, `cors_config`, `rate_limit`, `weak_crypto`, `command_injection`, `sql_injection` | | 2.3 Ingestion Bridge | ✅ `bridge.rs` — BLAKE3 hashing, Ed25519 signing, claim→assertion conversion | | 2.4 Conflict Query | ✅ `episteme.rs` — LocalEpisteme with check_conflicts() | | 2.5 Report Output | ✅ `report/` — table (comfy-table), JSON, SARIF 2.1.0, markdown | | 2.6 Acknowledge Command | ✅ `lib.rs` acknowledge() | | Baseline & Diff | ✅ `lib.rs` set_baseline(), show_diff() | | Status Command | ✅ `lib.rs` show_status() | 183 tests pass. Clippy and fmt clean. ### Phase 2 Code Quality Fixes ✅ Code review improvements to extractors: | Issue | Fix | Status | |-------|-----|--------| | DES/RC4 concept path misclassification | Split `check_pattern()` into `check_hash_pattern()` and `check_encryption_pattern()`; DES/RC4 now use `crypto/encryption/algorithm` path | ✅ | | SHA1 edge case undocumented | Added comments and test documenting that SHA1 detection is intentionally broad (triggers for git hashes, etc.) | ✅ | | JS exec() regex overly broad | Tightened regex to require `child_process.` prefix or non-word/non-dot preceding character; prevents `RegExp.exec()` false positives | ✅ | --- ## Phase 2A: Concept Matching ✅ > **Status:** Complete. Tail-path matching (2A.1), alias-aware queries (2A.2), and auto-alias creation (2A.3) all implemented. ### 2A.1 Leaf-Based Concept Matching (Aphoria-side fix) ✅ Implemented in `episteme.rs` via `ConceptIndex`: - `make_key(subject, predicate)` extracts tail 2 path segments + predicate - `build(assertions)` creates in-memory index keyed by tail path - `lookup(subject, predicate)` finds matching authoritative assertions - `check_conflicts()` uses `ConceptIndex` instead of `QueryEngine` for cross-scheme matching Integration tests prove TLS and JWT conflicts are detected correctly. ### 2A.2 Alias Resolution in QueryEngine (StemeDB-side fix) ✅ Wired `AliasStore` into `QueryEngine.execute()`: - Added `resolve_aliases: bool` field to `Query` (defaults to `false`) - Added `alias_store: Option>` to `QueryEngine` - Added `.with_alias_store()` builder method - When `resolve_aliases: true`, expands subject via `AliasStore.resolve_all()` before index lookup - Added `fetch_by_subjects()` and `fetch_by_subjects_predicate()` for multi-subject deduplication - Modified `Query.matches()` to skip subject filtering when aliases are resolved - Skips fast path (MV lookup) when `resolve_aliases: true` - Gracefully degrades when no alias store is configured 7 unit tests in `engine/tests/alias_resolution.rs`. This is the architecturally correct long-term fix that complements leaf matching. ### 2A.3 Auto-Alias Creation ✅ When Aphoria ingests authoritative assertions and code claims that share leaf names, automatically create aliases: - `code://rust/myapp/tls/cert_verification` ↔ `rfc://5246/tls/cert_verification` - `code://rust/myapp/auth/jwt/audience_validation` ↔ `rfc://7519/jwt/audience_validation` This bridges 2A.1 (leaf matching) with 2A.2 (alias resolution) — leaf matching identifies candidates, aliases persist the relationship. **Implementation:** - Added `auto_create_aliases: bool` config option to `AliasConfig` (defaults to `true`) - Added `AliasOrigin::AutoDetected` variant to `stemedb-core` for tracking auto-created aliases - Wired `GenericAliasStore` into `LocalEpisteme` for alias persistence - In `check_conflicts()`, when a code claim matches an authoritative claim by leaf, calls `AliasStore.set_alias()` to persist the relationship with `AliasOrigin::AutoDetected` - Alias creation is idempotent (skips if alias already exists) - 4 unit tests verify: alias creation on conflict, no creation when disabled, correct origin, idempotency --- ## Phase 1: Authoritative Corpus Expansion ✅ > Expanded from 11 hardcoded assertions to a pluggable corpus system with RFC, OWASP, and Vendor sources. ### Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ aphoria corpus build │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────┐ │ │ │ RFC Ingester │ │ OWASP │ │ Vendor Bootstrapper │ │ │ │ (Tier 0) │ │ Ingester │ │ (Tier 2) │ │ │ │ │ │ (Tier 1) │ │ │ │ │ └──────┬───────┘ └──────┬───────┘ └───────────┬───────────┘ │ │ │ │ │ │ │ └─────────────────┼──────────────────────┘ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ CorpusRegistry │ │ │ └────────┬────────┘ │ │ ▼ │ │ ┌─────────────────┐ │ │ │ LocalEpisteme │ │ │ │ ingest_ │ │ │ │ authoritative() │ │ │ └─────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ### 1.1 CorpusBuilder Trait ✅ | Task | Status | |------|--------| | `CorpusBuilder` trait | ✅ `corpus/mod.rs` — name, scheme, default_tier, build, requires_network | | `CorpusRegistry` | ✅ Manages multiple builders, build_all(), list_builders() | | `CorpusBuildResult` | ✅ Stats per builder, total assertions, success/fail/skip counts | ### 1.2 RFC Ingester ✅ | Task | Status | |------|--------| | `RfcCorpusBuilder` | ✅ `corpus/rfc.rs` | | HTTP fetching | ✅ Via `ureq`, cached to `~/.cache/aphoria/rfc-cache/` | | RFC 2119 keyword parsing | ✅ MUST, MUST NOT, SHOULD, SHALL extraction | | RFC-specific parsers | ✅ JWT (7519), OAuth (6749), Bearer (6750), TLS 1.3 (8446), TLS BCP (7525), TOTP (6238), Basic Auth (7617), HTTP (9110) | | Concept mapping | ✅ `rfc://{number}/{topic}` at Tier 0 (Regulatory) | ### 1.3 OWASP Ingester ✅ | Task | Status | |------|--------| | `OwaspCorpusBuilder` | ✅ `corpus/owasp.rs` | | HTTP fetching | ✅ From GitHub raw content, cached to `~/.cache/aphoria/owasp-cache/` | | Markdown parsing | ✅ MUST/SHOULD statements, section context | | Cheat sheet parsers | ✅ Authentication, JWT, TLS, Secrets, Input Validation, Session, CSRF, Password Storage, HTTP Headers | | Concept mapping | ✅ `owasp://cheatsheet/{topic}/{claim}` at Tier 1 (Clinical) | ### 1.4 Vendor Docs ✅ | Task | Status | |------|--------| | `VendorCorpusBuilder` | ✅ `corpus/vendor.rs` | | PostgreSQL claims | ✅ pool_size, idle_timeout, ssl_mode | | Redis claims | ✅ timeout, max_retries, tls | | reqwest claims | ✅ cert_verification, connect_timeout, request_timeout | | hyper claims | ✅ keep_alive_timeout, max_concurrent_streams | | Go net/http claims | ✅ read_timeout, write_timeout, idle_timeout, min_tls_version | | tokio-postgres claims | ✅ pool_size, ssl_mode | | SQLx claims | ✅ max_connections, idle_timeout | | Concept mapping | ✅ `vendor://{product}/{topic}/{claim}` at Tier 2 (Observational) | ### 1.5 Hardcoded Refactor ✅ | Task | Status | |------|--------| | `HardcodedCorpusBuilder` | ✅ `corpus/hardcoded.rs` — original 11 assertions | | `create_authoritative_assertion()` | ✅ Made public in `episteme.rs` for corpus builders | ### 1.6 CLI Integration ✅ | Task | Status | |------|--------| | `aphoria corpus build` | ✅ Fetches and ingests from all sources | | `--only rfc,owasp,vendor` | ✅ Filter to specific sources | | `--offline` | ✅ Skip network-requiring sources | | `--clear-cache` | ✅ Clear cache before building | | `aphoria corpus list` | ✅ List available corpus sources | | `CorpusConfig` | ✅ cache_dir, include_*, rfc_list options | ### 1.7 Error Handling ✅ | Task | Status | |------|--------| | `RfcFetch` error | ✅ Per-RFC fetch failures with context | | `OwaspFetch` error | ✅ Per-cheat-sheet fetch failures with context | | `CorpusBuild` error | ✅ General corpus build failures | | Graceful degradation | ✅ Continue with other sources if one fails | **Files:** `corpus/mod.rs`, `corpus/hardcoded.rs`, `corpus/rfc.rs`, `corpus/owasp.rs`, `corpus/vendor.rs` --- ## Phase 3: Skill Integration ✅ > Complete. Aphoria is now usable in Claude Code agent workflows. ### 3.1 Claude Code Skill ✅ | Task | Status | |------|--------| | `skill/SKILL.md` | ✅ Comprehensive skill definition with all commands | | `/aphoria scan` | ✅ Scan project, show conflicts grouped by verdict | | `/aphoria scan --fix` | ✅ Interactive fix workflow | | `/aphoria ack` | ✅ Acknowledge conflicts as intentional | | `/aphoria status` | ✅ Show status and baseline | | `/aphoria diff` | ✅ Show changes since baseline | | `/aphoria init` | ✅ Initialize Aphoria | | `/aphoria baseline` | ✅ Set baseline | | `skill/install.sh` | ✅ Install script for `~/.claude/skills/aphoria/` | **Files:** `skill/SKILL.md`, `skill/install.sh`, `skill/hooks.json` ### 3.2 Agent Pre-Flight Hook ✅ | Task | Status | |------|--------| | `--exit-code` flag | ✅ Returns 2 for BLOCK, 1 for FLAG only, 0 for clean | | `--strict` flag | ✅ Lower thresholds (FLAG at 0.3, BLOCK at 0.5) | | Hook template | ✅ `skill/hooks.json` with PreCommit and PrePush examples | **Usage:** ```json { "hooks": { "PreCommit": [{"command": "aphoria scan --format sarif --exit-code"}], "PrePush": [{"command": "aphoria scan --strict --exit-code"}] } } ``` ### 3.3 Alias Suggestion Workflow ✅ Auto-alias creation is now automatic (Phase 2A.3). When Aphoria scans: 1. Tail-path matching finds authoritative assertions 2. Aliases are auto-created with `AliasOrigin::AutoDetected` 3. Future queries use the alias automatically The skill documents the suggestion flow for manual alias management: - **y (Accept)**: Creates alias - **n (Reject)**: Records intentional difference - **defer**: Flags for later review --- ## Phase 4: Full-Cycle Pre-Commit (Scan + Sync) ✅ > **Vision:** The pre-commit hook is a **bidirectional knowledge sync**, not just a read-only linter. Every commit extracts claims, checks authority, detects drift from prior observations, and records new observations back. **Spec:** [uat/2026-02-04-full-cycle-precommit-vision.md](uat/2026-02-04-full-cycle-precommit-vision.md) ``` ┌─────────────────────────────────────────────────────────────┐ │ PRE-COMMIT FLOW │ ├─────────────────────────────────────────────────────────────┤ │ 1. EXTRACT → What claims does this code make? │ │ 2. CHECK → Against authority + own prior claims │ │ 3. CLASSIFY → Authority conflict | Self conflict | Novel │ │ 4. UPDATE → Record observations to local Episteme │ │ 5. GATE → Exit code (BLOCK=2, FLAG=1, PASS=0) │ └─────────────────────────────────────────────────────────────┘ ``` ### 4.1 Git Pre-Commit Hook ✅ All flags needed for pre-commit integration are implemented: ```bash #!/bin/sh # .git/hooks/pre-commit aphoria scan --staged --sync --exit-code ``` Or using pre-commit framework: ```yaml repos: - repo: local hooks: - id: aphoria name: Aphoria Truth Sync entry: aphoria scan --staged --sync --exit-code language: system pass_filenames: false ``` ### 4.2 Baseline Mode ✅ Already implemented in Phase 2. ### 4A: Observational Claims ✅ Record code claims as Tier 4 (Community) assertions when no authority conflict exists: | Task | Status | |------|--------| | `sync: bool` in ScanArgs | ✅ `types/command.rs` | | `observations_recorded: usize` in ScanResult | ✅ `types/result.rs` | | `--sync` CLI flag | ✅ `cli.rs` — requires `--persist` | | `claim_to_observation()` | ✅ `bridge.rs` — creates Tier 4 (Community, 0.3 weight) assertions | | `ingest_observations()` in LocalEpisteme | ✅ `episteme/local.rs` — writes to WAL + predicate index | | Scan flow integration | ✅ `scan.rs` — splits claims by conflict status, writes novel claims as observations | | Handler validation | ✅ `handlers.rs` — `--sync requires --persist` error | | Report output | ✅ `report/table.rs`, `report/json.rs` — shows observation count | | Tests | ✅ 5 new tests for observation write-back | ``` Code: connection_pool.max_size = 25 Authority: (nothing) Action: Record as Tier 4 observation (project memory) ``` **Usage:** ```bash # Scan with observation write-back aphoria scan --persist --sync # Output: # Recorded 45 observations (project memory) ``` ### 4B: Self-Conflict Detection ✅ Detect drift from the project's own prior observations: | Task | Status | |------|--------| | Query prior claims before conflict check | ✅ `fetch_observations_for_concept()` | | Compare current vs stored observations | ✅ `check_drift()` compares values | | Report changes as SELF-CONFLICT | ✅ DriftResult with prior/current values | | New verdict: `Drift` (distinct from Block/Flag) | ✅ `Verdict::Drift` | | Drift reporting in all formats | ✅ table, json, markdown, sarif | | Exit code includes drift | ✅ `--exit-code` returns 1 for drift | ``` Prior: db/pool_size = 25 (recorded 2026-01-15) Now: db/pool_size = 100 Result: DRIFT — "You changed pool_size from 25 to 100. Intentional?" ``` **Files:** `types/result.rs`, `types/verdict.rs`, `episteme/local.rs`, `scan.rs`, `report/*.rs` ### 4C: Diff-Only Scanning ✅ Fast scanning for pre-commit hooks: | Task | Status | |------|--------| | `FileSource` enum (All, Staged) | ✅ `types/command.rs` | | `--staged` flag (git diff --cached) | ✅ `cli.rs`, `handlers.rs` | | `walker/git.rs` git utilities | ✅ `find_repo_root()`, `get_staged_files()` | | `walk_staged_files()` | ✅ `walker/mod.rs` — filters to scan root, applies same filters | | Scan dispatch by file_source | ✅ `scan.rs` | | Error handling (NotGitRepo, GitCommand) | ✅ `error.rs` | | Tests | ✅ 9 tests in `tests/staged_scanning.rs` | | Target: < 500ms for staged-only | ✅ | **Files:** `types/command.rs`, `walker/git.rs`, `walker/mod.rs`, `scan.rs`, `cli.rs`, `handlers.rs`, `error.rs` **Usage:** ```bash # Pre-commit hook (fast, staged files only) aphoria scan --staged --exit-code # Full cycle with observation sync aphoria scan --staged --persist --sync --exit-code ``` ### 4D: Enhanced Ack ✅ Acknowledgments with rationale and policy updates: | Task | Status | |------|--------| | `--reason "text"` flag | ✅ `cli.rs` — required on `ack`, `bless`, `update` commands | | Store rationale in assertion metadata | ✅ `policy_ops.rs` — stored in value/description fields | | `aphoria update` for intentional drift | ✅ `policy_ops.rs` — creates `policy_update` assertion | | Policy update assertions | ✅ `types/mod.rs` — `predicates::POLICY_UPDATE` | **Files:** `cli.rs`, `handlers.rs`, `policy_ops.rs`, `types/command.rs`, `types/mod.rs` ```bash $ aphoria ack db/pool_size --reason "Scaling for Black Friday" $ aphoria update db/pool_size 100 --reason "New baseline after load test" ``` ### 4E: Hosted Mode ✅ Organizations run their own StemeDB server and all team members automatically sync observations: | Task | Status | |------|--------| | `HostedConfig` in config.rs | ✅ `url`, `project_id`, `team_id`, `sync_mode`, `offline_fallback`, `api_key_env` | | `SyncMode` enum | ✅ `remote-only` (default), `local-and-remote` | | `OfflineFallback` enum | ✅ `skip` (default), `fail`, `queue` | | `HostedClient` HTTP client | ✅ `hosted.rs` — retry logic, auth headers, observation push | | `POST /v1/aphoria/observations` endpoint | ✅ Server receives observations with project/team metadata | | Scan integration | ✅ Auto-enables sync when `[hosted]` configured | | `Hosted(String)` error variant | ✅ For connection/auth failures | | Graceful offline fallback | ✅ Based on `offline_fallback` config | | Tests | ✅ Config parsing, client creation, assertion conversion | ```toml # aphoria.toml [hosted] url = "https://episteme.acme.corp" # Enables hosted mode project_id = "billing-service" # Optional, defaults to [project.name] team_id = "platform-team" # Optional, for multi-team servers sync_mode = "remote-only" # "remote-only" | "local-and-remote" offline_fallback = "skip" # "skip" | "fail" | "queue" api_key_env = "APHORIA_API_KEY" # Env var for auth token ``` **Architecture:** ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Developer A │ │ Developer B │ │ Developer C │ │ aphoria scan │ │ aphoria scan │ │ aphoria scan │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ └─────────────────┼─────────────────┘ ▼ ┌─────────────────────┐ │ Team StemeDB Server │ │ POST /v1/aphoria/ │ │ observations │ └─────────────────────┘ │ ▼ Aggregated team patterns ``` **Files:** `config.rs`, `hosted.rs`, `scan.rs`, `error.rs`, `lib.rs`, `crates/stemedb-api/src/handlers/aphoria.rs`, `crates/stemedb-api/src/dto/aphoria.rs` --- ## Phase 4.5: Ephemeral Scan Mode ✅ > Performance optimization: 40x faster scans by skipping Episteme storage when persistence isn't needed. ### Problem Every `aphoria scan` was slow because it initialized the full Episteme stack: - WAL recovery (O(n) on every startup) - Dual backend initialization (fjall + redb) - Store and index initialization But conflict detection is actually 100% in-memory — it never reads from the KV store. The authoritative corpus is built fresh each time, and code claims are extracted fresh each scan. ### Solution Added `ScanMode` enum with two modes: | Mode | Use Case | Storage | Performance | |------|----------|---------|-------------| | **Ephemeral** (default) | CI, pre-commit, quick checks | None | ~0.25 seconds | | **Persistent** | Baseline/diff tracking, alias creation | WAL + store | ~1-2 seconds | ### Implementation ✅ | Task | Status | |------|--------| | `ScanMode` enum | ✅ `types.rs` — Ephemeral (default), Persistent | | `EphemeralDetector` struct | ✅ `episteme/mod.rs` — in-memory corpus + ConceptIndex | | `check_conflicts_pure()` | ✅ Extracted as standalone function for reuse | | Mode-based dispatch in `run_scan()` | ✅ Uses `EphemeralDetector` for Ephemeral, `LocalEpisteme` for Persistent | | `--persist` CLI flag | ✅ `main.rs` — opt-in to persistent mode | | Tests for both modes | ✅ `test_ephemeral_scan_no_storage_created`, `test_persistent_scan_creates_storage`, `test_scan_modes_produce_same_conflicts` | ### Usage ```bash # Fast ephemeral scan (default) — no storage created aphoria scan . # Persistent scan — enables baseline, diff, auto-alias features aphoria scan . --persist ``` ### Performance | Mode | Time | Storage | |------|------|---------| | Ephemeral | ~0.25s | None | | Persistent | ~1-2s | WAL + store directories | **Files:** `types.rs`, `episteme/mod.rs`, `lib.rs`, `main.rs`, `tests.rs` --- ## Phase 5: Research Agent Loop ✅ > Research agent fills gaps in authoritative coverage by researching official documentation. ### 5.1 Gap Detection ✅ | Task | Status | |------|--------| | `Gap` struct | ✅ `research/gap_detector.rs` — concept_path, topic, predicate, source info | | `detect_gaps()` | ✅ Compares claims against ConceptIndex, identifies missing coverage | | Topic normalization | ✅ Extracts last 2 path segments for cross-scheme matching | | Deduplication | ✅ Deduplicates gaps by topic+predicate key | ### 5.2 Gap Storage ✅ | Task | Status | |------|--------| | `GapRecord` | ✅ `research/gap_store.rs` — tracking metadata, project count, research status | | `GapStore` | ✅ JSON-backed persistent storage with atomic saves | | Project tracking | ✅ Records which projects reported each gap | | Research eligibility | ✅ `is_eligible_for_research()` with threshold and cooldown | | Gap pruning | ✅ `prune_old_gaps()` removes stale entries | ### 5.3 Quality Validation ✅ | Task | Status | |------|--------| | `QualityValidator` | ✅ `research/quality.rs` — validates researched claims | | Source attribution | ✅ Checks for authoritative domains (rfc-editor, owasp, vendor docs) | | Normative language | ✅ Verifies MUST/SHOULD/SHALL keywords present | | Vague content detection | ✅ Rejects "it depends", "typically", etc. | | Consistency scoring | ✅ Detects conflicting claims on same subject | | `QualityReport` | ✅ Detailed per-claim validation results | | `filter_passed()` | ✅ Returns only claims meeting quality threshold | ### 5.4 Research Execution ✅ | Task | Status | |------|--------| | `Researcher` | ✅ `research/researcher.rs` — orchestrates research pipeline | | `DocumentationSource` | ✅ Configurable sources with URL patterns and topics | | Default sources | ✅ Redis, PostgreSQL, Go, Rust, OWASP, Kafka, MongoDB | | Content fetching | ✅ HTTP with timeout and size limits | | Normative extraction | ✅ Regex-based MUST/SHOULD/SHALL extraction | | Section tracking | ✅ Extracts heading context for attribution | | Confidence scoring | ✅ Based on keyword strength, statement length, content size | ### 5.5 CLI Integration ✅ | Task | Status | |------|--------| | `aphoria research run` | ✅ Run research agent with configurable threshold | | `aphoria research status` | ✅ Show gap statistics and research progress | | `aphoria research gaps` | ✅ List gaps by project count | | `--threshold` | ✅ Minimum projects before researching (default: 3) | | `--strict` | ✅ Use strict quality validation | | `--prune` | ✅ Remove stale gaps before researching | | `--ready` | ✅ Show only gaps ready for research | **Files:** `research/mod.rs`, `research/gap_detector.rs`, `research/gap_store.rs`, `research/quality.rs`, `research/researcher.rs`, `research/tests.rs` ### 5.7 Security Extractors ✅ Extended Phase 2 extractors with OWASP-aligned security vulnerability detection: | Extractor | Detects | Languages | |-----------|---------|-----------| | `weak_crypto` | MD5, SHA1, DES, RC4 usage | Rust, Go, Python, JS/TS | | `command_injection` | Shell execution, os.system, subprocess shell=True | Rust, Go, Python, JS/TS | | `sql_injection` | String concatenation in SQL queries | Rust, Go, Python, JS/TS | **Concept paths:** - `crypto/hashing/algorithm` — MD5, SHA1 - `crypto/encryption/algorithm` — DES, RC4 - `os/command/input`, `os/shell_mode` — command injection - `db/query/input` — SQL injection ### 5.6 Community Corpus Contributions ✅ > Users can opt in to contribute patterns anonymously to a central corpus, enabling community consensus to adjust default thresholds. | Task | Status | |------|--------| | `CommunityConfig` | ✅ `config/mod.rs` — enabled (false), anonymize (true), exclude, include, min_confidence | | `AnonymizedObservation` | ✅ `community/types.rs` — privacy-preserving observation without file/line/text | | `CommunityObjectValue` | ✅ `community/types.rs` — serde-compatible version of ObjectValue | | `PatternAggregate` | ✅ `community/types.rs` — server-side aggregation with project counts | | `anonymize_claim()` | ✅ `community/anonymizer.rs` — wildcards project names, strips file/line, rounds timestamps | | `compute_anon_hash()` | ✅ Hash computed WITHOUT file/line/text (privacy-critical) | | `wildcard_project_path()` | ✅ `code://rust/myapp/tls` → `code://rust/*/tls` | | `--community-preview` flag | ✅ `cli.rs` — dry-run showing what WOULD be shared | | `PatternAggregateStore` | ✅ `stemedb-storage` — server-side pattern aggregation | | Project deduplication | ✅ Uses project_hash to prevent double-counting | | `POST /v1/aphoria/community/observations` | ✅ Push anonymized observations | | `GET /v1/aphoria/patterns` | ✅ Retrieve high-confidence community patterns | **Privacy Model:** - Project names wildcarded: `myapp` → `*` - File paths, line numbers, matched text NEVER shared - Timestamps rounded to hour (k-anonymity) - Server receives `project_hash`, not raw project names - `enabled` defaults to `false` (explicit opt-in required) - `anonymize` defaults to `true` (privacy-preserving by default) **Usage:** ```bash # Preview what would be shared (no network) aphoria scan --community-preview # Enable in aphoria.toml: [community] enabled = true anonymize = true min_confidence = 0.8 exclude = ["vendor://acme/internal/*"] # Scan with sync to share patterns aphoria scan --persist --sync ``` **Files:** `community/mod.rs`, `community/types.rs`, `community/anonymizer.rs`, `config/mod.rs`, `cli.rs`, `handlers.rs`, `stemedb-storage/src/pattern_aggregate_store/` --- ## Phase 6: Federated Policy & Trust Packs ✅ > Allow teams to define their own authoritative truths and distribute them as signed Trust Packs. This enables "Enterprise Grade" compliance across distributed teams. ### 6.1 Trust Pack Format ✅ | Task | Status | |------|--------| | `TrustPack` schema | ✅ `policy.rs` — Assertions, Aliases, Metadata, Signature | | `PackHeader` | ✅ Name, version, issuer, timestamp | | Serialization | ✅ `rkyv` for zero-copy efficiency | | Signing | ✅ `ed25519-dalek` signing and verification | ### 6.2 Policy Management ✅ | Task | Status | |------|--------| | `PolicyManager` | ✅ Loads local and remote (HTTP/HTTPS) policies | | Caching | ✅ Caches remote policies in `~/.cache/aphoria/policies/` | | `aphoria.toml` config | ✅ `policies` list support | ### 6.3 Core Integration ✅ | Task | Status | |------|--------| | `EphemeralDetector` integration | ✅ Ingests policies into memory corpus/index | | `check_conflicts_pure` update | ✅ Resolves policy aliases before authoritative lookup | | `LocalEpisteme` export helpers | ✅ `fetch_acknowledgments`, `fetch_manual_aliases` | ### 6.4 CLI Commands ✅ | Task | Status | |------|--------| | `aphoria policy export` | ✅ Exports local `ack` decisions as a Trust Pack | | `aphoria scan` policy loading | ✅ Auto-loads policies from config | **Files:** `policy.rs`, `config.rs`, `episteme/mod.rs`, `lib.rs`, `main.rs` --- ## Phase 6.5: Trust Pack Extensions ✅ > Enhancements to Trust Packs for semantic predicate matching and key management. ### 6.5.1 Predicate Aliases ✅ **Status:** Complete **Implemented:** 2026-02-06 **User Story:** > As a security architect, when my policy uses `required=true` but the extractor emits `enabled=true`, I need them to match semantically. **Problem:** - Policy blesses: `code://standard/tls/cert_verification` with predicate `required`, value `true` - Extractor emits: `code://config/tls/cert_verification` with predicate `enabled`, value `false` - Tail-path matching finds the concept (`tls/cert_verification`) ✓ - But predicates differ: `required` vs `enabled` — no conflict detected ✗ **Solution:** | Task | Description | |------|-------------| | `predicate_aliases` field | Add to Trust Pack schema | | Default aliases | `enabled` ↔ `required` ↔ `mandatory` ↔ `enforced` | | ConceptIndex update | Check aliases during lookup | | Pack-defined aliases | Allow packs to specify custom alias sets | **Trust Pack Schema Extension:** ```toml # In Trust Pack [predicate_aliases] security_enabled = ["enabled", "required", "mandatory", "enforced", "active"] version_minimum = ["min_version", "minimum_version", "tls_min_version"] ``` **Implementation Plan:** 1. Add `predicate_aliases: HashMap>` to `TrustPack` 2. Store aliases alongside assertions during import 3. Update `ConceptIndex.make_key()` to normalize predicates via aliases 4. Match during conflict detection: if `predicate_a` aliases to `predicate_b`, treat as same concept ### 6.5.2 Pack Signing Key Rotation ✅ **Status:** Complete **Implemented:** 2026-02-06 **User Story:** > As a security admin, when our signing key is rotated, I need to re-sign all packs without losing policy content. **Problem:** - Trust Packs are signed with Ed25519 keys - When keys are rotated (security best practice), existing packs become unverifiable - Need to re-sign packs with new key while preserving content hash **Solution:** | Task | Description | |------|-------------| | `aphoria policy resign` | CLI command to re-sign pack with new key | | Content hash preservation | Keep `content_hash` unchanged, only update signature | | Key rotation audit | Log key rotation events | | Old signature archival | Optionally keep old signature for audit trail | **CLI:** ```bash # Re-sign pack with new key aphoria policy resign my-standards.pack --key-file new-private-key.pem # Re-sign with signature chain (audit trail) aphoria policy resign my-standards.pack --key-file new-key.pem --chain-signatures ``` **Trust Pack Schema Extension:** ```rust pub struct TrustPack { // Existing fields... pub signature: Signature, // New field for key rotation audit pub signature_chain: Option>, } pub struct SignatureRecord { pub issuer_public_key: [u8; 32], pub signature: Signature, pub signed_at: DateTime, pub reason: Option, // "Key rotation", "Security incident", etc. } ``` ### 6.5.3 Priority | Feature | Priority | Trigger | |---------|----------|---------| | Predicate Aliases | Medium | Enterprise feedback showing predicate naming conflicts | | Key Rotation | Low | Enterprise security key management requirements | **Documented in:** [uat/future-scenarios.md](uat/future-scenarios.md) --- ## Phase 7: Declarative Extractors ✅ > Enable users to define new extractors in config/policy files (TOML) without writing Rust code. This removes the recompilation bottleneck for custom pattern enforcement. **User Outcome:** "I added a custom extractor to my aphoria.toml that detects our company's deprecated API patterns. Now every scan flags files using the old pattern without me writing any Rust code." ### 7.1 Core Types ✅ | Task | Status | |------|--------| | `DeclarativeExtractorDef` | ✅ `extractors/declarative.rs` — name, description, languages, pattern, claim, confidence | | `DeclarativeClaimDef` | ✅ subject, predicate, value specification | | `DeclarativeValue` enum | ✅ MatchedText, Boolean, Text variants | | `DeclarativeExtractor` | ✅ Compiled extractor with `Extractor` trait impl | ### 7.2 Configuration ✅ | Task | Status | |------|--------| | `ExtractorConfig.declarative` | ✅ `config/mod.rs` — `Vec` | | TOML parsing | ✅ Serde deserialization with `#[serde(untagged)]` for value types | | Example config | ✅ Documented in module and config docs | **Example aphoria.toml:** ```toml [[extractors.declarative]] name = "deprecated_api_v1" description = "Detects usage of deprecated v1 API endpoints" languages = ["go", "rust", "python"] pattern = '/api/v1/\w+' claim.subject = "api/deprecated_endpoint" claim.predicate = "version" claim.value = "v1" confidence = 1.0 [[extractors.declarative]] name = "legacy_encryption" description = "Detects legacy encryption algorithms" languages = ["rust", "go", "python", "javascript"] pattern = '(?i)blowfish|twofish|cast5' claim.subject = "crypto/encryption/algorithm" claim.predicate = "algorithm" claim.value_from_match = true confidence = 0.9 ``` ### 7.3 Validation & Security ✅ | Task | Status | |------|--------| | Name validation | ✅ Non-empty required | | Subject/predicate validation | ✅ Non-empty required | | Confidence validation | ✅ Must be 0.0-1.0 | | Regex validation | ✅ Compiled at load time, not scan time | | ReDoS protection | ✅ `RegexBuilder` with 10MB size limits | | Language parsing | ✅ `Language::from_str()` with `FromStr` trait | | Graceful failure | ✅ Invalid extractors logged as warnings, don't block others | ### 7.4 Registry Integration ✅ | Task | Status | |------|--------| | Module export | ✅ `extractors/mod.rs` — public types | | Registry registration | ✅ `ExtractorRegistry::new()` loads from config | | Enable/disable support | ✅ Declarative extractors respect `disabled` list | | Runtime addition | ✅ `add_from_definitions()` for Trust Pack integration | ### 7.5 Error Handling ✅ | Task | Status | |------|--------| | `DeclarativeExtractor` error variant | ✅ `error.rs` — name + message | | Validation errors | ✅ Clear messages for each failure mode | | Structured logging | ✅ `tracing::warn!` for compilation failures | ### 7.6 Tests ✅ | Task | Status | |------|--------| | Unit tests | ✅ 22 tests in `declarative.rs` | | Registry tests | ✅ 7 tests for integration | | Validation tests | ✅ Empty name, subject, predicate; invalid confidence, regex, language | | Extraction tests | ✅ Boolean, text, matched_text value types | | Deserialization tests | ✅ TOML parsing for all value types | **Files:** `extractors/declarative.rs`, `extractors/mod.rs`, `config/mod.rs`, `types/language.rs`, `error.rs` --- ## Phase 7.5: LLM-in-the-Loop Extraction ✅ > Use LLM (Gemini) to extract claims semantically during persistent scans. This fills gaps that regex extractors can't catch, providing immediate value while the learning system builds up pattern knowledge. ### Vision ``` Code file → Regex extractors → Claims found ↓ High-value files (auth, config, crypto) ↓ LLM Extractor → Additional semantic claims ↓ Combined claims → Conflict detection ``` ### 7.5.1 LLM Extractor Implementation ✅ | Task | Status | |------|--------| | `GeminiClient` struct | ✅ `llm/client.rs` — Gemini API client using ureq | | `LlmExtractor` struct | ✅ `llm/extractor.rs` — orchestrates extraction with budget tracking | | Prompt engineering | ✅ Security-focused extraction prompt with structured JSON output | | Response parsing | ✅ Parse Gemini's JSON response into `ExtractedClaim` format | | Error handling | ✅ Graceful degradation when API unavailable or key missing | ### 7.5.2 Selective Triggering ✅ | Task | Status | |------|--------| | `is_high_value_file()` | ✅ `llm/extractor.rs` — auth/, config/, crypto/, security/, secrets/, certs/, ssl/, tls/, keys/, credentials/ directories | | High-value file names | ✅ secret, password, credential, token, auth, login, session, jwt, tls, ssl, cert, key, config, settings, security, crypto, encrypt, decrypt, oauth, saml, ldap, api_key, apikey, access_key, private | | Token budget | ✅ `max_tokens_per_scan` (default 50k), `max_tokens_per_file` (default 4k) | | Skip conditions | ✅ Only runs when regex extractors found nothing AND file is high-value | ### 7.5.3 Cost Controls ✅ | Task | Status | |------|--------| | Token tracking | ✅ `Arc` for thread-safe budget tracking across files | | BLAKE3 caching | ✅ `llm/cache.rs` — content hash + model + prompt version for cache key | | Cache location | ✅ `~/.cache/aphoria/llm-cache/` | | Budget enforcement | ✅ `within_budget()` check before each LLM call | ### 7.5.4 Configuration ✅ ```toml # aphoria.toml [llm] enabled = true # Enable LLM extraction (default: false) provider = "gemini" # Only "gemini" supported # model defaults to DEFAULT_LLM_MODEL (currently "gemini-3-flash-preview") api_key_env = "GEMINI_API_KEY" # Environment variable for API key max_tokens_per_scan = 50000 # Budget per scan max_tokens_per_file = 4000 # Budget per file (for max_output_tokens) high_value_only = true # Only use on auth/config/crypto files cache_responses = true # Cache by content hash timeout_secs = 60 # API timeout min_confidence = 0.7 # Filter claims below this confidence ``` **Files:** `llm/mod.rs`, `llm/client.rs`, `llm/extractor.rs`, `llm/cache.rs`, `config/mod.rs`, `scan.rs`, `error.rs` --- ## Phase 7.6: Pattern Learning Store ✅ > When LLM extracts something that regex extractors missed, remember the pattern. Track which patterns recur across projects to identify candidates for promotion to declarative extractors. ### Vision ``` LLM extracts claim from code ↓ Pattern not in learned store? ↓ Store: { example_code, claim, project_hash } ↓ Same pattern seen in 5+ projects? ↓ Flag for promotion to declarative extractor ``` ### 7.6.1 LearnedPattern Schema ✅ | Task | Status | |------|--------| | `ValueType` enum | ✅ `learning/types.rs` — Text, Number, Boolean | | `ClaimTemplate` struct | ✅ `learning/types.rs` — subject_template, predicate, value_type, description | | `LearnedPattern` struct | ✅ `learning/types.rs` — full schema with timestamps, project hashes, confidence tracking | | Serde serialization | ✅ JSON serialization with chrono timestamps | | Tests | ✅ 5 unit tests for types | ### 7.6.2 PatternStore Implementation ✅ | Task | Status | |------|--------| | `PatternStore` trait | ✅ `learning/store.rs` — abstract storage interface | | `LocalPatternStore` | ✅ JSON-backed local storage at `~/.aphoria/learning/patterns.json` | | `RwLock` thread safety | ✅ Write-through cache with in-memory HashMap | | Deduplication | ✅ `find_similar()` with Levenshtein similarity threshold 0.8 | | Pruning | ✅ `prune_stale()` removes patterns not seen in N days | | Tests | ✅ 8 unit tests for store operations | ### 7.6.3 Pattern Normalization ✅ | Task | Status | |------|--------| | `normalize_pattern()` | ✅ `learning/normalizer.rs` — replaces literals with placeholders | | Version detection | ✅ `"1.0"`, `"TLSv1.2"` → `` | | Boolean detection | ✅ `true`/`false` → `` | | Number detection | ✅ Standalone numbers → `` | | String detection | ✅ Remaining quoted strings → `` | | `pattern_similarity()` | ✅ Levenshtein distance normalized to 0.0-1.0 | | Tests | ✅ 17 unit tests for normalization | ### 7.6.4 Configuration ✅ ```toml # aphoria.toml [learning] enabled = true # Enable pattern learning (default: false) store = "local" # "local" | "hosted" min_confidence = 0.7 # Minimum LLM confidence to learn prune_after_days = 90 # Remove patterns not seen in N days [learning.promotion] min_projects = 5 # Projects needed before promotion min_confidence = 0.8 # Average confidence needed auto_promote = false # Require human approval (Phase 7.7) ``` ### 7.6.5 Scan Integration ✅ | Task | Status | |------|--------| | Initialize pattern store | ✅ `scan.rs` — only in persistent mode with learning enabled | | Project hash computation | ✅ BLAKE3 hash for privacy-preserving project identification | | Record LLM-extracted claims | ✅ After LLM extraction, record patterns meeting min_confidence | | Update existing patterns | ✅ Merge observations when similar pattern found | | Logging | ✅ Reports patterns_recorded count on scan completion | ### 7.6.6 Error Handling ✅ | Task | Status | |------|--------| | `LearningStore` error variant | ✅ `error.rs` — for storage/cache failures | | Graceful degradation | ✅ Store failures logged, don't block scan | **Files:** `learning/mod.rs`, `learning/types.rs`, `learning/normalizer.rs`, `learning/store.rs`, `config/mod.rs`, `scan.rs`, `error.rs`, `lib.rs` **Tests:** 30 tests covering types, normalization, and store operations. --- ## Phase 7.6 (Legacy Documentation) > **Note:** The following is the original spec for reference. See above for implemented status. ### Original Schema (Reference) ```rust /// A pattern learned from LLM extraction that could become a declarative extractor. #[derive(Debug, Clone, Serialize, Deserialize)] pub struct LearnedPattern { /// Unique identifier pub id: Uuid, /// Example code that triggered this pattern pub example_code: String, /// Normalized pattern (variables replaced with placeholders) /// e.g., "const TLS_MIN_VERSION = \"1.0\"" → "const TLS_MIN_VERSION = " pub normalized_pattern: String, /// The claim this pattern produces pub claim_template: ClaimTemplate, /// Language this pattern applies to pub language: Language, /// When first seen pub first_seen: DateTime, /// When last seen pub last_seen: DateTime, /// Projects that have this pattern (hashed for privacy) pub project_hashes: HashSet, /// Total occurrences across all projects pub occurrences: u32, /// Average LLM confidence when extracting this pub avg_confidence: f32, /// Has this been promoted to a declarative extractor? pub promoted: bool, /// If promoted, the extractor ID pub promoted_to: Option, } /// Template for generating claims from a learned pattern. #[derive(Debug, Clone, Serialize, Deserialize)] pub struct ClaimTemplate { pub subject_template: String, // "tls/min_version" pub predicate: String, // "version" pub value_type: ValueType, // String, Boolean, Number pub description_template: String, } ``` ### Original PatternStore Trait (Reference) ```rust pub trait PatternStore: Send + Sync { /// Record a pattern learned from LLM extraction fn record_pattern(&self, pattern: &LearnedPattern) -> Result<()>; /// Find existing pattern matching this example fn find_similar(&self, normalized: &str, language: Language, threshold: f32) -> Option; /// Get patterns ready for promotion (threshold met) fn get_promotion_candidates(&self, min_projects: usize, min_confidence: f32) -> Vec; /// Mark pattern as promoted fn mark_promoted(&self, id: &Uuid, extractor_name: &str) -> Result<()>; /// Prune old patterns async fn prune_stale(&self, max_age_days: u32) -> Result; } ``` ### 7.6.3 Pattern Normalization ⬜ | Task | Description | |------|-------------| | Variable extraction | Identify literals that vary (versions, names, values) | | Placeholder insertion | Replace literals with typed placeholders | | Similarity scoring | Compare normalized patterns for dedup | ```rust fn normalize_pattern(code: &str, claim: &ExtractedClaim) -> String { // "const TLS_MIN = \"1.0\"" → "const TLS_MIN = " // "pool_size: 25" → "pool_size: " // "verify_ssl: false" → "verify_ssl: " } fn similarity_score(a: &str, b: &str) -> f32 { // Levenshtein distance normalized to 0.0-1.0 // Patterns with score > 0.8 are considered duplicates } ``` ### 7.6.4 Integration with Scan ⬜ ```rust // In scan.rs, after LLM extraction for claim in llm_claims { // Check if this is a new pattern if let Some(existing) = pattern_store.find_similar(&claim.matched_text, language).await { // Update existing pattern pattern_store.increment_occurrence(&existing.id, project_hash).await?; } else { // Record new pattern let pattern = LearnedPattern::from_claim(&claim, &code_context, project_hash); pattern_store.record_pattern(&pattern).await?; } } ``` ### 7.6.5 Configuration ⬜ ```toml # aphoria.toml [learning] enabled = true # Enable pattern learning store = "local" # "local" | "hosted" min_confidence = 0.7 # Minimum LLM confidence to learn prune_after_days = 90 # Remove patterns not seen in N days [learning.promotion] min_projects = 5 # Projects needed before promotion min_confidence = 0.8 # Average confidence needed auto_promote = false # Require human approval (Phase 7.7) ``` **Files:** `learning/mod.rs`, `learning/pattern.rs`, `learning/store.rs`, `learning/normalize.rs` --- ## Phase 7.7: Pattern → Extractor Promotion ✅ > High-frequency learned patterns get promoted to declarative extractors. This closes the learning loop: patterns discovered by LLM become permanent, fast regex extractors. ### Vision ``` LearnedPattern (5+ projects, >0.8 confidence) ↓ Claude: "Generate regex for this pattern" ↓ Candidate declarative extractor ↓ Validate against stored examples ↓ Human review (optional) → Approve/Reject ↓ Merge to project's .aphoria/extractors/ ``` ### 7.7.1 Promotion Pipeline ✅ | Task | Status | |------|--------| | `PromotionPipeline` | ✅ `promotion/pipeline.rs` — orchestrates full promotion flow | | `RegexGenerator` | ✅ `promotion/regex_gen.rs` — Gemini LLM integration | | `ExtractorValidator` | ✅ `promotion/validator.rs` — ReDoS detection, timing validation | | `YamlWriter` | ✅ `promotion/writer.rs` — outputs to `.aphoria/extractors/learned/` | | `InteractiveReviewer` | ✅ `promotion/review.rs` — CLI review workflow | | `PromotionCandidate` | ✅ `promotion/types.rs` | | `ValidationResult` | ✅ `promotion/types.rs` | ```rust pub struct PromotionPipeline { pattern_store: Arc, llm_client: ClaudeClient, validator: ExtractorValidator, } impl PromotionPipeline { /// Get patterns ready for promotion pub async fn get_candidates(&self) -> Vec { let patterns = self.pattern_store .get_promotion_candidates(5, 0.8) .await?; patterns.into_iter() .map(|p| self.generate_candidate(p)) .collect() } /// Generate declarative extractor from pattern async fn generate_candidate(&self, pattern: LearnedPattern) -> PromotionCandidate { // Ask Claude to generate regex let regex = self.llm_client.generate_regex(&pattern).await?; // Build declarative extractor let extractor = DeclarativeExtractor { name: pattern.id.to_string(), language: pattern.language, pattern: regex, claim: pattern.claim_template.clone(), source: ExtractorSource::Learned { pattern_id: pattern.id, projects: pattern.project_hashes.len(), }, }; // Validate against examples let validation = self.validator.validate(&extractor, &pattern).await; PromotionCandidate { pattern, extractor, validation } } } ``` ### 7.7.2 Regex Generation ✅ | Task | Status | |------|--------| | Multi-example prompt | ✅ Includes all examples in generation prompt | | Regex safety | ✅ ReDoS detection prevents catastrophic backtracking | | Test coverage | ✅ Validates against stored examples | ```rust async fn generate_regex(examples: &[String], claim: &ClaimTemplate) -> Result { let prompt = format!( "Generate a regex pattern that matches all these code examples:\n\n{}\n\n\ The regex should extract the value for claim: {}\n\ Requirements:\n\ - Must match ALL examples\n\ - Use named capture groups for extracted values\n\ - Avoid catastrophic backtracking (no nested quantifiers)\n\ - Return ONLY the regex, no explanation", examples.join("\n---\n"), claim.subject_template ); let response = claude.message(&prompt).await?; validate_regex_safety(&response)?; Ok(response) } ``` ### 7.7.3 Validation Suite ✅ | Task | Status | |------|--------| | Positive tests | ✅ Must match all stored examples | | ReDoS detection | ✅ Detects catastrophic backtracking patterns | | Performance test | ✅ Timing validation with configurable threshold | | False positive check | ⬜ Deferred to Phase 9 (sample codebase FP testing) | ```rust pub struct ExtractorValidator { sample_codebases: Vec, // Known-good projects for FP testing } impl ExtractorValidator { pub async fn validate( &self, extractor: &DeclarativeExtractor, pattern: &LearnedPattern ) -> ValidationResult { let mut result = ValidationResult::default(); // Must match all positive examples for example in &pattern.examples { if !extractor.matches(example) { result.positive_failures.push(example.clone()); } } // Must not have excessive false positives for codebase in &self.sample_codebases { let fps = self.count_false_positives(extractor, codebase).await; if fps > 10 { result.false_positive_warning = true; } } // Must be fast let duration = self.benchmark(extractor); if duration > Duration::from_millis(100) { result.performance_warning = true; } result } } ``` ### 7.7.4 Human Review Gate ✅ | Task | Status | |------|--------| | `aphoria extractors review` | ✅ CLI to review pending promotions | | `aphoria extractors stats` | ✅ Show pattern store statistics | | `aphoria extractors candidates` | ✅ List promotion candidates | | `aphoria extractors promote` | ✅ Promote pattern to extractor | | Approval workflow | ✅ Approve, reject, or skip via InteractiveReviewer | | Rejection tracking | ⬜ Deferred to Phase 9 (rejection reason persistence) | | Auto-approve mode | ⬜ Deferred to Phase 9 (>0.95 confidence auto-promote) | ```bash $ aphoria extractors review Pending promotions: 3 [1/3] Pattern: tls_min_version_const Examples: 47 (across 8 projects) Confidence: 0.91 Generated regex: (?i)(tls|ssl)_?(min|minimum)_?version\s*[:=]\s*["']?(1\.[01])["']? Sample matches: const TLS_MIN_VERSION = "1.0" ✓ matches TLS_MINIMUM_VERSION: "1.1" ✓ matches ssl_min_version = "1.2" ✓ matches (TLS 1.2 is safe, false positive?) [a]pprove [r]eject [e]dit [s]kip [q]uit: _ ``` ### 7.7.5 Extractor Output ✅ Promoted patterns become declarative extractors in `.aphoria/extractors/learned/`: ```yaml # .aphoria/extractors/learned/tls_min_version_const.yaml # Auto-generated from learned pattern. DO NOT EDIT. # Pattern ID: 550e8400-e29b-41d4-a716-446655440000 # Learned from: 8 projects, 47 occurrences # Confidence: 0.91 # Promoted: 2026-02-10 name: "tls_min_version_const" language: ["rust", "go", "python", "javascript", "typescript"] pattern: '(?i)(tls|ssl)_?(min|minimum)_?version\s*[:=]\s*["\']?(1\.[01])["\']?' claim: subject: "tls/min_version" predicate: "version" value_capture: 1 # Capture group for version description: "TLS minimum version set to deprecated {value}" metadata: source: "learned" pattern_id: "550e8400-e29b-41d4-a716-446655440000" projects: 8 occurrences: 47 confidence: 0.91 ``` ### 7.7.6 Configuration ✅ ```toml # aphoria.toml [promotion] enabled = true # Enable promotion pipeline auto_promote = false # Require human approval output_dir = ".aphoria/extractors/learned" min_confidence = 0.8 # Minimum to consider min_projects = 5 # Projects needed before promotion require_validation = true # Must pass validation suite ``` **Files:** `promotion/mod.rs`, `promotion/pipeline.rs`, `promotion/regex_gen.rs`, `promotion/validator.rs`, `promotion/review.rs`, `promotion/writer.rs`, `promotion/types.rs`, `handlers/extractors.rs` **Tests:** 43 tests covering pipeline, validation, regex generation, and YAML output. --- ## Phase 9: Autonomous Extractor Generation ✅ > The system generates, tests, and deploys extractors without human approval for high-confidence patterns. This is the endgame: a fully self-improving extraction system. ### Vision ``` Learned pattern exceeds autonomous threshold (>0.95 confidence, >10 projects) ↓ Auto-generate extractor ↓ Validate against comprehensive test suite ↓ A/B test: run new extractor in shadow mode ↓ If FP rate < 5%: auto-deploy ↓ If FP rate spikes: auto-rollback ``` --- ## Phase 7.8: LLM Prompt Evaluation ✅ > Measure and improve LLM extraction quality through golden fixtures and regression detection. Essential for prompt engineering without breaking existing quality. ### Vision ``` Golden Fixtures (TOML) Evaluation Harness ├── tls-001: verify=False ├── Load fixtures ├── jwt-001: algorithm=none --> ├── Run extraction (live/cached/mock) └── secrets-001: hardcoded key ├── Match against expectations ├── Compute precision/recall/F1 └── Compare to baseline (regression detection) ``` ### 7.8.1 Fixture Format ✅ | Task | Status | |------|--------| | `Fixture` type | ✅ `eval/fixture.rs` — TOML-based test cases | | `ExpectedClaim` | ✅ Subject/predicate/value expectations | | `must_contain` | ✅ Claims that MUST be extracted (recall) | | `must_not_contain` | ✅ Claims that MUST NOT appear (precision) | | `FixtureLoader` | ✅ Load fixtures from directory tree | | `CorpusManifest` | ✅ Corpus metadata + baseline metrics | | Validation | ✅ Duplicate ID, empty content, missing expectations | ```toml # tests/llm_fixtures/tls/tls-001-disabled-verification.toml [metadata] id = "tls-001" name = "TLS verification disabled in Python requests" category = "tls" language = "python" [input] filename = "api_client.py" content = """ response = requests.get(url, verify=False) """ [expected] must_contain = [ { subject = "tls/cert_verification", predicate = "enabled", value = false } ] must_not_contain = [ { subject = "tls/cert_verification", predicate = "enabled", value = true } ] ``` ### 7.8.2 Claim Matching ✅ | Task | Status | |------|--------| | `ClaimMatcher` | ✅ `eval/matcher.rs` — Flexible claim comparison | | Tail-path matching | ✅ Last 2 segments for subject comparison | | Type coercion | ✅ Boolean↔string ("true"/"yes"), number↔string | | Confidence thresholds | ✅ Optional min_confidence per expectation | | `count_false_positives()` | ✅ Detect unexpected claims | ### 7.8.3 Metrics Computation ✅ | Task | Status | |------|--------| | `Metrics` | ✅ `eval/metrics.rs` — Aggregate evaluation metrics | | Precision/Recall/F1 | ✅ Standard information retrieval metrics | | Per-category breakdown | ✅ Metrics by fixture category | | Cost estimation | ✅ Token-based cost tracking | | `BaselineComparison` | ✅ Compare current run to stored baseline | | Regression detection | ✅ Flag if F1/precision/recall drop > threshold | ### 7.8.4 Evaluation Harness ✅ | Task | Status | |------|--------| | `EvalHarness` | ✅ `eval/harness.rs` — Orchestrates evaluation runs | | `EvalMode::Live` | ✅ Real LLM API calls | | `EvalMode::Cached` | ✅ Use cached responses (deterministic CI) | | `EvalMode::Mock` | ✅ No LLM, tests harness itself | | `EvalVerdict` | ✅ Pass, Regression, Review, Error | | `update_baseline()` | ✅ Save current metrics as new baseline | ### 7.8.5 Report Generation ✅ | Task | Status | |------|--------| | `Report` | ✅ `eval/report.rs` — Multi-format output | | Table format | ✅ Terminal tables with color-coded results | | JSON format | ✅ Machine-readable for CI/CD integration | | Markdown format | ✅ Documentation and PR comments | | Failed fixture details | ✅ Shows unmatched expectations with rationale | ### 7.8.6 CLI Commands ✅ | Task | Status | |------|--------| | `aphoria eval run` | ✅ Run evaluation against fixtures | | `aphoria eval baseline` | ✅ Show current baseline metrics | | `aphoria eval update-baseline` | ✅ Update baseline (--force required) | | `aphoria eval list-fixtures` | ✅ List available fixtures by category | | `aphoria eval validate-fixtures` | ✅ Validate fixture format | | `--fail-on-regression` | ✅ Exit code 1 if regression detected | | `--threshold` | ✅ Configurable regression threshold (default 5%) | | `--mode` | ✅ live, cached, or mock | ```bash # Run evaluation in mock mode aphoria eval run --fixtures tests/llm_fixtures --mode mock # CI: fail on regression aphoria eval run --mode cached --fail-on-regression --threshold 0.05 # Update baseline after prompt improvements aphoria eval update-baseline --fixtures tests/llm_fixtures --force # List fixtures by category aphoria eval list-fixtures --category tls ``` ### 7.8.7 Seed Fixtures ✅ | Category | Fixture | Description | |----------|---------|-------------| | tls | tls-001 | Python requests verify=False | | tls | tls-002 | Node.js TLSv1 deprecated protocol | | jwt | jwt-001 | Algorithm 'none' allowed | | jwt | jwt-002 | Go WithoutClaimsValidation | | secrets | secrets-001 | Hardcoded API key | | secrets | secrets-002 | High-entropy JWT in config | | auth | auth-001 | Debug authentication bypass | | negative | negative-001 | Safe TLS config (no findings expected) | | negative | negative-002 | Env-loaded secrets (no findings expected) | | edge | edge-001 | Empty file edge case | **Files:** `eval/mod.rs`, `eval/fixture.rs`, `eval/matcher.rs`, `eval/metrics.rs`, `eval/harness.rs`, `eval/report.rs`, `handlers/eval.rs`, `cli.rs`, `tests/llm_fixtures/` **Documentation:** [docs/llm-optimization/](docs/llm-optimization/index.md) — Full optimization playbook with decision trees, research templates, and baseline tracking. --- ### 9.1 Autonomous Promotion ✅ | Task | Description | Status | |------|-------------|--------| | `AutonomousConfig` | Configuration with kill switch (enabled: false default) | ✅ | | High-confidence threshold | Skip human review for >0.95 confidence | ✅ | | Project threshold | Require >10 projects for autonomous | ✅ | | Validation strictness | Zero failures, zero warnings required | ✅ | | `should_auto_promote()` | Decision logic on `PromotionCandidate` | ✅ | | `auto_promotion_blockers()` | Explains why pattern can't be auto-promoted | ✅ | | `AutonomousAuditLog` | JSONL audit trail for all decisions | ✅ | | `smart_auto_promote_all()` | Pipeline integration with audit logging | ✅ | | YAML header enhancement | "AUTO-PROMOTED" + "Approved by: autonomous" | ✅ | | CLI command | `aphoria extractors auto-promote [--dry-run]` | ✅ | **Safety Features:** - Kill switch: `enabled: false` by default (opt-in only) - Auditability: All decisions logged to `~/.aphoria/audit/autonomous-decisions.jsonl` - Reversibility: Can delete YAML + reset pattern.promoted - Blast radius: One pattern = one YAML file - Traceability: YAML header shows approval source **Files:** `config/types/autonomous.rs`, `promotion/audit.rs`, `promotion/types.rs`, `promotion/pipeline.rs`, `promotion/writer.rs`, `handlers/extractors.rs` **Configuration:** ```toml [autonomous] enabled = true # Master switch (default: false) min_confidence = 0.95 # Stricter than standard 0.8 min_projects = 10 # Stricter than standard 5 require_zero_failures = true require_zero_warnings = true audit_log = true audit_dir = "~/.aphoria/audit/" ``` **CLI Usage:** ```bash # Preview what would be auto-promoted aphoria extractors auto-promote --dry-run # Run autonomous promotion aphoria extractors auto-promote # Override thresholds aphoria extractors auto-promote --min-confidence 0.97 --min-projects 15 ``` ### 9.2 Shadow Mode Testing ✅ | Task | Description | Status | |------|-------------|--------| | `ShadowConfig` | Configuration for shadow mode (min_scans, max_fp_rate, rollback_threshold) | ✅ | | `ShadowTest`, `ShadowStatus`, `ShadowMetrics` | Core types for tracking shadow extractors | ✅ | | `ShadowStore` | JSONL persistence for tests, matches, and decisions | ✅ | | `ShadowExtractorRegistry` | Loads shadow extractors from learned/ directory | ✅ | | `ShadowExecutor` | Runs shadow extractors during scans, stores matches separately | ✅ | | `FeedbackCollector` | TP/FP feedback collection and metrics update | ✅ | | `GraduationManager` | Shadow → production promotion and rollback logic | ✅ | | CLI commands | `shadow-status`, `feedback`, `graduate`, `rollback` | ✅ | **Safety Features:** - Shadow isolation: Matches stored separately, not in production output - Metrics transparency: FP rate visible via `shadow-status` - Graduation gate: Must meet min_scans (100) + max_fp_rate (5%) + feedback exists - Manual control: `rollback` command for immediate removal - Audit trail: All decisions logged to `decisions.jsonl` **Files:** `shadow/mod.rs`, `shadow/types.rs`, `shadow/store.rs`, `shadow/registry.rs`, `shadow/executor.rs`, `shadow/feedback.rs`, `shadow/graduation.rs`, `handlers/shadow.rs`, `config/types/shadow.rs` **Configuration:** ```toml [shadow] enabled = true # Shadow mode on by default min_scans = 100 # Scans before graduation eligible max_fp_rate = 0.05 # Maximum FP rate for graduation rollback_threshold = 0.15 # FP rate that triggers rollback retention_days = 30 # Days to retain shadow data ``` **CLI Usage:** ```bash # View shadow test status aphoria extractors shadow-status [-v] # Provide TP/FP feedback on matches aphoria extractors feedback [--limit 10] # Graduate shadow test to production aphoria extractors graduate [--force] # Rollback a shadow test aphoria extractors rollback --reason "too many FPs" ``` **Tests:** 44 tests covering types, store, registry, executor, feedback, graduation, and auto-rollback. ### 9.3 Auto-Rollback ✅ | Task | Description | Status | |------|-------------|--------| | `auto_rollback_enabled` config | Toggle to enable/disable auto-rollback (default: true) | ✅ | | Feedback-time check | Auto-rollback triggered immediately after FP feedback | ✅ | | `FeedbackWithRollback` return | `record_feedback()` returns rollback info | ✅ | | `AutoRollbackResult` | Track checked count, rolled back names, errors | ✅ | | CLI command | `aphoria extractors auto-check` for manual batch checking | ✅ | | Audit trail | Decision logged as `ShadowDecisionKind::AutoRollback` | ✅ | | YAML deletion | Extractor file deleted from learned/ on rollback | ✅ | **Safety Features:** - Toggle: `auto_rollback_enabled` can disable feature for testing or manual-only workflows - Threshold configurable: `rollback_threshold` in config (default: 15%) - Minimum reviews: Requires 10+ reviewed matches before auto-rollback triggers - Audit trail: All auto-rollback decisions logged to `decisions.jsonl` - CLI fallback: `auto-check` command for manual verification **Files:** `shadow/feedback.rs`, `shadow/graduation.rs`, `config/types/shadow.rs`, `handlers/shadow.rs`, `cli.rs` **Configuration:** ```toml [shadow] enabled = true auto_rollback_enabled = true # NEW: Enable automatic rollback (default: true) rollback_threshold = 0.15 # FP rate that triggers auto-rollback ``` **CLI Usage:** ```bash # Automatic: Rollback happens immediately when feedback pushes FP rate over threshold aphoria extractors feedback --limit 10 # If FP rate exceeds 15%, you'll see: # ⚠️ AUTO-ROLLBACK TRIGGERED: # Manual batch check: Scan all active tests and rollback any over threshold aphoria extractors auto-check # Output: "⚠️ Auto-rolled back 1 of 5 shadow test(s): ..." ``` **Tests:** 3 new tests covering auto-rollback triggering, disabled toggle, and threshold boundary. ### 9.4 Cross-Project Learning ✅ | Task | Description | Status | |------|-------------|--------| | Hosted pattern sync | Patterns from all projects aggregate on server | ✅ | | Global promotion | Promote patterns seen across many orgs | ✅ | | Privacy preservation | Only normalized patterns shared, no code | ✅ | | Opt-in distribution | Orgs can opt-in to receive community extractors | ✅ | ``` Org A: Pattern seen in 3 projects → shared to hosted Org B: Same pattern in 5 projects → shared to hosted Org C: Same pattern in 4 projects → shared to hosted ↓ Hosted aggregates: 12 projects total ↓ Promotes to community extractor ↓ All orgs receive new extractor (if opted in) ``` **Implementation:** - `CrossProjectConfig` with opt-in flags (`contribute_patterns`, `receive_community`) - `PatternSyncer` for uploading anonymized patterns to hosted server - `CommunityExtractorLoader` for pulling community extractors as YAML files - BLAKE3 hashing for pattern deduplication and org anonymization - Privacy guarantees: `normalized_pattern` shared, but NOT `example_code` or `project_hashes` - CLI commands: `aphoria patterns sync`, `aphoria patterns status`, `aphoria patterns pull-community` **Files:** `config/types/cross_project.rs`, `community/pattern_syncer.rs`, `community/extractor_loader.rs`, `handlers/patterns.rs` **Tests:** 7 new tests covering pattern hashing, subject exclusion, anonymization, and extractor loading. ### 9.5 Extractor Versioning ✅ | Task | Description | Status | |------|-------------|--------| | Version tracking | Track which version caught which issues | ✅ `ExtractorVersion` + `VersionStore` | | Changelog | Record changes between versions | ✅ `ExtractorChangelog` + `ChangelogEntry` | | Rollback support | Revert to previous version | ✅ `aphoria extractors rollback-version` | | A/B metrics | Compare versions side-by-side | ✅ `aphoria extractors compare` + `compute_metrics_delta()` | | CLI commands | versions, compare, rollback-version | ✅ Full CLI implementation | | Tests | Unit tests for all components | ✅ 15+ version/changelog tests | **Files:** - `promotion/version.rs` - Core types (`ExtractorVersion`, `ChangelogEntry`, `MetricsDelta`, `ExtractorChangelog`, `VersionStore`) - `promotion/writer.rs` - Versioned YAML output (`write_versioned()`) - `promotion/types.rs` - Version field in `PromotionMetadata` - `handlers/extractors.rs` - CLI handlers (`handle_versions`, `handle_compare`, `handle_rollback_version`) - `cli.rs` - CLI commands (`Versions`, `Compare`, `RollbackVersion`) **CLI Usage:** ```bash # List versions aphoria extractors versions learned_tls_min_version # Version History: learned_tls_min_version # Version Date Changes # ------------------------------------------------------------ # 2 2026-03-15 Added support for YAML configs # 1 2026-02-01 Initial promotion from learned pattern # Compare versions aphoria extractors compare learned_tls_min_version -a 1 -b 2 # Comparison: learned_tls_min_version v1 vs v2 # Matches +15% # False Positives -3% # Rollback aphoria extractors rollback-version learned_tls_min_version --version 1 --reason "v2 edge case bug" # Rolled back learned_tls_min_version to v1 ``` **YAML Output:** ```yaml # Generated from learned pattern. Review before editing. # Pattern ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890 # Version: 2 (previous: 1) # Promoted: 2026-03-15 14:30:00 UTC name: learned_tls_min_version description: TLS minimum version set to deprecated value version: 2 previous_version: 1 languages: - rust - go pattern: '(?i)tls_?min_?(version)?\s*[:=]\s*["\']?(?P1\.[01])["\']?' claim: subject: tls/min_version predicate: version value_from_match: true confidence: 0.97 metadata: source: learned pattern_id: a1b2c3d4-e5f6-7890-abcd-ef1234567890 version: 2 changelog: - version: 2 date: 2026-03-15 changes: "Added support for YAML configs" metrics: matches: "+15%" false_positives: "-3%" - version: 1 date: 2026-02-01 changes: "Initial promotion from learned pattern" ``` ### 9.6 Configuration ⬜ ```toml # aphoria.toml [autonomous] enabled = false # Opt-in to autonomous mode min_confidence = 0.95 # Higher threshold for auto min_projects = 10 # More evidence required shadow_scans = 100 # Scans before promotion max_fp_rate = 0.05 # Auto-rollback threshold [autonomous.distribution] receive_community = true # Receive community extractors contribute_patterns = true # Share patterns to community ``` **Files:** `autonomous/mod.rs`, `autonomous/shadow.rs`, `autonomous/rollback.rs`, `autonomous/distribution.rs` --- ## Milestone Summary | Phase | Deliverable | Depends On | Status | |-------|-------------|------------|--------| | 0 | ConceptPath in StemeDB | concept-hierarchy spec | ✅ | | 2 | Aphoria CLI (scan, report, ack) | Phase 0 | ✅ | | 2A | Concept matching (leaf, alias, auto-alias) | Phase 2 | ✅ | | 1 | Authoritative corpus expansion | Phase 0 | ✅ | | 3 | Claude Code skill + hooks | Phase 2A | ✅ | | 4.5 | Ephemeral scan mode (40x faster) | Phase 2 | ✅ | | 5 | Research agent loop | Phase 3 | ✅ | | 6 | Federated Policy & Trust Packs | Phase 4.5 | ✅ | | **6.5** | **Trust Pack Extensions (Predicate Aliases, Key Rotation)** | Phase 6 | ✅ | | 4A | Observational claims (Tier 4 write-back) | Phase 6 | ✅ | | 4B | Self-conflict detection (drift) | Phase 4A | ✅ | | 4C | Diff-only scanning (--staged) | Phase 4B | ✅ | | 4E | Hosted mode (team aggregation) | Phase 4C | ✅ | | 4D | Enhanced ack (--reason, policy updates) | Phase 4C | ✅ | | 5.6 | Community Corpus Contributions | Phase 4E | ✅ | | 7 | Declarative Extractors | Phase 6 | ✅ | | **7.5** | **LLM-in-the-Loop Extraction (Gemini)** | Phase 7 | ✅ | | **7.6** | **Pattern Learning Store** | Phase 7.5 | ✅ | | **7.7** | **Pattern → Extractor Promotion** | Phase 7.6 | ✅ | | **7.8** | **LLM Prompt Evaluation** | Phase 7.5 | ✅ | | 8 | Enterprise Extractors (8.1-8.11) | Phase 7.5 | ✅ | | **8.2** | **Framework-Specific Extractors (10 frameworks)** | Phase 8 | ✅ | | **9.1** | **Autonomous Promotion** | Phase 8 | ✅ | | **9.2** | **Shadow Mode Testing** | Phase 9.1 | ✅ | | **9.3** | **Auto-Rollback** | Phase 9.2 | ✅ | | **9.4** | **Cross-Project Learning** | Phase 9.1 | ✅ | | **9.5** | **Extractor Versioning** | Phase 9.4 | ✅ | **Current state:** - Phases 0-3, 4.5, 4A-4E, 5, 5.6, 6, 7, 7.5, 7.6, 7.7, 7.8, 8, 9.1, 9.2, 9.3, 9.4, 9.5 complete (clippy clean) - Full corpus: RFC, OWASP, Vendor sources - **36 extractors** including: - Security: weak_crypto, command_injection, sql_injection, high_entropy_secrets, auth_bypass, insecure_cookies, path_traversal, unvalidated_redirects, weak_password, security_headers, insecure_deserialization, ssrf, orm_injection, xxe - Framework-specific: django, express, flask, fastapi, nestjs, nextjs, spring, laravel, rails, aspnet - Trust Packs: signed policy bundles with import/export - Ephemeral mode: 40x faster for CI - Observation write-back: `--sync` records novel claims as Tier 4 project memory - **Autonomous promotion**: High-confidence patterns (>0.95, 10+ projects) can skip human review with full audit trail - **Shadow mode testing**: Auto-promoted extractors run in shadow mode to measure FP rate before graduation - **Auto-rollback**: Shadow extractors exceeding FP threshold (15%) are automatically rolled back - Drift detection: Detects changes from prior observations - Staged scanning: `--staged` flag for fast pre-commit hooks - Hosted mode: Team aggregation via central StemeDB server - Enhanced ack: `--reason` flag, `aphoria update` for policy changes - Community Corpus: Opt-in anonymous pattern sharing with privacy-preserving anonymization - Declarative Extractors: TOML-defined custom extractors without Rust code - LLM Extraction: Gemini-powered semantic claim extraction for high-value files - Pattern Learning: LLM-extracted claims recorded for promotion to declarative extractors - Pattern Promotion: CLI workflow to promote learned patterns to declarative extractors with Gemini regex generation and validation - **LLM Prompt Evaluation**: Golden fixtures with precision/recall metrics, baseline comparison, and regression detection for prompt engineering - **Cross-Project Learning**: Privacy-preserving pattern sync to hosted server, community extractor pull, BLAKE3-based deduplication, opt-in sharing with `CrossProjectConfig` - **Extractor Versioning**: Version tracking with changelogs, safe rollback to previous versions, A/B metrics comparison between versions via `VersionStore` **Phase 9 Complete!** Autonomous Generation pipeline is fully self-improving. ### The Self-Learning Vision ``` Phase 7: Declarative Extractors (foundation) ✅ COMPLETE ↓ Phase 7.5: LLM-in-the-Loop (Gemini semantic extraction) ✅ COMPLETE ↓ Phase 7.6: Pattern Learning (remember what LLM finds) ✅ COMPLETE ↓ Phase 7.7: Pattern Promotion (patterns → extractors) ✅ COMPLETE ↓ Phase 7.8: LLM Prompt Evaluation (measure & improve) ✅ COMPLETE ↓ Phase 8: Enterprise Extractors (36 total) ✅ COMPLETE ├── 8.1: High-entropy secrets ✅ ├── 8.2: Framework extractors (10 frameworks) ✅ ├── 8.3: Config deep parsing ✅ ├── 8.4-8.11: Security patterns ✅ ↓ Phase 9: Autonomous Generation (fully self-improving) ✅ COMPLETE ├── 9.1: Autonomous Promotion ✅ COMPLETE ├── 9.2: Shadow Mode Testing ✅ COMPLETE ├── 9.3: Auto-Rollback ✅ COMPLETE ├── 9.4: Cross-Project Learning ✅ COMPLETE └── 9.5: Extractor Versioning ✅ COMPLETE ``` **The endgame:** Every PR teaches Aphoria. After a month, it knows your security patterns better than your team does. ### Bidirectional Knowledge Sync (Complete) The pre-commit hook is now a bidirectional knowledge sync: 1. **4A** ✅: Record code claims as Tier 4 observations (project memory) 2. **4B** ✅: Detect drift from prior observations (self-conflict) 3. **4C** ✅: Fast diff-only scanning for pre-commit hooks (`--staged`) 4. **4E** ✅: Team aggregation via hosted StemeDB server 5. **4D** ✅: Enhanced ack with rationale and policy updates This transforms Aphoria from a linter into a learning system that builds institutional memory per-project and collective intelligence across teams via hosted mode. --- ## Phase 8: Enterprise Extractor Improvements ✅ > **Goal:** Transform extractors from "toy examples" to enterprise-grade detection that catches real violations in production codebases. ### Current State Audit | Extractor | Languages | Strengths | Weaknesses | |-----------|-----------|-----------|------------| | `tls_verify` | 8 | Multi-lang, configs | Misses custom wrappers | | `tls_version` | 8 | API patterns | Misses semantic (const = "1.0") | | `hardcoded_secrets` | 8 | Placeholders, test files | No entropy detection | | `weak_crypto` | 5 | MD5/SHA1/DES/RC4 | SHA1 false positives, misses bcrypt cost | | `sql_injection` | 5 | Interpolation patterns | Misses ORM unsafe methods | | `jwt_config` | 8 | alg:none, skip sig | Library-specific gaps | | `cors_config` | 8 | Wildcard + credentials | Misses dynamic origin reflection | | `rate_limit` | 8 | Basic patterns | Limited depth | | `timeout_config` | 8 | Basic patterns | Limited depth | | `command_injection` | 5 | exec/system calls | Indirect injection | | `dep_versions` | 3 | Version parsing | No CVE correlation | **Enterprise Reality:** Current extractors catch ~30% of real-world security misconfigurations. Config files are highest value (patterns consistent), code is lowest (semantic understanding required). --- ### 8.1 High-Entropy Secret Detection ✅ **Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete | Task | Status | |------|--------| | `HighEntropySecretsExtractor` | ✅ `extractors/high_entropy_secrets.rs` | | Shannon entropy algorithm | ✅ `shannon_entropy()` with 4.5 threshold | | Charset variety check | ✅ 0.4 minimum variety ratio | | Known secret prefixes | ✅ AWS (AKIA), Stripe (sk_live_, sk_test_), GitHub (ghp_, gho_), GitLab (glpat-), Slack (xox[baprs]-) | | High-entropy context patterns | ✅ api_key, secret, token, credential, auth_key contexts | | False positive exclusions | ✅ UUIDs, git SHAs (40-char hex), file hashes (64-char hex) | | Test file confidence reduction | ✅ 0.6 confidence for test files | | Tests | ✅ 10+ tests covering all patterns | **Configuration:** ```toml # aphoria.toml [extractors.entropy] min_entropy = 4.5 # Shannon entropy threshold min_charset_variety = 0.4 # Unique chars / length ratio min_length = 20 # Minimum string length max_length = 200 # Maximum string length ``` **Languages:** Rust, Go, Python, JavaScript, TypeScript, YAML, TOML, JSON, Dotenv --- ### 8.2 Framework-Specific Extractors ✅ **Impact:** HIGH | **Effort:** HIGH | **Status:** Complete **Research Document:** [`docs/architecture/framework-security-extractors.md`](./docs/architecture/framework-security-extractors.md) All 10 framework-specific extractors implemented and tested: | Framework | Extractor | Languages | Tests | |-----------|-----------|-----------|-------| | Spring Boot | `spring_security` | Java, YAML, Properties | 7 | | Django | `django_security` | Python | 7 | | Express.js | `express_security` | JavaScript, TypeScript | 5 | | Rails | `rails_security` | Ruby, YAML | 6 | | ASP.NET Core | `aspnet_security` | C# (via regex), JSON | 6 | | Laravel | `laravel_security` | PHP (via regex) | 5 | | FastAPI | `fastapi_security` | Python | 5 | | Next.js | `nextjs_security` | JavaScript, TypeScript | 5 | | Flask | `flask_security` | Python | 6 | | NestJS | `nestjs_security` | TypeScript | 5 | **Total:** 10 extractors, 57+ tests, 100+ patterns **Files:** `extractors/{django,express,flask,fastapi,nestjs,nextjs,spring,laravel,rails,aspnet}_security.rs` #### 8.2.1 Spring Boot Security ```yaml # application.yml misconfigs security: basic: enabled: false # Auth disabled csrf: enabled: false # CSRF disabled headers: frame-options: DISABLE # Clickjacking ``` ```java // Java code patterns @EnableWebSecurity public class Config extends WebSecurityConfigurerAdapter { http.csrf().disable(); // CSRF disabled http.authorizeRequests().antMatchers("/**").permitAll(); // Auth bypass } ``` #### 8.2.2 Django Security ```python # settings.py misconfigs DEBUG = True # Debug in production ALLOWED_HOSTS = ['*'] # All hosts CSRF_COOKIE_SECURE = False # Insecure cookies SESSION_COOKIE_SECURE = False ``` #### 8.2.3 Express.js Security ```javascript // Missing security middleware app.use(helmet()); // helmet() should exist app.use(cors({ origin: '*', credentials: true })); // CORS + creds app.disable('x-powered-by'); // Should be disabled ``` #### 8.2.4 Rails Security ```ruby # config/environments/production.rb config.force_ssl = false # Should be true config.action_dispatch.cookies_same_site_protection = :none ``` --- ### 8.3 Config File Deep Parsing ✅ **Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete | Task | Status | |------|--------| | `ConfigValue` enum | ✅ `extractors/config_parser.rs` | | YAML/JSON/TOML parsers | ✅ Using `serde_yaml`, `serde_json`, `toml` | | Tree walker with path tracking | ✅ `walk_config()` with dot-path | | `ConfigSecurityExtractor` | ✅ `extractors/config_security.rs` | | Security rules (11 rules) | ✅ TLS, CSRF, debug, password, cookies, CORS, rate limit | | Dev file exclusion | ✅ Skip debug warnings in dev/test configs | | Tests | ✅ 26 tests for parsing + security rules | **Patterns now caught (nested to any depth):** - `*.tls.verify: false` — TLS verification disabled - `*.insecure_skip_verify: true` — Skip verification enabled - `*.security.enabled: false` — Security disabled - `*.csrf.enabled: false` — CSRF protection disabled - `debug: true` — Debug mode (only in production files) - `*.password.min_length < 8` — Weak password policy - `*.cookie.secure: false` — Cookie secure flag disabled - `*.cookie.httpOnly: false` — Cookie httpOnly disabled - `*.cors.allow_origin: "*"` — CORS allows all origins - `*.rate_limit.enabled: false` — Rate limiting disabled **Languages:** YAML, JSON, TOML --- ### 8.4 Semantic TLS Version Detection ✅ **Impact:** MEDIUM | **Effort:** MEDIUM | **Status:** Complete | Task | Status | |------|--------| | Add `Language::Terraform` variant | ✅ `types/language.rs` | | Semantic pattern (cross-language) | ✅ Catches `TLS_MIN_VERSION = "1.0"` with type annotations | | Environment variable pattern | ✅ `.env` files with `TLS_MIN_VERSION=1.0` | | Terraform HCL pattern | ✅ `min_tls_version = "TLS1_0"` | | Kubernetes camelCase pattern | ✅ `minTLSVersion: VersionTLS10` | | False positive prevention | ✅ TLS 1.2/1.3 not flagged | | Tests | ✅ 16 new tests (27 total for TLS extractor) | **Patterns now caught:** - `const TLS_MIN_VERSION: &str = "1.0";` (Rust with type annotation) - `let sslVersion = "TLSv1";` (JavaScript camelCase) - `TLS_MINIMUM_VERSION = "1.1"` (Python assignment) - `TLS_MIN_VERSION=1.0` (dotenv) - `export SSL_VERSION=TLSv1` (shell export) - `min_tls_version = "TLS1_0"` (Terraform) - `minTLSVersion: VersionTLS10` (Kubernetes YAML) **Languages:** Rust, Go, Python, TypeScript, JavaScript, Yaml, Toml, Json, Terraform, Dotenv --- ### 8.5 ORM SQL Injection Detection ✅ **Impact:** MEDIUM | **Effort:** MEDIUM | **Status:** Complete | Task | Status | |------|--------| | `OrmInjectionExtractor` | ✅ `extractors/orm_injection.rs` | | Django .raw() with interpolation | ✅ `f"SELECT..."`, `.format()` patterns | | Django .extra() with interpolation | ✅ `where=["...{}...".format()]` | | SQLAlchemy text() with interpolation | ✅ `text(f"SELECT...")` | | SQLAlchemy execute() with f-string | ✅ `execute(f"...")` | | Sequelize raw query | ✅ `` sequelize.query(`...${...}`) `` | | TypeORM where() | ✅ `` .where(`...${...}`) `` | | GORM Raw() with Sprintf | ✅ `.Raw(fmt.Sprintf(...))` | | Prisma $queryRawUnsafe | ✅ `` $queryRawUnsafe(`...${...}`) `` | | Tests | ✅ 8+ tests covering all patterns | **Languages:** Python, JavaScript, TypeScript, Go Current `sql_injection` catches raw string interpolation but misses ORM escape hatches: ```python # SQLAlchemy db.execute(text(f"SELECT * FROM users WHERE id = {user_id}")) User.query.filter(text("name = '" + name + "'")) # Django User.objects.raw("SELECT * FROM users WHERE id = %s" % user_id) User.objects.extra(where=["name = '%s'" % name]) ``` ```javascript // Sequelize sequelize.query(`SELECT * FROM users WHERE id = ${userId}`); Model.findAll({ where: sequelize.literal(`id = ${id}`) }); // Prisma prisma.$queryRawUnsafe(`SELECT * FROM users WHERE id = ${id}`); ``` ```ruby # ActiveRecord User.where("name = '#{name}'") User.find_by_sql("SELECT * FROM users WHERE id = #{id}") ``` --- ### 8.6 Authentication Bypass Patterns ✅ **Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete | Task | Status | |------|--------| | `AuthBypassExtractor` | ✅ `extractors/auth_bypass.rs` | | Hardcoded admin credentials | ✅ `username == "admin" && password == "..."` patterns | | Debug auth headers | ✅ X-Debug-Auth, X-Internal-Auth, X-Admin-Auth | | Skip auth env vars | ✅ SKIP_AUTH, BYPASS_AUTH, NO_AUTH, DEBUG_AUTH | | Backdoor patterns | ✅ `if username == "backdoor"`, `if user == "test"` | | Default credentials | ✅ admin/admin, root/root, test/test, guest/guest | | Test file confidence reduction | ✅ 0.5 confidence for test files | | Tests | ✅ 11+ tests covering all patterns | **Detected patterns:** ```python # Hardcoded credentials if username == "admin" and password == "admin": # Debug auth headers if request.headers.get("X-Debug-Auth") == "secret": # Skip auth env vars if os.environ.get("SKIP_AUTH") == "true": ``` **Languages:** Python, JavaScript, TypeScript, Go, Rust --- ### 8.7 Insecure Deserialization ✅ **Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete | Task | Status | |------|--------| | `InsecureDeserializationExtractor` | ✅ `extractors/insecure_deserialization.rs` | | Python pickle (critical) | ✅ `pickle.load()`, `pickle.loads()`, `Unpickler()` | | Python yaml.load without SafeLoader | ✅ Detects missing SafeLoader | | Python marshal | ✅ `marshal.load()`, `marshal.loads()` | | Python eval/exec with user input | ✅ `eval(request...)`, `exec(user...)` | | JavaScript node-serialize | ✅ `require('node-serialize')`, `.unserialize()` | | Go gob decoder | ✅ `gob.NewDecoder()`, `gob.Decode()` | | Java ObjectInputStream (polyglot) | ✅ `ObjectInputStream`, `readObject()` | | Tests | ✅ 10+ tests covering all patterns | **Languages:** Python, JavaScript, TypeScript, Go Unsafe deserialization of untrusted data: ```python # Python pickle.loads(user_input) yaml.load(user_input) # Without Loader=SafeLoader eval(user_input) exec(user_input) ``` ```java // Java ObjectInputStream ois = new ObjectInputStream(userInput); ois.readObject(); // Dangerous! ``` ```ruby # Ruby Marshal.load(user_input) YAML.load(user_input) # Should use safe_load ``` --- ### 8.8 Path Traversal Patterns ✅ **Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete | Task | Status | |------|--------| | `PathTraversalExtractor` | ✅ `extractors/path_traversal.rs` | | Python open/read/write with user input | ✅ `open(request...)`, `read(params...)` | | Python os.path.join with user input | ✅ `os.path.join(base, request...)` | | JavaScript fs operations | ✅ `fs.readFile(req...)`, `fs.writeFile(params...)` | | JavaScript path.join/resolve | ✅ `path.join(base, req.query...)` | | JavaScript res.sendFile | ✅ `res.sendFile(req.params...)` | | Go filepath operations | ✅ `filepath.Join(base, r...)`, `os.Open(req...)` | | Rust path operations | ✅ `Path::new(request...)`, `std::fs::read(user...)` | | Traversal literals | ✅ `../`, `%2e%2e` URL-encoded patterns | | Tests | ✅ 8+ tests covering all patterns | **Languages:** Python, JavaScript, TypeScript, Go, Rust File operations with user input: ```python # Python open(user_input) os.path.join(base, user_input) # Doesn't prevent ../ shutil.copy(user_input, dest) ``` ```javascript // JavaScript fs.readFile(userInput) path.join(base, userInput) // Doesn't prevent ../ res.sendFile(userInput) ``` --- ### 8.9 SSRF Patterns ✅ **Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete | Task | Status | |------|--------| | `SsrfExtractor` | ✅ `extractors/ssrf.rs` | | Python requests library | ✅ `requests.get(url)`, `requests.post(target)` | | Python urllib | ✅ `urllib.request.urlopen(url)` | | Python httpx | ✅ `httpx.get(url)`, `AsyncClient` | | JavaScript fetch | ✅ `fetch(url)`, `fetch(req.query...)` | | JavaScript axios | ✅ `axios.get(url)`, `axios.post(target)` | | JavaScript got | ✅ `got(url)` | | Go http.Get/Post | ✅ `http.Get(url)`, `http.NewRequest(...)` | | Rust reqwest | ✅ `reqwest::get(url)`, `reqwest::Client` | | URL sink patterns | ✅ `proxy_url`, `webhook_url`, `callback_url` from request | | Tests | ✅ 10+ tests covering all patterns | **Languages:** Python, JavaScript, TypeScript, Go, Rust HTTP requests with user-controlled URLs: ```python # Python requests.get(user_url) urllib.request.urlopen(user_input) ``` ```javascript // JavaScript fetch(userUrl) axios.get(userUrl) http.get(userUrl) ``` ```go // Go http.Get(userURL) client.Do(req) // Where req.URL is user-controlled ``` --- ### 8.10 Missing Security Headers ✅ **Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete | Task | Status | |------|--------| | `SecurityHeadersExtractor` | ✅ `extractors/security_headers.rs` | | X-Frame-Options disabled | ✅ `X-Frame-Options: none`, `ALLOWALL` | | X-Content-Type-Options disabled | ✅ `X-Content-Type-Options: disabled` | | X-XSS-Protection disabled | ✅ `X-XSS-Protection: false` | | Django SECURE_* settings | ✅ `SECURE_BROWSER_XSS_FILTER = False`, etc. | | YAML headers disabled | ✅ `x_frame_options: false`, `hsts: no` | | CSP disabled or unsafe | ✅ `unsafe-inline`, `unsafe-eval` directives | | HSTS disabled | ✅ `Strict-Transport-Security: none`, `hsts_seconds = 0` | | Tests | ✅ 7+ tests covering all patterns | **Languages:** Python, JavaScript, TypeScript, Go, YAML, JSON, TOML Detect when security headers are explicitly removed or not set: ```python # Response headers missing response.headers.pop('X-Content-Type-Options') response.headers['X-Frame-Options'] = 'ALLOWALL' ``` ```javascript // Express without helmet app.use(cors()); // CORS without other security // No app.use(helmet()) found ``` --- ### 8.11 Insecure Cookie Flags ✅ **Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete | Task | Status | |------|--------| | `InsecureCookiesExtractor` | ✅ `extractors/insecure_cookies.rs` | | Missing Secure flag | ✅ `secure=False`, `secure: false` | | Missing HttpOnly flag | ✅ `httponly=False`, `httpOnly: false` | | SameSite=None without Secure | ✅ `sameSite: 'none'`, `SameSite=None` | | Django settings | ✅ SESSION_COOKIE_SECURE, CSRF_COOKIE_SECURE = False | | Go cookie patterns | ✅ `Secure: false`, `HttpOnly: false` | | Rust actix-web patterns | ✅ `.secure(false)`, `.http_only(false)` | | Test file confidence reduction | ✅ 0.5 confidence for test files | | Tests | ✅ 8+ tests covering all patterns | **Detected patterns:** ```python # Python/Flask/Django response.set_cookie('session', value, secure=False) SESSION_COOKIE_SECURE = False ``` ```javascript // JavaScript/Express res.cookie('session', value, { httpOnly: false }); res.cookie('auth', value, { sameSite: 'none' }); ``` **Languages:** Python, JavaScript, TypeScript, Go, Rust, Ruby, YAML --- ### 8.12 Unvalidated Redirects ✅ **Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete | Task | Status | |------|--------| | `UnvalidatedRedirectsExtractor` | ✅ `extractors/unvalidated_redirects.rs` | | Python redirect with user input | ✅ `redirect(request.GET['next'])`, `HttpResponseRedirect(url)` | | Python Flask redirect | ✅ `redirect(request.args.get(...))` | | JavaScript res.redirect | ✅ `res.redirect(req.query.next)` | | JavaScript window.location | ✅ `window.location = url`, `location.href = params...` | | Go http.Redirect | ✅ `http.Redirect(w, r, r.Query...)` | | URL parameter patterns | ✅ `redirect_url`, `return_url`, `next`, `goto` from request | | Tests | ✅ 7+ tests covering all patterns | **Languages:** Python, JavaScript, TypeScript, Go Open redirect vulnerabilities: ```python # Python return redirect(request.args.get('next')) return redirect(request.GET['url']) ``` ```javascript // JavaScript res.redirect(req.query.redirect); window.location = userInput; window.location.href = params.url; ``` --- ### 8.13 XXE (XML External Entity) ✅ **Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete | Task | Status | |------|--------| | `XxeExtractor` | ✅ `extractors/xxe.rs` | | Python lxml/etree | ✅ `etree.parse()`, `lxml.fromstring()` | | Python xml.etree.ElementTree | ✅ `ET.parse()`, `ET.fromstring()` | | Python xml.dom.minidom | ✅ `minidom.parse()`, `minidom.parseString()` | | Python xml.sax | ✅ `xml.sax.parse()`, `xml.sax.make_parser()` | | JavaScript xml2js | ✅ `xml2js.parseString()`, `xml2js.Parser()` | | JavaScript libxmljs | ✅ `libxmljs.parseXml()` | | Go encoding/xml | ✅ `xml.Unmarshal()`, `xml.NewDecoder()` | | Java patterns (polyglot) | ✅ `DocumentBuilderFactory`, `SAXParser`, `XMLReader` | | DTD entity declarations | ✅ ``, `` | | defusedxml detection | ✅ Lower confidence when defusedxml is imported | | Tests | ✅ 9+ tests covering all patterns | **Languages:** Python, JavaScript, TypeScript, Go Unsafe XML parsing: ```python # Python etree.parse(user_input) # Without disabling entities xml.etree.ElementTree.parse(user_input) ``` ```java // Java DocumentBuilderFactory.newInstance() // Without setFeature to disable XXE SAXParserFactory.newInstance() // Without secure processing ``` --- ### 8.14 Weak Password Requirements ✅ **Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete | Task | Status | |------|--------| | `WeakPasswordExtractor` | ✅ `extractors/weak_password.rs` | | Minimum length < 8 | ✅ `password_min_length: 6`, `minLength: 4` | | Bcrypt cost < 10 | ✅ `bcrypt_cost = 8`, `hash_rounds = 5` | | Simple length checks | ✅ `len(password) >= 6` in code | | Complexity disabled | ✅ `require_special_chars: false`, `require_uppercase = false` | | Number requirement disabled | ✅ `require_numbers: no`, `require_digit = 0` | | Tests | ✅ 7+ tests covering all patterns | **Languages:** Python, JavaScript, TypeScript, Go, Rust, YAML, JSON, TOML Password validation that's too weak: ```python # Python if len(password) >= 4: # Too short if len(password) >= 6: # Still weak MIN_PASSWORD_LENGTH = 6 # Config too low ``` ```javascript // JavaScript if (password.length >= 4) const MIN_LENGTH = 6; /^.{4,}$/ // Regex allows 4+ chars ``` --- ### 8.15 LLM-Assisted Extraction (Future) ⬜ **Impact:** VERY HIGH | **Effort:** VERY HIGH Use Claude to understand code semantically: ```rust // Pseudo-implementation async fn extract_with_llm(code: &str, file: &str) -> Vec { let prompt = format!( "Analyze this code for security issues. Return JSON with:\n\ - concept_path: security concept (e.g., 'tls/cert_verification')\n\ - predicate: what aspect (e.g., 'enabled')\n\ - value: the value found\n\ - confidence: 0.0-1.0\n\ - description: why this is an issue\n\n\ Code:\n```\n{}\n```", code ); let response = claude_api.message(&prompt).await?; parse_claims_from_llm_response(&response) } ``` **When to use:** - High-value files (auth, crypto, config) - After regex extractors find nothing - For code review mode (not CI) **Considerations:** - Cost per scan - Latency - Rate limits - Privacy (code leaves machine) --- ### Implementation Priority | Phase | Extractors | Impact | Effort | Enterprise Value | Status | |-------|------------|--------|--------|------------------|--------| | **8.1** | High-entropy secrets | HIGH | MEDIUM | Catches real leaked secrets | ✅ | | **8.2** | Framework-specific | HIGH | HIGH | Spring/Django/Express coverage | ✅ | | **8.3** | Config deep parsing | HIGH | MEDIUM | Nested YAML/JSON understanding | ✅ | | **8.4** | Semantic TLS | MEDIUM | MEDIUM | Catches const TLS_MIN = "1.0" | ✅ | | **8.5** | ORM SQL injection | MEDIUM | MEDIUM | SQLAlchemy, Django, Sequelize | ✅ | | **8.6** | Auth bypass | HIGH | MEDIUM | Backdoors, hardcoded creds | ✅ | | **8.7** | Deserialization | HIGH | MEDIUM | pickle, Marshal, eval | ✅ | | **8.8** | Path traversal | MEDIUM | LOW | ../../../etc/passwd | ✅ | | **8.9** | SSRF | HIGH | MEDIUM | Internal network access | ✅ | | **8.10** | Security headers | MEDIUM | LOW | Missing helmet(), CSP | ✅ | | **8.11** | Cookie flags | MEDIUM | LOW | httpOnly, secure, sameSite | ✅ | | **8.12** | Open redirects | MEDIUM | LOW | Phishing via redirect | ✅ | | **8.13** | XXE | HIGH | MEDIUM | XML entity injection | ✅ | | **8.14** | Weak passwords | MEDIUM | LOW | MIN_LENGTH = 4 | ✅ | | **8.15** | LLM extraction | VERY HIGH | VERY HIGH | Semantic understanding | ✅ (Phase 7.5) | **Phase 8 Complete (8.1-8.14):** All extractors implemented including 10 framework-specific extractors (Spring, Django, Express, Rails, ASP.NET, Laravel, FastAPI, Next.js, Flask, NestJS). --- ### Success Metrics | Metric | Current | Target | How to Measure | |--------|---------|--------|----------------| | Detection rate (known vulns) | ~30% | >70% | Run against OWASP benchmark | | False positive rate | Unknown | <10% | Manual review of 100 findings | | Config file coverage | Regex only | Full parse | Structure-aware extraction | | Framework coverage | 0 | 4 major | Spring, Django, Express, Rails | | Enterprise pilot feedback | N/A | >4/5 | Post-pilot survey | --- ## Phase 10: UX & Enterprise Polish ⬜ > **Goal:** Address enterprise buyer feedback from pilot demos. Close gaps between pitch claims and actual functionality. > **Source:** Skeptical buyer review of `applications/aphoria-pitch/` materials. ### 10.1 Acknowledgment Expiry ✅ **Impact:** HIGH | **Effort:** MEDIUM | **Priority:** P1 Add `--expires` flag to `aphoria ack` command for time-limited exceptions. | Task | Status | |------|--------| | Add `expires_at: Option` to `AcknowledgmentInfo` struct (ISO 8601 format) | ✅ | | Add `--expires` CLI flag to `Commands::Ack` in `cli.rs` | ✅ | | Parse durations: `--expires 90d`, `--expires 2026-12-31` (ISO 8601 date only) | ✅ | | Filter expired acks in `check_conflicts()` | ✅ | | Show "Ack expired, resurfaces as BLOCK" in output | ✅ | | Add expiry to JSON export for audit trail | ✅ | | Tests for expiry parsing and behavior | ✅ | **Implementation Notes:** - Created `src/expiry.rs` module with `parse_expiry()`, `is_expired()`, and `format_expiry()` functions - Ack payloads stored as JSON with `{reason, expires_at}` for backwards compatibility - Legacy plain-text acks treated as permanent (no expiry) - Expired acks preserved for audit trail per patent claim 25 - Updated all report formatters (table, JSON, markdown) to show expiry info **CLI changes (`cli.rs`):** ```rust Ack { concept_path: String, #[arg(short, long)] reason: String, /// Optional expiry (e.g., "90d", "2026-12-31") #[arg(long)] expires: Option, }, ``` **Usage:** ```bash # Expire after 90 days aphoria ack code://go/auth/tls/cert_verification \ --reason "Integration test environment" \ --expires 90d # Expire on specific date (ISO 8601) aphoria ack code://go/auth/tls/cert_verification \ --reason "Legacy migration - ends Q2" \ --expires 2026-12-31 ``` **Output after expiry:** ``` BLOCK code://go/auth/tls/cert_verification Your code: TLS certificate verification is disabled (main.go:12) Note: Previous acknowledgment expired 2026-12-31 Action: Re-acknowledge or fix the issue ``` **Enterprise Value:** "Exceptions don't become permanent." SOC 2 auditors love time-limited exceptions because they force periodic review. --- ### 10.2 Human-Readable Signer Names ⬜ **Impact:** MEDIUM | **Effort:** MEDIUM | **Priority:** P2 Map issuer hex IDs to human-readable team names in output. | Task | Status | |------|--------| | Add `signer_name: Option` to `PackHeader` | ⬜ | | Add `contact: Option` to `PackHeader` (Slack channel, email) | ⬜ | | Update `policy export/import` to preserve new fields | ⬜ | | Show "Signed by Platform Security Team" instead of hex in output | ⬜ | | Show contact info in conflict output | ⬜ | | Backward-compat: gracefully handle packs without new fields | ⬜ | **Output with signer name:** ``` BLOCK code://go/auth/tls/cert_verification Your code: TLS certificate verification is disabled (main.go:12) Source: Acme Security Standard v3.2 (Platform Security Team) Contact: #security-policy Action: Fix or acknowledge with: aphoria ack --reason "..." ``` **Enterprise Value:** Developers know who to contact. Auditors see clear attribution. --- ### 10.3 Speed Benchmarks ⬜ **Impact:** LOW | **Effort:** LOW | **Priority:** P3 Document and automate speed benchmark testing. | Task | Status | |------|--------| | Create `benchmarks/` directory with test corpora | ⬜ | | Automate `time aphoria scan` on standard corpus | ⬜ | | Document test conditions in benchmark results | ⬜ | | Add `aphoria scan --benchmark` flag for self-test | ⬜ | | Include benchmarks in CI (optional, non-blocking) | ⬜ | **Usage:** ```bash # Run benchmark on current directory aphoria scan --benchmark # Output includes timing breakdown Benchmark Results: Files scanned: 767 Lines of code: 187,918 Claims extracted: 722 Conflicts found: 186 Total time: 652ms - File discovery: 45ms - Extraction: 487ms - Conflict query: 120ms ``` **Enterprise Value:** "Show me the benchmark on a 100K-line codebase" → `aphoria scan --benchmark` --- ### Phase 10 Completion Criteria | Metric | Target | |--------|--------| | Ack expiry working with 90d default | ✓ | | Demo output matches pitch slides exactly | ✓ | | Buyer can see who signed a policy (name, not hex) | ✓ | | Buyer can see how to contact policy owner | ✓ | | Speed benchmarks documented and reproducible | ✓ | --- ## Phase 11: Evidence-Based Authority ✅ > **Vision:** Authority comes from evidence, not titles. Merit over tenure. **Problem:** All patterns treated equally. A random commit carries the same weight as a pattern backed by RFC research and product specs. **Principle:** The system rewards documentation, not tenure. ### Evidence Levels | Level | Example | Authority Weight | Graduation Threshold | |-------|---------|------------------|---------------------| | ProductSpec | `specs/api-design.md → REQ-API-001` | 0.95 | 1 usage | | Standard | RFC 7519, OWASP A03:2021 | 0.85 | 3 usages | | Research | ADR-042, docs/decision-log.md | 0.70 | 5 usages | | Commit | Just code, no context | 0.40 | 10 usages | ### 11.1 Evidence Level Types ✅ | Task | Status | |------|--------| | Create `src/evidence/mod.rs` module | ✅ | | Define `EvidenceLevel` enum (Commit, Research, Standard, ProductSpec) | ✅ | | Implement `authority_weight()` method | ✅ | | Add evidence level to `LearnedPattern` struct | ✅ | | Update pattern display to show evidence level | ✅ | ### 11.2 Evidence Source Detection ✅ | Task | Status | |------|--------| | Create `EvidenceSource` enum | ✅ | | Implement commit message parsing for RFC/standard references | ✅ | | Implement ADR file detection (docs/adr/*.md patterns) | ✅ | | Implement spec file detection (specs/*.md, *.spec.md) | ✅ | | Add `PatternEvidence::detect()` auto-detection | ✅ | ### 11.3 Evidence-Aware Graduation ✅ | Task | Status | |------|--------| | Update `GraduationManager` thresholds based on evidence | ✅ | | ProductSpec: 1 usage → promotion candidate | ✅ | | Standard: 3 usages → promotion candidate | ✅ | | Research: 5 usages → promotion candidate | ✅ | | Commit-only: 10 usages → promotion candidate | ✅ | | Add evidence boost to shadow mode evaluation | ✅ | ### 11.4 Evidence Display ✅ | Task | Status | |------|--------| | Update `aphoria patterns show` to display evidence chain | ✅ | | Show evidence level badge in table/JSON output | ✅ | | Show linked sources (ADR, spec, RFC) in conflict output | ✅ | | Add `--evidence` flag to filter patterns by evidence level | ✅ | ### Phase 11 Completion Criteria | Metric | Target | |--------|--------| | Evidence detection working for 4 source types | ✅ | | Graduation thresholds vary by evidence level | ✅ | | Pattern display shows evidence chain | ✅ | | ProductSpec-backed patterns graduate with 1 usage | ✅ | ### Implementation Notes **Files Created:** - `src/evidence/mod.rs` - Module exports with flow documentation - `src/evidence/types.rs` - `EvidenceLevel`, `EvidenceSource`, `PatternEvidence` types - `src/evidence/detection.rs` - `EvidenceDetector` with regex-based parsing **Files Modified:** - `src/learning/types.rs` - Added `evidence` field to `LearnedPattern` - `src/learning/store.rs` - Added `get_all_patterns()`, `get_pattern_by_id()` - `src/shadow/types.rs` - Added `evidence_level`, `evidence_sources` to `ShadowTest` - `src/shadow/graduation.rs` - Added `effective_min_scans()`, `meets_evidence_aware_criteria()` - `src/cli.rs` - Added `Show` variant to `PatternCommands` - `src/handlers/patterns.rs` - Implemented `handle_pattern_show()` **Tests:** 29 evidence tests + 15 graduation tests passing (817 total) --- ## Phase 12: Knowledge Scope Hierarchy ✅ > **Vision:** Knowledge applies at the right level - org, team, or project. **Problem:** All knowledge exists at one flat level. No way to say "this applies org-wide" vs "this is just our team's preference." ### Scope Levels ``` Organization Level (applies to all teams) ├── Security policies (TLS, auth, secrets) - NO opt-out ├── Compliance requirements (GDPR, SOC 2) └── Architecture decisions (API gateway, event bus) Team Level (applies to team's projects) ├── Coding conventions (naming, error handling) ├── Technology choices (frameworks, libraries) └── Domain patterns (payment flows, user lifecycle) Project Level (applies to single project) ├── Local overrides (justified exceptions) ├── Experimental patterns (not yet proven) └── Context-specific decisions ``` ### 12.1 Scope Level Types ✅ | Task | Status | |------|--------| | Create `src/scope/mod.rs` module | ✅ | | Define `ScopeLevel` enum (Organization, Team, Project) | ✅ | | Add `scope_level` and `scope_id` to `LearnedPattern` | ✅ | | Add `ScopeConfig` to `.aphoria.toml` | ✅ | | Implement `--scope` flag for CLI commands | ✅ | ### 12.2 Scope Inheritance ✅ | Task | Status | |------|--------| | Implement inheritance resolution (project → team → org) | ✅ | | Security policies: auto-apply, no opt-out | ✅ | | Conventions: auto-apply, teams can override with justification | ✅ | | Observations: never inherited, team-specific only | ✅ | | Add `ScopedKnowledge` struct with `inherited_from` chain | ✅ | ### 12.3 Scope Override Workflow ✅ | Task | Status | |------|--------| | Implement `aphoria scope override` command | ✅ | | Require justification for overrides | ✅ | | Require evidence link (spec, ADR, ticket) for overrides | ✅ | | Store override audit trail | ✅ | | Show overrides in SOC 2 reports | ⬜ | ### 12.4 Cross-Scope Queries ✅ | Task | Status | |------|--------| | `aphoria patterns --scope org` (org-level only) | ✅ | | `aphoria patterns --scope team --exclude-inherited` | ✅ | | `aphoria patterns --scope project --only-local` | ✅ | | Show scope in pattern list output | ✅ | ### Phase 12 Completion Criteria | Metric | Target | |--------|--------| | 3 scope levels working (org/team/project) | ✅ | | Inheritance resolution correct | ✅ | | Overrides require justification + evidence | ✅ | | Cross-scope queries functional | ✅ | **Implementation Notes:** - `src/scope/mod.rs` - ScopeLevel, ScopeId, ScopeContext with inheritance chain - `src/scope/config.rs` - ScopeConfig for aphoria.toml - `src/scope/resolver.rs` - ScopeResolver with Replace/Merge/NoInherit policies - `src/scope/override_record.rs` - ScopeOverride with OverrideValue, expiration - `src/scope/store.rs` - OverrideStore with persistence to ~/.aphoria/scope/ - `src/handlers/scope.rs` - CLI command handlers (status, override, list, remove) **Tests:** 884 tests passing, all scope tests passing --- ## Phase 13: Knowledge Lifecycle Management ✅ > **Vision:** Knowledge ages. Patterns can be deprecated and superseded. **Problem:** Knowledge exists forever. No way to deprecate patterns or track evolution. ### Knowledge Status ``` Active → Pattern is current, enforced Deprecated → Pattern is being phased out, migration guidance provided Superseded → Pattern replaced by another, link to replacement Archived → Pattern removed from active use, historical only ``` ### 13.1 Knowledge Status Types ✅ | Task | Status | |------|--------| | Create `src/lifecycle/mod.rs` module | ✅ | | Define `KnowledgeStatus` enum | ✅ | | Add `Deprecated` variant with reason, superseded_by, sunset_date | ✅ | | Add `KnowledgeLifecycle` struct with status history | ✅ | | Store lifecycle in pattern metadata | ✅ | ### 13.2 Deprecation Command ✅ | Task | Status | |------|--------| | Implement `aphoria deprecate ` command | ✅ | | Require `--reason` flag | ✅ | | Optional `--superseded-by ` | ✅ | | Optional `--sunset-date ` | ✅ | | Notify connected teams on deprecation | ⬜ | ### 13.3 Migration Guidance ✅ | Task | Status | |------|--------| | Show deprecation warning in scan output | ✅ | | Link to superseding pattern when available | ✅ | | Show migration guide/ADR when linked | ✅ | | FLAG (not BLOCK) deprecated pattern usage | ✅ | | Track migration progress across projects | ✅ | ### 13.4 Migration Tracking Dashboard ✅ | Task | Status | |------|--------| | Implement `aphoria migrations status` command | ✅ | | Show progress by team (X/Y endpoints migrated) | ✅ | | Show days remaining until sunset | ✅ | | Show blockers (acknowledged exceptions) | ✅ | | Export migration status for reporting | ✅ | ### Phase 13 Completion Criteria | Metric | Target | |--------|--------| | Deprecation command working | ✅ | | Deprecated patterns show warning in scan | ✅ | | Migration tracking across projects | ✅ | | SOC 2 report includes migration status | ⬜ | **Implementation Notes:** - `src/lifecycle/mod.rs` - KnowledgeStatus, KnowledgeLifecycle, StatusTransition - `src/lifecycle/store.rs` - LifecycleStore for persistence - `src/lifecycle/migration.rs` - MigrationStore, MigrationProgress tracking - `src/handlers/lifecycle.rs` - CLI handlers for deprecate, archive, reactivate, history, list - `src/handlers/lifecycle.rs` - Migration handlers for status, export, blockers - `KnowledgeLifecycle` added to `LearnedPattern` for pattern-level lifecycle tracking **Tests:** 884 tests passing (35 lifecycle-specific tests) --- ## Phase 14: Governance Workflows 🎯 > **Vision:** Clear approval paths for pattern promotion with audit trails. **Problem:** Governance is binary: manual review or >0.95 auto-promote. No structured approval workflows. ### 14.1 Approval Workflow Definition ⬜ | Task | Status | |------|--------| | Create `src/governance/mod.rs` module | ⬜ | | Define `ApprovalWorkflow` struct | ⬜ | | Define `ApprovalStage` with required approvers | ⬜ | | Support evidence-based auto-approve thresholds | ⬜ | | Config: define workflows in `.aphoria.toml` | ⬜ | ### 14.2 Approval State Machine ⬜ | Task | Status | |------|--------| | Implement state transitions (pending → approved/rejected) | ⬜ | | Multi-stage approval support | ⬜ | | Timeout and escalation policies | ⬜ | | Store approval history with timestamps | ⬜ | ### 14.3 Approval CLI ⬜ | Task | Status | |------|--------| | `aphoria governance pending` - list pending approvals | ⬜ | | `aphoria governance approve --comment "..."` | ⬜ | | `aphoria governance reject --reason "..."` | ⬜ | | `aphoria governance escalate ` | ⬜ | | Show approval status in pattern list | ⬜ | ### 14.4 SOC 2 Audit Trail ⬜ | Task | Status | |------|--------| | Full audit log for all governance actions | ⬜ | | `aphoria audit trail --pattern ` - show timeline | ⬜ | | Export governance history for auditors | ⬜ | | Include approver identity and timestamp | ⬜ | ### Phase 14 Completion Criteria | Metric | Target | |--------|--------| | Multi-stage approval working | ✓ | | Approval/reject with comments | ✓ | | Full audit trail exportable | ✓ | | SOC 2 evidence includes approval chain | ✓ | --- ## Phase 15: Evidence Source Integration ⬜ > **Vision:** ADRs, specs, and standards automatically link to patterns. **Problem:** Evidence sources aren't automatically detected. Developers must manually reference them. ### 15.1 ADR Auto-Detection ⬜ | Task | Status | |------|--------| | Create `src/evidence/adr.rs` | ⬜ | | Detect ADR-XXX patterns in commit messages | ⬜ | | Scan for ADR files in standard locations | ⬜ | | Parse ADR content for related patterns | ⬜ | | Link ADR to patterns automatically | ⬜ | ### 15.2 Spec File Detection ⬜ | Task | Status | |------|--------| | Create `src/evidence/spec.rs` | ⬜ | | Detect spec files (specs/*.md, *.spec.md) | ⬜ | | Parse requirement IDs (REQ-XXX) | ⬜ | | Link requirements to patterns | ⬜ | | Show requirement coverage in reports | ⬜ | ### 15.3 Standard Reference Extraction ⬜ | Task | Status | |------|--------| | Create `src/evidence/standards.rs` | ⬜ | | Parse RFC references (RFC 7519) | ⬜ | | Parse OWASP references (OWASP A03:2021) | ⬜ | | Parse NIST references (NIST SP 800-53) | ⬜ | | Auto-link to authoritative corpus | ⬜ | ### 15.4 Evidence Display ⬜ | Task | Status | |------|--------| | Show full evidence chain in pattern output | ⬜ | | Link to source files (ADR, spec) | ⬜ | | Show external standard references | ⬜ | | `aphoria patterns --by-evidence` grouping | ⬜ | ### Phase 15 Completion Criteria | Metric | Target | |--------|--------| | ADR auto-detection working | ✓ | | Spec file linking working | ✓ | | Standard references extracted | ✓ | | Evidence chain visible in output | ✓ | --- ## Enterprise Pilot Success Metrics ### 90-Day Pilot Targets | Metric | Target | Measurement | |--------|--------|-------------| | Patterns captured | 100+ observations | Count in knowledge graph | | Patterns promoted | 10+ conventions | Count with status=Active | | Cross-team adoption | 2+ teams connected | Unique team_ids | | New hire guidance events | 5+ accepted suggestions | Accept rate tracking | | False positive rate | <10% | FP feedback / total flags | | Evidence-backed patterns | >50% | Patterns with Research+ evidence | ### 180-Day Production Targets | Metric | Target | Measurement | |--------|--------|-------------| | Knowledge retention | 0 lost patterns on departures | Audit log | | Onboarding velocity | 50% faster ramp | Time to first PR | | Convention adoption | 80% across org | Compliance rate | | SOC 2 evidence | Audit pass | External validation | | Deprecated pattern migration | 90% complete by sunset | Migration tracking | --- ## Enterprise Simulation UAT See: `uat/enterprise-simulation-uat.md` 6-month simulation covering: - Month 1: Platform team adopts, baseline patterns captured - Month 2: Payments team joins, cross-team patterns emerge - Month 3: New hire guided by existing patterns - Month 4: Mobile team joins, org-level promotion - Month 5: API versioning deprecated, migration tracked - Month 6: SOC 2 audit evidence generated