stemedb/applications/aphoria/roadmap.md

# Aphoria Roadmap

> Completed phases archived in [`roadmap-archive.md`](./roadmap-archive.md)

---

## Status Overview

| Phase | Deliverable | Status |
|-------|-------------|--------|
| 0–9, 11–13, 16–17 | Core CLI, Extractors (42), LLM, Learning, Enterprise, Lifecycle, Pattern Enrichment | ✅ Archived |
| 10 | UX & Enterprise Polish | 🔄 Partial (10.1 ✅, 10.2–10.3 ⬜) |
| 14 | Governance Workflows | 🎯 Current |
| 15 | Evidence Source Integration | ⬜ Future |
| A6 | AST-Aware Observation & Claim Verification | ⬜ Future |

### Current State

- 42 built-in extractors + declarative custom extractors
- Full corpus: RFC, OWASP, Vendor sources
- Ephemeral mode (~0.25s), persistent mode with drift detection
- Observation/claim distinction (A1–A5 complete, see main `roadmap.md`)
- `aphoria verify run|map` for claim verification
- 10 claims dogfooded in `.aphoria/claims.toml`
- Self-improving: LLM extraction → pattern learning → autonomous promotion → shadow testing → auto-rollback

---

## Phase 10: UX & Enterprise Polish (Partial)

> 10.1 Acknowledgment Expiry ✅ — archived

### 10.2 Human-Readable Signer Names ⬜

**Impact:** MEDIUM | **Effort:** MEDIUM | **Priority:** P2

Map issuer hex IDs to human-readable team names in output.

| Task | Status |
|------|--------|
| Add `signer_name: Option<String>` to `PackHeader` | ⬜ |
| Add `contact: Option<String>` to `PackHeader` (Slack channel, email) | ⬜ |
| Update `policy export/import` to preserve new fields | ⬜ |
| Show "Signed by Platform Security Team" instead of hex in output | ⬜ |
| Backward-compat: gracefully handle packs without new fields | ⬜ |

### 10.3 Speed Benchmarks ⬜

**Impact:** LOW | **Effort:** LOW | **Priority:** P3

| Task | Status |
|------|--------|
| Create `benchmarks/` directory with test corpora | ⬜ |
| Add `aphoria scan --benchmark` flag for self-test | ⬜ |
| Document test conditions in benchmark results | ⬜ |

---

## Phase 14: Governance Workflows 🎯

> **Vision:** Clear approval paths for pattern promotion with audit trails.

### 14.1 Approval Workflow Definition ⬜

| Task | Status |
|------|--------|
| Create `src/governance/mod.rs` module | ⬜ |
| Define `ApprovalWorkflow` struct | ⬜ |
| Define `ApprovalStage` with required approvers | ⬜ |
| Support evidence-based auto-approve thresholds | ⬜ |
| Config: define workflows in `.aphoria.toml` | ⬜ |

### 14.2 Approval State Machine ⬜

| Task | Status |
|------|--------|
| Implement state transitions (pending → approved/rejected) | ⬜ |
| Multi-stage approval support | ⬜ |
| Timeout and escalation policies | ⬜ |
| Store approval history with timestamps | ⬜ |

### 14.3 Approval CLI ⬜

| Task | Status |
|------|--------|
| `aphoria governance pending` — list pending approvals | ⬜ |
| `aphoria governance approve <id> --comment "..."` | ⬜ |
| `aphoria governance reject <id> --reason "..."` | ⬜ |
| `aphoria governance escalate <id>` | ⬜ |
| Show approval status in pattern list | ⬜ |

### 14.4 SOC 2 Audit Trail ⬜

| Task | Status |
|------|--------|
| Full audit log for all governance actions | ⬜ |
| `aphoria audit trail --pattern <id>` — show timeline | ⬜ |
| Export governance history for auditors | ⬜ |
| Include approver identity and timestamp | ⬜ |

---

## Phase 15: Evidence Source Integration ⬜

> **Vision:** ADRs, specs, and standards automatically link to patterns.

### 15.1 ADR Auto-Detection ⬜

| Task | Status |
|------|--------|
| Create `src/evidence/adr.rs` | ⬜ |
| Detect ADR-XXX patterns in commit messages | ⬜ |
| Scan for ADR files in standard locations | ⬜ |
| Parse ADR content for related patterns | ⬜ |
| Link ADR to patterns automatically | ⬜ |

### 15.2 Spec File Detection ⬜

| Task | Status |
|------|--------|
| Create `src/evidence/spec.rs` | ⬜ |
| Detect spec files (specs/*.md, *.spec.md) | ⬜ |
| Parse requirement IDs (REQ-XXX) | ⬜ |
| Link requirements to patterns | ⬜ |
| Show requirement coverage in reports | ⬜ |

### 15.3 Standard Reference Extraction ⬜

| Task | Status |
|------|--------|
| Parse RFC references (RFC 7519) | ⬜ |
| Parse OWASP references (OWASP A03:2021) | ⬜ |
| Parse NIST references (NIST SP 800-53) | ⬜ |
| Auto-link to authoritative corpus | ⬜ |

### 15.4 Evidence Display ⬜

| Task | Status |
|------|--------|
| Show full evidence chain in pattern output | ⬜ |
| `aphoria patterns --by-evidence` grouping | ⬜ |

---

## Phase A6: AST-Aware Observation & Claim Verification ⬜

> Evolved from the "Scout & Judge" proposal (2026-02-05). The original focused on LLM cost reduction via AST snippet extraction. Reframed through the observations/claims distinction: the **Scout** produces structurally richer observations that regex can't, and the **Judge** verifies authored claims against code rather than classifying security issues.

### Why This Matters

The 42 regex extractors work well for direct pattern matching (~0.25s). But they can't follow indirection:

```python
# Regex sees `requests.get(url, verify=should_verify)` — no match
# AST sees `should_verify = False` in scope — match
should_verify = False
requests.get(url, verify=should_verify)
```

And they can't verify authored claims. When a claim says "Wallet MUST NOT derive Clone", regex can find `#[derive(` but can't determine scope or negation semantics. An AST-aware scout + LLM judge can.

### A6.1 Tree-sitter Infrastructure ⬜

| Task | Status |
|------|--------|
| Add `tree-sitter` + language grammars to `Cargo.toml` | ⬜ |
| Create `src/scout/mod.rs` module | ⬜ |
| `src/scout/engine.rs` — parse files, run SCM queries | ⬜ |
| `CandidateSnippet` type with structural context | ⬜ |
| `src/scout/queries/` — `.scm` query files per category/language | ⬜ |
| Language support: Python, Go, Rust, JavaScript/TypeScript | ⬜ |

```rust
pub struct CandidateSnippet {
    pub file_path: String,
    pub language: Language,
    pub start_line: usize,
    pub end_line: usize,
    pub code: String,
    pub context_variables: HashMap<String, String>,
    pub query_id: String,
}
```

### A6.2 Scout as Observation Producer ⬜

AST-aware ROI detection for patterns regex can't follow.

| Task | Status |
|------|--------|
| Variable indirection tracking (assign → use across lines) | ⬜ |
| Context expansion: function scope, variable defs, comments | ⬜ |
| Deduplication with existing regex extractors | ⬜ |
| SCM queries for TLS, secrets, auth, crypto categories | ⬜ |
| Integration: run scout after regex, drop overlaps, combine | ⬜ |

**Key design:** Scout runs alongside (not instead of) regex extractors. Regex handles 90% at zero cost; scout handles the indirection cases regex misses.

### A6.3 Judge as Claim Verifier ⬜

LLM receives focused snippet + authored claim → structured verdict.

| Task | Status |
|------|--------|
| Refactor `LlmExtractor` to accept `CandidateSnippet` + `AuthoredClaim` | ⬜ |
| Verification prompt: "Does this code satisfy this claim?" | ⬜ |
| Structured output: `{ verdict: PASS|FAIL|UNCERTAIN, evidence: "..." }` | ⬜ |
| Wire into `aphoria verify` Direction 2 (walk claims, verify in code) | ⬜ |
| Maps to `Extractor::verify()` from vision-gaps | ⬜ |

**Token efficiency:** Snippet (~100 tokens) vs whole file (~2000 tokens) = 95% cost reduction per verification.

### A6.4 Scout for Claim Suggestion ⬜

Scout identifies ROIs without matching authored claims, feeds context to `aphoria-suggest`.

| Task | Status |
|------|--------|
| Identify ROIs with no matching claim in `.aphoria/claims.toml` | ⬜ |
| Enrich context for skill: snippet + function name + surrounding comments | ⬜ |
| Feed to `aphoria-suggest` skill for claim drafting | ⬜ |

### A6.5 Evaluation ⬜

| Task | Status |
|------|--------|
| Scout recall: "Did scout find the vulnerable line in fixture?" | ⬜ |
| Judge precision: "Given snippet + claim, did LLM classify correctly?" | ⬜ |
| Cost metric: `tokens_per_verification` vs monolithic approach | ⬜ |
| Parallel run: shadow mode alongside regex for tuning | ⬜ |

### Phase A6 Priority

Lower priority than A5 flywheel completion and Phase 14 governance. Build when:
1. Regex extractors hit limits on specific indirection patterns
2. `aphoria verify` Direction 2 needs LLM-backed verification
3. `aphoria-suggest` needs richer context than regex observations provide

---

## Enterprise Pilot Success Metrics

### 90-Day Pilot Targets

| Metric | Target | Measurement |
|--------|--------|-------------|
| Patterns captured | 100+ observations | Count in knowledge graph |
| Patterns promoted | 10+ conventions | Count with status=Active |
| Cross-team adoption | 2+ teams connected | Unique team_ids |
| New hire guidance events | 5+ accepted suggestions | Accept rate tracking |
| False positive rate | <10% | FP feedback / total flags |
| Evidence-backed patterns | >50% | Patterns with Research+ evidence |

### 180-Day Production Targets

| Metric | Target | Measurement |
|--------|--------|-------------|
| Knowledge retention | 0 lost patterns on departures | Audit log |
| Onboarding velocity | 50% faster ramp | Time to first PR |
| Convention adoption | 80% across org | Compliance rate |
| SOC 2 evidence | Audit pass | External validation |
| Deprecated pattern migration | 90% complete by sunset | Migration tracking |

---

## Enterprise Simulation UAT

See: `uat/enterprise-simulation-uat.md`