stemedb/applications/aphoria/roadmap.md
jml e95c978481 feat(aphoria): add inline claim markers and claim enrichment infrastructure
This commit implements Phase 17 of the Aphoria roadmap, adding:

**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)

**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)

**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns

**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion

**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries

**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion

Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 20:18:20 +00:00

269 lines
9.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Aphoria Roadmap
> Completed phases archived in [`roadmap-archive.md`](./roadmap-archive.md)
---
## Status Overview
| Phase | Deliverable | Status |
|-------|-------------|--------|
| 09, 1113, 1617 | Core CLI, Extractors (42), LLM, Learning, Enterprise, Lifecycle, Pattern Enrichment | ✅ Archived |
| 10 | UX & Enterprise Polish | 🔄 Partial (10.1 ✅, 10.210.3 ⬜) |
| 14 | Governance Workflows | 🎯 Current |
| 15 | Evidence Source Integration | ⬜ Future |
| A6 | AST-Aware Observation & Claim Verification | ⬜ Future |
### Current State
- 42 built-in extractors + declarative custom extractors
- Full corpus: RFC, OWASP, Vendor sources
- Ephemeral mode (~0.25s), persistent mode with drift detection
- Observation/claim distinction (A1A5 complete, see main `roadmap.md`)
- `aphoria verify run|map` for claim verification
- 10 claims dogfooded in `.aphoria/claims.toml`
- Self-improving: LLM extraction → pattern learning → autonomous promotion → shadow testing → auto-rollback
---
## Phase 10: UX & Enterprise Polish (Partial)
> 10.1 Acknowledgment Expiry ✅ — archived
### 10.2 Human-Readable Signer Names ⬜
**Impact:** MEDIUM | **Effort:** MEDIUM | **Priority:** P2
Map issuer hex IDs to human-readable team names in output.
| Task | Status |
|------|--------|
| Add `signer_name: Option<String>` to `PackHeader` | ⬜ |
| Add `contact: Option<String>` to `PackHeader` (Slack channel, email) | ⬜ |
| Update `policy export/import` to preserve new fields | ⬜ |
| Show "Signed by Platform Security Team" instead of hex in output | ⬜ |
| Backward-compat: gracefully handle packs without new fields | ⬜ |
### 10.3 Speed Benchmarks ⬜
**Impact:** LOW | **Effort:** LOW | **Priority:** P3
| Task | Status |
|------|--------|
| Create `benchmarks/` directory with test corpora | ⬜ |
| Add `aphoria scan --benchmark` flag for self-test | ⬜ |
| Document test conditions in benchmark results | ⬜ |
---
## Phase 14: Governance Workflows 🎯
> **Vision:** Clear approval paths for pattern promotion with audit trails.
### 14.1 Approval Workflow Definition ⬜
| Task | Status |
|------|--------|
| Create `src/governance/mod.rs` module | ⬜ |
| Define `ApprovalWorkflow` struct | ⬜ |
| Define `ApprovalStage` with required approvers | ⬜ |
| Support evidence-based auto-approve thresholds | ⬜ |
| Config: define workflows in `.aphoria.toml` | ⬜ |
### 14.2 Approval State Machine ⬜
| Task | Status |
|------|--------|
| Implement state transitions (pending → approved/rejected) | ⬜ |
| Multi-stage approval support | ⬜ |
| Timeout and escalation policies | ⬜ |
| Store approval history with timestamps | ⬜ |
### 14.3 Approval CLI ⬜
| Task | Status |
|------|--------|
| `aphoria governance pending` — list pending approvals | ⬜ |
| `aphoria governance approve <id> --comment "..."` | ⬜ |
| `aphoria governance reject <id> --reason "..."` | ⬜ |
| `aphoria governance escalate <id>` | ⬜ |
| Show approval status in pattern list | ⬜ |
### 14.4 SOC 2 Audit Trail ⬜
| Task | Status |
|------|--------|
| Full audit log for all governance actions | ⬜ |
| `aphoria audit trail --pattern <id>` — show timeline | ⬜ |
| Export governance history for auditors | ⬜ |
| Include approver identity and timestamp | ⬜ |
---
## Phase 15: Evidence Source Integration ⬜
> **Vision:** ADRs, specs, and standards automatically link to patterns.
### 15.1 ADR Auto-Detection ⬜
| Task | Status |
|------|--------|
| Create `src/evidence/adr.rs` | ⬜ |
| Detect ADR-XXX patterns in commit messages | ⬜ |
| Scan for ADR files in standard locations | ⬜ |
| Parse ADR content for related patterns | ⬜ |
| Link ADR to patterns automatically | ⬜ |
### 15.2 Spec File Detection ⬜
| Task | Status |
|------|--------|
| Create `src/evidence/spec.rs` | ⬜ |
| Detect spec files (specs/*.md, *.spec.md) | ⬜ |
| Parse requirement IDs (REQ-XXX) | ⬜ |
| Link requirements to patterns | ⬜ |
| Show requirement coverage in reports | ⬜ |
### 15.3 Standard Reference Extraction ⬜
| Task | Status |
|------|--------|
| Parse RFC references (RFC 7519) | ⬜ |
| Parse OWASP references (OWASP A03:2021) | ⬜ |
| Parse NIST references (NIST SP 800-53) | ⬜ |
| Auto-link to authoritative corpus | ⬜ |
### 15.4 Evidence Display ⬜
| Task | Status |
|------|--------|
| Show full evidence chain in pattern output | ⬜ |
| `aphoria patterns --by-evidence` grouping | ⬜ |
---
## Phase A6: AST-Aware Observation & Claim Verification ⬜
> Evolved from the "Scout & Judge" proposal (2026-02-05). The original focused on LLM cost reduction via AST snippet extraction. Reframed through the observations/claims distinction: the **Scout** produces structurally richer observations that regex can't, and the **Judge** verifies authored claims against code rather than classifying security issues.
### Why This Matters
The 42 regex extractors work well for direct pattern matching (~0.25s). But they can't follow indirection:
```python
# Regex sees `requests.get(url, verify=should_verify)` — no match
# AST sees `should_verify = False` in scope — match
should_verify = False
requests.get(url, verify=should_verify)
```
And they can't verify authored claims. When a claim says "Wallet MUST NOT derive Clone", regex can find `#[derive(` but can't determine scope or negation semantics. An AST-aware scout + LLM judge can.
### A6.1 Tree-sitter Infrastructure ⬜
| Task | Status |
|------|--------|
| Add `tree-sitter` + language grammars to `Cargo.toml` | ⬜ |
| Create `src/scout/mod.rs` module | ⬜ |
| `src/scout/engine.rs` — parse files, run SCM queries | ⬜ |
| `CandidateSnippet` type with structural context | ⬜ |
| `src/scout/queries/``.scm` query files per category/language | ⬜ |
| Language support: Python, Go, Rust, JavaScript/TypeScript | ⬜ |
```rust
pub struct CandidateSnippet {
pub file_path: String,
pub language: Language,
pub start_line: usize,
pub end_line: usize,
pub code: String,
pub context_variables: HashMap<String, String>,
pub query_id: String,
}
```
### A6.2 Scout as Observation Producer ⬜
AST-aware ROI detection for patterns regex can't follow.
| Task | Status |
|------|--------|
| Variable indirection tracking (assign → use across lines) | ⬜ |
| Context expansion: function scope, variable defs, comments | ⬜ |
| Deduplication with existing regex extractors | ⬜ |
| SCM queries for TLS, secrets, auth, crypto categories | ⬜ |
| Integration: run scout after regex, drop overlaps, combine | ⬜ |
**Key design:** Scout runs alongside (not instead of) regex extractors. Regex handles 90% at zero cost; scout handles the indirection cases regex misses.
### A6.3 Judge as Claim Verifier ⬜
LLM receives focused snippet + authored claim → structured verdict.
| Task | Status |
|------|--------|
| Refactor `LlmExtractor` to accept `CandidateSnippet` + `AuthoredClaim` | ⬜ |
| Verification prompt: "Does this code satisfy this claim?" | ⬜ |
| Structured output: `{ verdict: PASS|FAIL|UNCERTAIN, evidence: "..." }` | ⬜ |
| Wire into `aphoria verify` Direction 2 (walk claims, verify in code) | ⬜ |
| Maps to `Extractor::verify()` from vision-gaps | ⬜ |
**Token efficiency:** Snippet (~100 tokens) vs whole file (~2000 tokens) = 95% cost reduction per verification.
### A6.4 Scout for Claim Suggestion ⬜
Scout identifies ROIs without matching authored claims, feeds context to `aphoria-suggest`.
| Task | Status |
|------|--------|
| Identify ROIs with no matching claim in `.aphoria/claims.toml` | ⬜ |
| Enrich context for skill: snippet + function name + surrounding comments | ⬜ |
| Feed to `aphoria-suggest` skill for claim drafting | ⬜ |
### A6.5 Evaluation ⬜
| Task | Status |
|------|--------|
| Scout recall: "Did scout find the vulnerable line in fixture?" | ⬜ |
| Judge precision: "Given snippet + claim, did LLM classify correctly?" | ⬜ |
| Cost metric: `tokens_per_verification` vs monolithic approach | ⬜ |
| Parallel run: shadow mode alongside regex for tuning | ⬜ |
### Phase A6 Priority
Lower priority than A5 flywheel completion and Phase 14 governance. Build when:
1. Regex extractors hit limits on specific indirection patterns
2. `aphoria verify` Direction 2 needs LLM-backed verification
3. `aphoria-suggest` needs richer context than regex observations provide
---
## Enterprise Pilot Success Metrics
### 90-Day Pilot Targets
| Metric | Target | Measurement |
|--------|--------|-------------|
| Patterns captured | 100+ observations | Count in knowledge graph |
| Patterns promoted | 10+ conventions | Count with status=Active |
| Cross-team adoption | 2+ teams connected | Unique team_ids |
| New hire guidance events | 5+ accepted suggestions | Accept rate tracking |
| False positive rate | <10% | FP feedback / total flags |
| Evidence-backed patterns | >50% | Patterns with Research+ evidence |
### 180-Day Production Targets
| Metric | Target | Measurement |
|--------|--------|-------------|
| Knowledge retention | 0 lost patterns on departures | Audit log |
| Onboarding velocity | 50% faster ramp | Time to first PR |
| Convention adoption | 80% across org | Compliance rate |
| SOC 2 evidence | Audit pass | External validation |
| Deprecated pattern migration | 90% complete by sunset | Migration tracking |
---
## Enterprise Simulation UAT
See: `uat/enterprise-simulation-uat.md`