stemedb/applications/aphoria/roadmap.md
jml e95c978481 feat(aphoria): add inline claim markers and claim enrichment infrastructure
This commit implements Phase 17 of the Aphoria roadmap, adding:

**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)

**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)

**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns

**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion

**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries

**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion

Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 20:18:20 +00:00

9.3 KiB
Raw Blame History

Aphoria Roadmap

Completed phases archived in roadmap-archive.md


Status Overview

Phase Deliverable Status
09, 1113, 1617 Core CLI, Extractors (42), LLM, Learning, Enterprise, Lifecycle, Pattern Enrichment Archived
10 UX & Enterprise Polish 🔄 Partial (10.1 , 10.210.3 )
14 Governance Workflows 🎯 Current
15 Evidence Source Integration Future
A6 AST-Aware Observation & Claim Verification Future

Current State

  • 42 built-in extractors + declarative custom extractors
  • Full corpus: RFC, OWASP, Vendor sources
  • Ephemeral mode (~0.25s), persistent mode with drift detection
  • Observation/claim distinction (A1A5 complete, see main roadmap.md)
  • aphoria verify run|map for claim verification
  • 10 claims dogfooded in .aphoria/claims.toml
  • Self-improving: LLM extraction → pattern learning → autonomous promotion → shadow testing → auto-rollback

Phase 10: UX & Enterprise Polish (Partial)

10.1 Acknowledgment Expiry — archived

10.2 Human-Readable Signer Names

Impact: MEDIUM | Effort: MEDIUM | Priority: P2

Map issuer hex IDs to human-readable team names in output.

Task Status
Add signer_name: Option<String> to PackHeader
Add contact: Option<String> to PackHeader (Slack channel, email)
Update policy export/import to preserve new fields
Show "Signed by Platform Security Team" instead of hex in output
Backward-compat: gracefully handle packs without new fields

10.3 Speed Benchmarks

Impact: LOW | Effort: LOW | Priority: P3

Task Status
Create benchmarks/ directory with test corpora
Add aphoria scan --benchmark flag for self-test
Document test conditions in benchmark results

Phase 14: Governance Workflows 🎯

Vision: Clear approval paths for pattern promotion with audit trails.

14.1 Approval Workflow Definition

Task Status
Create src/governance/mod.rs module
Define ApprovalWorkflow struct
Define ApprovalStage with required approvers
Support evidence-based auto-approve thresholds
Config: define workflows in .aphoria.toml

14.2 Approval State Machine

Task Status
Implement state transitions (pending → approved/rejected)
Multi-stage approval support
Timeout and escalation policies
Store approval history with timestamps

14.3 Approval CLI

Task Status
aphoria governance pending — list pending approvals
aphoria governance approve <id> --comment "..."
aphoria governance reject <id> --reason "..."
aphoria governance escalate <id>
Show approval status in pattern list

14.4 SOC 2 Audit Trail

Task Status
Full audit log for all governance actions
aphoria audit trail --pattern <id> — show timeline
Export governance history for auditors
Include approver identity and timestamp

Phase 15: Evidence Source Integration

Vision: ADRs, specs, and standards automatically link to patterns.

15.1 ADR Auto-Detection

Task Status
Create src/evidence/adr.rs
Detect ADR-XXX patterns in commit messages
Scan for ADR files in standard locations
Parse ADR content for related patterns
Link ADR to patterns automatically

15.2 Spec File Detection

Task Status
Create src/evidence/spec.rs
Detect spec files (specs/*.md, *.spec.md)
Parse requirement IDs (REQ-XXX)
Link requirements to patterns
Show requirement coverage in reports

15.3 Standard Reference Extraction

Task Status
Parse RFC references (RFC 7519)
Parse OWASP references (OWASP A03:2021)
Parse NIST references (NIST SP 800-53)
Auto-link to authoritative corpus

15.4 Evidence Display

Task Status
Show full evidence chain in pattern output
aphoria patterns --by-evidence grouping

Phase A6: AST-Aware Observation & Claim Verification

Evolved from the "Scout & Judge" proposal (2026-02-05). The original focused on LLM cost reduction via AST snippet extraction. Reframed through the observations/claims distinction: the Scout produces structurally richer observations that regex can't, and the Judge verifies authored claims against code rather than classifying security issues.

Why This Matters

The 42 regex extractors work well for direct pattern matching (~0.25s). But they can't follow indirection:

# Regex sees `requests.get(url, verify=should_verify)` — no match
# AST sees `should_verify = False` in scope — match
should_verify = False
requests.get(url, verify=should_verify)

And they can't verify authored claims. When a claim says "Wallet MUST NOT derive Clone", regex can find #[derive( but can't determine scope or negation semantics. An AST-aware scout + LLM judge can.

A6.1 Tree-sitter Infrastructure

Task Status
Add tree-sitter + language grammars to Cargo.toml
Create src/scout/mod.rs module
src/scout/engine.rs — parse files, run SCM queries
CandidateSnippet type with structural context
src/scout/queries/.scm query files per category/language
Language support: Python, Go, Rust, JavaScript/TypeScript
pub struct CandidateSnippet {
    pub file_path: String,
    pub language: Language,
    pub start_line: usize,
    pub end_line: usize,
    pub code: String,
    pub context_variables: HashMap<String, String>,
    pub query_id: String,
}

A6.2 Scout as Observation Producer

AST-aware ROI detection for patterns regex can't follow.

Task Status
Variable indirection tracking (assign → use across lines)
Context expansion: function scope, variable defs, comments
Deduplication with existing regex extractors
SCM queries for TLS, secrets, auth, crypto categories
Integration: run scout after regex, drop overlaps, combine

Key design: Scout runs alongside (not instead of) regex extractors. Regex handles 90% at zero cost; scout handles the indirection cases regex misses.

A6.3 Judge as Claim Verifier

LLM receives focused snippet + authored claim → structured verdict.

Task Status
Refactor LlmExtractor to accept CandidateSnippet + AuthoredClaim
Verification prompt: "Does this code satisfy this claim?"
Structured output: `{ verdict: PASS FAIL
Wire into aphoria verify Direction 2 (walk claims, verify in code)
Maps to Extractor::verify() from vision-gaps

Token efficiency: Snippet (~100 tokens) vs whole file (~2000 tokens) = 95% cost reduction per verification.

A6.4 Scout for Claim Suggestion

Scout identifies ROIs without matching authored claims, feeds context to aphoria-suggest.

Task Status
Identify ROIs with no matching claim in .aphoria/claims.toml
Enrich context for skill: snippet + function name + surrounding comments
Feed to aphoria-suggest skill for claim drafting

A6.5 Evaluation

Task Status
Scout recall: "Did scout find the vulnerable line in fixture?"
Judge precision: "Given snippet + claim, did LLM classify correctly?"
Cost metric: tokens_per_verification vs monolithic approach
Parallel run: shadow mode alongside regex for tuning

Phase A6 Priority

Lower priority than A5 flywheel completion and Phase 14 governance. Build when:

  1. Regex extractors hit limits on specific indirection patterns
  2. aphoria verify Direction 2 needs LLM-backed verification
  3. aphoria-suggest needs richer context than regex observations provide

Enterprise Pilot Success Metrics

90-Day Pilot Targets

Metric Target Measurement
Patterns captured 100+ observations Count in knowledge graph
Patterns promoted 10+ conventions Count with status=Active
Cross-team adoption 2+ teams connected Unique team_ids
New hire guidance events 5+ accepted suggestions Accept rate tracking
False positive rate <10% FP feedback / total flags
Evidence-backed patterns >50% Patterns with Research+ evidence

180-Day Production Targets

Metric Target Measurement
Knowledge retention 0 lost patterns on departures Audit log
Onboarding velocity 50% faster ramp Time to first PR
Convention adoption 80% across org Compliance rate
SOC 2 evidence Audit pass External validation
Deprecated pattern migration 90% complete by sunset Migration tracking

Enterprise Simulation UAT

See: uat/enterprise-simulation-uat.md