# Aphoria Roadmap > Completed phases archived in [`roadmap-archive.md`](./roadmap-archive.md) --- ## Status Overview | Phase | Deliverable | Status | |-------|-------------|--------| | 0–9, 11–13, 16–17 | Core CLI, Extractors (42), LLM, Learning, Enterprise, Lifecycle, Pattern Enrichment | ✅ Archived | | CC | Corpus Infrastructure (Community Corpus, Wiki Import, Pattern Aggregation, **Async Default**) | ✅ Complete | | 10 | UX & Enterprise Polish | 🔄 Partial (10.1 ✅, 10.2–10.3 ⬜) | | 14 | Governance Workflows | 🎯 Current | | **DF-1** | **Dogfood: Database Connection Pool** | 🎯 **ACTIVE** | | 15 | Evidence Source Integration | ⬜ Future | | A6 | AST-Aware Observation & Claim Verification | ⬜ Future | ### Current State - 42 built-in extractors + declarative custom extractors - **Emergent corpus**: RFC, OWASP, Vendor sources + **community-driven patterns (CC.6 ✅)** - **Community corpus enabled by default** (CC.7 ✅): `use_community: true`, proper async, no runtime hacks - **Pattern aggregation active**: Observations auto-feed pattern aggregates after each scan - **No hardcoded assertions**: Bootstrap via wiki import or Trust Packs - Ephemeral mode (~0.25s), persistent mode with drift detection - Observation/claim distinction (A1–A5 complete) - `aphoria verify run|map` for claim verification - 10 claims dogfooded in `.aphoria/claims.toml` - Self-improving: LLM extraction → pattern learning → autonomous promotion → shadow testing → auto-rollback ### Recently Completed: Corpus Infrastructure (Phase CC ✅) **Phase CC.1-CC.3: Removed hardcoded corpus, built emergent system** (Feb 6-7) - Deleted `hardcoded.rs` (369 lines, 19 assertions) - Pattern aggregates stored in StemeDB: `community://pattern/{BLAKE3(SPV)}` - Multi-tier promotion: 95%+ (Regulatory), 80%+ (Clinical), 50%+ (Emerging, review required) - Wiki import: `aphoria corpus import wiki ~/docs` parses MUST/SHOULD patterns **Phase CC.6: Pattern Aggregation (Emergent Learning)** (Feb 8) ✅ - Observations now automatically feed back into pattern aggregates - Every scan with `--persist --sync` contributes to community learning - Config: `aggregation_enabled: true` (default) - Tracks project_count and observation_count per pattern - Privacy-preserving: wildcarded subjects, project deduplication **Phase CC.7: Make Community Corpus Default** (Feb 8) ✅ - Created `AsyncCorpusBuilder` trait for async-native corpus builders - Refactored `CommunityCorpusBuilder` to implement `AsyncCorpusBuilder` - **Removed `rt.block_on()` hack** that caused "runtime within runtime" errors - Made entire corpus building chain properly async (16 functions updated) - Enabled `use_community: true` by default in `CorpusConfig` - All 1189 tests pass, no clippy warnings, no runtime errors **Philosophy:** The corpus isn't written by experts. It's discovered by the community and validated by authorities. --- ## Phase 10: UX & Enterprise Polish (Partial) > 10.1 Acknowledgment Expiry ✅ — archived ### 10.2 Human-Readable Signer Names ⬜ **Impact:** MEDIUM | **Effort:** MEDIUM | **Priority:** P2 Map issuer hex IDs to human-readable team names in output. | Task | Status | |------|--------| | Add `signer_name: Option` to `PackHeader` | ⬜ | | Add `contact: Option` to `PackHeader` (Slack channel, email) | ⬜ | | Update `policy export/import` to preserve new fields | ⬜ | | Show "Signed by Platform Security Team" instead of hex in output | ⬜ | | Backward-compat: gracefully handle packs without new fields | ⬜ | ### 10.3 Speed Benchmarks ⬜ **Impact:** LOW | **Effort:** LOW | **Priority:** P3 | Task | Status | |------|--------| | Create `benchmarks/` directory with test corpora | ⬜ | | Add `aphoria scan --benchmark` flag for self-test | ⬜ | | Document test conditions in benchmark results | ⬜ | --- ## Phase CC: Corpus Infrastructure (Community Corpus) ✅ > **Completed:** 2026-02-08 | Removed hardcoded corpus, built emergent community-driven system ### Philosophy The corpus isn't written by experts. It's discovered by the community and validated by authorities. 95% adoption = "This is what the community does" = Authoritative. ### CC.1 Delete Hardcoded Corpus ✅ | Task | Status | |------|--------| | Remove `applications/aphoria/src/corpus/hardcoded.rs` (369 lines) | ✅ | | Remove `include_hardcoded` from `CorpusConfig` | ✅ | | Remove from `CorpusRegistry::with_defaults()` | ✅ | | Update tests to use community corpus | ✅ | | Fix 5 pre-existing clippy errors in stemedb-api | ✅ | **Implemented:** Destructive pre-release approach - no deprecation warnings, just deleted. ### CC.2 Community Corpus Builder ✅ | Task | Status | |------|--------| | Create `applications/aphoria/src/corpus/community.rs` (393 lines) | ✅ | | Create `applications/aphoria/src/corpus/thresholds.rs` (230 lines) | ✅ | | Create `applications/aphoria/src/corpus/resolver.rs` (220 lines) | ✅ | | Create `applications/aphoria/src/community/pattern_store.rs` (332 lines) | ✅ | | Implement `PatternAggregateStore` trait with StemeDB backend | ✅ | | Multi-tier promotion: 95% (Regulatory), 80% (Clinical), 50% (Emerging) | ✅ | | Content-addressed storage: `community://pattern/{BLAKE3(SPV)}` | ✅ | | Config integration: `use_community` flag (opt-in) | ✅ | | Full scan flow integration | ✅ | **Storage Architecture:** - Pattern aggregates stored as StemeDB assertions (no TOML files) - Predicate: `pattern_aggregate` with JSON metadata - Deduplication via content-addressed subjects - Privacy-preserving: wildcarded subjects, k-anonymity ### CC.3 Wiki Import Bootstrap ✅ | Task | Status | |------|--------| | Create `applications/aphoria/src/corpus/wiki_importer.rs` (332 lines) | ✅ | | Regex extraction of MUST/SHOULD patterns from markdown | ✅ | | Authority source parsing (RFC, OWASP, CWE references) | ✅ | | Smart subject normalization (TLS → tls/cert_verification) | ✅ | | CLI command: `aphoria corpus import wiki ` | ✅ | | PatternAggregator write path (stores to StemeDB) | ✅ | | Integration tests with fixtures | ✅ (6 tests) | | Documentation: `docs/bootstrap-corpus.md` | ✅ | **Usage:** ```bash # Create wiki with best practices mkdir -p .aphoria/wiki echo "TLS cert verification MUST be enabled. Authority: RFC 5246" > .aphoria/wiki/tls.md # Import patterns aphoria corpus import wiki .aphoria/wiki # → Patterns now in StemeDB, available for conflict detection ``` ### CC.4 Trust Pack Bootstrap ⬜ | Task | Status | |------|--------| | Extend Trust Packs to include pattern aggregates | ⬜ Future | | `aphoria trust-pack install ` writes patterns to StemeDB | ⬜ Future | | Create `rfc-owasp-baseline.toml` with ~20 common patterns | ⬜ Future | **Status:** Infrastructure exists, implementation deferred. Wiki import covers bootstrap needs. ### CC.5 Skill-Driven Cold Start ⬜ | Task | Status | |------|--------| | Enhance `aphoria-suggest` skill with bootstrap mode | ⬜ Future | | Detect empty corpus during scan | ⬜ Future | | Analyze project structure (Cargo.toml, package.json) | ⬜ Future | | Suggest 3-5 baseline patterns based on detected stack | ⬜ Future | **Status:** Skill exists, bootstrap mode not implemented. Manual wiki creation works well. ### CC.6 Pattern Aggregation (Emergent Learning) ✅ > **Completed:** 2026-02-08 | Observations now feed back into pattern aggregates automatically | Task | Status | |------|--------| | Add `aggregation_enabled` config field (default: `true`) | ✅ | | Implement `aggregate_observations_to_patterns()` in scanner | ✅ | | Add `StemeDBPatternStore::get_pattern_by_spv()` for lookup | ✅ | | Add `StemeDBPatternStore::update_pattern()` for updates | ✅ | | Add `compute_project_hash()` for deduplication | ✅ | | Hook into scan flow after observation recording | ✅ | | Group observations by (subject, predicate, value) | ✅ | | Wildcard project paths for anonymization | ✅ | | Create or update PatternAggregate records | ✅ | | Track project_count and observation_count | ✅ | **Implementation:** ```rust // scanner.rs:344-357 if config.corpus.aggregation_enabled && should_persist_locally { let project_hash = compute_project_hash(project_root); aggregate_observations_to_patterns(&novel_claims, &episteme, &project_hash).await?; } ``` **Flow:** 1. Scan extracts observations → recorded as Tier 4 assertions 2. Observations aggregated by (wildcarded_subject, predicate, value) 3. For each unique pattern: - If exists: increment observation_count, check new project → increment project_count - If new: create PatternAggregate with initial counts 4. Stored as assertions with predicate `"pattern_aggregate"` **Result:** The corpus is now **emergent**. Every scan with `--persist --sync` feeds the learning loop. --- ### What Remains (Future Enhancement) **CC.4 Trust Pack Bootstrap ⬜** _(Unchanged - Future enhancement)_ **CC.5 Skill-Driven Cold Start ⬜** _(Unchanged - Future enhancement)_ --- ### CC.7 Make Community Corpus Default ✅ > **Completed:** 2026-02-08 | Community corpus now enabled by default, async runtime issue resolved | Task | Status | |------|--------| | Create `AsyncCorpusBuilder` trait for async corpus builders | ✅ | | Implement dual registry (sync + async builders) | ✅ | | Refactor `CommunityCorpusBuilder` to implement `AsyncCorpusBuilder` | ✅ | | Remove `rt.block_on()` hack, use proper `.await` | ✅ | | Make `build_corpus_with_stores()` async | ✅ | | Make `create_authoritative_corpus()` async | ✅ | | Make `EphemeralDetector::new()` async | ✅ | | Make `extract_claims_from_files()` async | ✅ | | Update all 16 function callers to use `.await` | ✅ | | Change `use_community: false` → `true` in defaults | ✅ | | Verify tests pass with community corpus enabled | ✅ (1189 tests) | **Architecture Improvement:** - **Before**: Sync `CorpusBuilder` trait forced async operations to use `rt.block_on()`, causing runtime errors in async contexts - **After**: Dual-trait approach (`CorpusBuilder` + `AsyncCorpusBuilder`) allows sync builders (RFC, OWASP, Vendor) to stay simple while community builder uses proper async - **Result**: No `block_on()` hacks anywhere, proper async/await throughout **Verification:** ```bash RUST_LOG=aphoria=debug aphoria scan --persist --sync . # Logs show: # ✅ "Registered community corpus builder (async)" # ✅ "Building corpus (async)" for Community builder # ✅ "Querying popular patterns from StemeDB" # ✅ No "Cannot start a runtime from within a runtime" errors ``` --- ### CC.4 Trust Pack System (Bootstrap Option 2) ⬜ | Task | Status | |------|--------| | `aphoria trust-pack export --source community` | ⬜ | | `aphoria trust-pack install ` | ⬜ | | Create `rfc-owasp-bootstrap` Trust Pack from old hardcoded corpus | ⬜ | | Trust Pack validation and signing | ⬜ | | Trust Pack registry/sharing mechanism | ⬜ | **Usage:** ```bash aphoria trust-pack install rfc-owasp-bootstrap # Installs 19 baseline assertions for new projects ``` ### CC.5 Corpus Management CLI ⬜ | Task | Status | |------|--------| | `aphoria corpus build` - Build community corpus | ⬜ | | `aphoria corpus list` - Show loaded corpus assertions | ⬜ | | `aphoria corpus candidates --min-adoption 0.50` - List promotion candidates | ⬜ | | `aphoria corpus promote ` - Manual promotion | ⬜ | | Update `aphoria-corpus-curator` skill for manual review | ⬜ | ### CC.6 Multi-Layer Corpus Resolver ⬜ | Task | Status | |------|--------| | Create `applications/aphoria/src/corpus/resolver.rs` | ⬜ | | Priority layers: Manual overrides > Trust Packs > Community > (deprecated hardcoded) | ⬜ | | Conflict resolution: higher priority overwrites lower | ⬜ | | Config: `use_community = true` default | ⬜ | | Config: `include_hardcoded = false` default (post-migration) | ⬜ | --- ## Phase 14: Governance Workflows 🎯 > **Vision:** Clear approval paths for pattern promotion with audit trails. ### 14.1 Approval Workflow Definition ⬜ | Task | Status | |------|--------| | Create `src/governance/mod.rs` module | ⬜ | | Define `ApprovalWorkflow` struct | ⬜ | | Define `ApprovalStage` with required approvers | ⬜ | | Support evidence-based auto-approve thresholds | ⬜ | | Config: define workflows in `.aphoria.toml` | ⬜ | ### 14.2 Approval State Machine ⬜ | Task | Status | |------|--------| | Implement state transitions (pending → approved/rejected) | ⬜ | | Multi-stage approval support | ⬜ | | Timeout and escalation policies | ⬜ | | Store approval history with timestamps | ⬜ | ### 14.3 Approval CLI ⬜ | Task | Status | |------|--------| | `aphoria governance pending` — list pending approvals | ⬜ | | `aphoria governance approve --comment "..."` | ⬜ | | `aphoria governance reject --reason "..."` | ⬜ | | `aphoria governance escalate ` | ⬜ | | Show approval status in pattern list | ⬜ | ### 14.4 SOC 2 Audit Trail ⬜ | Task | Status | |------|--------| | Full audit log for all governance actions | ⬜ | | `aphoria audit trail --pattern ` — show timeline | ⬜ | | Export governance history for auditors | ⬜ | | Include approver identity and timestamp | ⬜ | --- ## Phase DF-1: Dogfood Project - Database Connection Pool 🎯 > **Status:** ACTIVE | **Start:** 2026-02-09 | **Target:** 2026-02-14 (5 days) > > **Vision:** Build a production-ready database connection pool with intentional violations, use Aphoria to detect and guide remediation. Demonstrates real-world value in preventing production incidents. ### Overview **Product:** `dbpool` - Safe, opinionated PostgreSQL connection pool for Rust **Why This Matters:** - Connection pool misconfigurations cause real P0 incidents - Clear authority sources (HikariCP, PostgreSQL docs) - Demonstrates Aphoria preventing actual production problems - "Aphoria caught this before deployment" is compelling ROI **Key Metrics:** - Claims to extract: 25-30 - Intentional violations: 7-8 - Expected detection rate: 100% - Final state: 0 conflicts, production-ready ### DF-1.1 Preparation & Corpus Building (Day 1) 🔄 **Goal:** Extract claims from authority sources and populate corpus database | Task | Status | |------|--------| | Create project structure at `applications/aphoria/dogfood/dbpool/` | ✅ | | Write comprehensive plan in `dogfood/dbpool/plan.md` | ✅ | | Fetch HikariCP configuration documentation | ⏳ | | Fetch PostgreSQL connection pooling guide | ⏳ | | Extract OWASP A07 credential guidance | ⏳ | | Create 25-30 claims via CLI (`aphoria corpus create`) | ⏳ | | Verify all claims queryable via API | ⏳ | | Document claim templates for future dogfoods | ⏳ | **Deliverables:** - `docs/sources/hikaricp-config.md` - `docs/sources/postgresql-pooling.md` - `docs/sources/owasp-credentials.md` - 25-30 claims in corpus database - Verification report ### DF-1.2 Initial Implementation with Violations (Day 2) ⏳ **Goal:** Write working code that compiles but violates best practices | Task | Status | |------|--------| | Create Rust project with Cargo.toml | ⏳ | | Implement PoolConfig with 5 violations | ⏳ | | Implement ConnectionPool with 2 violations | ⏳ | | Add basic tests (that pass despite violations) | ⏳ | | Verify compilation successful | ⏳ | **Intentional Violations:** 1. ❌ Unbounded max_connections (CRITICAL) 2. ❌ Plaintext password in connection string (CRITICAL) 3. ❌ Missing max_lifetime (CRITICAL) 4. ❌ Excessive connection_timeout (ERROR) 5. ❌ Zero min_connections (ERROR) 6. ❌ Missing connection validation (ERROR) 7. ⚠️ No metrics exposed (WARNING) 8. ⚠️ Missing leak detection (WARNING) ### DF-1.3 First Scan & Verification (Day 3) ✅ **Goal:** Run Aphoria scan and verify all violations detected | Task | Status | |------|--------| | Create `.aphoria/config.toml` | ✅ | | Run initial scan, save results JSON | ✅ | | Verify 7-8 violations detected (100% accuracy) | ⚠️ Gap identified | | Generate markdown report | ✅ | | Take screenshots for demo | ⏳ | | Verify 0 false positives | ✅ | **Actual Results:** - 0/7 violations detected (expected - documented in planning as Scenario 1) - Built-in extractors cover security patterns, not library API patterns - All 7 claims authored successfully via A2 system - Verify system working correctly (all claims returned "missing" verdict) - **Key Finding:** Extractor coverage gap identified and documented **Discovered Limitation:** Aphoria's 42 built-in extractors excel at **security/infrastructure patterns** (TLS, JWT, CORS, SQL injection, rate limits) but don't cover **library API design validation** (struct field types, missing fields, numeric constraints, function call patterns). **Why This Matters:** - This is the **expected outcome** documented in STATE-2026-02-10.md (Scenario 1) - Validates Aphoria's architecture (claims, verify, scanning all work correctly) - Identifies product gap: custom extractors require Rust code, not TOML - Confirms LLM automation requirement for flywheel (needs `/aphoria-custom-extractor-creator` skill) See: `dogfood/dbpool/DAY3-FINDINGS.md` for complete analysis ### DF-1.4 Remediation & Re-verification (Day 4) ⏳ **Goal:** Fix violations incrementally, re-scan after each fix | Task | Status | |------|--------| | Fix unbounded max_connections → re-scan | ⏳ | | Fix plaintext password → re-scan | ⏳ | | Fix missing max_lifetime → re-scan | ⏳ | | Fix excessive timeouts → re-scan | ⏳ | | Fix zero min_connections → re-scan | ⏳ | | Add connection validation → re-scan | ⏳ | | Add metrics exposure → re-scan | ⏳ | | Add leak detection → re-scan | ⏳ | | Final verification: 0 conflicts | ⏳ | **Deliverables:** - Progressive scan results (v1 through v6) - Git tags for each fix milestone - Final clean scan report ### DF-1.5 Documentation & Demo Preparation (Day 5) ⏳ **Goal:** Create compelling documentation and demo materials | Task | Status | |------|--------| | Write success story document | ⏳ | | Create demo script for live presentation | ⏳ | | Record performance metrics | ⏳ | | Create before/after visual comparison | ⏳ | | Document prevented incidents with cost estimates | ⏳ | | Update this roadmap with completion status | ⏳ | **Deliverables:** - `docs/SUCCESS-STORY.md` - Comprehensive case study - `demo.sh` - Automated demo script - Screenshots and visuals - Metrics report (accuracy, performance) ### Success Metrics | Metric | Target | Actual | |--------|--------|--------| | Claims Extracted | 25-30 | TBD | | Violations Detected | 7-8 | TBD | | Detection Accuracy | 100% | TBD | | False Positives | 0 | TBD | | Scan Performance | ≤0.3s | TBD | | Final Conflicts | 0 | TBD | ### Lessons Learned **From Day 3 (2026-02-10):** 1. **Extractor Coverage Gap Validated** - Built-in extractors (42 total) cover security patterns excellently - Library API design patterns (struct fields, type constraints) need custom extractors - Custom extractors require Rust code (~10-20 hours), not TOML configuration - This was documented in planning (Scenario 1 vs 2) and validated through execution 2. **Authored Claims System Works** - A2 system successfully created 7 claims with full provenance/invariant/consequence - Claims loaded correctly, verify system working as designed - All claims returned "missing" verdict (correct - no matching observations) - Demonstrates claim authoring workflow even without detection 3. **Flywheel Automation is Critical** - Manual TOML configuration cannot address the gap - Requires LLM-driven extractor generation (`/aphoria-custom-extractor-creator` skill) - Confirms vision.md's emphasis on LLM automation as core, not optional - Manual CLI is debug interface, not primary workflow 4. **Dogfooding Reveals Product Gaps** - Time investment: Day 3 took 8 hours (3x planned) due to troubleshooting - Found fundamental limitation, not implementation bug - "Failure" to detect is actually success at identifying product needs - Documentation produced (CUSTOM-EXTRACTOR-GUIDE.md) valuable despite approach not working 5. **Next Priority Clear** - Implement `/aphoria-custom-extractor-creator` skill (Priority 1) - LLM reads violation examples → generates Rust extractor code - Re-run dogfood to validate end-to-end automation - Expand built-in extractor library with common API patterns ### Next Dogfoods Potential follow-up dogfooding projects: - Health check service (`healthd`) - Rate limiter middleware (`ratelimit-rs`) - Secrets manager client (`secrets-rs`) **Full Plan:** See [`applications/aphoria/dogfood/dbpool/plan.md`](dogfood/dbpool/plan.md) --- ## Phase 15: Evidence Source Integration ⬜ > **Vision:** ADRs, specs, and standards automatically link to patterns. ### 15.1 ADR Auto-Detection ⬜ | Task | Status | |------|--------| | Create `src/evidence/adr.rs` | ⬜ | | Detect ADR-XXX patterns in commit messages | ⬜ | | Scan for ADR files in standard locations | ⬜ | | Parse ADR content for related patterns | ⬜ | | Link ADR to patterns automatically | ⬜ | ### 15.2 Spec File Detection ⬜ | Task | Status | |------|--------| | Create `src/evidence/spec.rs` | ⬜ | | Detect spec files (specs/*.md, *.spec.md) | ⬜ | | Parse requirement IDs (REQ-XXX) | ⬜ | | Link requirements to patterns | ⬜ | | Show requirement coverage in reports | ⬜ | ### 15.3 Standard Reference Extraction ⬜ | Task | Status | |------|--------| | Parse RFC references (RFC 7519) | ⬜ | | Parse OWASP references (OWASP A03:2021) | ⬜ | | Parse NIST references (NIST SP 800-53) | ⬜ | | Auto-link to authoritative corpus | ⬜ | ### 15.4 Evidence Display ⬜ | Task | Status | |------|--------| | Show full evidence chain in pattern output | ⬜ | | `aphoria patterns --by-evidence` grouping | ⬜ | --- ## Phase A6: AST-Aware Observation & Claim Verification ⬜ > Evolved from the "Scout & Judge" proposal (2026-02-05). The original focused on LLM cost reduction via AST snippet extraction. Reframed through the observations/claims distinction: the **Scout** produces structurally richer observations that regex can't, and the **Judge** verifies authored claims against code rather than classifying security issues. ### Why This Matters The 42 regex extractors work well for direct pattern matching (~0.25s). But they can't follow indirection: ```python # Regex sees `requests.get(url, verify=should_verify)` — no match # AST sees `should_verify = False` in scope — match should_verify = False requests.get(url, verify=should_verify) ``` And they can't verify authored claims. When a claim says "Wallet MUST NOT derive Clone", regex can find `#[derive(` but can't determine scope or negation semantics. An AST-aware scout + LLM judge can. ### A6.1 Tree-sitter Infrastructure ⬜ | Task | Status | |------|--------| | Add `tree-sitter` + language grammars to `Cargo.toml` | ⬜ | | Create `src/scout/mod.rs` module | ⬜ | | `src/scout/engine.rs` — parse files, run SCM queries | ⬜ | | `CandidateSnippet` type with structural context | ⬜ | | `src/scout/queries/` — `.scm` query files per category/language | ⬜ | | Language support: Python, Go, Rust, JavaScript/TypeScript | ⬜ | ```rust pub struct CandidateSnippet { pub file_path: String, pub language: Language, pub start_line: usize, pub end_line: usize, pub code: String, pub context_variables: HashMap, pub query_id: String, } ``` ### A6.2 Scout as Observation Producer ⬜ AST-aware ROI detection for patterns regex can't follow. | Task | Status | |------|--------| | Variable indirection tracking (assign → use across lines) | ⬜ | | Context expansion: function scope, variable defs, comments | ⬜ | | Deduplication with existing regex extractors | ⬜ | | SCM queries for TLS, secrets, auth, crypto categories | ⬜ | | Integration: run scout after regex, drop overlaps, combine | ⬜ | **Key design:** Scout runs alongside (not instead of) regex extractors. Regex handles 90% at zero cost; scout handles the indirection cases regex misses. ### A6.3 Judge as Claim Verifier ⬜ LLM receives focused snippet + authored claim → structured verdict. | Task | Status | |------|--------| | Refactor `LlmExtractor` to accept `CandidateSnippet` + `AuthoredClaim` | ⬜ | | Verification prompt: "Does this code satisfy this claim?" | ⬜ | | Structured output: `{ verdict: PASS|FAIL|UNCERTAIN, evidence: "..." }` | ⬜ | | Wire into `aphoria verify` Direction 2 (walk claims, verify in code) | ⬜ | | Maps to `Extractor::verify()` from vision-gaps | ⬜ | **Token efficiency:** Snippet (~100 tokens) vs whole file (~2000 tokens) = 95% cost reduction per verification. ### A6.4 Scout for Claim Suggestion ⬜ Scout identifies ROIs without matching authored claims, feeds context to `aphoria-suggest`. | Task | Status | |------|--------| | Identify ROIs with no matching claim in `.aphoria/claims.toml` | ⬜ | | Enrich context for skill: snippet + function name + surrounding comments | ⬜ | | Feed to `aphoria-suggest` skill for claim drafting | ⬜ | ### A6.5 Evaluation ⬜ | Task | Status | |------|--------| | Scout recall: "Did scout find the vulnerable line in fixture?" | ⬜ | | Judge precision: "Given snippet + claim, did LLM classify correctly?" | ⬜ | | Cost metric: `tokens_per_verification` vs monolithic approach | ⬜ | | Parallel run: shadow mode alongside regex for tuning | ⬜ | ### Phase A6 Priority Lower priority than A5 flywheel completion and Phase 14 governance. Build when: 1. Regex extractors hit limits on specific indirection patterns 2. `aphoria verify` Direction 2 needs LLM-backed verification 3. `aphoria-suggest` needs richer context than regex observations provide --- ## Enterprise Pilot Success Metrics ### 90-Day Pilot Targets | Metric | Target | Measurement | |--------|--------|-------------| | Patterns captured | 100+ observations | Count in knowledge graph | | Patterns promoted | 10+ conventions | Count with status=Active | | Cross-team adoption | 2+ teams connected | Unique team_ids | | New hire guidance events | 5+ accepted suggestions | Accept rate tracking | | False positive rate | <10% | FP feedback / total flags | | Evidence-backed patterns | >50% | Patterns with Research+ evidence | ### 180-Day Production Targets | Metric | Target | Measurement | |--------|--------|-------------| | Knowledge retention | 0 lost patterns on departures | Audit log | | Onboarding velocity | 50% faster ramp | Time to first PR | | Convention adoption | 80% across org | Compliance rate | | SOC 2 evidence | Audit pass | External validation | | Deprecated pattern migration | 90% complete by sunset | Migration tracking | --- ## Enterprise Simulation UAT See: `uat/enterprise-simulation-uat.md`