Implements Phase 4 (A4) - Community corpus as first-class citizens: - **Community Corpus Builder** - Queries StemeDB pattern aggregates - **Wiki Import** - Bootstrap corpus from markdown docs (aphoria corpus import wiki) - **Pattern Aggregation** - Automatic learning from local scans (--sync flag) - **Storage Layer** - StemeDBPatternStore with content-addressed deduplication - **Promotion Logic** - Multi-tier thresholds (95%/80%/50% adoption rates) - **Corpus Build** - Unified registry for RFC/OWASP/Vendor/Community sources - **Trust Packs** - Export corpus as signed, distributable artifacts - **Documentation** - bootstrap-corpus.md guide + CLI reference updates Technical details: - Pattern aggregates stored as assertions with predicate "pattern_aggregate" - Content-addressed subjects via BLAKE3(subject:predicate:value) - PatternAggregator handles write path (observations → patterns) - StemeDBPatternStore handles read path (pattern queries) - Integration tests + fixtures in tests/wiki_import_test.rs Deleted hardcoded.rs (368 lines) - corpus now fully emergent from StemeDB. Deleted enriched-corpus-patterns.md (677 lines) - feature shipped. Closes VG-026 (community corpus), part of A4 milestone. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
504 lines
19 KiB
Markdown
504 lines
19 KiB
Markdown
# Aphoria Roadmap
|
||
|
||
> Completed phases archived in [`roadmap-archive.md`](./roadmap-archive.md)
|
||
|
||
---
|
||
|
||
## Status Overview
|
||
|
||
| Phase | Deliverable | Status |
|
||
|-------|-------------|--------|
|
||
| 0–9, 11–13, 16–17 | Core CLI, Extractors (42), LLM, Learning, Enterprise, Lifecycle, Pattern Enrichment | ✅ Archived |
|
||
| CC | Corpus Infrastructure (Community Corpus, Wiki Import, Pattern Aggregation, **Async Default**) | ✅ Complete |
|
||
| 10 | UX & Enterprise Polish | 🔄 Partial (10.1 ✅, 10.2–10.3 ⬜) |
|
||
| 14 | Governance Workflows | 🎯 Current |
|
||
| 15 | Evidence Source Integration | ⬜ Future |
|
||
| A6 | AST-Aware Observation & Claim Verification | ⬜ Future |
|
||
|
||
### Current State
|
||
|
||
- 42 built-in extractors + declarative custom extractors
|
||
- **Emergent corpus**: RFC, OWASP, Vendor sources + **community-driven patterns (CC.6 ✅)**
|
||
- **Community corpus enabled by default** (CC.7 ✅): `use_community: true`, proper async, no runtime hacks
|
||
- **Pattern aggregation active**: Observations auto-feed pattern aggregates after each scan
|
||
- **No hardcoded assertions**: Bootstrap via wiki import or Trust Packs
|
||
- Ephemeral mode (~0.25s), persistent mode with drift detection
|
||
- Observation/claim distinction (A1–A5 complete)
|
||
- `aphoria verify run|map` for claim verification
|
||
- 10 claims dogfooded in `.aphoria/claims.toml`
|
||
- Self-improving: LLM extraction → pattern learning → autonomous promotion → shadow testing → auto-rollback
|
||
|
||
### Recently Completed: Corpus Infrastructure (Phase CC ✅)
|
||
|
||
**Phase CC.1-CC.3: Removed hardcoded corpus, built emergent system** (Feb 6-7)
|
||
- Deleted `hardcoded.rs` (369 lines, 19 assertions)
|
||
- Pattern aggregates stored in StemeDB: `community://pattern/{BLAKE3(SPV)}`
|
||
- Multi-tier promotion: 95%+ (Regulatory), 80%+ (Clinical), 50%+ (Emerging, review required)
|
||
- Wiki import: `aphoria corpus import wiki ~/docs` parses MUST/SHOULD patterns
|
||
|
||
**Phase CC.6: Pattern Aggregation (Emergent Learning)** (Feb 8) ✅
|
||
- Observations now automatically feed back into pattern aggregates
|
||
- Every scan with `--persist --sync` contributes to community learning
|
||
- Config: `aggregation_enabled: true` (default)
|
||
- Tracks project_count and observation_count per pattern
|
||
- Privacy-preserving: wildcarded subjects, project deduplication
|
||
|
||
**Phase CC.7: Make Community Corpus Default** (Feb 8) ✅
|
||
- Created `AsyncCorpusBuilder` trait for async-native corpus builders
|
||
- Refactored `CommunityCorpusBuilder` to implement `AsyncCorpusBuilder`
|
||
- **Removed `rt.block_on()` hack** that caused "runtime within runtime" errors
|
||
- Made entire corpus building chain properly async (16 functions updated)
|
||
- Enabled `use_community: true` by default in `CorpusConfig`
|
||
- All 1189 tests pass, no clippy warnings, no runtime errors
|
||
|
||
**Philosophy:** The corpus isn't written by experts. It's discovered by the community and validated by authorities.
|
||
|
||
---
|
||
|
||
## Phase 10: UX & Enterprise Polish (Partial)
|
||
|
||
> 10.1 Acknowledgment Expiry ✅ — archived
|
||
|
||
### 10.2 Human-Readable Signer Names ⬜
|
||
|
||
**Impact:** MEDIUM | **Effort:** MEDIUM | **Priority:** P2
|
||
|
||
Map issuer hex IDs to human-readable team names in output.
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Add `signer_name: Option<String>` to `PackHeader` | ⬜ |
|
||
| Add `contact: Option<String>` to `PackHeader` (Slack channel, email) | ⬜ |
|
||
| Update `policy export/import` to preserve new fields | ⬜ |
|
||
| Show "Signed by Platform Security Team" instead of hex in output | ⬜ |
|
||
| Backward-compat: gracefully handle packs without new fields | ⬜ |
|
||
|
||
### 10.3 Speed Benchmarks ⬜
|
||
|
||
**Impact:** LOW | **Effort:** LOW | **Priority:** P3
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Create `benchmarks/` directory with test corpora | ⬜ |
|
||
| Add `aphoria scan --benchmark` flag for self-test | ⬜ |
|
||
| Document test conditions in benchmark results | ⬜ |
|
||
|
||
---
|
||
|
||
## Phase CC: Corpus Infrastructure (Community Corpus) ✅
|
||
|
||
> **Completed:** 2026-02-08 | Removed hardcoded corpus, built emergent community-driven system
|
||
|
||
### Philosophy
|
||
|
||
The corpus isn't written by experts. It's discovered by the community and validated by authorities. 95% adoption = "This is what the community does" = Authoritative.
|
||
|
||
### CC.1 Delete Hardcoded Corpus ✅
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Remove `applications/aphoria/src/corpus/hardcoded.rs` (369 lines) | ✅ |
|
||
| Remove `include_hardcoded` from `CorpusConfig` | ✅ |
|
||
| Remove from `CorpusRegistry::with_defaults()` | ✅ |
|
||
| Update tests to use community corpus | ✅ |
|
||
| Fix 5 pre-existing clippy errors in stemedb-api | ✅ |
|
||
|
||
**Implemented:** Destructive pre-release approach - no deprecation warnings, just deleted.
|
||
|
||
### CC.2 Community Corpus Builder ✅
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Create `applications/aphoria/src/corpus/community.rs` (393 lines) | ✅ |
|
||
| Create `applications/aphoria/src/corpus/thresholds.rs` (230 lines) | ✅ |
|
||
| Create `applications/aphoria/src/corpus/resolver.rs` (220 lines) | ✅ |
|
||
| Create `applications/aphoria/src/community/pattern_store.rs` (332 lines) | ✅ |
|
||
| Implement `PatternAggregateStore` trait with StemeDB backend | ✅ |
|
||
| Multi-tier promotion: 95% (Regulatory), 80% (Clinical), 50% (Emerging) | ✅ |
|
||
| Content-addressed storage: `community://pattern/{BLAKE3(SPV)}` | ✅ |
|
||
| Config integration: `use_community` flag (opt-in) | ✅ |
|
||
| Full scan flow integration | ✅ |
|
||
|
||
**Storage Architecture:**
|
||
- Pattern aggregates stored as StemeDB assertions (no TOML files)
|
||
- Predicate: `pattern_aggregate` with JSON metadata
|
||
- Deduplication via content-addressed subjects
|
||
- Privacy-preserving: wildcarded subjects, k-anonymity
|
||
|
||
### CC.3 Wiki Import Bootstrap ✅
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Create `applications/aphoria/src/corpus/wiki_importer.rs` (332 lines) | ✅ |
|
||
| Regex extraction of MUST/SHOULD patterns from markdown | ✅ |
|
||
| Authority source parsing (RFC, OWASP, CWE references) | ✅ |
|
||
| Smart subject normalization (TLS → tls/cert_verification) | ✅ |
|
||
| CLI command: `aphoria corpus import wiki <path>` | ✅ |
|
||
| PatternAggregator write path (stores to StemeDB) | ✅ |
|
||
| Integration tests with fixtures | ✅ (6 tests) |
|
||
| Documentation: `docs/bootstrap-corpus.md` | ✅ |
|
||
|
||
**Usage:**
|
||
```bash
|
||
# Create wiki with best practices
|
||
mkdir -p .aphoria/wiki
|
||
echo "TLS cert verification MUST be enabled. Authority: RFC 5246" > .aphoria/wiki/tls.md
|
||
|
||
# Import patterns
|
||
aphoria corpus import wiki .aphoria/wiki
|
||
# → Patterns now in StemeDB, available for conflict detection
|
||
```
|
||
|
||
### CC.4 Trust Pack Bootstrap ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Extend Trust Packs to include pattern aggregates | ⬜ Future |
|
||
| `aphoria trust-pack install <name>` writes patterns to StemeDB | ⬜ Future |
|
||
| Create `rfc-owasp-baseline.toml` with ~20 common patterns | ⬜ Future |
|
||
|
||
**Status:** Infrastructure exists, implementation deferred. Wiki import covers bootstrap needs.
|
||
|
||
### CC.5 Skill-Driven Cold Start ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Enhance `aphoria-suggest` skill with bootstrap mode | ⬜ Future |
|
||
| Detect empty corpus during scan | ⬜ Future |
|
||
| Analyze project structure (Cargo.toml, package.json) | ⬜ Future |
|
||
| Suggest 3-5 baseline patterns based on detected stack | ⬜ Future |
|
||
|
||
**Status:** Skill exists, bootstrap mode not implemented. Manual wiki creation works well.
|
||
|
||
### CC.6 Pattern Aggregation (Emergent Learning) ✅
|
||
|
||
> **Completed:** 2026-02-08 | Observations now feed back into pattern aggregates automatically
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Add `aggregation_enabled` config field (default: `true`) | ✅ |
|
||
| Implement `aggregate_observations_to_patterns()` in scanner | ✅ |
|
||
| Add `StemeDBPatternStore::get_pattern_by_spv()` for lookup | ✅ |
|
||
| Add `StemeDBPatternStore::update_pattern()` for updates | ✅ |
|
||
| Add `compute_project_hash()` for deduplication | ✅ |
|
||
| Hook into scan flow after observation recording | ✅ |
|
||
| Group observations by (subject, predicate, value) | ✅ |
|
||
| Wildcard project paths for anonymization | ✅ |
|
||
| Create or update PatternAggregate records | ✅ |
|
||
| Track project_count and observation_count | ✅ |
|
||
|
||
**Implementation:**
|
||
```rust
|
||
// scanner.rs:344-357
|
||
if config.corpus.aggregation_enabled && should_persist_locally {
|
||
let project_hash = compute_project_hash(project_root);
|
||
aggregate_observations_to_patterns(&novel_claims, &episteme, &project_hash).await?;
|
||
}
|
||
```
|
||
|
||
**Flow:**
|
||
1. Scan extracts observations → recorded as Tier 4 assertions
|
||
2. Observations aggregated by (wildcarded_subject, predicate, value)
|
||
3. For each unique pattern:
|
||
- If exists: increment observation_count, check new project → increment project_count
|
||
- If new: create PatternAggregate with initial counts
|
||
4. Stored as assertions with predicate `"pattern_aggregate"`
|
||
|
||
**Result:** The corpus is now **emergent**. Every scan with `--persist --sync` feeds the learning loop.
|
||
|
||
---
|
||
|
||
### What Remains (Future Enhancement)
|
||
|
||
**CC.4 Trust Pack Bootstrap ⬜**
|
||
_(Unchanged - Future enhancement)_
|
||
|
||
**CC.5 Skill-Driven Cold Start ⬜**
|
||
_(Unchanged - Future enhancement)_
|
||
|
||
---
|
||
|
||
### CC.7 Make Community Corpus Default ✅
|
||
|
||
> **Completed:** 2026-02-08 | Community corpus now enabled by default, async runtime issue resolved
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Create `AsyncCorpusBuilder` trait for async corpus builders | ✅ |
|
||
| Implement dual registry (sync + async builders) | ✅ |
|
||
| Refactor `CommunityCorpusBuilder` to implement `AsyncCorpusBuilder` | ✅ |
|
||
| Remove `rt.block_on()` hack, use proper `.await` | ✅ |
|
||
| Make `build_corpus_with_stores()` async | ✅ |
|
||
| Make `create_authoritative_corpus()` async | ✅ |
|
||
| Make `EphemeralDetector::new()` async | ✅ |
|
||
| Make `extract_claims_from_files()` async | ✅ |
|
||
| Update all 16 function callers to use `.await` | ✅ |
|
||
| Change `use_community: false` → `true` in defaults | ✅ |
|
||
| Verify tests pass with community corpus enabled | ✅ (1189 tests) |
|
||
|
||
**Architecture Improvement:**
|
||
- **Before**: Sync `CorpusBuilder` trait forced async operations to use `rt.block_on()`, causing runtime errors in async contexts
|
||
- **After**: Dual-trait approach (`CorpusBuilder` + `AsyncCorpusBuilder`) allows sync builders (RFC, OWASP, Vendor) to stay simple while community builder uses proper async
|
||
- **Result**: No `block_on()` hacks anywhere, proper async/await throughout
|
||
|
||
**Verification:**
|
||
```bash
|
||
RUST_LOG=aphoria=debug aphoria scan --persist --sync .
|
||
# Logs show:
|
||
# ✅ "Registered community corpus builder (async)"
|
||
# ✅ "Building corpus (async)" for Community builder
|
||
# ✅ "Querying popular patterns from StemeDB"
|
||
# ✅ No "Cannot start a runtime from within a runtime" errors
|
||
```
|
||
|
||
---
|
||
|
||
### CC.4 Trust Pack System (Bootstrap Option 2) ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| `aphoria trust-pack export --source community` | ⬜ |
|
||
| `aphoria trust-pack install <name>` | ⬜ |
|
||
| Create `rfc-owasp-bootstrap` Trust Pack from old hardcoded corpus | ⬜ |
|
||
| Trust Pack validation and signing | ⬜ |
|
||
| Trust Pack registry/sharing mechanism | ⬜ |
|
||
|
||
**Usage:**
|
||
```bash
|
||
aphoria trust-pack install rfc-owasp-bootstrap
|
||
# Installs 19 baseline assertions for new projects
|
||
```
|
||
|
||
### CC.5 Corpus Management CLI ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| `aphoria corpus build` - Build community corpus | ⬜ |
|
||
| `aphoria corpus list` - Show loaded corpus assertions | ⬜ |
|
||
| `aphoria corpus candidates --min-adoption 0.50` - List promotion candidates | ⬜ |
|
||
| `aphoria corpus promote <pattern-id>` - Manual promotion | ⬜ |
|
||
| Update `aphoria-corpus-curator` skill for manual review | ⬜ |
|
||
|
||
### CC.6 Multi-Layer Corpus Resolver ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Create `applications/aphoria/src/corpus/resolver.rs` | ⬜ |
|
||
| Priority layers: Manual overrides > Trust Packs > Community > (deprecated hardcoded) | ⬜ |
|
||
| Conflict resolution: higher priority overwrites lower | ⬜ |
|
||
| Config: `use_community = true` default | ⬜ |
|
||
| Config: `include_hardcoded = false` default (post-migration) | ⬜ |
|
||
|
||
---
|
||
|
||
## Phase 14: Governance Workflows 🎯
|
||
|
||
> **Vision:** Clear approval paths for pattern promotion with audit trails.
|
||
|
||
### 14.1 Approval Workflow Definition ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Create `src/governance/mod.rs` module | ⬜ |
|
||
| Define `ApprovalWorkflow` struct | ⬜ |
|
||
| Define `ApprovalStage` with required approvers | ⬜ |
|
||
| Support evidence-based auto-approve thresholds | ⬜ |
|
||
| Config: define workflows in `.aphoria.toml` | ⬜ |
|
||
|
||
### 14.2 Approval State Machine ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Implement state transitions (pending → approved/rejected) | ⬜ |
|
||
| Multi-stage approval support | ⬜ |
|
||
| Timeout and escalation policies | ⬜ |
|
||
| Store approval history with timestamps | ⬜ |
|
||
|
||
### 14.3 Approval CLI ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| `aphoria governance pending` — list pending approvals | ⬜ |
|
||
| `aphoria governance approve <id> --comment "..."` | ⬜ |
|
||
| `aphoria governance reject <id> --reason "..."` | ⬜ |
|
||
| `aphoria governance escalate <id>` | ⬜ |
|
||
| Show approval status in pattern list | ⬜ |
|
||
|
||
### 14.4 SOC 2 Audit Trail ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Full audit log for all governance actions | ⬜ |
|
||
| `aphoria audit trail --pattern <id>` — show timeline | ⬜ |
|
||
| Export governance history for auditors | ⬜ |
|
||
| Include approver identity and timestamp | ⬜ |
|
||
|
||
---
|
||
|
||
## Phase 15: Evidence Source Integration ⬜
|
||
|
||
> **Vision:** ADRs, specs, and standards automatically link to patterns.
|
||
|
||
### 15.1 ADR Auto-Detection ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Create `src/evidence/adr.rs` | ⬜ |
|
||
| Detect ADR-XXX patterns in commit messages | ⬜ |
|
||
| Scan for ADR files in standard locations | ⬜ |
|
||
| Parse ADR content for related patterns | ⬜ |
|
||
| Link ADR to patterns automatically | ⬜ |
|
||
|
||
### 15.2 Spec File Detection ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Create `src/evidence/spec.rs` | ⬜ |
|
||
| Detect spec files (specs/*.md, *.spec.md) | ⬜ |
|
||
| Parse requirement IDs (REQ-XXX) | ⬜ |
|
||
| Link requirements to patterns | ⬜ |
|
||
| Show requirement coverage in reports | ⬜ |
|
||
|
||
### 15.3 Standard Reference Extraction ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Parse RFC references (RFC 7519) | ⬜ |
|
||
| Parse OWASP references (OWASP A03:2021) | ⬜ |
|
||
| Parse NIST references (NIST SP 800-53) | ⬜ |
|
||
| Auto-link to authoritative corpus | ⬜ |
|
||
|
||
### 15.4 Evidence Display ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Show full evidence chain in pattern output | ⬜ |
|
||
| `aphoria patterns --by-evidence` grouping | ⬜ |
|
||
|
||
---
|
||
|
||
## Phase A6: AST-Aware Observation & Claim Verification ⬜
|
||
|
||
> Evolved from the "Scout & Judge" proposal (2026-02-05). The original focused on LLM cost reduction via AST snippet extraction. Reframed through the observations/claims distinction: the **Scout** produces structurally richer observations that regex can't, and the **Judge** verifies authored claims against code rather than classifying security issues.
|
||
|
||
### Why This Matters
|
||
|
||
The 42 regex extractors work well for direct pattern matching (~0.25s). But they can't follow indirection:
|
||
|
||
```python
|
||
# Regex sees `requests.get(url, verify=should_verify)` — no match
|
||
# AST sees `should_verify = False` in scope — match
|
||
should_verify = False
|
||
requests.get(url, verify=should_verify)
|
||
```
|
||
|
||
And they can't verify authored claims. When a claim says "Wallet MUST NOT derive Clone", regex can find `#[derive(` but can't determine scope or negation semantics. An AST-aware scout + LLM judge can.
|
||
|
||
### A6.1 Tree-sitter Infrastructure ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Add `tree-sitter` + language grammars to `Cargo.toml` | ⬜ |
|
||
| Create `src/scout/mod.rs` module | ⬜ |
|
||
| `src/scout/engine.rs` — parse files, run SCM queries | ⬜ |
|
||
| `CandidateSnippet` type with structural context | ⬜ |
|
||
| `src/scout/queries/` — `.scm` query files per category/language | ⬜ |
|
||
| Language support: Python, Go, Rust, JavaScript/TypeScript | ⬜ |
|
||
|
||
```rust
|
||
pub struct CandidateSnippet {
|
||
pub file_path: String,
|
||
pub language: Language,
|
||
pub start_line: usize,
|
||
pub end_line: usize,
|
||
pub code: String,
|
||
pub context_variables: HashMap<String, String>,
|
||
pub query_id: String,
|
||
}
|
||
```
|
||
|
||
### A6.2 Scout as Observation Producer ⬜
|
||
|
||
AST-aware ROI detection for patterns regex can't follow.
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Variable indirection tracking (assign → use across lines) | ⬜ |
|
||
| Context expansion: function scope, variable defs, comments | ⬜ |
|
||
| Deduplication with existing regex extractors | ⬜ |
|
||
| SCM queries for TLS, secrets, auth, crypto categories | ⬜ |
|
||
| Integration: run scout after regex, drop overlaps, combine | ⬜ |
|
||
|
||
**Key design:** Scout runs alongside (not instead of) regex extractors. Regex handles 90% at zero cost; scout handles the indirection cases regex misses.
|
||
|
||
### A6.3 Judge as Claim Verifier ⬜
|
||
|
||
LLM receives focused snippet + authored claim → structured verdict.
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Refactor `LlmExtractor` to accept `CandidateSnippet` + `AuthoredClaim` | ⬜ |
|
||
| Verification prompt: "Does this code satisfy this claim?" | ⬜ |
|
||
| Structured output: `{ verdict: PASS|FAIL|UNCERTAIN, evidence: "..." }` | ⬜ |
|
||
| Wire into `aphoria verify` Direction 2 (walk claims, verify in code) | ⬜ |
|
||
| Maps to `Extractor::verify()` from vision-gaps | ⬜ |
|
||
|
||
**Token efficiency:** Snippet (~100 tokens) vs whole file (~2000 tokens) = 95% cost reduction per verification.
|
||
|
||
### A6.4 Scout for Claim Suggestion ⬜
|
||
|
||
Scout identifies ROIs without matching authored claims, feeds context to `aphoria-suggest`.
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Identify ROIs with no matching claim in `.aphoria/claims.toml` | ⬜ |
|
||
| Enrich context for skill: snippet + function name + surrounding comments | ⬜ |
|
||
| Feed to `aphoria-suggest` skill for claim drafting | ⬜ |
|
||
|
||
### A6.5 Evaluation ⬜
|
||
|
||
| Task | Status |
|
||
|------|--------|
|
||
| Scout recall: "Did scout find the vulnerable line in fixture?" | ⬜ |
|
||
| Judge precision: "Given snippet + claim, did LLM classify correctly?" | ⬜ |
|
||
| Cost metric: `tokens_per_verification` vs monolithic approach | ⬜ |
|
||
| Parallel run: shadow mode alongside regex for tuning | ⬜ |
|
||
|
||
### Phase A6 Priority
|
||
|
||
Lower priority than A5 flywheel completion and Phase 14 governance. Build when:
|
||
1. Regex extractors hit limits on specific indirection patterns
|
||
2. `aphoria verify` Direction 2 needs LLM-backed verification
|
||
3. `aphoria-suggest` needs richer context than regex observations provide
|
||
|
||
---
|
||
|
||
## Enterprise Pilot Success Metrics
|
||
|
||
### 90-Day Pilot Targets
|
||
|
||
| Metric | Target | Measurement |
|
||
|--------|--------|-------------|
|
||
| Patterns captured | 100+ observations | Count in knowledge graph |
|
||
| Patterns promoted | 10+ conventions | Count with status=Active |
|
||
| Cross-team adoption | 2+ teams connected | Unique team_ids |
|
||
| New hire guidance events | 5+ accepted suggestions | Accept rate tracking |
|
||
| False positive rate | <10% | FP feedback / total flags |
|
||
| Evidence-backed patterns | >50% | Patterns with Research+ evidence |
|
||
|
||
### 180-Day Production Targets
|
||
|
||
| Metric | Target | Measurement |
|
||
|--------|--------|-------------|
|
||
| Knowledge retention | 0 lost patterns on departures | Audit log |
|
||
| Onboarding velocity | 50% faster ramp | Time to first PR |
|
||
| Convention adoption | 80% across org | Compliance rate |
|
||
| SOC 2 evidence | Audit pass | External validation |
|
||
| Deprecated pattern migration | 90% complete by sunset | Migration tracking |
|
||
|
||
---
|
||
|
||
## Enterprise Simulation UAT
|
||
|
||
See: `uat/enterprise-simulation-uat.md`
|