stemedb/applications/aphoria/roadmap.md

# Aphoria Roadmap

> Completed phases archived in [`roadmap-archive.md`](./roadmap-archive.md)

---

## Status Overview

| Phase | Deliverable | Status |
|-------|-------------|--------|
| 0–9, 11–13, 16–17 | Core CLI, Extractors (42), LLM, Learning, Enterprise, Lifecycle, Pattern Enrichment | ✅ Archived |
| CC | Corpus Infrastructure (Community Corpus, Wiki Import, Pattern Aggregation, **Async Default**) | ✅ Complete |
| 10 | UX & Enterprise Polish | 🔄 Partial (10.1 ✅, 10.2–10.3 ⬜) |
| 14 | Governance Workflows | 🎯 Current |
| 15 | Evidence Source Integration | ⬜ Future |
| A6 | AST-Aware Observation & Claim Verification | ⬜ Future |

### Current State

- 42 built-in extractors + declarative custom extractors
- **Emergent corpus**: RFC, OWASP, Vendor sources + **community-driven patterns (CC.6 ✅)**
- **Community corpus enabled by default** (CC.7 ✅): `use_community: true`, proper async, no runtime hacks
- **Pattern aggregation active**: Observations auto-feed pattern aggregates after each scan
- **No hardcoded assertions**: Bootstrap via wiki import or Trust Packs
- Ephemeral mode (~0.25s), persistent mode with drift detection
- Observation/claim distinction (A1–A5 complete)
- `aphoria verify run|map` for claim verification
- 10 claims dogfooded in `.aphoria/claims.toml`
- Self-improving: LLM extraction → pattern learning → autonomous promotion → shadow testing → auto-rollback

### Recently Completed: Corpus Infrastructure (Phase CC ✅)

**Phase CC.1-CC.3: Removed hardcoded corpus, built emergent system** (Feb 6-7)
- Deleted `hardcoded.rs` (369 lines, 19 assertions)
- Pattern aggregates stored in StemeDB: `community://pattern/{BLAKE3(SPV)}`
- Multi-tier promotion: 95%+ (Regulatory), 80%+ (Clinical), 50%+ (Emerging, review required)
- Wiki import: `aphoria corpus import wiki ~/docs` parses MUST/SHOULD patterns

**Phase CC.6: Pattern Aggregation (Emergent Learning)** (Feb 8) ✅
- Observations now automatically feed back into pattern aggregates
- Every scan with `--persist --sync` contributes to community learning
- Config: `aggregation_enabled: true` (default)
- Tracks project_count and observation_count per pattern
- Privacy-preserving: wildcarded subjects, project deduplication

**Phase CC.7: Make Community Corpus Default** (Feb 8) ✅
- Created `AsyncCorpusBuilder` trait for async-native corpus builders
- Refactored `CommunityCorpusBuilder` to implement `AsyncCorpusBuilder`
- **Removed `rt.block_on()` hack** that caused "runtime within runtime" errors
- Made entire corpus building chain properly async (16 functions updated)
- Enabled `use_community: true` by default in `CorpusConfig`
- All 1189 tests pass, no clippy warnings, no runtime errors

**Philosophy:** The corpus isn't written by experts. It's discovered by the community and validated by authorities.

---

## Phase 10: UX & Enterprise Polish (Partial)

> 10.1 Acknowledgment Expiry ✅ — archived

### 10.2 Human-Readable Signer Names ⬜

**Impact:** MEDIUM | **Effort:** MEDIUM | **Priority:** P2

Map issuer hex IDs to human-readable team names in output.

| Task | Status |
|------|--------|
| Add `signer_name: Option<String>` to `PackHeader` | ⬜ |
| Add `contact: Option<String>` to `PackHeader` (Slack channel, email) | ⬜ |
| Update `policy export/import` to preserve new fields | ⬜ |
| Show "Signed by Platform Security Team" instead of hex in output | ⬜ |
| Backward-compat: gracefully handle packs without new fields | ⬜ |

### 10.3 Speed Benchmarks ⬜

**Impact:** LOW | **Effort:** LOW | **Priority:** P3

| Task | Status |
|------|--------|
| Create `benchmarks/` directory with test corpora | ⬜ |
| Add `aphoria scan --benchmark` flag for self-test | ⬜ |
| Document test conditions in benchmark results | ⬜ |

---

## Phase CC: Corpus Infrastructure (Community Corpus) ✅

> **Completed:** 2026-02-08 | Removed hardcoded corpus, built emergent community-driven system

### Philosophy

The corpus isn't written by experts. It's discovered by the community and validated by authorities. 95% adoption = "This is what the community does" = Authoritative.

### CC.1 Delete Hardcoded Corpus ✅

| Task | Status |
|------|--------|
| Remove `applications/aphoria/src/corpus/hardcoded.rs` (369 lines) | ✅ |
| Remove `include_hardcoded` from `CorpusConfig` | ✅ |
| Remove from `CorpusRegistry::with_defaults()` | ✅ |
| Update tests to use community corpus | ✅ |
| Fix 5 pre-existing clippy errors in stemedb-api | ✅ |

**Implemented:** Destructive pre-release approach - no deprecation warnings, just deleted.

### CC.2 Community Corpus Builder ✅

| Task | Status |
|------|--------|
| Create `applications/aphoria/src/corpus/community.rs` (393 lines) | ✅ |
| Create `applications/aphoria/src/corpus/thresholds.rs` (230 lines) | ✅ |
| Create `applications/aphoria/src/corpus/resolver.rs` (220 lines) | ✅ |
| Create `applications/aphoria/src/community/pattern_store.rs` (332 lines) | ✅ |
| Implement `PatternAggregateStore` trait with StemeDB backend | ✅ |
| Multi-tier promotion: 95% (Regulatory), 80% (Clinical), 50% (Emerging) | ✅ |
| Content-addressed storage: `community://pattern/{BLAKE3(SPV)}` | ✅ |
| Config integration: `use_community` flag (opt-in) | ✅ |
| Full scan flow integration | ✅ |

**Storage Architecture:**
- Pattern aggregates stored as StemeDB assertions (no TOML files)
- Predicate: `pattern_aggregate` with JSON metadata
- Deduplication via content-addressed subjects
- Privacy-preserving: wildcarded subjects, k-anonymity

### CC.3 Wiki Import Bootstrap ✅

| Task | Status |
|------|--------|
| Create `applications/aphoria/src/corpus/wiki_importer.rs` (332 lines) | ✅ |
| Regex extraction of MUST/SHOULD patterns from markdown | ✅ |
| Authority source parsing (RFC, OWASP, CWE references) | ✅ |
| Smart subject normalization (TLS → tls/cert_verification) | ✅ |
| CLI command: `aphoria corpus import wiki <path>` | ✅ |
| PatternAggregator write path (stores to StemeDB) | ✅ |
| Integration tests with fixtures | ✅ (6 tests) |
| Documentation: `docs/bootstrap-corpus.md` | ✅ |

**Usage:**
```bash
# Create wiki with best practices
mkdir -p .aphoria/wiki
echo "TLS cert verification MUST be enabled. Authority: RFC 5246" > .aphoria/wiki/tls.md

# Import patterns
aphoria corpus import wiki .aphoria/wiki
# → Patterns now in StemeDB, available for conflict detection
```

### CC.4 Trust Pack Bootstrap ⬜

| Task | Status |
|------|--------|
| Extend Trust Packs to include pattern aggregates | ⬜ Future |
| `aphoria trust-pack install <name>` writes patterns to StemeDB | ⬜ Future |
| Create `rfc-owasp-baseline.toml` with ~20 common patterns | ⬜ Future |

**Status:** Infrastructure exists, implementation deferred. Wiki import covers bootstrap needs.

### CC.5 Skill-Driven Cold Start ⬜

| Task | Status |
|------|--------|
| Enhance `aphoria-suggest` skill with bootstrap mode | ⬜ Future |
| Detect empty corpus during scan | ⬜ Future |
| Analyze project structure (Cargo.toml, package.json) | ⬜ Future |
| Suggest 3-5 baseline patterns based on detected stack | ⬜ Future |

**Status:** Skill exists, bootstrap mode not implemented. Manual wiki creation works well.

### CC.6 Pattern Aggregation (Emergent Learning) ✅

> **Completed:** 2026-02-08 | Observations now feed back into pattern aggregates automatically

| Task | Status |
|------|--------|
| Add `aggregation_enabled` config field (default: `true`) | ✅ |
| Implement `aggregate_observations_to_patterns()` in scanner | ✅ |
| Add `StemeDBPatternStore::get_pattern_by_spv()` for lookup | ✅ |
| Add `StemeDBPatternStore::update_pattern()` for updates | ✅ |
| Add `compute_project_hash()` for deduplication | ✅ |
| Hook into scan flow after observation recording | ✅ |
| Group observations by (subject, predicate, value) | ✅ |
| Wildcard project paths for anonymization | ✅ |
| Create or update PatternAggregate records | ✅ |
| Track project_count and observation_count | ✅ |

**Implementation:**
```rust
// scanner.rs:344-357
if config.corpus.aggregation_enabled && should_persist_locally {
    let project_hash = compute_project_hash(project_root);
    aggregate_observations_to_patterns(&novel_claims, &episteme, &project_hash).await?;
}
```

**Flow:**
1. Scan extracts observations → recorded as Tier 4 assertions
2. Observations aggregated by (wildcarded_subject, predicate, value)
3. For each unique pattern:
   - If exists: increment observation_count, check new project → increment project_count
   - If new: create PatternAggregate with initial counts
4. Stored as assertions with predicate `"pattern_aggregate"`

**Result:** The corpus is now **emergent**. Every scan with `--persist --sync` feeds the learning loop.

---

### What Remains (Future Enhancement)

**CC.4 Trust Pack Bootstrap ⬜**
_(Unchanged - Future enhancement)_

**CC.5 Skill-Driven Cold Start ⬜**
_(Unchanged - Future enhancement)_

---

### CC.7 Make Community Corpus Default ✅

> **Completed:** 2026-02-08 | Community corpus now enabled by default, async runtime issue resolved

| Task | Status |
|------|--------|
| Create `AsyncCorpusBuilder` trait for async corpus builders | ✅ |
| Implement dual registry (sync + async builders) | ✅ |
| Refactor `CommunityCorpusBuilder` to implement `AsyncCorpusBuilder` | ✅ |
| Remove `rt.block_on()` hack, use proper `.await` | ✅ |
| Make `build_corpus_with_stores()` async | ✅ |
| Make `create_authoritative_corpus()` async | ✅ |
| Make `EphemeralDetector::new()` async | ✅ |
| Make `extract_claims_from_files()` async | ✅ |
| Update all 16 function callers to use `.await` | ✅ |
| Change `use_community: false` → `true` in defaults | ✅ |
| Verify tests pass with community corpus enabled | ✅ (1189 tests) |

**Architecture Improvement:**
- **Before**: Sync `CorpusBuilder` trait forced async operations to use `rt.block_on()`, causing runtime errors in async contexts
- **After**: Dual-trait approach (`CorpusBuilder` + `AsyncCorpusBuilder`) allows sync builders (RFC, OWASP, Vendor) to stay simple while community builder uses proper async
- **Result**: No `block_on()` hacks anywhere, proper async/await throughout

**Verification:**
```bash
RUST_LOG=aphoria=debug aphoria scan --persist --sync .
# Logs show:
# ✅ "Registered community corpus builder (async)"
# ✅ "Building corpus (async)" for Community builder
# ✅ "Querying popular patterns from StemeDB"
# ✅ No "Cannot start a runtime from within a runtime" errors
```

---

### CC.4 Trust Pack System (Bootstrap Option 2) ⬜

| Task | Status |
|------|--------|
| `aphoria trust-pack export --source community` | ⬜ |
| `aphoria trust-pack install <name>` | ⬜ |
| Create `rfc-owasp-bootstrap` Trust Pack from old hardcoded corpus | ⬜ |
| Trust Pack validation and signing | ⬜ |
| Trust Pack registry/sharing mechanism | ⬜ |

**Usage:**
```bash
aphoria trust-pack install rfc-owasp-bootstrap
# Installs 19 baseline assertions for new projects
```

### CC.5 Corpus Management CLI ⬜

| Task | Status |
|------|--------|
| `aphoria corpus build` - Build community corpus | ⬜ |
| `aphoria corpus list` - Show loaded corpus assertions | ⬜ |
| `aphoria corpus candidates --min-adoption 0.50` - List promotion candidates | ⬜ |
| `aphoria corpus promote <pattern-id>` - Manual promotion | ⬜ |
| Update `aphoria-corpus-curator` skill for manual review | ⬜ |

### CC.6 Multi-Layer Corpus Resolver ⬜

| Task | Status |
|------|--------|
| Create `applications/aphoria/src/corpus/resolver.rs` | ⬜ |
| Priority layers: Manual overrides > Trust Packs > Community > (deprecated hardcoded) | ⬜ |
| Conflict resolution: higher priority overwrites lower | ⬜ |
| Config: `use_community = true` default | ⬜ |
| Config: `include_hardcoded = false` default (post-migration) | ⬜ |

---

## Phase 14: Governance Workflows 🎯

> **Vision:** Clear approval paths for pattern promotion with audit trails.

### 14.1 Approval Workflow Definition ⬜

| Task | Status |
|------|--------|
| Create `src/governance/mod.rs` module | ⬜ |
| Define `ApprovalWorkflow` struct | ⬜ |
| Define `ApprovalStage` with required approvers | ⬜ |
| Support evidence-based auto-approve thresholds | ⬜ |
| Config: define workflows in `.aphoria.toml` | ⬜ |

### 14.2 Approval State Machine ⬜

| Task | Status |
|------|--------|
| Implement state transitions (pending → approved/rejected) | ⬜ |
| Multi-stage approval support | ⬜ |
| Timeout and escalation policies | ⬜ |
| Store approval history with timestamps | ⬜ |

### 14.3 Approval CLI ⬜

| Task | Status |
|------|--------|
| `aphoria governance pending` — list pending approvals | ⬜ |
| `aphoria governance approve <id> --comment "..."` | ⬜ |
| `aphoria governance reject <id> --reason "..."` | ⬜ |
| `aphoria governance escalate <id>` | ⬜ |
| Show approval status in pattern list | ⬜ |

### 14.4 SOC 2 Audit Trail ⬜

| Task | Status |
|------|--------|
| Full audit log for all governance actions | ⬜ |
| `aphoria audit trail --pattern <id>` — show timeline | ⬜ |
| Export governance history for auditors | ⬜ |
| Include approver identity and timestamp | ⬜ |

---

## Phase 15: Evidence Source Integration ⬜

> **Vision:** ADRs, specs, and standards automatically link to patterns.

### 15.1 ADR Auto-Detection ⬜

| Task | Status |
|------|--------|
| Create `src/evidence/adr.rs` | ⬜ |
| Detect ADR-XXX patterns in commit messages | ⬜ |
| Scan for ADR files in standard locations | ⬜ |
| Parse ADR content for related patterns | ⬜ |
| Link ADR to patterns automatically | ⬜ |

### 15.2 Spec File Detection ⬜

| Task | Status |
|------|--------|
| Create `src/evidence/spec.rs` | ⬜ |
| Detect spec files (specs/*.md, *.spec.md) | ⬜ |
| Parse requirement IDs (REQ-XXX) | ⬜ |
| Link requirements to patterns | ⬜ |
| Show requirement coverage in reports | ⬜ |

### 15.3 Standard Reference Extraction ⬜

| Task | Status |
|------|--------|
| Parse RFC references (RFC 7519) | ⬜ |
| Parse OWASP references (OWASP A03:2021) | ⬜ |
| Parse NIST references (NIST SP 800-53) | ⬜ |
| Auto-link to authoritative corpus | ⬜ |

### 15.4 Evidence Display ⬜

| Task | Status |
|------|--------|
| Show full evidence chain in pattern output | ⬜ |
| `aphoria patterns --by-evidence` grouping | ⬜ |

---

## Phase A6: AST-Aware Observation & Claim Verification ⬜

> Evolved from the "Scout & Judge" proposal (2026-02-05). The original focused on LLM cost reduction via AST snippet extraction. Reframed through the observations/claims distinction: the **Scout** produces structurally richer observations that regex can't, and the **Judge** verifies authored claims against code rather than classifying security issues.

### Why This Matters

The 42 regex extractors work well for direct pattern matching (~0.25s). But they can't follow indirection:

```python
# Regex sees `requests.get(url, verify=should_verify)` — no match
# AST sees `should_verify = False` in scope — match
should_verify = False
requests.get(url, verify=should_verify)
```

And they can't verify authored claims. When a claim says "Wallet MUST NOT derive Clone", regex can find `#[derive(` but can't determine scope or negation semantics. An AST-aware scout + LLM judge can.

### A6.1 Tree-sitter Infrastructure ⬜

| Task | Status |
|------|--------|
| Add `tree-sitter` + language grammars to `Cargo.toml` | ⬜ |
| Create `src/scout/mod.rs` module | ⬜ |
| `src/scout/engine.rs` — parse files, run SCM queries | ⬜ |
| `CandidateSnippet` type with structural context | ⬜ |
| `src/scout/queries/` — `.scm` query files per category/language | ⬜ |
| Language support: Python, Go, Rust, JavaScript/TypeScript | ⬜ |

```rust
pub struct CandidateSnippet {
    pub file_path: String,
    pub language: Language,
    pub start_line: usize,
    pub end_line: usize,
    pub code: String,
    pub context_variables: HashMap<String, String>,
    pub query_id: String,
}
```

### A6.2 Scout as Observation Producer ⬜

AST-aware ROI detection for patterns regex can't follow.

| Task | Status |
|------|--------|
| Variable indirection tracking (assign → use across lines) | ⬜ |
| Context expansion: function scope, variable defs, comments | ⬜ |
| Deduplication with existing regex extractors | ⬜ |
| SCM queries for TLS, secrets, auth, crypto categories | ⬜ |
| Integration: run scout after regex, drop overlaps, combine | ⬜ |

**Key design:** Scout runs alongside (not instead of) regex extractors. Regex handles 90% at zero cost; scout handles the indirection cases regex misses.

### A6.3 Judge as Claim Verifier ⬜

LLM receives focused snippet + authored claim → structured verdict.

| Task | Status |
|------|--------|
| Refactor `LlmExtractor` to accept `CandidateSnippet` + `AuthoredClaim` | ⬜ |
| Verification prompt: "Does this code satisfy this claim?" | ⬜ |
| Structured output: `{ verdict: PASS|FAIL|UNCERTAIN, evidence: "..." }` | ⬜ |
| Wire into `aphoria verify` Direction 2 (walk claims, verify in code) | ⬜ |
| Maps to `Extractor::verify()` from vision-gaps | ⬜ |

**Token efficiency:** Snippet (~100 tokens) vs whole file (~2000 tokens) = 95% cost reduction per verification.

### A6.4 Scout for Claim Suggestion ⬜

Scout identifies ROIs without matching authored claims, feeds context to `aphoria-suggest`.

| Task | Status |
|------|--------|
| Identify ROIs with no matching claim in `.aphoria/claims.toml` | ⬜ |
| Enrich context for skill: snippet + function name + surrounding comments | ⬜ |
| Feed to `aphoria-suggest` skill for claim drafting | ⬜ |

### A6.5 Evaluation ⬜

| Task | Status |
|------|--------|
| Scout recall: "Did scout find the vulnerable line in fixture?" | ⬜ |
| Judge precision: "Given snippet + claim, did LLM classify correctly?" | ⬜ |
| Cost metric: `tokens_per_verification` vs monolithic approach | ⬜ |
| Parallel run: shadow mode alongside regex for tuning | ⬜ |

### Phase A6 Priority

Lower priority than A5 flywheel completion and Phase 14 governance. Build when:
1. Regex extractors hit limits on specific indirection patterns
2. `aphoria verify` Direction 2 needs LLM-backed verification
3. `aphoria-suggest` needs richer context than regex observations provide

---

## Enterprise Pilot Success Metrics

### 90-Day Pilot Targets

| Metric | Target | Measurement |
|--------|--------|-------------|
| Patterns captured | 100+ observations | Count in knowledge graph |
| Patterns promoted | 10+ conventions | Count with status=Active |
| Cross-team adoption | 2+ teams connected | Unique team_ids |
| New hire guidance events | 5+ accepted suggestions | Accept rate tracking |
| False positive rate | <10% | FP feedback / total flags |
| Evidence-backed patterns | >50% | Patterns with Research+ evidence |

### 180-Day Production Targets

| Metric | Target | Measurement |
|--------|--------|-------------|
| Knowledge retention | 0 lost patterns on departures | Audit log |
| Onboarding velocity | 50% faster ramp | Time to first PR |
| Convention adoption | 80% across org | Compliance rate |
| SOC 2 evidence | Audit pass | External validation |
| Deprecated pattern migration | 90% complete by sunset | Migration tracking |

---

## Enterprise Simulation UAT

See: `uat/enterprise-simulation-uat.md`