Implement structured approval workflows for pattern promotion with full audit trails for SOC 2 compliance. Core Components: - governance/types.rs: ApprovalRequest, ApprovalStatus, ApprovalDecision - governance/workflow.rs: ApprovalWorkflow, ApprovalStage with escalation - governance/store.rs: JSONL persistence for requests and decisions - governance/state_machine.rs: Approval state transitions with auto-advance - governance/audit.rs: AuditTrail with JSON/CSV/Markdown export CLI Commands: - aphoria governance pending/approve/reject/escalate/status/create - aphoria audit trail/export/summary Integration: - Pipeline gate blocks promotion until governance approval - Auto-creates approval requests when governance enabled - Evidence-based auto-approval for high-confidence patterns Also includes: - Phase 11-13: Evidence, Lifecycle, Scope modules - 62+ governance-specific tests (946 total passing) - Clippy clean with -D warnings - Refactored cli.rs into submodules (governance, lifecycle, scope, etc.) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3070 lines
115 KiB
Markdown
3070 lines
115 KiB
Markdown
# Aphoria Roadmap
|
|
|
|
---
|
|
|
|
## Phase 0: StemeDB Foundation ✅
|
|
|
|
> **Tracked in:** [roadmap.md § 5D. Concept Hierarchy](../../roadmap.md)
|
|
|
|
Changes to the core database that Aphoria depends on. Shipped as **Phase 5D** of the main StemeDB roadmap.
|
|
|
|
| Aphoria Phase 0 | StemeDB Phase 5D | Status |
|
|
|-----------------|------------------|--------|
|
|
| 0.1 ConceptPath Type | 5D.1 ConceptPath Type | ✅ |
|
|
| 0.2 ConceptPath in Assertion | (implicit in 5D.1) | ✅ |
|
|
| 0.3 Hierarchical Index | 5D.4 Hierarchical Query | ✅ |
|
|
| 0.4 Alias Store | 5D.3 Alias Store + 5D.5 Alias Resolution | ✅ |
|
|
| 0.5 Source Class Inference | 5D.6 Source Class Inference | ✅ |
|
|
| 0.6 Concept API Endpoints | 5D.7 Concept API Endpoints | ✅ |
|
|
|
|
**Spec:** [docs/specs/concept-hierarchy.md](../../docs/specs/concept-hierarchy.md)
|
|
|
|
---
|
|
|
|
## Phase 2: CLI Core ✅
|
|
|
|
> Phase 2 was built before Phase 1 (authoritative corpus expansion). The CLI pipeline works end-to-end with a bootstrapped corpus of 11 hardcoded assertions covering TLS, JWT, CORS, secrets, and rate limiting.
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| 2.1 Project Walker | ✅ `walker/mod.rs`, `walker/path_mapper.rs`, `walker/language.rs` |
|
|
| 2.2 Extractors (10) | ✅ `tls_verify`, `jwt_config`, `hardcoded_secrets`, `timeout_config`, `dep_versions`, `cors_config`, `rate_limit`, `weak_crypto`, `command_injection`, `sql_injection` |
|
|
| 2.3 Ingestion Bridge | ✅ `bridge.rs` — BLAKE3 hashing, Ed25519 signing, claim→assertion conversion |
|
|
| 2.4 Conflict Query | ✅ `episteme.rs` — LocalEpisteme with check_conflicts() |
|
|
| 2.5 Report Output | ✅ `report/` — table (comfy-table), JSON, SARIF 2.1.0, markdown |
|
|
| 2.6 Acknowledge Command | ✅ `lib.rs` acknowledge() |
|
|
| Baseline & Diff | ✅ `lib.rs` set_baseline(), show_diff() |
|
|
| Status Command | ✅ `lib.rs` show_status() |
|
|
|
|
183 tests pass. Clippy and fmt clean.
|
|
|
|
### Phase 2 Code Quality Fixes ✅
|
|
|
|
Code review improvements to extractors:
|
|
|
|
| Issue | Fix | Status |
|
|
|-------|-----|--------|
|
|
| DES/RC4 concept path misclassification | Split `check_pattern()` into `check_hash_pattern()` and `check_encryption_pattern()`; DES/RC4 now use `crypto/encryption/algorithm` path | ✅ |
|
|
| SHA1 edge case undocumented | Added comments and test documenting that SHA1 detection is intentionally broad (triggers for git hashes, etc.) | ✅ |
|
|
| JS exec() regex overly broad | Tightened regex to require `child_process.` prefix or non-word/non-dot preceding character; prevents `RegExp.exec()` false positives | ✅ |
|
|
|
|
---
|
|
|
|
## Phase 2A: Concept Matching ✅
|
|
|
|
> **Status:** Complete. Tail-path matching (2A.1), alias-aware queries (2A.2), and auto-alias creation (2A.3) all implemented.
|
|
|
|
### 2A.1 Leaf-Based Concept Matching (Aphoria-side fix) ✅
|
|
|
|
Implemented in `episteme.rs` via `ConceptIndex`:
|
|
- `make_key(subject, predicate)` extracts tail 2 path segments + predicate
|
|
- `build(assertions)` creates in-memory index keyed by tail path
|
|
- `lookup(subject, predicate)` finds matching authoritative assertions
|
|
- `check_conflicts()` uses `ConceptIndex` instead of `QueryEngine` for cross-scheme matching
|
|
|
|
Integration tests prove TLS and JWT conflicts are detected correctly.
|
|
|
|
### 2A.2 Alias Resolution in QueryEngine (StemeDB-side fix) ✅
|
|
|
|
Wired `AliasStore` into `QueryEngine.execute()`:
|
|
- Added `resolve_aliases: bool` field to `Query` (defaults to `false`)
|
|
- Added `alias_store: Option<Arc<dyn AliasStore>>` to `QueryEngine`
|
|
- Added `.with_alias_store()` builder method
|
|
- When `resolve_aliases: true`, expands subject via `AliasStore.resolve_all()` before index lookup
|
|
- Added `fetch_by_subjects()` and `fetch_by_subjects_predicate()` for multi-subject deduplication
|
|
- Modified `Query.matches()` to skip subject filtering when aliases are resolved
|
|
- Skips fast path (MV lookup) when `resolve_aliases: true`
|
|
- Gracefully degrades when no alias store is configured
|
|
|
|
7 unit tests in `engine/tests/alias_resolution.rs`. This is the architecturally correct long-term fix that complements leaf matching.
|
|
|
|
### 2A.3 Auto-Alias Creation ✅
|
|
|
|
When Aphoria ingests authoritative assertions and code claims that share leaf names, automatically create aliases:
|
|
- `code://rust/myapp/tls/cert_verification` ↔ `rfc://5246/tls/cert_verification`
|
|
- `code://rust/myapp/auth/jwt/audience_validation` ↔ `rfc://7519/jwt/audience_validation`
|
|
|
|
This bridges 2A.1 (leaf matching) with 2A.2 (alias resolution) — leaf matching identifies candidates, aliases persist the relationship.
|
|
|
|
**Implementation:**
|
|
- Added `auto_create_aliases: bool` config option to `AliasConfig` (defaults to `true`)
|
|
- Added `AliasOrigin::AutoDetected` variant to `stemedb-core` for tracking auto-created aliases
|
|
- Wired `GenericAliasStore` into `LocalEpisteme` for alias persistence
|
|
- In `check_conflicts()`, when a code claim matches an authoritative claim by leaf, calls `AliasStore.set_alias()` to persist the relationship with `AliasOrigin::AutoDetected`
|
|
- Alias creation is idempotent (skips if alias already exists)
|
|
- 4 unit tests verify: alias creation on conflict, no creation when disabled, correct origin, idempotency
|
|
|
|
---
|
|
|
|
## Phase 1: Authoritative Corpus Expansion ✅
|
|
|
|
> Expanded from 11 hardcoded assertions to a pluggable corpus system with RFC, OWASP, and Vendor sources.
|
|
|
|
### Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ aphoria corpus build │
|
|
│ │
|
|
│ ┌──────────────┐ ┌──────────────┐ ┌───────────────────────┐ │
|
|
│ │ RFC Ingester │ │ OWASP │ │ Vendor Bootstrapper │ │
|
|
│ │ (Tier 0) │ │ Ingester │ │ (Tier 2) │ │
|
|
│ │ │ │ (Tier 1) │ │ │ │
|
|
│ └──────┬───────┘ └──────┬───────┘ └───────────┬───────────┘ │
|
|
│ │ │ │ │
|
|
│ └─────────────────┼──────────────────────┘ │
|
|
│ ▼ │
|
|
│ ┌─────────────────┐ │
|
|
│ │ CorpusRegistry │ │
|
|
│ └────────┬────────┘ │
|
|
│ ▼ │
|
|
│ ┌─────────────────┐ │
|
|
│ │ LocalEpisteme │ │
|
|
│ │ ingest_ │ │
|
|
│ │ authoritative() │ │
|
|
│ └─────────────────┘ │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 1.1 CorpusBuilder Trait ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `CorpusBuilder` trait | ✅ `corpus/mod.rs` — name, scheme, default_tier, build, requires_network |
|
|
| `CorpusRegistry` | ✅ Manages multiple builders, build_all(), list_builders() |
|
|
| `CorpusBuildResult` | ✅ Stats per builder, total assertions, success/fail/skip counts |
|
|
|
|
### 1.2 RFC Ingester ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `RfcCorpusBuilder` | ✅ `corpus/rfc.rs` |
|
|
| HTTP fetching | ✅ Via `ureq`, cached to `~/.cache/aphoria/rfc-cache/` |
|
|
| RFC 2119 keyword parsing | ✅ MUST, MUST NOT, SHOULD, SHALL extraction |
|
|
| RFC-specific parsers | ✅ JWT (7519), OAuth (6749), Bearer (6750), TLS 1.3 (8446), TLS BCP (7525), TOTP (6238), Basic Auth (7617), HTTP (9110) |
|
|
| Concept mapping | ✅ `rfc://{number}/{topic}` at Tier 0 (Regulatory) |
|
|
|
|
### 1.3 OWASP Ingester ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `OwaspCorpusBuilder` | ✅ `corpus/owasp.rs` |
|
|
| HTTP fetching | ✅ From GitHub raw content, cached to `~/.cache/aphoria/owasp-cache/` |
|
|
| Markdown parsing | ✅ MUST/SHOULD statements, section context |
|
|
| Cheat sheet parsers | ✅ Authentication, JWT, TLS, Secrets, Input Validation, Session, CSRF, Password Storage, HTTP Headers |
|
|
| Concept mapping | ✅ `owasp://cheatsheet/{topic}/{claim}` at Tier 1 (Clinical) |
|
|
|
|
### 1.4 Vendor Docs ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `VendorCorpusBuilder` | ✅ `corpus/vendor.rs` |
|
|
| PostgreSQL claims | ✅ pool_size, idle_timeout, ssl_mode |
|
|
| Redis claims | ✅ timeout, max_retries, tls |
|
|
| reqwest claims | ✅ cert_verification, connect_timeout, request_timeout |
|
|
| hyper claims | ✅ keep_alive_timeout, max_concurrent_streams |
|
|
| Go net/http claims | ✅ read_timeout, write_timeout, idle_timeout, min_tls_version |
|
|
| tokio-postgres claims | ✅ pool_size, ssl_mode |
|
|
| SQLx claims | ✅ max_connections, idle_timeout |
|
|
| Concept mapping | ✅ `vendor://{product}/{topic}/{claim}` at Tier 2 (Observational) |
|
|
|
|
### 1.5 Hardcoded Refactor ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `HardcodedCorpusBuilder` | ✅ `corpus/hardcoded.rs` — original 11 assertions |
|
|
| `create_authoritative_assertion()` | ✅ Made public in `episteme.rs` for corpus builders |
|
|
|
|
### 1.6 CLI Integration ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `aphoria corpus build` | ✅ Fetches and ingests from all sources |
|
|
| `--only rfc,owasp,vendor` | ✅ Filter to specific sources |
|
|
| `--offline` | ✅ Skip network-requiring sources |
|
|
| `--clear-cache` | ✅ Clear cache before building |
|
|
| `aphoria corpus list` | ✅ List available corpus sources |
|
|
| `CorpusConfig` | ✅ cache_dir, include_*, rfc_list options |
|
|
|
|
### 1.7 Error Handling ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `RfcFetch` error | ✅ Per-RFC fetch failures with context |
|
|
| `OwaspFetch` error | ✅ Per-cheat-sheet fetch failures with context |
|
|
| `CorpusBuild` error | ✅ General corpus build failures |
|
|
| Graceful degradation | ✅ Continue with other sources if one fails |
|
|
|
|
**Files:** `corpus/mod.rs`, `corpus/hardcoded.rs`, `corpus/rfc.rs`, `corpus/owasp.rs`, `corpus/vendor.rs`
|
|
|
|
---
|
|
|
|
## Phase 3: Skill Integration ✅
|
|
|
|
> Complete. Aphoria is now usable in Claude Code agent workflows.
|
|
|
|
### 3.1 Claude Code Skill ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `skill/SKILL.md` | ✅ Comprehensive skill definition with all commands |
|
|
| `/aphoria scan` | ✅ Scan project, show conflicts grouped by verdict |
|
|
| `/aphoria scan --fix` | ✅ Interactive fix workflow |
|
|
| `/aphoria ack` | ✅ Acknowledge conflicts as intentional |
|
|
| `/aphoria status` | ✅ Show status and baseline |
|
|
| `/aphoria diff` | ✅ Show changes since baseline |
|
|
| `/aphoria init` | ✅ Initialize Aphoria |
|
|
| `/aphoria baseline` | ✅ Set baseline |
|
|
| `skill/install.sh` | ✅ Install script for `~/.claude/skills/aphoria/` |
|
|
|
|
**Files:** `skill/SKILL.md`, `skill/install.sh`, `skill/hooks.json`
|
|
|
|
### 3.2 Agent Pre-Flight Hook ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `--exit-code` flag | ✅ Returns 2 for BLOCK, 1 for FLAG only, 0 for clean |
|
|
| `--strict` flag | ✅ Lower thresholds (FLAG at 0.3, BLOCK at 0.5) |
|
|
| Hook template | ✅ `skill/hooks.json` with PreCommit and PrePush examples |
|
|
|
|
**Usage:**
|
|
```json
|
|
{
|
|
"hooks": {
|
|
"PreCommit": [{"command": "aphoria scan --format sarif --exit-code"}],
|
|
"PrePush": [{"command": "aphoria scan --strict --exit-code"}]
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3.3 Alias Suggestion Workflow ✅
|
|
|
|
Auto-alias creation is now automatic (Phase 2A.3). When Aphoria scans:
|
|
1. Tail-path matching finds authoritative assertions
|
|
2. Aliases are auto-created with `AliasOrigin::AutoDetected`
|
|
3. Future queries use the alias automatically
|
|
|
|
The skill documents the suggestion flow for manual alias management:
|
|
- **y (Accept)**: Creates alias
|
|
- **n (Reject)**: Records intentional difference
|
|
- **defer**: Flags for later review
|
|
|
|
---
|
|
|
|
## Phase 4: Full-Cycle Pre-Commit (Scan + Sync) ✅
|
|
|
|
> **Vision:** The pre-commit hook is a **bidirectional knowledge sync**, not just a read-only linter. Every commit extracts claims, checks authority, detects drift from prior observations, and records new observations back.
|
|
|
|
**Spec:** [uat/2026-02-04-full-cycle-precommit-vision.md](uat/2026-02-04-full-cycle-precommit-vision.md)
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ PRE-COMMIT FLOW │
|
|
├─────────────────────────────────────────────────────────────┤
|
|
│ 1. EXTRACT → What claims does this code make? │
|
|
│ 2. CHECK → Against authority + own prior claims │
|
|
│ 3. CLASSIFY → Authority conflict | Self conflict | Novel │
|
|
│ 4. UPDATE → Record observations to local Episteme │
|
|
│ 5. GATE → Exit code (BLOCK=2, FLAG=1, PASS=0) │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### 4.1 Git Pre-Commit Hook ✅
|
|
|
|
All flags needed for pre-commit integration are implemented:
|
|
|
|
```bash
|
|
#!/bin/sh
|
|
# .git/hooks/pre-commit
|
|
aphoria scan --staged --sync --exit-code
|
|
```
|
|
|
|
Or using pre-commit framework:
|
|
|
|
```yaml
|
|
repos:
|
|
- repo: local
|
|
hooks:
|
|
- id: aphoria
|
|
name: Aphoria Truth Sync
|
|
entry: aphoria scan --staged --sync --exit-code
|
|
language: system
|
|
pass_filenames: false
|
|
```
|
|
|
|
### 4.2 Baseline Mode ✅
|
|
|
|
Already implemented in Phase 2.
|
|
|
|
### 4A: Observational Claims ✅
|
|
|
|
Record code claims as Tier 4 (Community) assertions when no authority conflict exists:
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `sync: bool` in ScanArgs | ✅ `types/command.rs` |
|
|
| `observations_recorded: usize` in ScanResult | ✅ `types/result.rs` |
|
|
| `--sync` CLI flag | ✅ `cli.rs` — requires `--persist` |
|
|
| `claim_to_observation()` | ✅ `bridge.rs` — creates Tier 4 (Community, 0.3 weight) assertions |
|
|
| `ingest_observations()` in LocalEpisteme | ✅ `episteme/local.rs` — writes to WAL + predicate index |
|
|
| Scan flow integration | ✅ `scan.rs` — splits claims by conflict status, writes novel claims as observations |
|
|
| Handler validation | ✅ `handlers.rs` — `--sync requires --persist` error |
|
|
| Report output | ✅ `report/table.rs`, `report/json.rs` — shows observation count |
|
|
| Tests | ✅ 5 new tests for observation write-back |
|
|
|
|
```
|
|
Code: connection_pool.max_size = 25
|
|
Authority: (nothing)
|
|
Action: Record as Tier 4 observation (project memory)
|
|
```
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Scan with observation write-back
|
|
aphoria scan --persist --sync
|
|
|
|
# Output:
|
|
# Recorded 45 observations (project memory)
|
|
```
|
|
|
|
### 4B: Self-Conflict Detection ✅
|
|
|
|
Detect drift from the project's own prior observations:
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Query prior claims before conflict check | ✅ `fetch_observations_for_concept()` |
|
|
| Compare current vs stored observations | ✅ `check_drift()` compares values |
|
|
| Report changes as SELF-CONFLICT | ✅ DriftResult with prior/current values |
|
|
| New verdict: `Drift` (distinct from Block/Flag) | ✅ `Verdict::Drift` |
|
|
| Drift reporting in all formats | ✅ table, json, markdown, sarif |
|
|
| Exit code includes drift | ✅ `--exit-code` returns 1 for drift |
|
|
|
|
```
|
|
Prior: db/pool_size = 25 (recorded 2026-01-15)
|
|
Now: db/pool_size = 100
|
|
Result: DRIFT — "You changed pool_size from 25 to 100. Intentional?"
|
|
```
|
|
|
|
**Files:** `types/result.rs`, `types/verdict.rs`, `episteme/local.rs`, `scan.rs`, `report/*.rs`
|
|
|
|
### 4C: Diff-Only Scanning ✅
|
|
|
|
Fast scanning for pre-commit hooks:
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `FileSource` enum (All, Staged) | ✅ `types/command.rs` |
|
|
| `--staged` flag (git diff --cached) | ✅ `cli.rs`, `handlers.rs` |
|
|
| `walker/git.rs` git utilities | ✅ `find_repo_root()`, `get_staged_files()` |
|
|
| `walk_staged_files()` | ✅ `walker/mod.rs` — filters to scan root, applies same filters |
|
|
| Scan dispatch by file_source | ✅ `scan.rs` |
|
|
| Error handling (NotGitRepo, GitCommand) | ✅ `error.rs` |
|
|
| Tests | ✅ 9 tests in `tests/staged_scanning.rs` |
|
|
| Target: < 500ms for staged-only | ✅ |
|
|
|
|
**Files:** `types/command.rs`, `walker/git.rs`, `walker/mod.rs`, `scan.rs`, `cli.rs`, `handlers.rs`, `error.rs`
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Pre-commit hook (fast, staged files only)
|
|
aphoria scan --staged --exit-code
|
|
|
|
# Full cycle with observation sync
|
|
aphoria scan --staged --persist --sync --exit-code
|
|
```
|
|
|
|
### 4D: Enhanced Ack ✅
|
|
|
|
Acknowledgments with rationale and policy updates:
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `--reason "text"` flag | ✅ `cli.rs` — required on `ack`, `bless`, `update` commands |
|
|
| Store rationale in assertion metadata | ✅ `policy_ops.rs` — stored in value/description fields |
|
|
| `aphoria update` for intentional drift | ✅ `policy_ops.rs` — creates `policy_update` assertion |
|
|
| Policy update assertions | ✅ `types/mod.rs` — `predicates::POLICY_UPDATE` |
|
|
|
|
**Files:** `cli.rs`, `handlers.rs`, `policy_ops.rs`, `types/command.rs`, `types/mod.rs`
|
|
|
|
```bash
|
|
$ aphoria ack db/pool_size --reason "Scaling for Black Friday"
|
|
$ aphoria update db/pool_size 100 --reason "New baseline after load test"
|
|
```
|
|
|
|
### 4E: Hosted Mode ✅
|
|
|
|
Organizations run their own StemeDB server and all team members automatically sync observations:
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `HostedConfig` in config.rs | ✅ `url`, `project_id`, `team_id`, `sync_mode`, `offline_fallback`, `api_key_env` |
|
|
| `SyncMode` enum | ✅ `remote-only` (default), `local-and-remote` |
|
|
| `OfflineFallback` enum | ✅ `skip` (default), `fail`, `queue` |
|
|
| `HostedClient` HTTP client | ✅ `hosted.rs` — retry logic, auth headers, observation push |
|
|
| `POST /v1/aphoria/observations` endpoint | ✅ Server receives observations with project/team metadata |
|
|
| Scan integration | ✅ Auto-enables sync when `[hosted]` configured |
|
|
| `Hosted(String)` error variant | ✅ For connection/auth failures |
|
|
| Graceful offline fallback | ✅ Based on `offline_fallback` config |
|
|
| Tests | ✅ Config parsing, client creation, assertion conversion |
|
|
|
|
```toml
|
|
# aphoria.toml
|
|
[hosted]
|
|
url = "https://episteme.acme.corp" # Enables hosted mode
|
|
project_id = "billing-service" # Optional, defaults to [project.name]
|
|
team_id = "platform-team" # Optional, for multi-team servers
|
|
sync_mode = "remote-only" # "remote-only" | "local-and-remote"
|
|
offline_fallback = "skip" # "skip" | "fail" | "queue"
|
|
api_key_env = "APHORIA_API_KEY" # Env var for auth token
|
|
```
|
|
|
|
**Architecture:**
|
|
```
|
|
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
|
|
│ Developer A │ │ Developer B │ │ Developer C │
|
|
│ aphoria scan │ │ aphoria scan │ │ aphoria scan │
|
|
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
|
|
│ │ │
|
|
└─────────────────┼─────────────────┘
|
|
▼
|
|
┌─────────────────────┐
|
|
│ Team StemeDB Server │
|
|
│ POST /v1/aphoria/ │
|
|
│ observations │
|
|
└─────────────────────┘
|
|
│
|
|
▼
|
|
Aggregated team patterns
|
|
```
|
|
|
|
**Files:** `config.rs`, `hosted.rs`, `scan.rs`, `error.rs`, `lib.rs`, `crates/stemedb-api/src/handlers/aphoria.rs`, `crates/stemedb-api/src/dto/aphoria.rs`
|
|
|
|
---
|
|
|
|
## Phase 4.5: Ephemeral Scan Mode ✅
|
|
|
|
> Performance optimization: 40x faster scans by skipping Episteme storage when persistence isn't needed.
|
|
|
|
### Problem
|
|
|
|
Every `aphoria scan` was slow because it initialized the full Episteme stack:
|
|
- WAL recovery (O(n) on every startup)
|
|
- Dual backend initialization (fjall + redb)
|
|
- Store and index initialization
|
|
|
|
But conflict detection is actually 100% in-memory — it never reads from the KV store. The authoritative corpus is built fresh each time, and code claims are extracted fresh each scan.
|
|
|
|
### Solution
|
|
|
|
Added `ScanMode` enum with two modes:
|
|
|
|
| Mode | Use Case | Storage | Performance |
|
|
|------|----------|---------|-------------|
|
|
| **Ephemeral** (default) | CI, pre-commit, quick checks | None | ~0.25 seconds |
|
|
| **Persistent** | Baseline/diff tracking, alias creation | WAL + store | ~1-2 seconds |
|
|
|
|
### Implementation ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `ScanMode` enum | ✅ `types.rs` — Ephemeral (default), Persistent |
|
|
| `EphemeralDetector` struct | ✅ `episteme/mod.rs` — in-memory corpus + ConceptIndex |
|
|
| `check_conflicts_pure()` | ✅ Extracted as standalone function for reuse |
|
|
| Mode-based dispatch in `run_scan()` | ✅ Uses `EphemeralDetector` for Ephemeral, `LocalEpisteme` for Persistent |
|
|
| `--persist` CLI flag | ✅ `main.rs` — opt-in to persistent mode |
|
|
| Tests for both modes | ✅ `test_ephemeral_scan_no_storage_created`, `test_persistent_scan_creates_storage`, `test_scan_modes_produce_same_conflicts` |
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
# Fast ephemeral scan (default) — no storage created
|
|
aphoria scan .
|
|
|
|
# Persistent scan — enables baseline, diff, auto-alias features
|
|
aphoria scan . --persist
|
|
```
|
|
|
|
### Performance
|
|
|
|
| Mode | Time | Storage |
|
|
|------|------|---------|
|
|
| Ephemeral | ~0.25s | None |
|
|
| Persistent | ~1-2s | WAL + store directories |
|
|
|
|
**Files:** `types.rs`, `episteme/mod.rs`, `lib.rs`, `main.rs`, `tests.rs`
|
|
|
|
---
|
|
|
|
## Phase 5: Research Agent Loop ✅
|
|
|
|
> Research agent fills gaps in authoritative coverage by researching official documentation.
|
|
|
|
### 5.1 Gap Detection ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `Gap` struct | ✅ `research/gap_detector.rs` — concept_path, topic, predicate, source info |
|
|
| `detect_gaps()` | ✅ Compares claims against ConceptIndex, identifies missing coverage |
|
|
| Topic normalization | ✅ Extracts last 2 path segments for cross-scheme matching |
|
|
| Deduplication | ✅ Deduplicates gaps by topic+predicate key |
|
|
|
|
### 5.2 Gap Storage ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `GapRecord` | ✅ `research/gap_store.rs` — tracking metadata, project count, research status |
|
|
| `GapStore` | ✅ JSON-backed persistent storage with atomic saves |
|
|
| Project tracking | ✅ Records which projects reported each gap |
|
|
| Research eligibility | ✅ `is_eligible_for_research()` with threshold and cooldown |
|
|
| Gap pruning | ✅ `prune_old_gaps()` removes stale entries |
|
|
|
|
### 5.3 Quality Validation ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `QualityValidator` | ✅ `research/quality.rs` — validates researched claims |
|
|
| Source attribution | ✅ Checks for authoritative domains (rfc-editor, owasp, vendor docs) |
|
|
| Normative language | ✅ Verifies MUST/SHOULD/SHALL keywords present |
|
|
| Vague content detection | ✅ Rejects "it depends", "typically", etc. |
|
|
| Consistency scoring | ✅ Detects conflicting claims on same subject |
|
|
| `QualityReport` | ✅ Detailed per-claim validation results |
|
|
| `filter_passed()` | ✅ Returns only claims meeting quality threshold |
|
|
|
|
### 5.4 Research Execution ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `Researcher` | ✅ `research/researcher.rs` — orchestrates research pipeline |
|
|
| `DocumentationSource` | ✅ Configurable sources with URL patterns and topics |
|
|
| Default sources | ✅ Redis, PostgreSQL, Go, Rust, OWASP, Kafka, MongoDB |
|
|
| Content fetching | ✅ HTTP with timeout and size limits |
|
|
| Normative extraction | ✅ Regex-based MUST/SHOULD/SHALL extraction |
|
|
| Section tracking | ✅ Extracts heading context for attribution |
|
|
| Confidence scoring | ✅ Based on keyword strength, statement length, content size |
|
|
|
|
### 5.5 CLI Integration ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `aphoria research run` | ✅ Run research agent with configurable threshold |
|
|
| `aphoria research status` | ✅ Show gap statistics and research progress |
|
|
| `aphoria research gaps` | ✅ List gaps by project count |
|
|
| `--threshold` | ✅ Minimum projects before researching (default: 3) |
|
|
| `--strict` | ✅ Use strict quality validation |
|
|
| `--prune` | ✅ Remove stale gaps before researching |
|
|
| `--ready` | ✅ Show only gaps ready for research |
|
|
|
|
**Files:** `research/mod.rs`, `research/gap_detector.rs`, `research/gap_store.rs`, `research/quality.rs`, `research/researcher.rs`, `research/tests.rs`
|
|
|
|
### 5.7 Security Extractors ✅
|
|
|
|
Extended Phase 2 extractors with OWASP-aligned security vulnerability detection:
|
|
|
|
| Extractor | Detects | Languages |
|
|
|-----------|---------|-----------|
|
|
| `weak_crypto` | MD5, SHA1, DES, RC4 usage | Rust, Go, Python, JS/TS |
|
|
| `command_injection` | Shell execution, os.system, subprocess shell=True | Rust, Go, Python, JS/TS |
|
|
| `sql_injection` | String concatenation in SQL queries | Rust, Go, Python, JS/TS |
|
|
|
|
**Concept paths:**
|
|
- `crypto/hashing/algorithm` — MD5, SHA1
|
|
- `crypto/encryption/algorithm` — DES, RC4
|
|
- `os/command/input`, `os/shell_mode` — command injection
|
|
- `db/query/input` — SQL injection
|
|
|
|
### 5.6 Community Corpus Contributions ✅
|
|
|
|
> Users can opt in to contribute patterns anonymously to a central corpus, enabling community consensus to adjust default thresholds.
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `CommunityConfig` | ✅ `config/mod.rs` — enabled (false), anonymize (true), exclude, include, min_confidence |
|
|
| `AnonymizedObservation` | ✅ `community/types.rs` — privacy-preserving observation without file/line/text |
|
|
| `CommunityObjectValue` | ✅ `community/types.rs` — serde-compatible version of ObjectValue |
|
|
| `PatternAggregate` | ✅ `community/types.rs` — server-side aggregation with project counts |
|
|
| `anonymize_claim()` | ✅ `community/anonymizer.rs` — wildcards project names, strips file/line, rounds timestamps |
|
|
| `compute_anon_hash()` | ✅ Hash computed WITHOUT file/line/text (privacy-critical) |
|
|
| `wildcard_project_path()` | ✅ `code://rust/myapp/tls` → `code://rust/*/tls` |
|
|
| `--community-preview` flag | ✅ `cli.rs` — dry-run showing what WOULD be shared |
|
|
| `PatternAggregateStore` | ✅ `stemedb-storage` — server-side pattern aggregation |
|
|
| Project deduplication | ✅ Uses project_hash to prevent double-counting |
|
|
| `POST /v1/aphoria/community/observations` | ✅ Push anonymized observations |
|
|
| `GET /v1/aphoria/patterns` | ✅ Retrieve high-confidence community patterns |
|
|
|
|
**Privacy Model:**
|
|
- Project names wildcarded: `myapp` → `*`
|
|
- File paths, line numbers, matched text NEVER shared
|
|
- Timestamps rounded to hour (k-anonymity)
|
|
- Server receives `project_hash`, not raw project names
|
|
- `enabled` defaults to `false` (explicit opt-in required)
|
|
- `anonymize` defaults to `true` (privacy-preserving by default)
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Preview what would be shared (no network)
|
|
aphoria scan --community-preview
|
|
|
|
# Enable in aphoria.toml:
|
|
[community]
|
|
enabled = true
|
|
anonymize = true
|
|
min_confidence = 0.8
|
|
exclude = ["vendor://acme/internal/*"]
|
|
|
|
# Scan with sync to share patterns
|
|
aphoria scan --persist --sync
|
|
```
|
|
|
|
**Files:** `community/mod.rs`, `community/types.rs`, `community/anonymizer.rs`, `config/mod.rs`, `cli.rs`, `handlers.rs`, `stemedb-storage/src/pattern_aggregate_store/`
|
|
|
|
---
|
|
|
|
## Phase 6: Federated Policy & Trust Packs ✅
|
|
|
|
> Allow teams to define their own authoritative truths and distribute them as signed Trust Packs. This enables "Enterprise Grade" compliance across distributed teams.
|
|
|
|
### 6.1 Trust Pack Format ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `TrustPack` schema | ✅ `policy.rs` — Assertions, Aliases, Metadata, Signature |
|
|
| `PackHeader` | ✅ Name, version, issuer, timestamp |
|
|
| Serialization | ✅ `rkyv` for zero-copy efficiency |
|
|
| Signing | ✅ `ed25519-dalek` signing and verification |
|
|
|
|
### 6.2 Policy Management ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `PolicyManager` | ✅ Loads local and remote (HTTP/HTTPS) policies |
|
|
| Caching | ✅ Caches remote policies in `~/.cache/aphoria/policies/` |
|
|
| `aphoria.toml` config | ✅ `policies` list support |
|
|
|
|
### 6.3 Core Integration ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `EphemeralDetector` integration | ✅ Ingests policies into memory corpus/index |
|
|
| `check_conflicts_pure` update | ✅ Resolves policy aliases before authoritative lookup |
|
|
| `LocalEpisteme` export helpers | ✅ `fetch_acknowledgments`, `fetch_manual_aliases` |
|
|
|
|
### 6.4 CLI Commands ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `aphoria policy export` | ✅ Exports local `ack` decisions as a Trust Pack |
|
|
| `aphoria scan` policy loading | ✅ Auto-loads policies from config |
|
|
|
|
**Files:** `policy.rs`, `config.rs`, `episteme/mod.rs`, `lib.rs`, `main.rs`
|
|
|
|
---
|
|
|
|
## Phase 6.5: Trust Pack Extensions ✅
|
|
|
|
> Enhancements to Trust Packs for semantic predicate matching and key management.
|
|
|
|
### 6.5.1 Predicate Aliases ✅
|
|
|
|
**Status:** Complete
|
|
**Implemented:** 2026-02-06
|
|
|
|
**User Story:**
|
|
> As a security architect, when my policy uses `required=true` but the extractor emits `enabled=true`, I need them to match semantically.
|
|
|
|
**Problem:**
|
|
- Policy blesses: `code://standard/tls/cert_verification` with predicate `required`, value `true`
|
|
- Extractor emits: `code://config/tls/cert_verification` with predicate `enabled`, value `false`
|
|
- Tail-path matching finds the concept (`tls/cert_verification`) ✓
|
|
- But predicates differ: `required` vs `enabled` — no conflict detected ✗
|
|
|
|
**Solution:**
|
|
|
|
| Task | Description |
|
|
|------|-------------|
|
|
| `predicate_aliases` field | Add to Trust Pack schema |
|
|
| Default aliases | `enabled` ↔ `required` ↔ `mandatory` ↔ `enforced` |
|
|
| ConceptIndex update | Check aliases during lookup |
|
|
| Pack-defined aliases | Allow packs to specify custom alias sets |
|
|
|
|
**Trust Pack Schema Extension:**
|
|
```toml
|
|
# In Trust Pack
|
|
[predicate_aliases]
|
|
security_enabled = ["enabled", "required", "mandatory", "enforced", "active"]
|
|
version_minimum = ["min_version", "minimum_version", "tls_min_version"]
|
|
```
|
|
|
|
**Implementation Plan:**
|
|
1. Add `predicate_aliases: HashMap<String, Vec<String>>` to `TrustPack`
|
|
2. Store aliases alongside assertions during import
|
|
3. Update `ConceptIndex.make_key()` to normalize predicates via aliases
|
|
4. Match during conflict detection: if `predicate_a` aliases to `predicate_b`, treat as same concept
|
|
|
|
### 6.5.2 Pack Signing Key Rotation ✅
|
|
|
|
**Status:** Complete
|
|
**Implemented:** 2026-02-06
|
|
|
|
**User Story:**
|
|
> As a security admin, when our signing key is rotated, I need to re-sign all packs without losing policy content.
|
|
|
|
**Problem:**
|
|
- Trust Packs are signed with Ed25519 keys
|
|
- When keys are rotated (security best practice), existing packs become unverifiable
|
|
- Need to re-sign packs with new key while preserving content hash
|
|
|
|
**Solution:**
|
|
|
|
| Task | Description |
|
|
|------|-------------|
|
|
| `aphoria policy resign` | CLI command to re-sign pack with new key |
|
|
| Content hash preservation | Keep `content_hash` unchanged, only update signature |
|
|
| Key rotation audit | Log key rotation events |
|
|
| Old signature archival | Optionally keep old signature for audit trail |
|
|
|
|
**CLI:**
|
|
```bash
|
|
# Re-sign pack with new key
|
|
aphoria policy resign my-standards.pack --key-file new-private-key.pem
|
|
|
|
# Re-sign with signature chain (audit trail)
|
|
aphoria policy resign my-standards.pack --key-file new-key.pem --chain-signatures
|
|
```
|
|
|
|
**Trust Pack Schema Extension:**
|
|
```rust
|
|
pub struct TrustPack {
|
|
// Existing fields...
|
|
pub signature: Signature,
|
|
|
|
// New field for key rotation audit
|
|
pub signature_chain: Option<Vec<SignatureRecord>>,
|
|
}
|
|
|
|
pub struct SignatureRecord {
|
|
pub issuer_public_key: [u8; 32],
|
|
pub signature: Signature,
|
|
pub signed_at: DateTime<Utc>,
|
|
pub reason: Option<String>, // "Key rotation", "Security incident", etc.
|
|
}
|
|
```
|
|
|
|
### 6.5.3 Priority
|
|
|
|
| Feature | Priority | Trigger |
|
|
|---------|----------|---------|
|
|
| Predicate Aliases | Medium | Enterprise feedback showing predicate naming conflicts |
|
|
| Key Rotation | Low | Enterprise security key management requirements |
|
|
|
|
**Documented in:** [uat/future-scenarios.md](uat/future-scenarios.md)
|
|
|
|
---
|
|
|
|
## Phase 7: Declarative Extractors ✅
|
|
|
|
> Enable users to define new extractors in config/policy files (TOML) without writing Rust code. This removes the recompilation bottleneck for custom pattern enforcement.
|
|
|
|
**User Outcome:** "I added a custom extractor to my aphoria.toml that detects our company's deprecated API patterns. Now every scan flags files using the old pattern without me writing any Rust code."
|
|
|
|
### 7.1 Core Types ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `DeclarativeExtractorDef` | ✅ `extractors/declarative.rs` — name, description, languages, pattern, claim, confidence |
|
|
| `DeclarativeClaimDef` | ✅ subject, predicate, value specification |
|
|
| `DeclarativeValue` enum | ✅ MatchedText, Boolean, Text variants |
|
|
| `DeclarativeExtractor` | ✅ Compiled extractor with `Extractor` trait impl |
|
|
|
|
### 7.2 Configuration ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `ExtractorConfig.declarative` | ✅ `config/mod.rs` — `Vec<DeclarativeExtractorDef>` |
|
|
| TOML parsing | ✅ Serde deserialization with `#[serde(untagged)]` for value types |
|
|
| Example config | ✅ Documented in module and config docs |
|
|
|
|
**Example aphoria.toml:**
|
|
```toml
|
|
[[extractors.declarative]]
|
|
name = "deprecated_api_v1"
|
|
description = "Detects usage of deprecated v1 API endpoints"
|
|
languages = ["go", "rust", "python"]
|
|
pattern = '/api/v1/\w+'
|
|
claim.subject = "api/deprecated_endpoint"
|
|
claim.predicate = "version"
|
|
claim.value = "v1"
|
|
confidence = 1.0
|
|
|
|
[[extractors.declarative]]
|
|
name = "legacy_encryption"
|
|
description = "Detects legacy encryption algorithms"
|
|
languages = ["rust", "go", "python", "javascript"]
|
|
pattern = '(?i)blowfish|twofish|cast5'
|
|
claim.subject = "crypto/encryption/algorithm"
|
|
claim.predicate = "algorithm"
|
|
claim.value_from_match = true
|
|
confidence = 0.9
|
|
```
|
|
|
|
### 7.3 Validation & Security ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Name validation | ✅ Non-empty required |
|
|
| Subject/predicate validation | ✅ Non-empty required |
|
|
| Confidence validation | ✅ Must be 0.0-1.0 |
|
|
| Regex validation | ✅ Compiled at load time, not scan time |
|
|
| ReDoS protection | ✅ `RegexBuilder` with 10MB size limits |
|
|
| Language parsing | ✅ `Language::from_str()` with `FromStr` trait |
|
|
| Graceful failure | ✅ Invalid extractors logged as warnings, don't block others |
|
|
|
|
### 7.4 Registry Integration ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Module export | ✅ `extractors/mod.rs` — public types |
|
|
| Registry registration | ✅ `ExtractorRegistry::new()` loads from config |
|
|
| Enable/disable support | ✅ Declarative extractors respect `disabled` list |
|
|
| Runtime addition | ✅ `add_from_definitions()` for Trust Pack integration |
|
|
|
|
### 7.5 Error Handling ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `DeclarativeExtractor` error variant | ✅ `error.rs` — name + message |
|
|
| Validation errors | ✅ Clear messages for each failure mode |
|
|
| Structured logging | ✅ `tracing::warn!` for compilation failures |
|
|
|
|
### 7.6 Tests ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Unit tests | ✅ 22 tests in `declarative.rs` |
|
|
| Registry tests | ✅ 7 tests for integration |
|
|
| Validation tests | ✅ Empty name, subject, predicate; invalid confidence, regex, language |
|
|
| Extraction tests | ✅ Boolean, text, matched_text value types |
|
|
| Deserialization tests | ✅ TOML parsing for all value types |
|
|
|
|
**Files:** `extractors/declarative.rs`, `extractors/mod.rs`, `config/mod.rs`, `types/language.rs`, `error.rs`
|
|
|
|
---
|
|
|
|
## Phase 7.5: LLM-in-the-Loop Extraction ✅
|
|
|
|
> Use LLM (Gemini) to extract claims semantically during persistent scans. This fills gaps that regex extractors can't catch, providing immediate value while the learning system builds up pattern knowledge.
|
|
|
|
### Vision
|
|
|
|
```
|
|
Code file → Regex extractors → Claims found
|
|
↓
|
|
High-value files (auth, config, crypto)
|
|
↓
|
|
LLM Extractor → Additional semantic claims
|
|
↓
|
|
Combined claims → Conflict detection
|
|
```
|
|
|
|
### 7.5.1 LLM Extractor Implementation ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `GeminiClient` struct | ✅ `llm/client.rs` — Gemini API client using ureq |
|
|
| `LlmExtractor` struct | ✅ `llm/extractor.rs` — orchestrates extraction with budget tracking |
|
|
| Prompt engineering | ✅ Security-focused extraction prompt with structured JSON output |
|
|
| Response parsing | ✅ Parse Gemini's JSON response into `ExtractedClaim` format |
|
|
| Error handling | ✅ Graceful degradation when API unavailable or key missing |
|
|
|
|
### 7.5.2 Selective Triggering ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `is_high_value_file()` | ✅ `llm/extractor.rs` — auth/, config/, crypto/, security/, secrets/, certs/, ssl/, tls/, keys/, credentials/ directories |
|
|
| High-value file names | ✅ secret, password, credential, token, auth, login, session, jwt, tls, ssl, cert, key, config, settings, security, crypto, encrypt, decrypt, oauth, saml, ldap, api_key, apikey, access_key, private |
|
|
| Token budget | ✅ `max_tokens_per_scan` (default 50k), `max_tokens_per_file` (default 4k) |
|
|
| Skip conditions | ✅ Only runs when regex extractors found nothing AND file is high-value |
|
|
|
|
### 7.5.3 Cost Controls ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Token tracking | ✅ `Arc<AtomicUsize>` for thread-safe budget tracking across files |
|
|
| BLAKE3 caching | ✅ `llm/cache.rs` — content hash + model + prompt version for cache key |
|
|
| Cache location | ✅ `~/.cache/aphoria/llm-cache/` |
|
|
| Budget enforcement | ✅ `within_budget()` check before each LLM call |
|
|
|
|
### 7.5.4 Configuration ✅
|
|
|
|
```toml
|
|
# aphoria.toml
|
|
[llm]
|
|
enabled = true # Enable LLM extraction (default: false)
|
|
provider = "gemini" # Only "gemini" supported
|
|
# model defaults to DEFAULT_LLM_MODEL (currently "gemini-3-flash-preview")
|
|
api_key_env = "GEMINI_API_KEY" # Environment variable for API key
|
|
max_tokens_per_scan = 50000 # Budget per scan
|
|
max_tokens_per_file = 4000 # Budget per file (for max_output_tokens)
|
|
high_value_only = true # Only use on auth/config/crypto files
|
|
cache_responses = true # Cache by content hash
|
|
timeout_secs = 60 # API timeout
|
|
min_confidence = 0.7 # Filter claims below this confidence
|
|
```
|
|
|
|
**Files:** `llm/mod.rs`, `llm/client.rs`, `llm/extractor.rs`, `llm/cache.rs`, `config/mod.rs`, `scan.rs`, `error.rs`
|
|
|
|
---
|
|
|
|
## Phase 7.6: Pattern Learning Store ✅
|
|
|
|
> When LLM extracts something that regex extractors missed, remember the pattern. Track which patterns recur across projects to identify candidates for promotion to declarative extractors.
|
|
|
|
### Vision
|
|
|
|
```
|
|
LLM extracts claim from code
|
|
↓
|
|
Pattern not in learned store?
|
|
↓
|
|
Store: { example_code, claim, project_hash }
|
|
↓
|
|
Same pattern seen in 5+ projects?
|
|
↓
|
|
Flag for promotion to declarative extractor
|
|
```
|
|
|
|
### 7.6.1 LearnedPattern Schema ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `ValueType` enum | ✅ `learning/types.rs` — Text, Number, Boolean |
|
|
| `ClaimTemplate` struct | ✅ `learning/types.rs` — subject_template, predicate, value_type, description |
|
|
| `LearnedPattern` struct | ✅ `learning/types.rs` — full schema with timestamps, project hashes, confidence tracking |
|
|
| Serde serialization | ✅ JSON serialization with chrono timestamps |
|
|
| Tests | ✅ 5 unit tests for types |
|
|
|
|
### 7.6.2 PatternStore Implementation ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `PatternStore` trait | ✅ `learning/store.rs` — abstract storage interface |
|
|
| `LocalPatternStore` | ✅ JSON-backed local storage at `~/.aphoria/learning/patterns.json` |
|
|
| `RwLock` thread safety | ✅ Write-through cache with in-memory HashMap |
|
|
| Deduplication | ✅ `find_similar()` with Levenshtein similarity threshold 0.8 |
|
|
| Pruning | ✅ `prune_stale()` removes patterns not seen in N days |
|
|
| Tests | ✅ 8 unit tests for store operations |
|
|
|
|
### 7.6.3 Pattern Normalization ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `normalize_pattern()` | ✅ `learning/normalizer.rs` — replaces literals with placeholders |
|
|
| Version detection | ✅ `"1.0"`, `"TLSv1.2"` → `<string:version>` |
|
|
| Boolean detection | ✅ `true`/`false` → `<boolean>` |
|
|
| Number detection | ✅ Standalone numbers → `<number>` |
|
|
| String detection | ✅ Remaining quoted strings → `<string>` |
|
|
| `pattern_similarity()` | ✅ Levenshtein distance normalized to 0.0-1.0 |
|
|
| Tests | ✅ 17 unit tests for normalization |
|
|
|
|
### 7.6.4 Configuration ✅
|
|
|
|
```toml
|
|
# aphoria.toml
|
|
[learning]
|
|
enabled = true # Enable pattern learning (default: false)
|
|
store = "local" # "local" | "hosted"
|
|
min_confidence = 0.7 # Minimum LLM confidence to learn
|
|
prune_after_days = 90 # Remove patterns not seen in N days
|
|
|
|
[learning.promotion]
|
|
min_projects = 5 # Projects needed before promotion
|
|
min_confidence = 0.8 # Average confidence needed
|
|
auto_promote = false # Require human approval (Phase 7.7)
|
|
```
|
|
|
|
### 7.6.5 Scan Integration ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Initialize pattern store | ✅ `scan.rs` — only in persistent mode with learning enabled |
|
|
| Project hash computation | ✅ BLAKE3 hash for privacy-preserving project identification |
|
|
| Record LLM-extracted claims | ✅ After LLM extraction, record patterns meeting min_confidence |
|
|
| Update existing patterns | ✅ Merge observations when similar pattern found |
|
|
| Logging | ✅ Reports patterns_recorded count on scan completion |
|
|
|
|
### 7.6.6 Error Handling ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `LearningStore` error variant | ✅ `error.rs` — for storage/cache failures |
|
|
| Graceful degradation | ✅ Store failures logged, don't block scan |
|
|
|
|
**Files:** `learning/mod.rs`, `learning/types.rs`, `learning/normalizer.rs`, `learning/store.rs`, `config/mod.rs`, `scan.rs`, `error.rs`, `lib.rs`
|
|
|
|
**Tests:** 30 tests covering types, normalization, and store operations.
|
|
|
|
---
|
|
|
|
## Phase 7.6 (Legacy Documentation)
|
|
|
|
> **Note:** The following is the original spec for reference. See above for implemented status.
|
|
|
|
### Original Schema (Reference)
|
|
|
|
```rust
|
|
/// A pattern learned from LLM extraction that could become a declarative extractor.
|
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
|
pub struct LearnedPattern {
|
|
/// Unique identifier
|
|
pub id: Uuid,
|
|
|
|
/// Example code that triggered this pattern
|
|
pub example_code: String,
|
|
|
|
/// Normalized pattern (variables replaced with placeholders)
|
|
/// e.g., "const TLS_MIN_VERSION = \"1.0\"" → "const TLS_MIN_VERSION = <version>"
|
|
pub normalized_pattern: String,
|
|
|
|
/// The claim this pattern produces
|
|
pub claim_template: ClaimTemplate,
|
|
|
|
/// Language this pattern applies to
|
|
pub language: Language,
|
|
|
|
/// When first seen
|
|
pub first_seen: DateTime<Utc>,
|
|
|
|
/// When last seen
|
|
pub last_seen: DateTime<Utc>,
|
|
|
|
/// Projects that have this pattern (hashed for privacy)
|
|
pub project_hashes: HashSet<String>,
|
|
|
|
/// Total occurrences across all projects
|
|
pub occurrences: u32,
|
|
|
|
/// Average LLM confidence when extracting this
|
|
pub avg_confidence: f32,
|
|
|
|
/// Has this been promoted to a declarative extractor?
|
|
pub promoted: bool,
|
|
|
|
/// If promoted, the extractor ID
|
|
pub promoted_to: Option<String>,
|
|
}
|
|
|
|
/// Template for generating claims from a learned pattern.
|
|
#[derive(Debug, Clone, Serialize, Deserialize)]
|
|
pub struct ClaimTemplate {
|
|
pub subject_template: String, // "tls/min_version"
|
|
pub predicate: String, // "version"
|
|
pub value_type: ValueType, // String, Boolean, Number
|
|
pub description_template: String,
|
|
}
|
|
```
|
|
|
|
### Original PatternStore Trait (Reference)
|
|
|
|
```rust
|
|
pub trait PatternStore: Send + Sync {
|
|
/// Record a pattern learned from LLM extraction
|
|
fn record_pattern(&self, pattern: &LearnedPattern) -> Result<()>;
|
|
|
|
/// Find existing pattern matching this example
|
|
fn find_similar(&self, normalized: &str, language: Language, threshold: f32) -> Option<LearnedPattern>;
|
|
|
|
/// Get patterns ready for promotion (threshold met)
|
|
fn get_promotion_candidates(&self, min_projects: usize, min_confidence: f32) -> Vec<LearnedPattern>;
|
|
|
|
/// Mark pattern as promoted
|
|
fn mark_promoted(&self, id: &Uuid, extractor_name: &str) -> Result<()>;
|
|
|
|
/// Prune old patterns
|
|
async fn prune_stale(&self, max_age_days: u32) -> Result<usize>;
|
|
}
|
|
```
|
|
|
|
### 7.6.3 Pattern Normalization ⬜
|
|
|
|
| Task | Description |
|
|
|------|-------------|
|
|
| Variable extraction | Identify literals that vary (versions, names, values) |
|
|
| Placeholder insertion | Replace literals with typed placeholders |
|
|
| Similarity scoring | Compare normalized patterns for dedup |
|
|
|
|
```rust
|
|
fn normalize_pattern(code: &str, claim: &ExtractedClaim) -> String {
|
|
// "const TLS_MIN = \"1.0\"" → "const TLS_MIN = <string:version>"
|
|
// "pool_size: 25" → "pool_size: <number>"
|
|
// "verify_ssl: false" → "verify_ssl: <boolean>"
|
|
}
|
|
|
|
fn similarity_score(a: &str, b: &str) -> f32 {
|
|
// Levenshtein distance normalized to 0.0-1.0
|
|
// Patterns with score > 0.8 are considered duplicates
|
|
}
|
|
```
|
|
|
|
### 7.6.4 Integration with Scan ⬜
|
|
|
|
```rust
|
|
// In scan.rs, after LLM extraction
|
|
for claim in llm_claims {
|
|
// Check if this is a new pattern
|
|
if let Some(existing) = pattern_store.find_similar(&claim.matched_text, language).await {
|
|
// Update existing pattern
|
|
pattern_store.increment_occurrence(&existing.id, project_hash).await?;
|
|
} else {
|
|
// Record new pattern
|
|
let pattern = LearnedPattern::from_claim(&claim, &code_context, project_hash);
|
|
pattern_store.record_pattern(&pattern).await?;
|
|
}
|
|
}
|
|
```
|
|
|
|
### 7.6.5 Configuration ⬜
|
|
|
|
```toml
|
|
# aphoria.toml
|
|
[learning]
|
|
enabled = true # Enable pattern learning
|
|
store = "local" # "local" | "hosted"
|
|
min_confidence = 0.7 # Minimum LLM confidence to learn
|
|
prune_after_days = 90 # Remove patterns not seen in N days
|
|
|
|
[learning.promotion]
|
|
min_projects = 5 # Projects needed before promotion
|
|
min_confidence = 0.8 # Average confidence needed
|
|
auto_promote = false # Require human approval (Phase 7.7)
|
|
```
|
|
|
|
**Files:** `learning/mod.rs`, `learning/pattern.rs`, `learning/store.rs`, `learning/normalize.rs`
|
|
|
|
---
|
|
|
|
## Phase 7.7: Pattern → Extractor Promotion ✅
|
|
|
|
> High-frequency learned patterns get promoted to declarative extractors. This closes the learning loop: patterns discovered by LLM become permanent, fast regex extractors.
|
|
|
|
### Vision
|
|
|
|
```
|
|
LearnedPattern (5+ projects, >0.8 confidence)
|
|
↓
|
|
Claude: "Generate regex for this pattern"
|
|
↓
|
|
Candidate declarative extractor
|
|
↓
|
|
Validate against stored examples
|
|
↓
|
|
Human review (optional) → Approve/Reject
|
|
↓
|
|
Merge to project's .aphoria/extractors/
|
|
```
|
|
|
|
### 7.7.1 Promotion Pipeline ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `PromotionPipeline` | ✅ `promotion/pipeline.rs` — orchestrates full promotion flow |
|
|
| `RegexGenerator` | ✅ `promotion/regex_gen.rs` — Gemini LLM integration |
|
|
| `ExtractorValidator` | ✅ `promotion/validator.rs` — ReDoS detection, timing validation |
|
|
| `YamlWriter` | ✅ `promotion/writer.rs` — outputs to `.aphoria/extractors/learned/` |
|
|
| `InteractiveReviewer` | ✅ `promotion/review.rs` — CLI review workflow |
|
|
| `PromotionCandidate` | ✅ `promotion/types.rs` |
|
|
| `ValidationResult` | ✅ `promotion/types.rs` |
|
|
|
|
```rust
|
|
pub struct PromotionPipeline {
|
|
pattern_store: Arc<dyn PatternStore>,
|
|
llm_client: ClaudeClient,
|
|
validator: ExtractorValidator,
|
|
}
|
|
|
|
impl PromotionPipeline {
|
|
/// Get patterns ready for promotion
|
|
pub async fn get_candidates(&self) -> Vec<PromotionCandidate> {
|
|
let patterns = self.pattern_store
|
|
.get_promotion_candidates(5, 0.8)
|
|
.await?;
|
|
|
|
patterns.into_iter()
|
|
.map(|p| self.generate_candidate(p))
|
|
.collect()
|
|
}
|
|
|
|
/// Generate declarative extractor from pattern
|
|
async fn generate_candidate(&self, pattern: LearnedPattern) -> PromotionCandidate {
|
|
// Ask Claude to generate regex
|
|
let regex = self.llm_client.generate_regex(&pattern).await?;
|
|
|
|
// Build declarative extractor
|
|
let extractor = DeclarativeExtractor {
|
|
name: pattern.id.to_string(),
|
|
language: pattern.language,
|
|
pattern: regex,
|
|
claim: pattern.claim_template.clone(),
|
|
source: ExtractorSource::Learned {
|
|
pattern_id: pattern.id,
|
|
projects: pattern.project_hashes.len(),
|
|
},
|
|
};
|
|
|
|
// Validate against examples
|
|
let validation = self.validator.validate(&extractor, &pattern).await;
|
|
|
|
PromotionCandidate { pattern, extractor, validation }
|
|
}
|
|
}
|
|
```
|
|
|
|
### 7.7.2 Regex Generation ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Multi-example prompt | ✅ Includes all examples in generation prompt |
|
|
| Regex safety | ✅ ReDoS detection prevents catastrophic backtracking |
|
|
| Test coverage | ✅ Validates against stored examples |
|
|
|
|
```rust
|
|
async fn generate_regex(examples: &[String], claim: &ClaimTemplate) -> Result<String> {
|
|
let prompt = format!(
|
|
"Generate a regex pattern that matches all these code examples:\n\n{}\n\n\
|
|
The regex should extract the value for claim: {}\n\
|
|
Requirements:\n\
|
|
- Must match ALL examples\n\
|
|
- Use named capture groups for extracted values\n\
|
|
- Avoid catastrophic backtracking (no nested quantifiers)\n\
|
|
- Return ONLY the regex, no explanation",
|
|
examples.join("\n---\n"),
|
|
claim.subject_template
|
|
);
|
|
|
|
let response = claude.message(&prompt).await?;
|
|
validate_regex_safety(&response)?;
|
|
Ok(response)
|
|
}
|
|
```
|
|
|
|
### 7.7.3 Validation Suite ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Positive tests | ✅ Must match all stored examples |
|
|
| ReDoS detection | ✅ Detects catastrophic backtracking patterns |
|
|
| Performance test | ✅ Timing validation with configurable threshold |
|
|
| False positive check | ⬜ Deferred to Phase 9 (sample codebase FP testing) |
|
|
|
|
```rust
|
|
pub struct ExtractorValidator {
|
|
sample_codebases: Vec<PathBuf>, // Known-good projects for FP testing
|
|
}
|
|
|
|
impl ExtractorValidator {
|
|
pub async fn validate(
|
|
&self,
|
|
extractor: &DeclarativeExtractor,
|
|
pattern: &LearnedPattern
|
|
) -> ValidationResult {
|
|
let mut result = ValidationResult::default();
|
|
|
|
// Must match all positive examples
|
|
for example in &pattern.examples {
|
|
if !extractor.matches(example) {
|
|
result.positive_failures.push(example.clone());
|
|
}
|
|
}
|
|
|
|
// Must not have excessive false positives
|
|
for codebase in &self.sample_codebases {
|
|
let fps = self.count_false_positives(extractor, codebase).await;
|
|
if fps > 10 {
|
|
result.false_positive_warning = true;
|
|
}
|
|
}
|
|
|
|
// Must be fast
|
|
let duration = self.benchmark(extractor);
|
|
if duration > Duration::from_millis(100) {
|
|
result.performance_warning = true;
|
|
}
|
|
|
|
result
|
|
}
|
|
}
|
|
```
|
|
|
|
### 7.7.4 Human Review Gate ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `aphoria extractors review` | ✅ CLI to review pending promotions |
|
|
| `aphoria extractors stats` | ✅ Show pattern store statistics |
|
|
| `aphoria extractors candidates` | ✅ List promotion candidates |
|
|
| `aphoria extractors promote` | ✅ Promote pattern to extractor |
|
|
| Approval workflow | ✅ Approve, reject, or skip via InteractiveReviewer |
|
|
| Rejection tracking | ⬜ Deferred to Phase 9 (rejection reason persistence) |
|
|
| Auto-approve mode | ⬜ Deferred to Phase 9 (>0.95 confidence auto-promote) |
|
|
|
|
```bash
|
|
$ aphoria extractors review
|
|
|
|
Pending promotions: 3
|
|
|
|
[1/3] Pattern: tls_min_version_const
|
|
Examples: 47 (across 8 projects)
|
|
Confidence: 0.91
|
|
|
|
Generated regex: (?i)(tls|ssl)_?(min|minimum)_?version\s*[:=]\s*["']?(1\.[01])["']?
|
|
|
|
Sample matches:
|
|
const TLS_MIN_VERSION = "1.0" ✓ matches
|
|
TLS_MINIMUM_VERSION: "1.1" ✓ matches
|
|
ssl_min_version = "1.2" ✓ matches (TLS 1.2 is safe, false positive?)
|
|
|
|
[a]pprove [r]eject [e]dit [s]kip [q]uit: _
|
|
```
|
|
|
|
### 7.7.5 Extractor Output ✅
|
|
|
|
Promoted patterns become declarative extractors in `.aphoria/extractors/learned/`:
|
|
|
|
```yaml
|
|
# .aphoria/extractors/learned/tls_min_version_const.yaml
|
|
# Auto-generated from learned pattern. DO NOT EDIT.
|
|
# Pattern ID: 550e8400-e29b-41d4-a716-446655440000
|
|
# Learned from: 8 projects, 47 occurrences
|
|
# Confidence: 0.91
|
|
# Promoted: 2026-02-10
|
|
|
|
name: "tls_min_version_const"
|
|
language: ["rust", "go", "python", "javascript", "typescript"]
|
|
pattern: '(?i)(tls|ssl)_?(min|minimum)_?version\s*[:=]\s*["\']?(1\.[01])["\']?'
|
|
claim:
|
|
subject: "tls/min_version"
|
|
predicate: "version"
|
|
value_capture: 1 # Capture group for version
|
|
description: "TLS minimum version set to deprecated {value}"
|
|
metadata:
|
|
source: "learned"
|
|
pattern_id: "550e8400-e29b-41d4-a716-446655440000"
|
|
projects: 8
|
|
occurrences: 47
|
|
confidence: 0.91
|
|
```
|
|
|
|
### 7.7.6 Configuration ✅
|
|
|
|
```toml
|
|
# aphoria.toml
|
|
[promotion]
|
|
enabled = true # Enable promotion pipeline
|
|
auto_promote = false # Require human approval
|
|
output_dir = ".aphoria/extractors/learned"
|
|
min_confidence = 0.8 # Minimum to consider
|
|
min_projects = 5 # Projects needed before promotion
|
|
require_validation = true # Must pass validation suite
|
|
```
|
|
|
|
**Files:** `promotion/mod.rs`, `promotion/pipeline.rs`, `promotion/regex_gen.rs`, `promotion/validator.rs`, `promotion/review.rs`, `promotion/writer.rs`, `promotion/types.rs`, `handlers/extractors.rs`
|
|
|
|
**Tests:** 43 tests covering pipeline, validation, regex generation, and YAML output.
|
|
|
|
---
|
|
|
|
## Phase 9: Autonomous Extractor Generation ✅
|
|
|
|
> The system generates, tests, and deploys extractors without human approval for high-confidence patterns. This is the endgame: a fully self-improving extraction system.
|
|
|
|
### Vision
|
|
|
|
```
|
|
Learned pattern exceeds autonomous threshold (>0.95 confidence, >10 projects)
|
|
↓
|
|
Auto-generate extractor
|
|
↓
|
|
Validate against comprehensive test suite
|
|
↓
|
|
A/B test: run new extractor in shadow mode
|
|
↓
|
|
If FP rate < 5%: auto-deploy
|
|
↓
|
|
If FP rate spikes: auto-rollback
|
|
```
|
|
|
|
---
|
|
|
|
## Phase 7.8: LLM Prompt Evaluation ✅
|
|
|
|
> Measure and improve LLM extraction quality through golden fixtures and regression detection. Essential for prompt engineering without breaking existing quality.
|
|
|
|
### Vision
|
|
|
|
```
|
|
Golden Fixtures (TOML) Evaluation Harness
|
|
├── tls-001: verify=False ├── Load fixtures
|
|
├── jwt-001: algorithm=none --> ├── Run extraction (live/cached/mock)
|
|
└── secrets-001: hardcoded key ├── Match against expectations
|
|
├── Compute precision/recall/F1
|
|
└── Compare to baseline (regression detection)
|
|
```
|
|
|
|
### 7.8.1 Fixture Format ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `Fixture` type | ✅ `eval/fixture.rs` — TOML-based test cases |
|
|
| `ExpectedClaim` | ✅ Subject/predicate/value expectations |
|
|
| `must_contain` | ✅ Claims that MUST be extracted (recall) |
|
|
| `must_not_contain` | ✅ Claims that MUST NOT appear (precision) |
|
|
| `FixtureLoader` | ✅ Load fixtures from directory tree |
|
|
| `CorpusManifest` | ✅ Corpus metadata + baseline metrics |
|
|
| Validation | ✅ Duplicate ID, empty content, missing expectations |
|
|
|
|
```toml
|
|
# tests/llm_fixtures/tls/tls-001-disabled-verification.toml
|
|
[metadata]
|
|
id = "tls-001"
|
|
name = "TLS verification disabled in Python requests"
|
|
category = "tls"
|
|
language = "python"
|
|
|
|
[input]
|
|
filename = "api_client.py"
|
|
content = """
|
|
response = requests.get(url, verify=False)
|
|
"""
|
|
|
|
[expected]
|
|
must_contain = [
|
|
{ subject = "tls/cert_verification", predicate = "enabled", value = false }
|
|
]
|
|
must_not_contain = [
|
|
{ subject = "tls/cert_verification", predicate = "enabled", value = true }
|
|
]
|
|
```
|
|
|
|
### 7.8.2 Claim Matching ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `ClaimMatcher` | ✅ `eval/matcher.rs` — Flexible claim comparison |
|
|
| Tail-path matching | ✅ Last 2 segments for subject comparison |
|
|
| Type coercion | ✅ Boolean↔string ("true"/"yes"), number↔string |
|
|
| Confidence thresholds | ✅ Optional min_confidence per expectation |
|
|
| `count_false_positives()` | ✅ Detect unexpected claims |
|
|
|
|
### 7.8.3 Metrics Computation ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `Metrics` | ✅ `eval/metrics.rs` — Aggregate evaluation metrics |
|
|
| Precision/Recall/F1 | ✅ Standard information retrieval metrics |
|
|
| Per-category breakdown | ✅ Metrics by fixture category |
|
|
| Cost estimation | ✅ Token-based cost tracking |
|
|
| `BaselineComparison` | ✅ Compare current run to stored baseline |
|
|
| Regression detection | ✅ Flag if F1/precision/recall drop > threshold |
|
|
|
|
### 7.8.4 Evaluation Harness ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `EvalHarness` | ✅ `eval/harness.rs` — Orchestrates evaluation runs |
|
|
| `EvalMode::Live` | ✅ Real LLM API calls |
|
|
| `EvalMode::Cached` | ✅ Use cached responses (deterministic CI) |
|
|
| `EvalMode::Mock` | ✅ No LLM, tests harness itself |
|
|
| `EvalVerdict` | ✅ Pass, Regression, Review, Error |
|
|
| `update_baseline()` | ✅ Save current metrics as new baseline |
|
|
|
|
### 7.8.5 Report Generation ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `Report` | ✅ `eval/report.rs` — Multi-format output |
|
|
| Table format | ✅ Terminal tables with color-coded results |
|
|
| JSON format | ✅ Machine-readable for CI/CD integration |
|
|
| Markdown format | ✅ Documentation and PR comments |
|
|
| Failed fixture details | ✅ Shows unmatched expectations with rationale |
|
|
|
|
### 7.8.6 CLI Commands ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `aphoria eval run` | ✅ Run evaluation against fixtures |
|
|
| `aphoria eval baseline` | ✅ Show current baseline metrics |
|
|
| `aphoria eval update-baseline` | ✅ Update baseline (--force required) |
|
|
| `aphoria eval list-fixtures` | ✅ List available fixtures by category |
|
|
| `aphoria eval validate-fixtures` | ✅ Validate fixture format |
|
|
| `--fail-on-regression` | ✅ Exit code 1 if regression detected |
|
|
| `--threshold` | ✅ Configurable regression threshold (default 5%) |
|
|
| `--mode` | ✅ live, cached, or mock |
|
|
|
|
```bash
|
|
# Run evaluation in mock mode
|
|
aphoria eval run --fixtures tests/llm_fixtures --mode mock
|
|
|
|
# CI: fail on regression
|
|
aphoria eval run --mode cached --fail-on-regression --threshold 0.05
|
|
|
|
# Update baseline after prompt improvements
|
|
aphoria eval update-baseline --fixtures tests/llm_fixtures --force
|
|
|
|
# List fixtures by category
|
|
aphoria eval list-fixtures --category tls
|
|
```
|
|
|
|
### 7.8.7 Seed Fixtures ✅
|
|
|
|
| Category | Fixture | Description |
|
|
|----------|---------|-------------|
|
|
| tls | tls-001 | Python requests verify=False |
|
|
| tls | tls-002 | Node.js TLSv1 deprecated protocol |
|
|
| jwt | jwt-001 | Algorithm 'none' allowed |
|
|
| jwt | jwt-002 | Go WithoutClaimsValidation |
|
|
| secrets | secrets-001 | Hardcoded API key |
|
|
| secrets | secrets-002 | High-entropy JWT in config |
|
|
| auth | auth-001 | Debug authentication bypass |
|
|
| negative | negative-001 | Safe TLS config (no findings expected) |
|
|
| negative | negative-002 | Env-loaded secrets (no findings expected) |
|
|
| edge | edge-001 | Empty file edge case |
|
|
|
|
**Files:** `eval/mod.rs`, `eval/fixture.rs`, `eval/matcher.rs`, `eval/metrics.rs`, `eval/harness.rs`, `eval/report.rs`, `handlers/eval.rs`, `cli.rs`, `tests/llm_fixtures/`
|
|
|
|
**Documentation:** [docs/llm-optimization/](docs/llm-optimization/index.md) — Full optimization playbook with decision trees, research templates, and baseline tracking.
|
|
|
|
---
|
|
|
|
### 9.1 Autonomous Promotion ✅
|
|
|
|
| Task | Description | Status |
|
|
|------|-------------|--------|
|
|
| `AutonomousConfig` | Configuration with kill switch (enabled: false default) | ✅ |
|
|
| High-confidence threshold | Skip human review for >0.95 confidence | ✅ |
|
|
| Project threshold | Require >10 projects for autonomous | ✅ |
|
|
| Validation strictness | Zero failures, zero warnings required | ✅ |
|
|
| `should_auto_promote()` | Decision logic on `PromotionCandidate` | ✅ |
|
|
| `auto_promotion_blockers()` | Explains why pattern can't be auto-promoted | ✅ |
|
|
| `AutonomousAuditLog` | JSONL audit trail for all decisions | ✅ |
|
|
| `smart_auto_promote_all()` | Pipeline integration with audit logging | ✅ |
|
|
| YAML header enhancement | "AUTO-PROMOTED" + "Approved by: autonomous" | ✅ |
|
|
| CLI command | `aphoria extractors auto-promote [--dry-run]` | ✅ |
|
|
|
|
**Safety Features:**
|
|
- Kill switch: `enabled: false` by default (opt-in only)
|
|
- Auditability: All decisions logged to `~/.aphoria/audit/autonomous-decisions.jsonl`
|
|
- Reversibility: Can delete YAML + reset pattern.promoted
|
|
- Blast radius: One pattern = one YAML file
|
|
- Traceability: YAML header shows approval source
|
|
|
|
**Files:** `config/types/autonomous.rs`, `promotion/audit.rs`, `promotion/types.rs`, `promotion/pipeline.rs`, `promotion/writer.rs`, `handlers/extractors.rs`
|
|
|
|
**Configuration:**
|
|
```toml
|
|
[autonomous]
|
|
enabled = true # Master switch (default: false)
|
|
min_confidence = 0.95 # Stricter than standard 0.8
|
|
min_projects = 10 # Stricter than standard 5
|
|
require_zero_failures = true
|
|
require_zero_warnings = true
|
|
audit_log = true
|
|
audit_dir = "~/.aphoria/audit/"
|
|
```
|
|
|
|
**CLI Usage:**
|
|
```bash
|
|
# Preview what would be auto-promoted
|
|
aphoria extractors auto-promote --dry-run
|
|
|
|
# Run autonomous promotion
|
|
aphoria extractors auto-promote
|
|
|
|
# Override thresholds
|
|
aphoria extractors auto-promote --min-confidence 0.97 --min-projects 15
|
|
```
|
|
|
|
### 9.2 Shadow Mode Testing ✅
|
|
|
|
| Task | Description | Status |
|
|
|------|-------------|--------|
|
|
| `ShadowConfig` | Configuration for shadow mode (min_scans, max_fp_rate, rollback_threshold) | ✅ |
|
|
| `ShadowTest`, `ShadowStatus`, `ShadowMetrics` | Core types for tracking shadow extractors | ✅ |
|
|
| `ShadowStore` | JSONL persistence for tests, matches, and decisions | ✅ |
|
|
| `ShadowExtractorRegistry` | Loads shadow extractors from learned/ directory | ✅ |
|
|
| `ShadowExecutor` | Runs shadow extractors during scans, stores matches separately | ✅ |
|
|
| `FeedbackCollector` | TP/FP feedback collection and metrics update | ✅ |
|
|
| `GraduationManager` | Shadow → production promotion and rollback logic | ✅ |
|
|
| CLI commands | `shadow-status`, `feedback`, `graduate`, `rollback` | ✅ |
|
|
|
|
**Safety Features:**
|
|
- Shadow isolation: Matches stored separately, not in production output
|
|
- Metrics transparency: FP rate visible via `shadow-status`
|
|
- Graduation gate: Must meet min_scans (100) + max_fp_rate (5%) + feedback exists
|
|
- Manual control: `rollback` command for immediate removal
|
|
- Audit trail: All decisions logged to `decisions.jsonl`
|
|
|
|
**Files:** `shadow/mod.rs`, `shadow/types.rs`, `shadow/store.rs`, `shadow/registry.rs`, `shadow/executor.rs`, `shadow/feedback.rs`, `shadow/graduation.rs`, `handlers/shadow.rs`, `config/types/shadow.rs`
|
|
|
|
**Configuration:**
|
|
```toml
|
|
[shadow]
|
|
enabled = true # Shadow mode on by default
|
|
min_scans = 100 # Scans before graduation eligible
|
|
max_fp_rate = 0.05 # Maximum FP rate for graduation
|
|
rollback_threshold = 0.15 # FP rate that triggers rollback
|
|
retention_days = 30 # Days to retain shadow data
|
|
```
|
|
|
|
**CLI Usage:**
|
|
```bash
|
|
# View shadow test status
|
|
aphoria extractors shadow-status [-v]
|
|
|
|
# Provide TP/FP feedback on matches
|
|
aphoria extractors feedback <test-name> [--limit 10]
|
|
|
|
# Graduate shadow test to production
|
|
aphoria extractors graduate <test-name> [--force]
|
|
|
|
# Rollback a shadow test
|
|
aphoria extractors rollback <test-name> --reason "too many FPs"
|
|
```
|
|
|
|
**Tests:** 44 tests covering types, store, registry, executor, feedback, graduation, and auto-rollback.
|
|
|
|
### 9.3 Auto-Rollback ✅
|
|
|
|
| Task | Description | Status |
|
|
|------|-------------|--------|
|
|
| `auto_rollback_enabled` config | Toggle to enable/disable auto-rollback (default: true) | ✅ |
|
|
| Feedback-time check | Auto-rollback triggered immediately after FP feedback | ✅ |
|
|
| `FeedbackWithRollback` return | `record_feedback()` returns rollback info | ✅ |
|
|
| `AutoRollbackResult` | Track checked count, rolled back names, errors | ✅ |
|
|
| CLI command | `aphoria extractors auto-check` for manual batch checking | ✅ |
|
|
| Audit trail | Decision logged as `ShadowDecisionKind::AutoRollback` | ✅ |
|
|
| YAML deletion | Extractor file deleted from learned/ on rollback | ✅ |
|
|
|
|
**Safety Features:**
|
|
- Toggle: `auto_rollback_enabled` can disable feature for testing or manual-only workflows
|
|
- Threshold configurable: `rollback_threshold` in config (default: 15%)
|
|
- Minimum reviews: Requires 10+ reviewed matches before auto-rollback triggers
|
|
- Audit trail: All auto-rollback decisions logged to `decisions.jsonl`
|
|
- CLI fallback: `auto-check` command for manual verification
|
|
|
|
**Files:** `shadow/feedback.rs`, `shadow/graduation.rs`, `config/types/shadow.rs`, `handlers/shadow.rs`, `cli.rs`
|
|
|
|
**Configuration:**
|
|
```toml
|
|
[shadow]
|
|
enabled = true
|
|
auto_rollback_enabled = true # NEW: Enable automatic rollback (default: true)
|
|
rollback_threshold = 0.15 # FP rate that triggers auto-rollback
|
|
```
|
|
|
|
**CLI Usage:**
|
|
```bash
|
|
# Automatic: Rollback happens immediately when feedback pushes FP rate over threshold
|
|
aphoria extractors feedback <test-name> --limit 10
|
|
# If FP rate exceeds 15%, you'll see:
|
|
# ⚠️ AUTO-ROLLBACK TRIGGERED: <extractor-name>
|
|
|
|
# Manual batch check: Scan all active tests and rollback any over threshold
|
|
aphoria extractors auto-check
|
|
# Output: "⚠️ Auto-rolled back 1 of 5 shadow test(s): ..."
|
|
```
|
|
|
|
**Tests:** 3 new tests covering auto-rollback triggering, disabled toggle, and threshold boundary.
|
|
|
|
### 9.4 Cross-Project Learning ✅
|
|
|
|
| Task | Description | Status |
|
|
|------|-------------|--------|
|
|
| Hosted pattern sync | Patterns from all projects aggregate on server | ✅ |
|
|
| Global promotion | Promote patterns seen across many orgs | ✅ |
|
|
| Privacy preservation | Only normalized patterns shared, no code | ✅ |
|
|
| Opt-in distribution | Orgs can opt-in to receive community extractors | ✅ |
|
|
|
|
```
|
|
Org A: Pattern seen in 3 projects → shared to hosted
|
|
Org B: Same pattern in 5 projects → shared to hosted
|
|
Org C: Same pattern in 4 projects → shared to hosted
|
|
↓
|
|
Hosted aggregates: 12 projects total
|
|
↓
|
|
Promotes to community extractor
|
|
↓
|
|
All orgs receive new extractor (if opted in)
|
|
```
|
|
|
|
**Implementation:**
|
|
- `CrossProjectConfig` with opt-in flags (`contribute_patterns`, `receive_community`)
|
|
- `PatternSyncer` for uploading anonymized patterns to hosted server
|
|
- `CommunityExtractorLoader` for pulling community extractors as YAML files
|
|
- BLAKE3 hashing for pattern deduplication and org anonymization
|
|
- Privacy guarantees: `normalized_pattern` shared, but NOT `example_code` or `project_hashes`
|
|
- CLI commands: `aphoria patterns sync`, `aphoria patterns status`, `aphoria patterns pull-community`
|
|
|
|
**Files:** `config/types/cross_project.rs`, `community/pattern_syncer.rs`, `community/extractor_loader.rs`, `handlers/patterns.rs`
|
|
|
|
**Tests:** 7 new tests covering pattern hashing, subject exclusion, anonymization, and extractor loading.
|
|
|
|
### 9.5 Extractor Versioning ✅
|
|
|
|
| Task | Description | Status |
|
|
|------|-------------|--------|
|
|
| Version tracking | Track which version caught which issues | ✅ `ExtractorVersion` + `VersionStore` |
|
|
| Changelog | Record changes between versions | ✅ `ExtractorChangelog` + `ChangelogEntry` |
|
|
| Rollback support | Revert to previous version | ✅ `aphoria extractors rollback-version` |
|
|
| A/B metrics | Compare versions side-by-side | ✅ `aphoria extractors compare` + `compute_metrics_delta()` |
|
|
| CLI commands | versions, compare, rollback-version | ✅ Full CLI implementation |
|
|
| Tests | Unit tests for all components | ✅ 15+ version/changelog tests |
|
|
|
|
**Files:**
|
|
- `promotion/version.rs` - Core types (`ExtractorVersion`, `ChangelogEntry`, `MetricsDelta`, `ExtractorChangelog`, `VersionStore`)
|
|
- `promotion/writer.rs` - Versioned YAML output (`write_versioned()`)
|
|
- `promotion/types.rs` - Version field in `PromotionMetadata`
|
|
- `handlers/extractors.rs` - CLI handlers (`handle_versions`, `handle_compare`, `handle_rollback_version`)
|
|
- `cli.rs` - CLI commands (`Versions`, `Compare`, `RollbackVersion`)
|
|
|
|
**CLI Usage:**
|
|
```bash
|
|
# List versions
|
|
aphoria extractors versions learned_tls_min_version
|
|
# Version History: learned_tls_min_version
|
|
# Version Date Changes
|
|
# ------------------------------------------------------------
|
|
# 2 2026-03-15 Added support for YAML configs
|
|
# 1 2026-02-01 Initial promotion from learned pattern
|
|
|
|
# Compare versions
|
|
aphoria extractors compare learned_tls_min_version -a 1 -b 2
|
|
# Comparison: learned_tls_min_version v1 vs v2
|
|
# Matches +15%
|
|
# False Positives -3%
|
|
|
|
# Rollback
|
|
aphoria extractors rollback-version learned_tls_min_version --version 1 --reason "v2 edge case bug"
|
|
# Rolled back learned_tls_min_version to v1
|
|
```
|
|
|
|
**YAML Output:**
|
|
```yaml
|
|
# Generated from learned pattern. Review before editing.
|
|
# Pattern ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
|
|
# Version: 2 (previous: 1)
|
|
# Promoted: 2026-03-15 14:30:00 UTC
|
|
|
|
name: learned_tls_min_version
|
|
description: TLS minimum version set to deprecated value
|
|
version: 2
|
|
previous_version: 1
|
|
languages:
|
|
- rust
|
|
- go
|
|
pattern: '(?i)tls_?min_?(version)?\s*[:=]\s*["\']?(?P<value>1\.[01])["\']?'
|
|
claim:
|
|
subject: tls/min_version
|
|
predicate: version
|
|
value_from_match: true
|
|
confidence: 0.97
|
|
metadata:
|
|
source: learned
|
|
pattern_id: a1b2c3d4-e5f6-7890-abcd-ef1234567890
|
|
version: 2
|
|
changelog:
|
|
- version: 2
|
|
date: 2026-03-15
|
|
changes: "Added support for YAML configs"
|
|
metrics:
|
|
matches: "+15%"
|
|
false_positives: "-3%"
|
|
- version: 1
|
|
date: 2026-02-01
|
|
changes: "Initial promotion from learned pattern"
|
|
```
|
|
|
|
### 9.6 Configuration ⬜
|
|
|
|
```toml
|
|
# aphoria.toml
|
|
[autonomous]
|
|
enabled = false # Opt-in to autonomous mode
|
|
min_confidence = 0.95 # Higher threshold for auto
|
|
min_projects = 10 # More evidence required
|
|
shadow_scans = 100 # Scans before promotion
|
|
max_fp_rate = 0.05 # Auto-rollback threshold
|
|
|
|
[autonomous.distribution]
|
|
receive_community = true # Receive community extractors
|
|
contribute_patterns = true # Share patterns to community
|
|
```
|
|
|
|
**Files:** `autonomous/mod.rs`, `autonomous/shadow.rs`, `autonomous/rollback.rs`, `autonomous/distribution.rs`
|
|
|
|
---
|
|
|
|
## Milestone Summary
|
|
|
|
| Phase | Deliverable | Depends On | Status |
|
|
|-------|-------------|------------|--------|
|
|
| 0 | ConceptPath in StemeDB | concept-hierarchy spec | ✅ |
|
|
| 2 | Aphoria CLI (scan, report, ack) | Phase 0 | ✅ |
|
|
| 2A | Concept matching (leaf, alias, auto-alias) | Phase 2 | ✅ |
|
|
| 1 | Authoritative corpus expansion | Phase 0 | ✅ |
|
|
| 3 | Claude Code skill + hooks | Phase 2A | ✅ |
|
|
| 4.5 | Ephemeral scan mode (40x faster) | Phase 2 | ✅ |
|
|
| 5 | Research agent loop | Phase 3 | ✅ |
|
|
| 6 | Federated Policy & Trust Packs | Phase 4.5 | ✅ |
|
|
| **6.5** | **Trust Pack Extensions (Predicate Aliases, Key Rotation)** | Phase 6 | ✅ |
|
|
| 4A | Observational claims (Tier 4 write-back) | Phase 6 | ✅ |
|
|
| 4B | Self-conflict detection (drift) | Phase 4A | ✅ |
|
|
| 4C | Diff-only scanning (--staged) | Phase 4B | ✅ |
|
|
| 4E | Hosted mode (team aggregation) | Phase 4C | ✅ |
|
|
| 4D | Enhanced ack (--reason, policy updates) | Phase 4C | ✅ |
|
|
| 5.6 | Community Corpus Contributions | Phase 4E | ✅ |
|
|
| 7 | Declarative Extractors | Phase 6 | ✅ |
|
|
| **7.5** | **LLM-in-the-Loop Extraction (Gemini)** | Phase 7 | ✅ |
|
|
| **7.6** | **Pattern Learning Store** | Phase 7.5 | ✅ |
|
|
| **7.7** | **Pattern → Extractor Promotion** | Phase 7.6 | ✅ |
|
|
| **7.8** | **LLM Prompt Evaluation** | Phase 7.5 | ✅ |
|
|
| 8 | Enterprise Extractors (8.1-8.11) | Phase 7.5 | ✅ |
|
|
| **8.2** | **Framework-Specific Extractors (10 frameworks)** | Phase 8 | ✅ |
|
|
| **9.1** | **Autonomous Promotion** | Phase 8 | ✅ |
|
|
| **9.2** | **Shadow Mode Testing** | Phase 9.1 | ✅ |
|
|
| **9.3** | **Auto-Rollback** | Phase 9.2 | ✅ |
|
|
| **9.4** | **Cross-Project Learning** | Phase 9.1 | ✅ |
|
|
| **9.5** | **Extractor Versioning** | Phase 9.4 | ✅ |
|
|
|
|
**Current state:**
|
|
- Phases 0-3, 4.5, 4A-4E, 5, 5.6, 6, 7, 7.5, 7.6, 7.7, 7.8, 8, 9.1, 9.2, 9.3, 9.4, 9.5 complete (clippy clean)
|
|
- Full corpus: RFC, OWASP, Vendor sources
|
|
- **36 extractors** including:
|
|
- Security: weak_crypto, command_injection, sql_injection, high_entropy_secrets, auth_bypass, insecure_cookies, path_traversal, unvalidated_redirects, weak_password, security_headers, insecure_deserialization, ssrf, orm_injection, xxe
|
|
- Framework-specific: django, express, flask, fastapi, nestjs, nextjs, spring, laravel, rails, aspnet
|
|
- Trust Packs: signed policy bundles with import/export
|
|
- Ephemeral mode: 40x faster for CI
|
|
- Observation write-back: `--sync` records novel claims as Tier 4 project memory
|
|
- **Autonomous promotion**: High-confidence patterns (>0.95, 10+ projects) can skip human review with full audit trail
|
|
- **Shadow mode testing**: Auto-promoted extractors run in shadow mode to measure FP rate before graduation
|
|
- **Auto-rollback**: Shadow extractors exceeding FP threshold (15%) are automatically rolled back
|
|
- Drift detection: Detects changes from prior observations
|
|
- Staged scanning: `--staged` flag for fast pre-commit hooks
|
|
- Hosted mode: Team aggregation via central StemeDB server
|
|
- Enhanced ack: `--reason` flag, `aphoria update` for policy changes
|
|
- Community Corpus: Opt-in anonymous pattern sharing with privacy-preserving anonymization
|
|
- Declarative Extractors: TOML-defined custom extractors without Rust code
|
|
- LLM Extraction: Gemini-powered semantic claim extraction for high-value files
|
|
- Pattern Learning: LLM-extracted claims recorded for promotion to declarative extractors
|
|
- Pattern Promotion: CLI workflow to promote learned patterns to declarative extractors with Gemini regex generation and validation
|
|
- **LLM Prompt Evaluation**: Golden fixtures with precision/recall metrics, baseline comparison, and regression detection for prompt engineering
|
|
- **Cross-Project Learning**: Privacy-preserving pattern sync to hosted server, community extractor pull, BLAKE3-based deduplication, opt-in sharing with `CrossProjectConfig`
|
|
- **Extractor Versioning**: Version tracking with changelogs, safe rollback to previous versions, A/B metrics comparison between versions via `VersionStore`
|
|
|
|
**Phase 9 Complete!** Autonomous Generation pipeline is fully self-improving.
|
|
|
|
### The Self-Learning Vision
|
|
|
|
```
|
|
Phase 7: Declarative Extractors (foundation) ✅ COMPLETE
|
|
↓
|
|
Phase 7.5: LLM-in-the-Loop (Gemini semantic extraction) ✅ COMPLETE
|
|
↓
|
|
Phase 7.6: Pattern Learning (remember what LLM finds) ✅ COMPLETE
|
|
↓
|
|
Phase 7.7: Pattern Promotion (patterns → extractors) ✅ COMPLETE
|
|
↓
|
|
Phase 7.8: LLM Prompt Evaluation (measure & improve) ✅ COMPLETE
|
|
↓
|
|
Phase 8: Enterprise Extractors (36 total) ✅ COMPLETE
|
|
├── 8.1: High-entropy secrets ✅
|
|
├── 8.2: Framework extractors (10 frameworks) ✅
|
|
├── 8.3: Config deep parsing ✅
|
|
├── 8.4-8.11: Security patterns ✅
|
|
↓
|
|
Phase 9: Autonomous Generation (fully self-improving) ✅ COMPLETE
|
|
├── 9.1: Autonomous Promotion ✅ COMPLETE
|
|
├── 9.2: Shadow Mode Testing ✅ COMPLETE
|
|
├── 9.3: Auto-Rollback ✅ COMPLETE
|
|
├── 9.4: Cross-Project Learning ✅ COMPLETE
|
|
└── 9.5: Extractor Versioning ✅ COMPLETE
|
|
```
|
|
|
|
**The endgame:** Every PR teaches Aphoria. After a month, it knows your security patterns better than your team does.
|
|
|
|
### Bidirectional Knowledge Sync (Complete)
|
|
|
|
The pre-commit hook is now a bidirectional knowledge sync:
|
|
1. **4A** ✅: Record code claims as Tier 4 observations (project memory)
|
|
2. **4B** ✅: Detect drift from prior observations (self-conflict)
|
|
3. **4C** ✅: Fast diff-only scanning for pre-commit hooks (`--staged`)
|
|
4. **4E** ✅: Team aggregation via hosted StemeDB server
|
|
5. **4D** ✅: Enhanced ack with rationale and policy updates
|
|
|
|
This transforms Aphoria from a linter into a learning system that builds institutional memory per-project and collective intelligence across teams via hosted mode.
|
|
|
|
---
|
|
|
|
## Phase 8: Enterprise Extractor Improvements ✅
|
|
|
|
> **Goal:** Transform extractors from "toy examples" to enterprise-grade detection that catches real violations in production codebases.
|
|
|
|
### Current State Audit
|
|
|
|
| Extractor | Languages | Strengths | Weaknesses |
|
|
|-----------|-----------|-----------|------------|
|
|
| `tls_verify` | 8 | Multi-lang, configs | Misses custom wrappers |
|
|
| `tls_version` | 8 | API patterns | Misses semantic (const = "1.0") |
|
|
| `hardcoded_secrets` | 8 | Placeholders, test files | No entropy detection |
|
|
| `weak_crypto` | 5 | MD5/SHA1/DES/RC4 | SHA1 false positives, misses bcrypt cost |
|
|
| `sql_injection` | 5 | Interpolation patterns | Misses ORM unsafe methods |
|
|
| `jwt_config` | 8 | alg:none, skip sig | Library-specific gaps |
|
|
| `cors_config` | 8 | Wildcard + credentials | Misses dynamic origin reflection |
|
|
| `rate_limit` | 8 | Basic patterns | Limited depth |
|
|
| `timeout_config` | 8 | Basic patterns | Limited depth |
|
|
| `command_injection` | 5 | exec/system calls | Indirect injection |
|
|
| `dep_versions` | 3 | Version parsing | No CVE correlation |
|
|
|
|
**Enterprise Reality:** Current extractors catch ~30% of real-world security misconfigurations. Config files are highest value (patterns consistent), code is lowest (semantic understanding required).
|
|
|
|
---
|
|
|
|
### 8.1 High-Entropy Secret Detection ✅
|
|
|
|
**Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `HighEntropySecretsExtractor` | ✅ `extractors/high_entropy_secrets.rs` |
|
|
| Shannon entropy algorithm | ✅ `shannon_entropy()` with 4.5 threshold |
|
|
| Charset variety check | ✅ 0.4 minimum variety ratio |
|
|
| Known secret prefixes | ✅ AWS (AKIA), Stripe (sk_live_, sk_test_), GitHub (ghp_, gho_), GitLab (glpat-), Slack (xox[baprs]-) |
|
|
| High-entropy context patterns | ✅ api_key, secret, token, credential, auth_key contexts |
|
|
| False positive exclusions | ✅ UUIDs, git SHAs (40-char hex), file hashes (64-char hex) |
|
|
| Test file confidence reduction | ✅ 0.6 confidence for test files |
|
|
| Tests | ✅ 10+ tests covering all patterns |
|
|
|
|
**Configuration:**
|
|
```toml
|
|
# aphoria.toml
|
|
[extractors.entropy]
|
|
min_entropy = 4.5 # Shannon entropy threshold
|
|
min_charset_variety = 0.4 # Unique chars / length ratio
|
|
min_length = 20 # Minimum string length
|
|
max_length = 200 # Maximum string length
|
|
```
|
|
|
|
**Languages:** Rust, Go, Python, JavaScript, TypeScript, YAML, TOML, JSON, Dotenv
|
|
|
|
---
|
|
|
|
### 8.2 Framework-Specific Extractors ✅
|
|
|
|
**Impact:** HIGH | **Effort:** HIGH | **Status:** Complete
|
|
|
|
**Research Document:** [`docs/architecture/framework-security-extractors.md`](./docs/architecture/framework-security-extractors.md)
|
|
|
|
All 10 framework-specific extractors implemented and tested:
|
|
|
|
| Framework | Extractor | Languages | Tests |
|
|
|-----------|-----------|-----------|-------|
|
|
| Spring Boot | `spring_security` | Java, YAML, Properties | 7 |
|
|
| Django | `django_security` | Python | 7 |
|
|
| Express.js | `express_security` | JavaScript, TypeScript | 5 |
|
|
| Rails | `rails_security` | Ruby, YAML | 6 |
|
|
| ASP.NET Core | `aspnet_security` | C# (via regex), JSON | 6 |
|
|
| Laravel | `laravel_security` | PHP (via regex) | 5 |
|
|
| FastAPI | `fastapi_security` | Python | 5 |
|
|
| Next.js | `nextjs_security` | JavaScript, TypeScript | 5 |
|
|
| Flask | `flask_security` | Python | 6 |
|
|
| NestJS | `nestjs_security` | TypeScript | 5 |
|
|
|
|
**Total:** 10 extractors, 57+ tests, 100+ patterns
|
|
|
|
**Files:** `extractors/{django,express,flask,fastapi,nestjs,nextjs,spring,laravel,rails,aspnet}_security.rs`
|
|
|
|
#### 8.2.1 Spring Boot Security
|
|
```yaml
|
|
# application.yml misconfigs
|
|
security:
|
|
basic:
|
|
enabled: false # Auth disabled
|
|
csrf:
|
|
enabled: false # CSRF disabled
|
|
headers:
|
|
frame-options: DISABLE # Clickjacking
|
|
```
|
|
|
|
```java
|
|
// Java code patterns
|
|
@EnableWebSecurity
|
|
public class Config extends WebSecurityConfigurerAdapter {
|
|
http.csrf().disable(); // CSRF disabled
|
|
http.authorizeRequests().antMatchers("/**").permitAll(); // Auth bypass
|
|
}
|
|
```
|
|
|
|
#### 8.2.2 Django Security
|
|
```python
|
|
# settings.py misconfigs
|
|
DEBUG = True # Debug in production
|
|
ALLOWED_HOSTS = ['*'] # All hosts
|
|
CSRF_COOKIE_SECURE = False # Insecure cookies
|
|
SESSION_COOKIE_SECURE = False
|
|
```
|
|
|
|
#### 8.2.3 Express.js Security
|
|
```javascript
|
|
// Missing security middleware
|
|
app.use(helmet()); // helmet() should exist
|
|
app.use(cors({ origin: '*', credentials: true })); // CORS + creds
|
|
app.disable('x-powered-by'); // Should be disabled
|
|
```
|
|
|
|
#### 8.2.4 Rails Security
|
|
```ruby
|
|
# config/environments/production.rb
|
|
config.force_ssl = false # Should be true
|
|
config.action_dispatch.cookies_same_site_protection = :none
|
|
```
|
|
|
|
---
|
|
|
|
### 8.3 Config File Deep Parsing ✅
|
|
|
|
**Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `ConfigValue` enum | ✅ `extractors/config_parser.rs` |
|
|
| YAML/JSON/TOML parsers | ✅ Using `serde_yaml`, `serde_json`, `toml` |
|
|
| Tree walker with path tracking | ✅ `walk_config()` with dot-path |
|
|
| `ConfigSecurityExtractor` | ✅ `extractors/config_security.rs` |
|
|
| Security rules (11 rules) | ✅ TLS, CSRF, debug, password, cookies, CORS, rate limit |
|
|
| Dev file exclusion | ✅ Skip debug warnings in dev/test configs |
|
|
| Tests | ✅ 26 tests for parsing + security rules |
|
|
|
|
**Patterns now caught (nested to any depth):**
|
|
- `*.tls.verify: false` — TLS verification disabled
|
|
- `*.insecure_skip_verify: true` — Skip verification enabled
|
|
- `*.security.enabled: false` — Security disabled
|
|
- `*.csrf.enabled: false` — CSRF protection disabled
|
|
- `debug: true` — Debug mode (only in production files)
|
|
- `*.password.min_length < 8` — Weak password policy
|
|
- `*.cookie.secure: false` — Cookie secure flag disabled
|
|
- `*.cookie.httpOnly: false` — Cookie httpOnly disabled
|
|
- `*.cors.allow_origin: "*"` — CORS allows all origins
|
|
- `*.rate_limit.enabled: false` — Rate limiting disabled
|
|
|
|
**Languages:** YAML, JSON, TOML
|
|
|
|
---
|
|
|
|
### 8.4 Semantic TLS Version Detection ✅
|
|
|
|
**Impact:** MEDIUM | **Effort:** MEDIUM | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Add `Language::Terraform` variant | ✅ `types/language.rs` |
|
|
| Semantic pattern (cross-language) | ✅ Catches `TLS_MIN_VERSION = "1.0"` with type annotations |
|
|
| Environment variable pattern | ✅ `.env` files with `TLS_MIN_VERSION=1.0` |
|
|
| Terraform HCL pattern | ✅ `min_tls_version = "TLS1_0"` |
|
|
| Kubernetes camelCase pattern | ✅ `minTLSVersion: VersionTLS10` |
|
|
| False positive prevention | ✅ TLS 1.2/1.3 not flagged |
|
|
| Tests | ✅ 16 new tests (27 total for TLS extractor) |
|
|
|
|
**Patterns now caught:**
|
|
- `const TLS_MIN_VERSION: &str = "1.0";` (Rust with type annotation)
|
|
- `let sslVersion = "TLSv1";` (JavaScript camelCase)
|
|
- `TLS_MINIMUM_VERSION = "1.1"` (Python assignment)
|
|
- `TLS_MIN_VERSION=1.0` (dotenv)
|
|
- `export SSL_VERSION=TLSv1` (shell export)
|
|
- `min_tls_version = "TLS1_0"` (Terraform)
|
|
- `minTLSVersion: VersionTLS10` (Kubernetes YAML)
|
|
|
|
**Languages:** Rust, Go, Python, TypeScript, JavaScript, Yaml, Toml, Json, Terraform, Dotenv
|
|
|
|
---
|
|
|
|
### 8.5 ORM SQL Injection Detection ✅
|
|
|
|
**Impact:** MEDIUM | **Effort:** MEDIUM | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `OrmInjectionExtractor` | ✅ `extractors/orm_injection.rs` |
|
|
| Django .raw() with interpolation | ✅ `f"SELECT..."`, `.format()` patterns |
|
|
| Django .extra() with interpolation | ✅ `where=["...{}...".format()]` |
|
|
| SQLAlchemy text() with interpolation | ✅ `text(f"SELECT...")` |
|
|
| SQLAlchemy execute() with f-string | ✅ `execute(f"...")` |
|
|
| Sequelize raw query | ✅ `` sequelize.query(`...${...}`) `` |
|
|
| TypeORM where() | ✅ `` .where(`...${...}`) `` |
|
|
| GORM Raw() with Sprintf | ✅ `.Raw(fmt.Sprintf(...))` |
|
|
| Prisma $queryRawUnsafe | ✅ `` $queryRawUnsafe(`...${...}`) `` |
|
|
| Tests | ✅ 8+ tests covering all patterns |
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go
|
|
|
|
Current `sql_injection` catches raw string interpolation but misses ORM escape hatches:
|
|
|
|
```python
|
|
# SQLAlchemy
|
|
db.execute(text(f"SELECT * FROM users WHERE id = {user_id}"))
|
|
User.query.filter(text("name = '" + name + "'"))
|
|
|
|
# Django
|
|
User.objects.raw("SELECT * FROM users WHERE id = %s" % user_id)
|
|
User.objects.extra(where=["name = '%s'" % name])
|
|
```
|
|
|
|
```javascript
|
|
// Sequelize
|
|
sequelize.query(`SELECT * FROM users WHERE id = ${userId}`);
|
|
Model.findAll({ where: sequelize.literal(`id = ${id}`) });
|
|
|
|
// Prisma
|
|
prisma.$queryRawUnsafe(`SELECT * FROM users WHERE id = ${id}`);
|
|
```
|
|
|
|
```ruby
|
|
# ActiveRecord
|
|
User.where("name = '#{name}'")
|
|
User.find_by_sql("SELECT * FROM users WHERE id = #{id}")
|
|
```
|
|
|
|
---
|
|
|
|
### 8.6 Authentication Bypass Patterns ✅
|
|
|
|
**Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `AuthBypassExtractor` | ✅ `extractors/auth_bypass.rs` |
|
|
| Hardcoded admin credentials | ✅ `username == "admin" && password == "..."` patterns |
|
|
| Debug auth headers | ✅ X-Debug-Auth, X-Internal-Auth, X-Admin-Auth |
|
|
| Skip auth env vars | ✅ SKIP_AUTH, BYPASS_AUTH, NO_AUTH, DEBUG_AUTH |
|
|
| Backdoor patterns | ✅ `if username == "backdoor"`, `if user == "test"` |
|
|
| Default credentials | ✅ admin/admin, root/root, test/test, guest/guest |
|
|
| Test file confidence reduction | ✅ 0.5 confidence for test files |
|
|
| Tests | ✅ 11+ tests covering all patterns |
|
|
|
|
**Detected patterns:**
|
|
```python
|
|
# Hardcoded credentials
|
|
if username == "admin" and password == "admin":
|
|
|
|
# Debug auth headers
|
|
if request.headers.get("X-Debug-Auth") == "secret":
|
|
|
|
# Skip auth env vars
|
|
if os.environ.get("SKIP_AUTH") == "true":
|
|
```
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go, Rust
|
|
|
|
---
|
|
|
|
### 8.7 Insecure Deserialization ✅
|
|
|
|
**Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `InsecureDeserializationExtractor` | ✅ `extractors/insecure_deserialization.rs` |
|
|
| Python pickle (critical) | ✅ `pickle.load()`, `pickle.loads()`, `Unpickler()` |
|
|
| Python yaml.load without SafeLoader | ✅ Detects missing SafeLoader |
|
|
| Python marshal | ✅ `marshal.load()`, `marshal.loads()` |
|
|
| Python eval/exec with user input | ✅ `eval(request...)`, `exec(user...)` |
|
|
| JavaScript node-serialize | ✅ `require('node-serialize')`, `.unserialize()` |
|
|
| Go gob decoder | ✅ `gob.NewDecoder()`, `gob.Decode()` |
|
|
| Java ObjectInputStream (polyglot) | ✅ `ObjectInputStream`, `readObject()` |
|
|
| Tests | ✅ 10+ tests covering all patterns |
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go
|
|
|
|
Unsafe deserialization of untrusted data:
|
|
|
|
```python
|
|
# Python
|
|
pickle.loads(user_input)
|
|
yaml.load(user_input) # Without Loader=SafeLoader
|
|
eval(user_input)
|
|
exec(user_input)
|
|
```
|
|
|
|
```java
|
|
// Java
|
|
ObjectInputStream ois = new ObjectInputStream(userInput);
|
|
ois.readObject(); // Dangerous!
|
|
```
|
|
|
|
```ruby
|
|
# Ruby
|
|
Marshal.load(user_input)
|
|
YAML.load(user_input) # Should use safe_load
|
|
```
|
|
|
|
---
|
|
|
|
### 8.8 Path Traversal Patterns ✅
|
|
|
|
**Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `PathTraversalExtractor` | ✅ `extractors/path_traversal.rs` |
|
|
| Python open/read/write with user input | ✅ `open(request...)`, `read(params...)` |
|
|
| Python os.path.join with user input | ✅ `os.path.join(base, request...)` |
|
|
| JavaScript fs operations | ✅ `fs.readFile(req...)`, `fs.writeFile(params...)` |
|
|
| JavaScript path.join/resolve | ✅ `path.join(base, req.query...)` |
|
|
| JavaScript res.sendFile | ✅ `res.sendFile(req.params...)` |
|
|
| Go filepath operations | ✅ `filepath.Join(base, r...)`, `os.Open(req...)` |
|
|
| Rust path operations | ✅ `Path::new(request...)`, `std::fs::read(user...)` |
|
|
| Traversal literals | ✅ `../`, `%2e%2e` URL-encoded patterns |
|
|
| Tests | ✅ 8+ tests covering all patterns |
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go, Rust
|
|
|
|
File operations with user input:
|
|
|
|
```python
|
|
# Python
|
|
open(user_input)
|
|
os.path.join(base, user_input) # Doesn't prevent ../
|
|
shutil.copy(user_input, dest)
|
|
```
|
|
|
|
```javascript
|
|
// JavaScript
|
|
fs.readFile(userInput)
|
|
path.join(base, userInput) // Doesn't prevent ../
|
|
res.sendFile(userInput)
|
|
```
|
|
|
|
---
|
|
|
|
### 8.9 SSRF Patterns ✅
|
|
|
|
**Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `SsrfExtractor` | ✅ `extractors/ssrf.rs` |
|
|
| Python requests library | ✅ `requests.get(url)`, `requests.post(target)` |
|
|
| Python urllib | ✅ `urllib.request.urlopen(url)` |
|
|
| Python httpx | ✅ `httpx.get(url)`, `AsyncClient` |
|
|
| JavaScript fetch | ✅ `fetch(url)`, `fetch(req.query...)` |
|
|
| JavaScript axios | ✅ `axios.get(url)`, `axios.post(target)` |
|
|
| JavaScript got | ✅ `got(url)` |
|
|
| Go http.Get/Post | ✅ `http.Get(url)`, `http.NewRequest(...)` |
|
|
| Rust reqwest | ✅ `reqwest::get(url)`, `reqwest::Client` |
|
|
| URL sink patterns | ✅ `proxy_url`, `webhook_url`, `callback_url` from request |
|
|
| Tests | ✅ 10+ tests covering all patterns |
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go, Rust
|
|
|
|
HTTP requests with user-controlled URLs:
|
|
|
|
```python
|
|
# Python
|
|
requests.get(user_url)
|
|
urllib.request.urlopen(user_input)
|
|
```
|
|
|
|
```javascript
|
|
// JavaScript
|
|
fetch(userUrl)
|
|
axios.get(userUrl)
|
|
http.get(userUrl)
|
|
```
|
|
|
|
```go
|
|
// Go
|
|
http.Get(userURL)
|
|
client.Do(req) // Where req.URL is user-controlled
|
|
```
|
|
|
|
---
|
|
|
|
### 8.10 Missing Security Headers ✅
|
|
|
|
**Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `SecurityHeadersExtractor` | ✅ `extractors/security_headers.rs` |
|
|
| X-Frame-Options disabled | ✅ `X-Frame-Options: none`, `ALLOWALL` |
|
|
| X-Content-Type-Options disabled | ✅ `X-Content-Type-Options: disabled` |
|
|
| X-XSS-Protection disabled | ✅ `X-XSS-Protection: false` |
|
|
| Django SECURE_* settings | ✅ `SECURE_BROWSER_XSS_FILTER = False`, etc. |
|
|
| YAML headers disabled | ✅ `x_frame_options: false`, `hsts: no` |
|
|
| CSP disabled or unsafe | ✅ `unsafe-inline`, `unsafe-eval` directives |
|
|
| HSTS disabled | ✅ `Strict-Transport-Security: none`, `hsts_seconds = 0` |
|
|
| Tests | ✅ 7+ tests covering all patterns |
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go, YAML, JSON, TOML
|
|
|
|
Detect when security headers are explicitly removed or not set:
|
|
|
|
```python
|
|
# Response headers missing
|
|
response.headers.pop('X-Content-Type-Options')
|
|
response.headers['X-Frame-Options'] = 'ALLOWALL'
|
|
```
|
|
|
|
```javascript
|
|
// Express without helmet
|
|
app.use(cors()); // CORS without other security
|
|
// No app.use(helmet()) found
|
|
```
|
|
|
|
---
|
|
|
|
### 8.11 Insecure Cookie Flags ✅
|
|
|
|
**Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `InsecureCookiesExtractor` | ✅ `extractors/insecure_cookies.rs` |
|
|
| Missing Secure flag | ✅ `secure=False`, `secure: false` |
|
|
| Missing HttpOnly flag | ✅ `httponly=False`, `httpOnly: false` |
|
|
| SameSite=None without Secure | ✅ `sameSite: 'none'`, `SameSite=None` |
|
|
| Django settings | ✅ SESSION_COOKIE_SECURE, CSRF_COOKIE_SECURE = False |
|
|
| Go cookie patterns | ✅ `Secure: false`, `HttpOnly: false` |
|
|
| Rust actix-web patterns | ✅ `.secure(false)`, `.http_only(false)` |
|
|
| Test file confidence reduction | ✅ 0.5 confidence for test files |
|
|
| Tests | ✅ 8+ tests covering all patterns |
|
|
|
|
**Detected patterns:**
|
|
```python
|
|
# Python/Flask/Django
|
|
response.set_cookie('session', value, secure=False)
|
|
SESSION_COOKIE_SECURE = False
|
|
```
|
|
|
|
```javascript
|
|
// JavaScript/Express
|
|
res.cookie('session', value, { httpOnly: false });
|
|
res.cookie('auth', value, { sameSite: 'none' });
|
|
```
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go, Rust, Ruby, YAML
|
|
|
|
---
|
|
|
|
### 8.12 Unvalidated Redirects ✅
|
|
|
|
**Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `UnvalidatedRedirectsExtractor` | ✅ `extractors/unvalidated_redirects.rs` |
|
|
| Python redirect with user input | ✅ `redirect(request.GET['next'])`, `HttpResponseRedirect(url)` |
|
|
| Python Flask redirect | ✅ `redirect(request.args.get(...))` |
|
|
| JavaScript res.redirect | ✅ `res.redirect(req.query.next)` |
|
|
| JavaScript window.location | ✅ `window.location = url`, `location.href = params...` |
|
|
| Go http.Redirect | ✅ `http.Redirect(w, r, r.Query...)` |
|
|
| URL parameter patterns | ✅ `redirect_url`, `return_url`, `next`, `goto` from request |
|
|
| Tests | ✅ 7+ tests covering all patterns |
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go
|
|
|
|
Open redirect vulnerabilities:
|
|
|
|
```python
|
|
# Python
|
|
return redirect(request.args.get('next'))
|
|
return redirect(request.GET['url'])
|
|
```
|
|
|
|
```javascript
|
|
// JavaScript
|
|
res.redirect(req.query.redirect);
|
|
window.location = userInput;
|
|
window.location.href = params.url;
|
|
```
|
|
|
|
---
|
|
|
|
### 8.13 XXE (XML External Entity) ✅
|
|
|
|
**Impact:** HIGH | **Effort:** MEDIUM | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `XxeExtractor` | ✅ `extractors/xxe.rs` |
|
|
| Python lxml/etree | ✅ `etree.parse()`, `lxml.fromstring()` |
|
|
| Python xml.etree.ElementTree | ✅ `ET.parse()`, `ET.fromstring()` |
|
|
| Python xml.dom.minidom | ✅ `minidom.parse()`, `minidom.parseString()` |
|
|
| Python xml.sax | ✅ `xml.sax.parse()`, `xml.sax.make_parser()` |
|
|
| JavaScript xml2js | ✅ `xml2js.parseString()`, `xml2js.Parser()` |
|
|
| JavaScript libxmljs | ✅ `libxmljs.parseXml()` |
|
|
| Go encoding/xml | ✅ `xml.Unmarshal()`, `xml.NewDecoder()` |
|
|
| Java patterns (polyglot) | ✅ `DocumentBuilderFactory`, `SAXParser`, `XMLReader` |
|
|
| DTD entity declarations | ✅ `<!ENTITY ... SYSTEM>`, `<!ENTITY ... PUBLIC>` |
|
|
| defusedxml detection | ✅ Lower confidence when defusedxml is imported |
|
|
| Tests | ✅ 9+ tests covering all patterns |
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go
|
|
|
|
Unsafe XML parsing:
|
|
|
|
```python
|
|
# Python
|
|
etree.parse(user_input) # Without disabling entities
|
|
xml.etree.ElementTree.parse(user_input)
|
|
```
|
|
|
|
```java
|
|
// Java
|
|
DocumentBuilderFactory.newInstance() // Without setFeature to disable XXE
|
|
SAXParserFactory.newInstance() // Without secure processing
|
|
```
|
|
|
|
---
|
|
|
|
### 8.14 Weak Password Requirements ✅
|
|
|
|
**Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `WeakPasswordExtractor` | ✅ `extractors/weak_password.rs` |
|
|
| Minimum length < 8 | ✅ `password_min_length: 6`, `minLength: 4` |
|
|
| Bcrypt cost < 10 | ✅ `bcrypt_cost = 8`, `hash_rounds = 5` |
|
|
| Simple length checks | ✅ `len(password) >= 6` in code |
|
|
| Complexity disabled | ✅ `require_special_chars: false`, `require_uppercase = false` |
|
|
| Number requirement disabled | ✅ `require_numbers: no`, `require_digit = 0` |
|
|
| Tests | ✅ 7+ tests covering all patterns |
|
|
|
|
**Languages:** Python, JavaScript, TypeScript, Go, Rust, YAML, JSON, TOML
|
|
|
|
Password validation that's too weak:
|
|
|
|
```python
|
|
# Python
|
|
if len(password) >= 4: # Too short
|
|
if len(password) >= 6: # Still weak
|
|
MIN_PASSWORD_LENGTH = 6 # Config too low
|
|
```
|
|
|
|
```javascript
|
|
// JavaScript
|
|
if (password.length >= 4)
|
|
const MIN_LENGTH = 6;
|
|
/^.{4,}$/ // Regex allows 4+ chars
|
|
```
|
|
|
|
---
|
|
|
|
### 8.15 LLM-Assisted Extraction (Future) ⬜
|
|
|
|
**Impact:** VERY HIGH | **Effort:** VERY HIGH
|
|
|
|
Use Claude to understand code semantically:
|
|
|
|
```rust
|
|
// Pseudo-implementation
|
|
async fn extract_with_llm(code: &str, file: &str) -> Vec<ExtractedClaim> {
|
|
let prompt = format!(
|
|
"Analyze this code for security issues. Return JSON with:\n\
|
|
- concept_path: security concept (e.g., 'tls/cert_verification')\n\
|
|
- predicate: what aspect (e.g., 'enabled')\n\
|
|
- value: the value found\n\
|
|
- confidence: 0.0-1.0\n\
|
|
- description: why this is an issue\n\n\
|
|
Code:\n```\n{}\n```",
|
|
code
|
|
);
|
|
|
|
let response = claude_api.message(&prompt).await?;
|
|
parse_claims_from_llm_response(&response)
|
|
}
|
|
```
|
|
|
|
**When to use:**
|
|
- High-value files (auth, crypto, config)
|
|
- After regex extractors find nothing
|
|
- For code review mode (not CI)
|
|
|
|
**Considerations:**
|
|
- Cost per scan
|
|
- Latency
|
|
- Rate limits
|
|
- Privacy (code leaves machine)
|
|
|
|
---
|
|
|
|
### Implementation Priority
|
|
|
|
| Phase | Extractors | Impact | Effort | Enterprise Value | Status |
|
|
|-------|------------|--------|--------|------------------|--------|
|
|
| **8.1** | High-entropy secrets | HIGH | MEDIUM | Catches real leaked secrets | ✅ |
|
|
| **8.2** | Framework-specific | HIGH | HIGH | Spring/Django/Express coverage | ✅ |
|
|
| **8.3** | Config deep parsing | HIGH | MEDIUM | Nested YAML/JSON understanding | ✅ |
|
|
| **8.4** | Semantic TLS | MEDIUM | MEDIUM | Catches const TLS_MIN = "1.0" | ✅ |
|
|
| **8.5** | ORM SQL injection | MEDIUM | MEDIUM | SQLAlchemy, Django, Sequelize | ✅ |
|
|
| **8.6** | Auth bypass | HIGH | MEDIUM | Backdoors, hardcoded creds | ✅ |
|
|
| **8.7** | Deserialization | HIGH | MEDIUM | pickle, Marshal, eval | ✅ |
|
|
| **8.8** | Path traversal | MEDIUM | LOW | ../../../etc/passwd | ✅ |
|
|
| **8.9** | SSRF | HIGH | MEDIUM | Internal network access | ✅ |
|
|
| **8.10** | Security headers | MEDIUM | LOW | Missing helmet(), CSP | ✅ |
|
|
| **8.11** | Cookie flags | MEDIUM | LOW | httpOnly, secure, sameSite | ✅ |
|
|
| **8.12** | Open redirects | MEDIUM | LOW | Phishing via redirect | ✅ |
|
|
| **8.13** | XXE | HIGH | MEDIUM | XML entity injection | ✅ |
|
|
| **8.14** | Weak passwords | MEDIUM | LOW | MIN_LENGTH = 4 | ✅ |
|
|
| **8.15** | LLM extraction | VERY HIGH | VERY HIGH | Semantic understanding | ✅ (Phase 7.5) |
|
|
|
|
**Phase 8 Complete (8.1-8.14):** All extractors implemented including 10 framework-specific extractors (Spring, Django, Express, Rails, ASP.NET, Laravel, FastAPI, Next.js, Flask, NestJS).
|
|
|
|
---
|
|
|
|
### Success Metrics
|
|
|
|
| Metric | Current | Target | How to Measure |
|
|
|--------|---------|--------|----------------|
|
|
| Detection rate (known vulns) | ~30% | >70% | Run against OWASP benchmark |
|
|
| False positive rate | Unknown | <10% | Manual review of 100 findings |
|
|
| Config file coverage | Regex only | Full parse | Structure-aware extraction |
|
|
| Framework coverage | 0 | 4 major | Spring, Django, Express, Rails |
|
|
| Enterprise pilot feedback | N/A | >4/5 | Post-pilot survey |
|
|
|
|
---
|
|
|
|
## Phase 10: UX & Enterprise Polish ⬜
|
|
|
|
> **Goal:** Address enterprise buyer feedback from pilot demos. Close gaps between pitch claims and actual functionality.
|
|
> **Source:** Skeptical buyer review of `applications/aphoria-pitch/` materials.
|
|
|
|
### 10.1 Acknowledgment Expiry ✅
|
|
|
|
**Impact:** HIGH | **Effort:** MEDIUM | **Priority:** P1
|
|
|
|
Add `--expires` flag to `aphoria ack` command for time-limited exceptions.
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Add `expires_at: Option<String>` to `AcknowledgmentInfo` struct (ISO 8601 format) | ✅ |
|
|
| Add `--expires` CLI flag to `Commands::Ack` in `cli.rs` | ✅ |
|
|
| Parse durations: `--expires 90d`, `--expires 2026-12-31` (ISO 8601 date only) | ✅ |
|
|
| Filter expired acks in `check_conflicts()` | ✅ |
|
|
| Show "Ack expired, resurfaces as BLOCK" in output | ✅ |
|
|
| Add expiry to JSON export for audit trail | ✅ |
|
|
| Tests for expiry parsing and behavior | ✅ |
|
|
|
|
**Implementation Notes:**
|
|
- Created `src/expiry.rs` module with `parse_expiry()`, `is_expired()`, and `format_expiry()` functions
|
|
- Ack payloads stored as JSON with `{reason, expires_at}` for backwards compatibility
|
|
- Legacy plain-text acks treated as permanent (no expiry)
|
|
- Expired acks preserved for audit trail per patent claim 25
|
|
- Updated all report formatters (table, JSON, markdown) to show expiry info
|
|
|
|
**CLI changes (`cli.rs`):**
|
|
```rust
|
|
Ack {
|
|
concept_path: String,
|
|
#[arg(short, long)]
|
|
reason: String,
|
|
/// Optional expiry (e.g., "90d", "2026-12-31")
|
|
#[arg(long)]
|
|
expires: Option<String>,
|
|
},
|
|
```
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Expire after 90 days
|
|
aphoria ack code://go/auth/tls/cert_verification \
|
|
--reason "Integration test environment" \
|
|
--expires 90d
|
|
|
|
# Expire on specific date (ISO 8601)
|
|
aphoria ack code://go/auth/tls/cert_verification \
|
|
--reason "Legacy migration - ends Q2" \
|
|
--expires 2026-12-31
|
|
```
|
|
|
|
**Output after expiry:**
|
|
```
|
|
BLOCK code://go/auth/tls/cert_verification
|
|
Your code: TLS certificate verification is disabled (main.go:12)
|
|
Note: Previous acknowledgment expired 2026-12-31
|
|
Action: Re-acknowledge or fix the issue
|
|
```
|
|
|
|
**Enterprise Value:** "Exceptions don't become permanent." SOC 2 auditors love time-limited exceptions because they force periodic review.
|
|
|
|
---
|
|
|
|
### 10.2 Human-Readable Signer Names ⬜
|
|
|
|
**Impact:** MEDIUM | **Effort:** MEDIUM | **Priority:** P2
|
|
|
|
Map issuer hex IDs to human-readable team names in output.
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Add `signer_name: Option<String>` to `PackHeader` | ⬜ |
|
|
| Add `contact: Option<String>` to `PackHeader` (Slack channel, email) | ⬜ |
|
|
| Update `policy export/import` to preserve new fields | ⬜ |
|
|
| Show "Signed by Platform Security Team" instead of hex in output | ⬜ |
|
|
| Show contact info in conflict output | ⬜ |
|
|
| Backward-compat: gracefully handle packs without new fields | ⬜ |
|
|
|
|
**Output with signer name:**
|
|
```
|
|
BLOCK code://go/auth/tls/cert_verification
|
|
Your code: TLS certificate verification is disabled (main.go:12)
|
|
Source: Acme Security Standard v3.2 (Platform Security Team)
|
|
Contact: #security-policy
|
|
Action: Fix or acknowledge with: aphoria ack <path> --reason "..."
|
|
```
|
|
|
|
**Enterprise Value:** Developers know who to contact. Auditors see clear attribution.
|
|
|
|
---
|
|
|
|
### 10.3 Speed Benchmarks ⬜
|
|
|
|
**Impact:** LOW | **Effort:** LOW | **Priority:** P3
|
|
|
|
Document and automate speed benchmark testing.
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `benchmarks/` directory with test corpora | ⬜ |
|
|
| Automate `time aphoria scan` on standard corpus | ⬜ |
|
|
| Document test conditions in benchmark results | ⬜ |
|
|
| Add `aphoria scan --benchmark` flag for self-test | ⬜ |
|
|
| Include benchmarks in CI (optional, non-blocking) | ⬜ |
|
|
|
|
**Usage:**
|
|
```bash
|
|
# Run benchmark on current directory
|
|
aphoria scan --benchmark
|
|
|
|
# Output includes timing breakdown
|
|
Benchmark Results:
|
|
Files scanned: 767
|
|
Lines of code: 187,918
|
|
Claims extracted: 722
|
|
Conflicts found: 186
|
|
Total time: 652ms
|
|
- File discovery: 45ms
|
|
- Extraction: 487ms
|
|
- Conflict query: 120ms
|
|
```
|
|
|
|
**Enterprise Value:** "Show me the benchmark on a 100K-line codebase" → `aphoria scan --benchmark`
|
|
|
|
---
|
|
|
|
### Phase 10 Completion Criteria
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| Ack expiry working with 90d default | ✓ |
|
|
| Demo output matches pitch slides exactly | ✓ |
|
|
| Buyer can see who signed a policy (name, not hex) | ✓ |
|
|
| Buyer can see how to contact policy owner | ✓ |
|
|
| Speed benchmarks documented and reproducible | ✓ |
|
|
|
|
|
|
---
|
|
|
|
## Phase 11: Evidence-Based Authority ✅
|
|
|
|
> **Vision:** Authority comes from evidence, not titles. Merit over tenure.
|
|
|
|
**Problem:** All patterns treated equally. A random commit carries the same weight as a pattern backed by RFC research and product specs.
|
|
|
|
**Principle:** The system rewards documentation, not tenure.
|
|
|
|
### Evidence Levels
|
|
|
|
| Level | Example | Authority Weight | Graduation Threshold |
|
|
|-------|---------|------------------|---------------------|
|
|
| ProductSpec | `specs/api-design.md → REQ-API-001` | 0.95 | 1 usage |
|
|
| Standard | RFC 7519, OWASP A03:2021 | 0.85 | 3 usages |
|
|
| Research | ADR-042, docs/decision-log.md | 0.70 | 5 usages |
|
|
| Commit | Just code, no context | 0.40 | 10 usages |
|
|
|
|
### 11.1 Evidence Level Types ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `src/evidence/mod.rs` module | ✅ |
|
|
| Define `EvidenceLevel` enum (Commit, Research, Standard, ProductSpec) | ✅ |
|
|
| Implement `authority_weight()` method | ✅ |
|
|
| Add evidence level to `LearnedPattern` struct | ✅ |
|
|
| Update pattern display to show evidence level | ✅ |
|
|
|
|
### 11.2 Evidence Source Detection ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `EvidenceSource` enum | ✅ |
|
|
| Implement commit message parsing for RFC/standard references | ✅ |
|
|
| Implement ADR file detection (docs/adr/*.md patterns) | ✅ |
|
|
| Implement spec file detection (specs/*.md, *.spec.md) | ✅ |
|
|
| Add `PatternEvidence::detect()` auto-detection | ✅ |
|
|
|
|
### 11.3 Evidence-Aware Graduation ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Update `GraduationManager` thresholds based on evidence | ✅ |
|
|
| ProductSpec: 1 usage → promotion candidate | ✅ |
|
|
| Standard: 3 usages → promotion candidate | ✅ |
|
|
| Research: 5 usages → promotion candidate | ✅ |
|
|
| Commit-only: 10 usages → promotion candidate | ✅ |
|
|
| Add evidence boost to shadow mode evaluation | ✅ |
|
|
|
|
### 11.4 Evidence Display ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Update `aphoria patterns show` to display evidence chain | ✅ |
|
|
| Show evidence level badge in table/JSON output | ✅ |
|
|
| Show linked sources (ADR, spec, RFC) in conflict output | ✅ |
|
|
| Add `--evidence` flag to filter patterns by evidence level | ✅ |
|
|
|
|
### Phase 11 Completion Criteria
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| Evidence detection working for 4 source types | ✅ |
|
|
| Graduation thresholds vary by evidence level | ✅ |
|
|
| Pattern display shows evidence chain | ✅ |
|
|
| ProductSpec-backed patterns graduate with 1 usage | ✅ |
|
|
|
|
### Implementation Notes
|
|
|
|
**Files Created:**
|
|
- `src/evidence/mod.rs` - Module exports with flow documentation
|
|
- `src/evidence/types.rs` - `EvidenceLevel`, `EvidenceSource`, `PatternEvidence` types
|
|
- `src/evidence/detection.rs` - `EvidenceDetector` with regex-based parsing
|
|
|
|
**Files Modified:**
|
|
- `src/learning/types.rs` - Added `evidence` field to `LearnedPattern`
|
|
- `src/learning/store.rs` - Added `get_all_patterns()`, `get_pattern_by_id()`
|
|
- `src/shadow/types.rs` - Added `evidence_level`, `evidence_sources` to `ShadowTest`
|
|
- `src/shadow/graduation.rs` - Added `effective_min_scans()`, `meets_evidence_aware_criteria()`
|
|
- `src/cli.rs` - Added `Show` variant to `PatternCommands`
|
|
- `src/handlers/patterns.rs` - Implemented `handle_pattern_show()`
|
|
|
|
**Tests:** 29 evidence tests + 15 graduation tests passing (817 total)
|
|
|
|
---
|
|
|
|
## Phase 12: Knowledge Scope Hierarchy ✅
|
|
|
|
> **Vision:** Knowledge applies at the right level - org, team, or project.
|
|
|
|
**Problem:** All knowledge exists at one flat level. No way to say "this applies org-wide" vs "this is just our team's preference."
|
|
|
|
### Scope Levels
|
|
|
|
```
|
|
Organization Level (applies to all teams)
|
|
├── Security policies (TLS, auth, secrets) - NO opt-out
|
|
├── Compliance requirements (GDPR, SOC 2)
|
|
└── Architecture decisions (API gateway, event bus)
|
|
|
|
Team Level (applies to team's projects)
|
|
├── Coding conventions (naming, error handling)
|
|
├── Technology choices (frameworks, libraries)
|
|
└── Domain patterns (payment flows, user lifecycle)
|
|
|
|
Project Level (applies to single project)
|
|
├── Local overrides (justified exceptions)
|
|
├── Experimental patterns (not yet proven)
|
|
└── Context-specific decisions
|
|
```
|
|
|
|
### 12.1 Scope Level Types ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `src/scope/mod.rs` module | ✅ |
|
|
| Define `ScopeLevel` enum (Organization, Team, Project) | ✅ |
|
|
| Add `scope_level` and `scope_id` to `LearnedPattern` | ✅ |
|
|
| Add `ScopeConfig` to `.aphoria.toml` | ✅ |
|
|
| Implement `--scope` flag for CLI commands | ✅ |
|
|
|
|
### 12.2 Scope Inheritance ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Implement inheritance resolution (project → team → org) | ✅ |
|
|
| Security policies: auto-apply, no opt-out | ✅ |
|
|
| Conventions: auto-apply, teams can override with justification | ✅ |
|
|
| Observations: never inherited, team-specific only | ✅ |
|
|
| Add `ScopedKnowledge` struct with `inherited_from` chain | ✅ |
|
|
|
|
### 12.3 Scope Override Workflow ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Implement `aphoria scope override` command | ✅ |
|
|
| Require justification for overrides | ✅ |
|
|
| Require evidence link (spec, ADR, ticket) for overrides | ✅ |
|
|
| Store override audit trail | ✅ |
|
|
| Show overrides in SOC 2 reports | ⬜ |
|
|
|
|
### 12.4 Cross-Scope Queries ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `aphoria patterns --scope org` (org-level only) | ✅ |
|
|
| `aphoria patterns --scope team --exclude-inherited` | ✅ |
|
|
| `aphoria patterns --scope project --only-local` | ✅ |
|
|
| Show scope in pattern list output | ✅ |
|
|
|
|
### Phase 12 Completion Criteria
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| 3 scope levels working (org/team/project) | ✅ |
|
|
| Inheritance resolution correct | ✅ |
|
|
| Overrides require justification + evidence | ✅ |
|
|
| Cross-scope queries functional | ✅ |
|
|
|
|
**Implementation Notes:**
|
|
- `src/scope/mod.rs` - ScopeLevel, ScopeId, ScopeContext with inheritance chain
|
|
- `src/scope/config.rs` - ScopeConfig for aphoria.toml
|
|
- `src/scope/resolver.rs` - ScopeResolver with Replace/Merge/NoInherit policies
|
|
- `src/scope/override_record.rs` - ScopeOverride with OverrideValue, expiration
|
|
- `src/scope/store.rs` - OverrideStore with persistence to ~/.aphoria/scope/
|
|
- `src/handlers/scope.rs` - CLI command handlers (status, override, list, remove)
|
|
|
|
**Tests:** 884 tests passing, all scope tests passing
|
|
|
|
---
|
|
|
|
## Phase 13: Knowledge Lifecycle Management ✅
|
|
|
|
> **Vision:** Knowledge ages. Patterns can be deprecated and superseded.
|
|
|
|
**Problem:** Knowledge exists forever. No way to deprecate patterns or track evolution.
|
|
|
|
### Knowledge Status
|
|
|
|
```
|
|
Active → Pattern is current, enforced
|
|
Deprecated → Pattern is being phased out, migration guidance provided
|
|
Superseded → Pattern replaced by another, link to replacement
|
|
Archived → Pattern removed from active use, historical only
|
|
```
|
|
|
|
### 13.1 Knowledge Status Types ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `src/lifecycle/mod.rs` module | ✅ |
|
|
| Define `KnowledgeStatus` enum | ✅ |
|
|
| Add `Deprecated` variant with reason, superseded_by, sunset_date | ✅ |
|
|
| Add `KnowledgeLifecycle` struct with status history | ✅ |
|
|
| Store lifecycle in pattern metadata | ✅ |
|
|
|
|
### 13.2 Deprecation Command ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Implement `aphoria deprecate <pattern-id>` command | ✅ |
|
|
| Require `--reason` flag | ✅ |
|
|
| Optional `--superseded-by <new-pattern>` | ✅ |
|
|
| Optional `--sunset-date <ISO-8601>` | ✅ |
|
|
| Notify connected teams on deprecation | ⬜ |
|
|
|
|
### 13.3 Migration Guidance ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Show deprecation warning in scan output | ✅ |
|
|
| Link to superseding pattern when available | ✅ |
|
|
| Show migration guide/ADR when linked | ✅ |
|
|
| FLAG (not BLOCK) deprecated pattern usage | ✅ |
|
|
| Track migration progress across projects | ✅ |
|
|
|
|
### 13.4 Migration Tracking Dashboard ✅
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Implement `aphoria migrations status` command | ✅ |
|
|
| Show progress by team (X/Y endpoints migrated) | ✅ |
|
|
| Show days remaining until sunset | ✅ |
|
|
| Show blockers (acknowledged exceptions) | ✅ |
|
|
| Export migration status for reporting | ✅ |
|
|
|
|
### Phase 13 Completion Criteria
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| Deprecation command working | ✅ |
|
|
| Deprecated patterns show warning in scan | ✅ |
|
|
| Migration tracking across projects | ✅ |
|
|
| SOC 2 report includes migration status | ⬜ |
|
|
|
|
**Implementation Notes:**
|
|
- `src/lifecycle/mod.rs` - KnowledgeStatus, KnowledgeLifecycle, StatusTransition
|
|
- `src/lifecycle/store.rs` - LifecycleStore for persistence
|
|
- `src/lifecycle/migration.rs` - MigrationStore, MigrationProgress tracking
|
|
- `src/handlers/lifecycle.rs` - CLI handlers for deprecate, archive, reactivate, history, list
|
|
- `src/handlers/lifecycle.rs` - Migration handlers for status, export, blockers
|
|
- `KnowledgeLifecycle` added to `LearnedPattern` for pattern-level lifecycle tracking
|
|
|
|
**Tests:** 884 tests passing (35 lifecycle-specific tests)
|
|
|
|
---
|
|
|
|
## Phase 14: Governance Workflows 🎯
|
|
|
|
> **Vision:** Clear approval paths for pattern promotion with audit trails.
|
|
|
|
**Problem:** Governance is binary: manual review or >0.95 auto-promote. No structured approval workflows.
|
|
|
|
### 14.1 Approval Workflow Definition ⬜
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `src/governance/mod.rs` module | ⬜ |
|
|
| Define `ApprovalWorkflow` struct | ⬜ |
|
|
| Define `ApprovalStage` with required approvers | ⬜ |
|
|
| Support evidence-based auto-approve thresholds | ⬜ |
|
|
| Config: define workflows in `.aphoria.toml` | ⬜ |
|
|
|
|
### 14.2 Approval State Machine ⬜
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Implement state transitions (pending → approved/rejected) | ⬜ |
|
|
| Multi-stage approval support | ⬜ |
|
|
| Timeout and escalation policies | ⬜ |
|
|
| Store approval history with timestamps | ⬜ |
|
|
|
|
### 14.3 Approval CLI ⬜
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| `aphoria governance pending` - list pending approvals | ⬜ |
|
|
| `aphoria governance approve <id> --comment "..."` | ⬜ |
|
|
| `aphoria governance reject <id> --reason "..."` | ⬜ |
|
|
| `aphoria governance escalate <id>` | ⬜ |
|
|
| Show approval status in pattern list | ⬜ |
|
|
|
|
### 14.4 SOC 2 Audit Trail ⬜
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Full audit log for all governance actions | ⬜ |
|
|
| `aphoria audit trail --pattern <id>` - show timeline | ⬜ |
|
|
| Export governance history for auditors | ⬜ |
|
|
| Include approver identity and timestamp | ⬜ |
|
|
|
|
### Phase 14 Completion Criteria
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| Multi-stage approval working | ✓ |
|
|
| Approval/reject with comments | ✓ |
|
|
| Full audit trail exportable | ✓ |
|
|
| SOC 2 evidence includes approval chain | ✓ |
|
|
|
|
---
|
|
|
|
## Phase 15: Evidence Source Integration ⬜
|
|
|
|
> **Vision:** ADRs, specs, and standards automatically link to patterns.
|
|
|
|
**Problem:** Evidence sources aren't automatically detected. Developers must manually reference them.
|
|
|
|
### 15.1 ADR Auto-Detection ⬜
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `src/evidence/adr.rs` | ⬜ |
|
|
| Detect ADR-XXX patterns in commit messages | ⬜ |
|
|
| Scan for ADR files in standard locations | ⬜ |
|
|
| Parse ADR content for related patterns | ⬜ |
|
|
| Link ADR to patterns automatically | ⬜ |
|
|
|
|
### 15.2 Spec File Detection ⬜
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `src/evidence/spec.rs` | ⬜ |
|
|
| Detect spec files (specs/*.md, *.spec.md) | ⬜ |
|
|
| Parse requirement IDs (REQ-XXX) | ⬜ |
|
|
| Link requirements to patterns | ⬜ |
|
|
| Show requirement coverage in reports | ⬜ |
|
|
|
|
### 15.3 Standard Reference Extraction ⬜
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Create `src/evidence/standards.rs` | ⬜ |
|
|
| Parse RFC references (RFC 7519) | ⬜ |
|
|
| Parse OWASP references (OWASP A03:2021) | ⬜ |
|
|
| Parse NIST references (NIST SP 800-53) | ⬜ |
|
|
| Auto-link to authoritative corpus | ⬜ |
|
|
|
|
### 15.4 Evidence Display ⬜
|
|
|
|
| Task | Status |
|
|
|------|--------|
|
|
| Show full evidence chain in pattern output | ⬜ |
|
|
| Link to source files (ADR, spec) | ⬜ |
|
|
| Show external standard references | ⬜ |
|
|
| `aphoria patterns --by-evidence` grouping | ⬜ |
|
|
|
|
### Phase 15 Completion Criteria
|
|
|
|
| Metric | Target |
|
|
|--------|--------|
|
|
| ADR auto-detection working | ✓ |
|
|
| Spec file linking working | ✓ |
|
|
| Standard references extracted | ✓ |
|
|
| Evidence chain visible in output | ✓ |
|
|
|
|
---
|
|
|
|
## Enterprise Pilot Success Metrics
|
|
|
|
### 90-Day Pilot Targets
|
|
|
|
| Metric | Target | Measurement |
|
|
|--------|--------|-------------|
|
|
| Patterns captured | 100+ observations | Count in knowledge graph |
|
|
| Patterns promoted | 10+ conventions | Count with status=Active |
|
|
| Cross-team adoption | 2+ teams connected | Unique team_ids |
|
|
| New hire guidance events | 5+ accepted suggestions | Accept rate tracking |
|
|
| False positive rate | <10% | FP feedback / total flags |
|
|
| Evidence-backed patterns | >50% | Patterns with Research+ evidence |
|
|
|
|
### 180-Day Production Targets
|
|
|
|
| Metric | Target | Measurement |
|
|
|--------|--------|-------------|
|
|
| Knowledge retention | 0 lost patterns on departures | Audit log |
|
|
| Onboarding velocity | 50% faster ramp | Time to first PR |
|
|
| Convention adoption | 80% across org | Compliance rate |
|
|
| SOC 2 evidence | Audit pass | External validation |
|
|
| Deprecated pattern migration | 90% complete by sunset | Migration tracking |
|
|
|
|
---
|
|
|
|
## Enterprise Simulation UAT
|
|
|
|
See: `uat/enterprise-simulation-uat.md`
|
|
|
|
6-month simulation covering:
|
|
- Month 1: Platform team adopts, baseline patterns captured
|
|
- Month 2: Payments team joins, cross-team patterns emerge
|
|
- Month 3: New hire guided by existing patterns
|
|
- Month 4: Mobile team joins, org-level promotion
|
|
- Month 5: API versioning deprecated, migration tracked
|
|
- Month 6: SOC 2 audit evidence generated
|
|
|