# Aphoria Comprehensive Vision UAT Plan **Date:** 2026-02-06 **Status:** Complete (90/90 tests passing) **Purpose:** Verify that Aphoria delivers on its complete vision across all user personas and use cases --- ## Vision Summary Aphoria's complete vision encompasses three layers: 1. **Core Value:** A "code-level truth linter" that validates code against authoritative sources (RFCs, OWASP, vendor docs) 2. **Enterprise Value:** Federated policy management via Trust Packs — "turn your decisions into enforceable standards" 3. **Protocol Vision:** The Epistemic Assertion Protocol (EAP) — a universal standard for truth publishing, making Aphoria the "DNS for Truth" --- ## User Personas | Persona | Role | Primary Use Cases | |---------|------|-------------------| | **Solo Developer** | Individual contributor | Pre-commit checks, RFC compliance, avoiding common mistakes | | **Security Engineer** | AppSec team member | Scan projects for security misconfigurations, create org-wide policies | | **Platform Lead** | Staff engineer | Define "Golden Path" patterns, distribute standards to teams | | **Compliance Officer** | GRC team member | Audit multiple projects, trace conflicts to authoritative sources | | **AI Agent** | Autonomous code agent | Pre-flight check before commits, query authority before implementing | --- ## UAT Categories ### Category 1: Core Detection (The "Linter" Value) > **Vision claim:** "Aphoria scans a codebase, extracts the decisions embedded in config and code, and checks them against authoritative sources." #### 1.1 Authoritative Source Conflict Detection | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 1.1.1 | TLS verification disabled (Python `verify=False`) | Conflict with RFC 5246, BLOCK verdict | P0 | | 1.1.2 | TLS verification disabled (Rust `danger_accept_invalid_certs`) | Conflict with RFC 5246, BLOCK verdict | P0 | | 1.1.3 | TLS verification disabled (Go `InsecureSkipVerify`) | Conflict with RFC 5246, BLOCK verdict | P0 | | 1.1.4 | JWT audience validation disabled | Conflict with RFC 7519, BLOCK verdict | P0 | | 1.1.5 | Hardcoded secrets in source | Conflict with OWASP Secrets Cheatsheet, BLOCK verdict | P0 | | 1.1.6 | CORS allow-all-origins | Conflict with OWASP Headers Cheatsheet, FLAG verdict | P0 | | 1.1.7 | Zero timeout configuration | Conflict with vendor best practices, FLAG verdict | P1 | | 1.1.8 | SQL injection pattern (string concat) | Conflict with OWASP Input Validation, BLOCK verdict | P0 | | 1.1.9 | Command injection pattern | Conflict with OWASP Input Validation, BLOCK verdict | P0 | | 1.1.10 | Weak crypto (MD5/SHA1 for security) | Conflict with OWASP Crypto Cheatsheet, BLOCK verdict | P0 | **Success Criteria:** - [ ] All P0 tests pass with correct verdict - [ ] Precision ≥95% (minimal false positives) - [ ] Every BLOCK verdict has an RFC/OWASP citation #### 1.2 Cross-Language Consistency | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 1.2.1 | Same TLS issue detected in Rust, Go, Python, JS | Same conflict, same verdict across languages | P0 | | 1.2.2 | Same JWT issue detected across languages | Same conflict, same verdict | P0 | | 1.2.3 | YAML/TOML config file detection | Config issues detected regardless of language | P0 | **Success Criteria:** - [ ] Language parity: same issue → same verdict in all supported languages #### 1.3 Precision and Recall | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 1.3.1 | VulnBank benchmark (intentionally vulnerable) | ≥50 findings, 100% precision | P0 | | 1.3.2 | Real-world project scan (Citadel/Masq) | Findings with ≥95% precision | P0 | | 1.3.3 | False positive rate on clean codebase | <5% false positive rate | P0 | | 1.3.4 | Test file handling | Lower confidence, not flagged as BLOCK | P1 | **Success Criteria:** - [ ] VulnBank: 100% precision (every finding is real) - [ ] Real-world: ≥95% precision, ≥5 distinct issues --- ### Category 2: Enterprise Policy (The "Trust Pack" Value) > **Vision claim:** "Organizations often have internal rules that override or extend public standards. Aphoria allows you to export these decisions as Trust Packs." #### 2.1 Policy Creation Workflow | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 2.1.1 | `aphoria bless` creates policy assertion | Assertion stored with reason, signed | P0 | | 2.1.2 | `aphoria ack` creates acknowledgment | Acknowledgment stored with reason | P0 | | 2.1.3 | `aphoria policy export` creates .pack file | Signed binary pack with assertions | P0 | | 2.1.4 | Export includes both blessed and acked assertions | All policy decisions exported | P0 | **Success Criteria:** - [ ] Complete round-trip: bless → export → import → conflict #### 2.2 Policy Distribution | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 2.2.1 | Local `.pack` file import | Assertions imported, conflicts detected | P0 | | 2.2.2 | HTTP URL policy import | Remote pack downloaded, cached | P0 | | 2.2.3 | Multiple packs, no conflict | Both policies enforced | P0 | | 2.2.4 | Multiple packs, same concept, different values | Conflict visible, user can choose | P1 | | 2.2.5 | Pack version update (v1 → v2) | v2 supersedes v1 | P1 | **Success Criteria:** - [ ] Enterprise workflow script passes (12/12) - [ ] Multi-pack import works without data loss #### 2.3 Policy Attribution | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 2.3.1 | Conflict shows pack name | `policy_source.pack_name` in JSON | P0 | | 2.3.2 | Conflict shows pack version | `policy_source.pack_version` in JSON | P0 | | 2.3.3 | Conflict shows issuer | `policy_source.issuer_hex` in JSON | P0 | | 2.3.4 | Attribution in all formats | JSON, table, markdown, SARIF | P0 | **Success Criteria:** - [ ] Developer can trace any conflict to "who decided this" #### 2.4 Predicate Aliases | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 2.4.1 | `enabled` matches `required` | Same-meaning predicates conflict | P1 | | 2.4.2 | Pack-defined aliases | Custom alias sets work | P2 | **Success Criteria:** - [ ] Semantic predicate matching prevents bypasses --- ### Category 3: Pre-Commit Integration (The "Full Cycle" Value) > **Vision claim:** "The pre-commit hook is a bidirectional knowledge sync, not just a read-only linter." #### 3.1 Fast Scanning | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 3.1.1 | Ephemeral scan (default) | <0.5s for typical project | P0 | | 3.1.2 | Staged-only scan (`--staged`) | <0.5s, only staged files scanned | P0 | | 3.1.3 | No storage created in ephemeral mode | No WAL/store directories | P0 | **Success Criteria:** - [ ] Pre-commit hook doesn't slow down development workflow #### 3.2 Observation Recording | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 3.2.1 | `--sync` records observations | Novel claims stored as Tier 4 | P1 | | 3.2.2 | Observations survive across commits | Persistent local knowledge | P1 | | 3.2.3 | `--sync` requires `--persist` | Validation error otherwise | P0 | **Success Criteria:** - [ ] Project builds local memory over time #### 3.3 Drift Detection | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 3.3.1 | Value changed from prior observation | DRIFT verdict shown | P1 | | 3.3.2 | Drift in table/json/markdown output | All formats show drift | P1 | | 3.3.3 | `--exit-code` returns 1 for drift | CI can catch unintentional changes | P1 | **Success Criteria:** - [ ] Accidental configuration changes are caught #### 3.4 Exit Codes | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 3.4.1 | No conflicts → exit 0 | Clean scan passes | P0 | | 3.4.2 | FLAG only → exit 1 | Review recommended | P0 | | 3.4.3 | BLOCK → exit 2 | Build should fail | P0 | | 3.4.4 | Without `--exit-code` → always exit 0 | Interactive mode | P0 | **Success Criteria:** - [ ] CI/CD integration works correctly --- ### Category 4: LLM Extraction (The "Intelligent" Value) > **Vision claim:** "Use LLM to extract claims semantically during persistent scans. This fills gaps that regex extractors can't catch." #### 4.1 LLM Triggering | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 4.1.1 | High-value file (auth/, crypto/) | LLM extraction runs | P1 | | 4.1.2 | Non-high-value file | LLM extraction skipped | P1 | | 4.1.3 | File already covered by regex extractors | LLM extraction skipped | P1 | | 4.1.4 | Token budget exceeded | Graceful stop, no crash | P1 | **Success Criteria:** - [ ] LLM only runs when valuable, stays within budget #### 4.2 LLM Quality | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 4.2.1 | Evaluation fixtures pass | Baseline quality maintained | P1 | | 4.2.2 | No regressions from prompt changes | Regression tests pass | P2 | | 4.2.3 | Response parsing handles edge cases | No crashes on malformed JSON | P1 | **Success Criteria:** - [ ] LLM extraction quality is measurable and stable #### 4.3 Pattern Learning | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 4.3.1 | LLM-extracted claim → pattern stored | LocalPatternStore updated | P2 | | 4.3.2 | Similar pattern → merged, not duplicated | Deduplication works | P2 | | 4.3.3 | Pattern seen in 5+ projects → promotion candidate | Threshold triggers | P2 | **Success Criteria:** - [ ] Learning system builds knowledge over time --- ### Category 5: Declarative Extractors (The "Extensibility" Value) > **Vision claim:** "Enable users to define new extractors in config/policy files (TOML) without writing Rust code." #### 5.1 Custom Extractors | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 5.1.1 | TOML-defined extractor runs | Claims extracted using custom regex | P0 | | 5.1.2 | Invalid regex rejected at load time | Clear error, doesn't block others | P0 | | 5.1.3 | ReDoS-vulnerable regex rejected | Security protection | P0 | | 5.1.4 | `value_from_match` captures groups | Dynamic claim values | P1 | **Success Criteria:** - [ ] Users can add extractors without recompiling #### 5.2 Extractor Promotion | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 5.2.1 | `aphoria extractors candidates` lists promotable patterns | Threshold-meeting patterns shown | P2 | | 5.2.2 | `aphoria extractors promote` generates YAML | Extractor file created | P2 | | 5.2.3 | Interactive review workflow | Approve/reject/skip options | P2 | **Success Criteria:** - [ ] Learning → promotion pipeline is functional --- ### Category 6: Output Formats (The "Integration" Value) > **Vision claim:** "SARIF for CI integration... structured JSON/SARIF for dashboard integration." #### 6.1 Format Correctness | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 6.1.1 | JSON output is valid JSON | Parses correctly | P0 | | 6.1.2 | SARIF output is valid SARIF 2.1.0 | Schema validates | P0 | | 6.1.3 | Markdown output is valid markdown | Renders correctly | P0 | | 6.1.4 | Table output is human-readable | Aligned, clear | P0 | **Success Criteria:** - [ ] All formats pass validation #### 6.2 Format Completeness | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 6.2.1 | All formats show file location | File + line for each conflict | P0 | | 6.2.2 | All formats show conflict score | Score visible | P0 | | 6.2.3 | All formats show verdict | BLOCK/FLAG/ACK/DRIFT visible | P0 | | 6.2.4 | All formats show policy source (if applicable) | Attribution visible | P0 | **Success Criteria:** - [ ] No information loss between formats --- ### Category 7: Domain-Specific Audits (The "Vertical" Value) > **Vision claim:** "Aphoria is not limited to web security. It includes specialized corpora for different domains." #### 7.1 Unreal Engine | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 7.1.1 | `LoadSynchronous()` on game thread detected | BLOCK verdict | P1 | | 7.1.2 | Hardcoded asset paths detected | FLAG verdict | P2 | | 7.1.3 | Exposed console commands detected | FLAG verdict | P2 | **Success Criteria:** - [ ] Masq UAT passes (7 findings, 100% precision) #### 7.2 Framework-Specific Security | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 7.2.1 | Spring Security misconfiguration | Conflict detected | P2 | | 7.2.2 | Django ALLOWED_HOSTS = ["*"] | Conflict detected | P2 | | 7.2.3 | Flask DEBUG=True in production | Conflict detected | P2 | **Success Criteria:** - [ ] Framework extractors detect common misconfigurations --- ### Category 8: The "Protocol Vision" (Long-Term) > **Vision claim:** "Aphoria is not just a linter; it is the Reference Implementation (Browser) for this new web of data." #### 8.1 EAP Readiness (Future) | Test ID | Scenario | Expected Outcome | Priority | |---------|----------|------------------|----------| | 8.1.1 | Consume `.eap.json` manifest | EAP format supported | P3 | | 8.1.2 | Publish project observations as EAP | Export to EAP format | P3 | | 8.1.3 | Multi-source aggregation | RFC + OWASP + Vendor + Policy unified | P3 | **Success Criteria:** - [ ] Foundation for "DNS for Truth" is laid --- ## UAT Execution Plan ### Phase 1: Core Detection (Week 1) **Goal:** Prove the core value proposition works across languages. | Day | Focus | Tests | |-----|-------|-------| | 1 | VulnBank benchmark | 1.3.1 | | 2 | Cross-language TLS/JWT | 1.1.1-1.1.5, 1.2.1-1.2.3 | | 3 | OWASP patterns | 1.1.6-1.1.10 | | 4 | False positive analysis | 1.3.3-1.3.4 | | 5 | Report validation | 6.1.1-6.2.4 | **Deliverable:** UAT report with precision/recall metrics ### Phase 2: Enterprise Policy (Week 2) **Goal:** Prove Trust Pack workflow is production-ready. | Day | Focus | Tests | |-----|-------|-------| | 1 | Policy creation | 2.1.1-2.1.4 | | 2 | Policy distribution | 2.2.1-2.2.5 | | 3 | Policy attribution | 2.3.1-2.3.4 | | 4 | Multi-pack scenarios | 2.2.3-2.2.4 | | 5 | End-to-end workflow | Full enterprise script | **Deliverable:** UAT report with enterprise workflow validation ### Phase 3: Pre-Commit Integration (Week 3) **Goal:** Prove Aphoria works seamlessly in development workflow. | Day | Focus | Tests | |-----|-------|-------| | 1 | Performance | 3.1.1-3.1.3 | | 2 | Exit codes | 3.4.1-3.4.4 | | 3 | Observation recording | 3.2.1-3.2.3 | | 4 | Drift detection | 3.3.1-3.3.3 | | 5 | CI/CD integration | GitHub Actions, pre-commit hook | **Deliverable:** UAT report with performance benchmarks ### Phase 4: Advanced Features (Week 4) **Goal:** Prove LLM, learning, and extensibility work. | Day | Focus | Tests | |-----|-------|-------| | 1 | LLM triggering | 4.1.1-4.1.4 | | 2 | LLM quality | 4.2.1-4.2.3 | | 3 | Declarative extractors | 5.1.1-5.1.4 | | 4 | Domain-specific | 7.1.1-7.2.3 | | 5 | End-to-end user journey | All personas | **Deliverable:** UAT report with feature completeness matrix --- ## Automated Test Scripts ### All Scripts | Script | Purpose | Tests | Status | |--------|---------|-------|--------| | `test-core-detection.sh` | Category 1: Core detection tests | 10 | PASS (10/10) | | `test-cross-language.sh` | Category 1.2: Cross-language parity | 3 | PASS (3/3) | | `test-declarative-extractors.sh` | Category 5: Custom extractor loading | 6 | PASS (6/6) | | `test-domain-frameworks.sh` | Category 7.2: Framework security | 11 | PASS (11/11) | | `test-domain-unreal.sh` | Category 7.1: Unreal Engine | 4 | PASS (4/4) | | `test-drift-detection.sh` | Category 3.2-3.3: Observation/drift | 6 | PASS (6/6) | | `test-enterprise-workflow.sh` | Category 2: Trust Pack round-trip | 12 | PASS (12/12) | | `test-eval-harness.sh` | Category 4.2: LLM evaluation harness | 4 | PASS (4/4) | | `test-exit-codes.sh` | Category 3.4: Exit code validation | 4 | PASS (4/4) | | `test-llm-extraction.sh` | Category 4.1: LLM quality gates | 5 | PASS (5/5) | | `test-multi-pack-conflict.sh` | Category 2.2: Multiple pack behavior | 7 | PASS (7/7) | | `test-output-formats.sh` | Category 6: Format validation | 8 | PASS (8/8) | | `test-pack-version-update.sh` | Category 2.2.5: Version supersession | 6 | PASS (6/6) | | `test-precommit-performance.sh` | Category 3.1: Performance benchmarks | 4 | PASS (4/4) | **Total: 14 scripts, 90 tests** ### Summary by Category | Category | Scripts | Tests | Status | |----------|---------|-------|--------| | 1. Core Detection | 2 | 13 | PASS | | 2. Enterprise Policy | 3 | 25 | PASS | | 3. Pre-Commit | 3 | 14 | PASS | | 4. LLM Extraction | 2 | 9 | PASS | | 5. Declarative Extractors | 1 | 6 | PASS | | 6. Output Formats | 1 | 8 | PASS | | 7. Domain-Specific | 2 | 15 | PASS | --- ## Success Criteria Summary ### Minimum Viable UAT (MVP) | Criterion | Threshold | Measured By | |-----------|-----------|-------------| | Core precision | ≥95% | VulnBank + real-world scan | | Cross-language parity | 100% | Same issue → same verdict | | Enterprise workflow | 12/12 pass | test-enterprise-workflow.sh | | Ephemeral scan time | <0.5s | Performance benchmark | | Exit code correctness | 4/4 pass | test-exit-codes.sh | | Format validity | 4/4 valid | test-output-formats.sh | ### Full Vision UAT | Criterion | Threshold | Measured By | |-----------|-----------|-------------| | All P0 tests pass | 100% | Test matrix | | All P1 tests pass | ≥90% | Test matrix | | User journey complete | All 5 personas | End-to-end walkthrough | | Drift detection works | DRIFT shown, exit 1 | test-drift-detection.sh | | LLM extraction quality | Baseline maintained | Eval fixtures | --- ## Appendix: Test Fixtures ### Fixture: VulnBank Location: External (clone separately) Purpose: Intentionally vulnerable polyglot codebase for precision testing ### Fixture: Citadel/Masq Location: Real customer project (NDA) Purpose: Real-world precision testing ### Fixture: Clean Codebase Location: `uat/fixtures/clean-project/` Purpose: False positive rate testing ### Fixture: LLM Evaluation Location: `applications/aphoria/tests/fixtures/` (via eval harness) Purpose: LLM extraction quality regression --- ## Change Log | Date | Version | Changes | |------|---------|---------| | 2026-02-06 | 1.0 | Initial comprehensive UAT plan | | 2026-02-06 | 2.0 | All 14 test scripts implemented, 90/90 tests passing |