# Aphoria CLI UAT Report **Date:** 2026-02-08 **Binary:** `aphoria 0.1.0` (release build, 13MB) **Target:** StemeDB codebase (~573 files, 112K LoC) **Claims file:** `applications/aphoria/.aphoria/claims.toml` (10 claims) --- ## Executive Summary | Metric | Value | |--------|-------| | Commands tested | 46 | | Pass (exit 0, correct output) | 43 | | Partial (works with caveats) | 2 | | Fail (exit != 0 or wrong output) | 1 | | **Weighted overall score** | **84.3 / 100** | | **Verdict** | **PASS** | --- ## Group 1: Smoke Tests (4 commands) | ID | Command | Exit | Time | Grade | Notes | |----|---------|------|------|-------|-------| | 1.1 | `--help` | 0 | <1s | **97** | Lists all 27 subcommands, clean formatting. Missing examples section. | | 1.2 | `scan` (table) | 0 | 10.9s | **78** | Works correctly. 2 BLOCKs found. Slightly over 10s target. Parallel extraction using all cores. | | 1.3 | `status` | 0 | <1s | **92** | Shows data dir, project root, baseline, agent key. Clean. | | 1.4 | `scan --format json` | 0 | ~11s | **90** | Valid JSON with keys: conflicts, deprecated_usages, drifts, project, scan_id, summary. | **Group 1 average: 89.3** --- ## Group 2: Scan Variants (7 commands) | ID | Command | Exit | Grade | Notes | |----|---------|------|-------|-------| | 2.1 | `scan --format markdown` | 0 | **95** | Clean markdown with table and detail sections. Ready for CI integration. | | 2.2 | `scan --format sarif` | 0 | **95** | Valid SARIF 2.1.0 with schema URL, 1 run, 2 results. IDE-ready. | | 2.3 | `scan --show-claims` | 0 | **90** | Shows all 2288 observations in 4607 lines. Table format with concept/value/file/line/confidence. | | 2.4 | `scan --benchmark` | 0 | **93** | Shows timing breakdown: discovery 18ms, extraction 11243ms, conflict 1ms. Very useful. | | 2.5 | `scan --staged` | 0 | **92** | Scans 13 staged files, 99 claims, 0 conflicts. Fast. | | 2.6 | `scan --strict` | 0 | **60** | Output identical to default scan. No visible difference in thresholds or behavior. Either strict is a no-op or thresholds only matter when scores are marginal. | | 2.7 | `scan --debug` | 0 | **65** | Adds "Authority: Tier X" line per finding. No conflict resolution traces, scoring breakdown, or query plan. Name implies more depth. | **Group 2 average: 84.3** --- ## Group 3: Claims (6 commands) **Note:** Claims commands require cwd = directory containing `.aphoria/claims.toml`. From project root, `claims list` shows "No claims found." This is a discoverability issue. | ID | Command | Exit | Grade | Notes | |----|---------|------|-------|-------| | 3.1 | `claims list` | 0 | **88** | Shows 10 claims in table: ID, Category, Tier, Status, Invariant. Clean formatting. | | 3.2 | `claims list --format json` | 0 | **92** | Valid JSON array of 10 claims. | | 3.3 | `claims explain` | 0 | **95** | Detailed markdown with concept, predicate, invariant, consequence, provenance, authority, evidence, status, author. Grouped by category. | | 3.4 | `claims explain --format json` | 0 | **78** | Valid JSON but returns flat array, not structured object with `type` field. Inconsistent with `explain --format json` which has `type: "onboarding"`. | | 3.5 | `claims explain --claim ` | 0 | **95** | Single claim detail, clean markdown. | | 3.6 | `claims list --category security` | 0 | **95** | Filtered to 6 security claims. Works correctly. | **Group 3 average: 90.5** --- ## Group 4: Verification (5 commands) | ID | Command | Exit | Grade | Notes | |----|---------|------|-------|-------| | 4.1 | `verify run` | 0 | **95** | Shows 1 PASS, 6 CONFLICT, 3 MISSING with observation evidence and consequences. Rich, actionable output. | | 4.2 | `verify run --format json` | 0 | **92** | Valid JSON: `{results: [...], summary: {pass:1, conflict:6, missing:3, unclaimed:1239}}`. | | 4.3 | `verify run --show-unclaimed` | 0 | **90** | Appends 1239 unclaimed observations. Long but correct. | | 4.4 | `verify map` | 0 | **97** | Shows claim→extractor mapping. 7/10 have extractors, 3 have "NO EXTRACTOR". Lists 2 extractors with predicates but no matching claims. Excellent. | | 4.5 | `verify run --format table` | 0 | **85** | Same as default (table is default). Flag accepted, no error. | **Group 4 average: 91.8** --- ## Group 5: Coverage & Docs (9 commands) | ID | Command | Exit | Grade | Notes | |----|---------|------|-------|-------| | 5.1 | `coverage` | 0 | **93** | Per-module table: Claims, Observations, Claimed, Unclaimed, Missing, Density. 33 modules. Summary at bottom. | | 5.2 | `coverage --format json` | 0 | **95** | Valid JSON: `{modules, project, summary}`. | | 5.3 | `coverage --format markdown` | 0 | **95** | Clean markdown with summary section and table. | | 5.4 | `coverage --sort-by density` | 0 | **85** | Sorts but many modules show 0.0% density, so ordering among zeroes is arbitrary. Works for non-zero modules. | | 5.5 | `coverage --sort-by unclaimed` | 0 | **90** | Correctly sorts by unclaimed count descending. Extractors (355) first. | | 5.6 | `explain` | 0 | **97** | Onboarding summary: categories table, verification health, coverage snapshot, top uncovered modules. Excellent first-touch UX. | | 5.7 | `explain --format json` | 0 | **97** | Valid JSON: `type: "onboarding"`, with categories, coverage, verification. | | 5.8 | `docs generate` | 0 | **90** | Full 224-line reference doc combining claims explain + verification + coverage. Comprehensive. | | 5.9 | `docs generate --format json` | 0 | **92** | Valid JSON: `type: "full_docs"`, with claims, coverage, verification. | **Group 5 average: 92.7** --- ## Group 6: Corpus & Trust Packs (3 commands) | ID | Command | Exit | Grade | Notes | |----|---------|------|-------|-------| | 6.1 | `corpus list` | 0 | **90** | Shows 4 source types: hardcoded (Tier 0), RFC (Tier 0), OWASP (Tier 1), Vendor (Tier 2). Lists specific sources. | | 6.2 | `corpus build --offline` | 0 | **90** | Builds 30 assertions (19 hardcoded + 11 vendor). Cleanly skips network sources. | | 6.3 | `trust-pack list` | 0 | **92** | Lists 3 packs: security-hardening, rfc-compliance, owasp-top10. Shows install command. | **Group 6 average: 90.7** --- ## Group 7: Learning & Extractors (4 commands) | ID | Command | Exit | Grade | Notes | |----|---------|------|-------|-------| | 7.1 | `extractors stats` | 0 | **88** | Shows zero counts and promotion thresholds. Helpful even when empty. | | 7.2 | `extractors candidates` | 0 | **88** | "No patterns eligible" with explanation of eligibility criteria. Good empty state. | | 7.3 | `extractors shadow-status` | 0 | **88** | "No shadow tests" with config hint for enabling. Good guidance. | | 7.4 | `patterns status` | 0 | **90** | Shows store location, cross-project config, hosted server status. Comprehensive. | **Group 7 average: 88.5** --- ## Group 8: Advanced Features (8 commands) | ID | Command | Exit | Grade | Notes | |----|---------|------|-------|-------| | 8.1 | `scope status` | 0 | **88** | Shows hierarchy, inheritance chain, overrides. Config hint provided. | | 8.2 | `lifecycle list` | 0 | **82** | "No patterns found." Terse. Could show what statuses are available. | | 8.3 | `governance pending` | 1 | **55** | Exits non-zero with "Governance is not enabled" message. Should exit 0 with informative message — non-zero suggests error. | | 8.4 | `governance pending --format json` | 1 | **45** | Same non-zero exit. No JSON output — prints plain text error. Format flag silently ignored on error path. | | 8.5 | `audit summary` | 0 | **88** | Shows request counts and total audit events (638). Works without governance enabled. | | 8.6 | `audit summary --format json` | 0 | **90** | Valid JSON with 8 fields including approval_rate and avg_approval_days. | | 8.7 | `migrations status` | 0 | **82** | "No deprecated patterns found." Correct but minimal. | | 8.8 | `research status` | 0 | **82** | "Gap store: not initialized" with guidance. | **Group 8 average: 76.5** --- ## Scoring Summary | Group | Weight | Average | Weighted | |-------|--------|---------|----------| | G1: Smoke Tests | 15% | 89.3 | 13.4 | | G2: Scan Variants | 15% | 84.3 | 12.6 | | G3: Claims | 12.5% | 90.5 | 11.3 | | G4: Verification | 12.5% | 91.8 | 11.5 | | G5: Coverage & Docs | 20% | 92.7 | 18.5 | | G6: Corpus & Trust Packs | 8.3% | 90.7 | 7.5 | | G7: Learning & Extractors | 8.3% | 88.5 | 7.3 | | G8: Advanced Features | 8.3% | 76.5 | 6.3 | | **Total** | **100%** | | **88.5** | **Weighted overall: 88.5 / 100 — PASS** --- ## Top Issues Found ### P1 — Critical (fix before next release) 1. **Governance exits non-zero when disabled (8.3, 8.4):** `governance pending` returns exit code 1 when governance isn't enabled. CI/scripts checking exit codes will treat this as a failure. Should exit 0 with an informative message, or return empty JSON with `--format json`. 2. **`--format json` ignored on error path (8.4):** When `governance pending --format json` fails, it prints plain text instead of JSON. Any format flag should produce structured error output: `{"error": "governance_not_enabled", "message": "..."}`. ### P2 — Important (fix soon) 3. **Claims commands require specific cwd (3.1):** `claims list` from project root shows "No claims found" even though `.aphoria/claims.toml` exists in `applications/aphoria/`. Should search upward or use `--project` flag. This confuses users who run from their repo root. 4. **`--strict` has no visible effect (2.6):** `scan --strict` produces output identical to `scan`. Either the strict thresholds are too similar to defaults, or the flag isn't applied correctly. Users who opt into strict mode expect stricter behavior. 5. **`--debug` is underwhelming (2.7):** Only adds "Authority: Tier X" per finding. No conflict resolution trace, scoring breakdown, or query plan. Rename to `--show-authority` or add actual debug output (concept matching attempts, score calculation, index lookups). 6. **`claims explain --format json` inconsistent (3.4):** Returns flat array while `explain --format json` returns `{type: "onboarding", ...}`. Should wrap in `{type: "claims_explain", claims: [...]}` for consistency. ### P3 — Polish (improve when convenient) 7. **Scan takes ~11s on 573 files (1.2):** Extraction dominates at 11.2s. Discovery and conflict are fast (<20ms). This is acceptable but could be improved with better parallelism or caching. 8. **Coverage density sorting among zeroes (5.4):** Most modules show 0.0% density, making sort-by-density less useful until more claims are authored. 9. **Empty state messages vary in helpfulness (8.2, 8.7):** `lifecycle list` and `migrations status` just say "no X found" without guidance. Compare with `extractors candidates` which explains how to become eligible. --- ## Recommendations 1. **Standardize error handling:** All commands should exit 0 for "nothing to show" and reserve non-zero for actual errors. `--format json` must always produce JSON, even for errors. 2. **Add `--project` flag:** Allow `aphoria claims list --project ./applications/aphoria` or auto-discover `.aphoria/` directories. 3. **Improve debug output:** Add `--trace` for detailed resolution traces (concept matching, score calculation, tier comparison). Keep `--debug` for general verbosity. 4. **Document `--strict` behavior:** If it works, show what threshold changed and what would pass under default but fails under strict. 5. **Consistent JSON envelopes:** All `--format json` outputs should use `{type: "...", ...}` pattern. The `explain` and `docs generate` commands do this well; extend to `claims explain`. --- ## Commands Skipped (46 tested / ~85 total) State-modifying commands intentionally excluded: `init`, `baseline`, `diff`, `bless`, `update`, `ack`, `claims create/update/supersede/deprecate`, `extractors review/promote/auto-promote/feedback/graduate/rollback`, `governance approve/reject/escalate/create`, `lifecycle deprecate/archive/reactivate`, `scope override/remove`, `patterns sync/pull-community`, `policy export/import/resign`, `corpus export-pack`, `trust-pack install`, `eval run/baseline/update-baseline`, `research run/gaps`. --- *Report generated by Aphoria CLI UAT, 2026-02-08*