jordan 8f6506b70a feat: Aphoria scan modes + stemedb-ontology crate + consumer health UAT

Major additions:
- Staged scanning modes (working tree, staged, committed) with git integration
- Drift detection for baseline vs current state comparisons
- Hosted API handlers for policy CRUD operations via StemeDB API
- stemedb-ontology crate with domain definitions and medical extractors
- Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.)
- Aphoria development skill documentation

Code organization:
- Split large files into focused modules to stay under 500-line limit
- Extracted config tests, episteme helpers/drift/aliases, API helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 21:57:33 -07:00

30 KiB

Raw Blame History

Aphoria Roadmap

Phase 0: StemeDB Foundation ✅

Tracked in: roadmap.md § 5D. Concept Hierarchy

Changes to the core database that Aphoria depends on. Shipped as Phase 5D of the main StemeDB roadmap.

Aphoria Phase 0	StemeDB Phase 5D	Status
0.1 ConceptPath Type	5D.1 ConceptPath Type	✅
0.2 ConceptPath in Assertion	(implicit in 5D.1)	✅
0.3 Hierarchical Index	5D.4 Hierarchical Query	✅
0.4 Alias Store	5D.3 Alias Store + 5D.5 Alias Resolution	✅
0.5 Source Class Inference	5D.6 Source Class Inference	✅
0.6 Concept API Endpoints	5D.7 Concept API Endpoints	✅

Spec: docs/specs/concept-hierarchy.md

Phase 2: CLI Core ✅

Phase 2 was built before Phase 1 (authoritative corpus expansion). The CLI pipeline works end-to-end with a bootstrapped corpus of 11 hardcoded assertions covering TLS, JWT, CORS, secrets, and rate limiting.

Task	Status
2.1 Project Walker	✅ `walker/mod.rs`, `walker/path_mapper.rs`, `walker/language.rs`
2.2 Extractors (10)	✅ `tls_verify`, `jwt_config`, `hardcoded_secrets`, `timeout_config`, `dep_versions`, `cors_config`, `rate_limit`, `weak_crypto`, `command_injection`, `sql_injection`
2.3 Ingestion Bridge	✅ `bridge.rs` — BLAKE3 hashing, Ed25519 signing, claim→assertion conversion
2.4 Conflict Query	✅ `episteme.rs` — LocalEpisteme with check_conflicts()
2.5 Report Output	✅ `report/` — table (comfy-table), JSON, SARIF 2.1.0, markdown
2.6 Acknowledge Command	✅ `lib.rs` acknowledge()
Baseline & Diff	✅ `lib.rs` set_baseline(), show_diff()
Status Command	✅ `lib.rs` show_status()

183 tests pass. Clippy and fmt clean.

Phase 2 Code Quality Fixes ✅

Code review improvements to extractors:

Issue	Fix	Status
DES/RC4 concept path misclassification	Split `check_pattern()` into `check_hash_pattern()` and `check_encryption_pattern()`; DES/RC4 now use `crypto/encryption/algorithm` path	✅
SHA1 edge case undocumented	Added comments and test documenting that SHA1 detection is intentionally broad (triggers for git hashes, etc.)	✅
JS exec() regex overly broad	Tightened regex to require `child_process.` prefix or non-word/non-dot preceding character; prevents `RegExp.exec()` false positives	✅

Phase 2A: Concept Matching ✅

Status: Complete. Tail-path matching (2A.1), alias-aware queries (2A.2), and auto-alias creation (2A.3) all implemented.

2A.1 Leaf-Based Concept Matching (Aphoria-side fix) ✅

Implemented in episteme.rs via ConceptIndex:

make_key(subject, predicate) extracts tail 2 path segments + predicate
build(assertions) creates in-memory index keyed by tail path
lookup(subject, predicate) finds matching authoritative assertions
check_conflicts() uses ConceptIndex instead of QueryEngine for cross-scheme matching

Integration tests prove TLS and JWT conflicts are detected correctly.

2A.2 Alias Resolution in QueryEngine (StemeDB-side fix) ✅

Wired AliasStore into QueryEngine.execute():

Added resolve_aliases: bool field to Query (defaults to false)
Added alias_store: Option<Arc<dyn AliasStore>> to QueryEngine
Added .with_alias_store() builder method
When resolve_aliases: true, expands subject via AliasStore.resolve_all() before index lookup
Added fetch_by_subjects() and fetch_by_subjects_predicate() for multi-subject deduplication
Modified Query.matches() to skip subject filtering when aliases are resolved
Skips fast path (MV lookup) when resolve_aliases: true
Gracefully degrades when no alias store is configured

7 unit tests in engine/tests/alias_resolution.rs. This is the architecturally correct long-term fix that complements leaf matching.

2A.3 Auto-Alias Creation ✅

When Aphoria ingests authoritative assertions and code claims that share leaf names, automatically create aliases:

code://rust/myapp/tls/cert_verification ↔ rfc://5246/tls/cert_verification
code://rust/myapp/auth/jwt/audience_validation ↔ rfc://7519/jwt/audience_validation

This bridges 2A.1 (leaf matching) with 2A.2 (alias resolution) — leaf matching identifies candidates, aliases persist the relationship.

Implementation:

Added auto_create_aliases: bool config option to AliasConfig (defaults to true)
Added AliasOrigin::AutoDetected variant to stemedb-core for tracking auto-created aliases
Wired GenericAliasStore into LocalEpisteme for alias persistence
In check_conflicts(), when a code claim matches an authoritative claim by leaf, calls AliasStore.set_alias() to persist the relationship with AliasOrigin::AutoDetected
Alias creation is idempotent (skips if alias already exists)
4 unit tests verify: alias creation on conflict, no creation when disabled, correct origin, idempotency

Phase 1: Authoritative Corpus Expansion ✅

Expanded from 11 hardcoded assertions to a pluggable corpus system with RFC, OWASP, and Vendor sources.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     aphoria corpus build                         │
│                                                                  │
│  ┌──────────────┐  ┌──────────────┐  ┌───────────────────────┐  │
│  │ RFC Ingester │  │ OWASP        │  │ Vendor Bootstrapper   │  │
│  │ (Tier 0)     │  │ Ingester     │  │ (Tier 2)              │  │
│  │              │  │ (Tier 1)     │  │                       │  │
│  └──────┬───────┘  └──────┬───────┘  └───────────┬───────────┘  │
│         │                 │                      │              │
│         └─────────────────┼──────────────────────┘              │
│                           ▼                                     │
│                  ┌─────────────────┐                            │
│                  │ CorpusRegistry  │                            │
│                  └────────┬────────┘                            │
│                           ▼                                     │
│                  ┌─────────────────┐                            │
│                  │ LocalEpisteme   │                            │
│                  │ ingest_         │                            │
│                  │ authoritative() │                            │
│                  └─────────────────┘                            │
└─────────────────────────────────────────────────────────────────┘

1.1 CorpusBuilder Trait ✅

Task	Status
`CorpusBuilder` trait	✅ `corpus/mod.rs` — name, scheme, default_tier, build, requires_network
`CorpusRegistry`	✅ Manages multiple builders, build_all(), list_builders()
`CorpusBuildResult`	✅ Stats per builder, total assertions, success/fail/skip counts

1.2 RFC Ingester ✅

Task	Status
`RfcCorpusBuilder`	✅ `corpus/rfc.rs`
HTTP fetching	✅ Via `ureq`, cached to `~/.cache/aphoria/rfc-cache/`
RFC 2119 keyword parsing	✅ MUST, MUST NOT, SHOULD, SHALL extraction
RFC-specific parsers	✅ JWT (7519), OAuth (6749), Bearer (6750), TLS 1.3 (8446), TLS BCP (7525), TOTP (6238), Basic Auth (7617), HTTP (9110)
Concept mapping	✅ `rfc://{number}/{topic}` at Tier 0 (Regulatory)

1.3 OWASP Ingester ✅

Task	Status
`OwaspCorpusBuilder`	✅ `corpus/owasp.rs`
HTTP fetching	✅ From GitHub raw content, cached to `~/.cache/aphoria/owasp-cache/`
Markdown parsing	✅ MUST/SHOULD statements, section context
Cheat sheet parsers	✅ Authentication, JWT, TLS, Secrets, Input Validation, Session, CSRF, Password Storage, HTTP Headers
Concept mapping	✅ `owasp://cheatsheet/{topic}/{claim}` at Tier 1 (Clinical)

1.4 Vendor Docs ✅

Task	Status
`VendorCorpusBuilder`	✅ `corpus/vendor.rs`
PostgreSQL claims	✅ pool_size, idle_timeout, ssl_mode
Redis claims	✅ timeout, max_retries, tls
reqwest claims	✅ cert_verification, connect_timeout, request_timeout
hyper claims	✅ keep_alive_timeout, max_concurrent_streams
Go net/http claims	✅ read_timeout, write_timeout, idle_timeout, min_tls_version
tokio-postgres claims	✅ pool_size, ssl_mode
SQLx claims	✅ max_connections, idle_timeout
Concept mapping	✅ `vendor://{product}/{topic}/{claim}` at Tier 2 (Observational)

1.5 Hardcoded Refactor ✅

Task	Status
`HardcodedCorpusBuilder`	✅ `corpus/hardcoded.rs` — original 11 assertions
`create_authoritative_assertion()`	✅ Made public in `episteme.rs` for corpus builders

1.6 CLI Integration ✅

Task	Status
`aphoria corpus build`	✅ Fetches and ingests from all sources
`--only rfc,owasp,vendor`	✅ Filter to specific sources
`--offline`	✅ Skip network-requiring sources
`--clear-cache`	✅ Clear cache before building
`aphoria corpus list`	✅ List available corpus sources
`CorpusConfig`	✅ cache_dir, include_*, rfc_list options

1.7 Error Handling ✅

Task	Status
`RfcFetch` error	✅ Per-RFC fetch failures with context
`OwaspFetch` error	✅ Per-cheat-sheet fetch failures with context
`CorpusBuild` error	✅ General corpus build failures
Graceful degradation	✅ Continue with other sources if one fails

Files: corpus/mod.rs, corpus/hardcoded.rs, corpus/rfc.rs, corpus/owasp.rs, corpus/vendor.rs

Phase 3: Skill Integration ✅

Complete. Aphoria is now usable in Claude Code agent workflows.

3.1 Claude Code Skill ✅

Task	Status
`skill/SKILL.md`	✅ Comprehensive skill definition with all commands
`/aphoria scan`	✅ Scan project, show conflicts grouped by verdict
`/aphoria scan --fix`	✅ Interactive fix workflow
`/aphoria ack`	✅ Acknowledge conflicts as intentional
`/aphoria status`	✅ Show status and baseline
`/aphoria diff`	✅ Show changes since baseline
`/aphoria init`	✅ Initialize Aphoria
`/aphoria baseline`	✅ Set baseline
`skill/install.sh`	✅ Install script for `~/.claude/skills/aphoria/`

Files: skill/SKILL.md, skill/install.sh, skill/hooks.json

3.2 Agent Pre-Flight Hook ✅

Task	Status
`--exit-code` flag	✅ Returns 2 for BLOCK, 1 for FLAG only, 0 for clean
`--strict` flag	✅ Lower thresholds (FLAG at 0.3, BLOCK at 0.5)
Hook template	✅ `skill/hooks.json` with PreCommit and PrePush examples

Usage:

{
  "hooks": {
    "PreCommit": [{"command": "aphoria scan --format sarif --exit-code"}],
    "PrePush": [{"command": "aphoria scan --strict --exit-code"}]
  }
}

3.3 Alias Suggestion Workflow ✅

Auto-alias creation is now automatic (Phase 2A.3). When Aphoria scans:

Tail-path matching finds authoritative assertions
Aliases are auto-created with AliasOrigin::AutoDetected
Future queries use the alias automatically

The skill documents the suggestion flow for manual alias management:

y (Accept): Creates alias
n (Reject): Records intentional difference
defer: Flags for later review

Phase 4: Full-Cycle Pre-Commit (Scan + Sync) ⬜

Vision: The pre-commit hook is a bidirectional knowledge sync, not just a read-only linter. Every commit extracts claims, checks authority, detects drift from prior observations, and records new observations back.

Spec: uat/2026-02-04-full-cycle-precommit-vision.md

┌─────────────────────────────────────────────────────────────┐
│                     PRE-COMMIT FLOW                          │
├─────────────────────────────────────────────────────────────┤
│  1. EXTRACT     → What claims does this code make?           │
│  2. CHECK       → Against authority + own prior claims       │
│  3. CLASSIFY    → Authority conflict | Self conflict | Novel │
│  4. UPDATE      → Record observations to local Episteme      │
│  5. GATE        → Exit code (BLOCK=2, FLAG=1, PASS=0)        │
└─────────────────────────────────────────────────────────────┘

4.1 Git Pre-Commit Hook ⬜

#!/bin/sh
# .git/hooks/pre-commit
aphoria scan --staged --sync --exit-code

Or using pre-commit framework:

repos:
  - repo: local
    hooks:
      - id: aphoria
        name: Aphoria Truth Sync
        entry: aphoria scan --staged --sync --exit-code
        language: system
        pass_filenames: false

4.2 Baseline Mode ✅

Already implemented in Phase 2.

4A: Observational Claims ✅

Record code claims as Tier 4 (Community) assertions when no authority conflict exists:

Task	Status
`sync: bool` in ScanArgs	✅ `types/command.rs`
`observations_recorded: usize` in ScanResult	✅ `types/result.rs`
`--sync` CLI flag	✅ `cli.rs` — requires `--persist`
`claim_to_observation()`	✅ `bridge.rs` — creates Tier 4 (Community, 0.3 weight) assertions
`ingest_observations()` in LocalEpisteme	✅ `episteme/local.rs` — writes to WAL + predicate index
Scan flow integration	✅ `scan.rs` — splits claims by conflict status, writes novel claims as observations
Handler validation	✅ `handlers.rs` — `--sync requires --persist` error
Report output	✅ `report/table.rs`, `report/json.rs` — shows observation count
Tests	✅ 5 new tests for observation write-back

Code: connection_pool.max_size = 25
Authority: (nothing)
Action: Record as Tier 4 observation (project memory)

Usage:

# Scan with observation write-back
aphoria scan --persist --sync

# Output:
# Recorded 45 observations (project memory)

4B: Self-Conflict Detection ✅

Detect drift from the project's own prior observations:

Task	Status
Query prior claims before conflict check	✅ `fetch_observations_for_concept()`
Compare current vs stored observations	✅ `check_drift()` compares values
Report changes as SELF-CONFLICT	✅ DriftResult with prior/current values
New verdict: `Drift` (distinct from Block/Flag)	✅ `Verdict::Drift`
Drift reporting in all formats	✅ table, json, markdown, sarif
Exit code includes drift	✅ `--exit-code` returns 1 for drift

Prior: db/pool_size = 25 (recorded 2026-01-15)
Now:   db/pool_size = 100
Result: DRIFT — "You changed pool_size from 25 to 100. Intentional?"

Files: types/result.rs, types/verdict.rs, episteme/local.rs, scan.rs, report/*.rs

4C: Diff-Only Scanning ✅

Fast scanning for pre-commit hooks:

Task	Status
`FileSource` enum (All, Staged)	✅ `types/command.rs`
`--staged` flag (git diff --cached)	✅ `cli.rs`, `handlers.rs`
`walker/git.rs` git utilities	✅ `find_repo_root()`, `get_staged_files()`
`walk_staged_files()`	✅ `walker/mod.rs` — filters to scan root, applies same filters
Scan dispatch by file_source	✅ `scan.rs`
Error handling (NotGitRepo, GitCommand)	✅ `error.rs`
Tests	✅ 9 tests in `tests/staged_scanning.rs`
Target: < 500ms for staged-only	✅

Files: types/command.rs, walker/git.rs, walker/mod.rs, scan.rs, cli.rs, handlers.rs, error.rs

Usage:

# Pre-commit hook (fast, staged files only)
aphoria scan --staged --exit-code

# Full cycle with observation sync
aphoria scan --staged --persist --sync --exit-code

4D: Enhanced Ack ⬜

Acknowledgments with rationale and policy updates:

Task	Status
`--reason "text"` flag	⬜
Store rationale in assertion metadata	⬜
`aphoria update` for intentional drift	⬜
Policy update assertions	⬜

$ aphoria ack db/pool_size --reason "Scaling for Black Friday"
$ aphoria update db/pool_size 100 --reason "New baseline after load test"

4E: Hosted Mode ✅

Organizations run their own StemeDB server and all team members automatically sync observations:

Task	Status
`HostedConfig` in config.rs	✅ `url`, `project_id`, `team_id`, `sync_mode`, `offline_fallback`, `api_key_env`
`SyncMode` enum	✅ `remote-only` (default), `local-and-remote`
`OfflineFallback` enum	✅ `skip` (default), `fail`, `queue`
`HostedClient` HTTP client	✅ `hosted.rs` — retry logic, auth headers, observation push
`POST /v1/aphoria/observations` endpoint	✅ Server receives observations with project/team metadata
Scan integration	✅ Auto-enables sync when `[hosted]` configured
`Hosted(String)` error variant	✅ For connection/auth failures
Graceful offline fallback	✅ Based on `offline_fallback` config
Tests	✅ Config parsing, client creation, assertion conversion

# aphoria.toml
[hosted]
url = "https://episteme.acme.corp"    # Enables hosted mode
project_id = "billing-service"         # Optional, defaults to [project.name]
team_id = "platform-team"              # Optional, for multi-team servers
sync_mode = "remote-only"              # "remote-only" | "local-and-remote"
offline_fallback = "skip"              # "skip" | "fail" | "queue"
api_key_env = "APHORIA_API_KEY"        # Env var for auth token

Architecture:

┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│ Developer A  │  │ Developer B  │  │ Developer C  │
│ aphoria scan │  │ aphoria scan │  │ aphoria scan │
└──────┬───────┘  └──────┬───────┘  └──────┬───────┘
       │                 │                 │
       └─────────────────┼─────────────────┘
                         ▼
              ┌─────────────────────┐
              │ Team StemeDB Server │
              │ POST /v1/aphoria/   │
              │      observations   │
              └─────────────────────┘
                         │
                         ▼
              Aggregated team patterns

Files: config.rs, hosted.rs, scan.rs, error.rs, lib.rs, crates/stemedb-api/src/handlers/aphoria.rs, crates/stemedb-api/src/dto/aphoria.rs

Phase 4.5: Ephemeral Scan Mode ✅

Performance optimization: 40x faster scans by skipping Episteme storage when persistence isn't needed.

Problem

Every aphoria scan was slow because it initialized the full Episteme stack:

WAL recovery (O(n) on every startup)
Dual backend initialization (fjall + redb)
Store and index initialization

But conflict detection is actually 100% in-memory — it never reads from the KV store. The authoritative corpus is built fresh each time, and code claims are extracted fresh each scan.

Solution

Added ScanMode enum with two modes:

Mode	Use Case	Storage	Performance
Ephemeral (default)	CI, pre-commit, quick checks	None	~0.25 seconds
Persistent	Baseline/diff tracking, alias creation	WAL + store	~1-2 seconds

Implementation ✅

Task	Status
`ScanMode` enum	✅ `types.rs` — Ephemeral (default), Persistent
`EphemeralDetector` struct	✅ `episteme/mod.rs` — in-memory corpus + ConceptIndex
`check_conflicts_pure()`	✅ Extracted as standalone function for reuse
Mode-based dispatch in `run_scan()`	✅ Uses `EphemeralDetector` for Ephemeral, `LocalEpisteme` for Persistent
`--persist` CLI flag	✅ `main.rs` — opt-in to persistent mode
Tests for both modes	✅ `test_ephemeral_scan_no_storage_created`, `test_persistent_scan_creates_storage`, `test_scan_modes_produce_same_conflicts`

Usage

# Fast ephemeral scan (default) — no storage created
aphoria scan .

# Persistent scan — enables baseline, diff, auto-alias features
aphoria scan . --persist

Performance

Mode	Time	Storage
Ephemeral	~0.25s	None
Persistent	~1-2s	WAL + store directories

Files: types.rs, episteme/mod.rs, lib.rs, main.rs, tests.rs

Phase 5: Research Agent Loop ✅

Research agent fills gaps in authoritative coverage by researching official documentation.

5.1 Gap Detection ✅

Task	Status
`Gap` struct	✅ `research/gap_detector.rs` — concept_path, topic, predicate, source info
`detect_gaps()`	✅ Compares claims against ConceptIndex, identifies missing coverage
Topic normalization	✅ Extracts last 2 path segments for cross-scheme matching
Deduplication	✅ Deduplicates gaps by topic+predicate key

5.2 Gap Storage ✅

Task	Status
`GapRecord`	✅ `research/gap_store.rs` — tracking metadata, project count, research status
`GapStore`	✅ JSON-backed persistent storage with atomic saves
Project tracking	✅ Records which projects reported each gap
Research eligibility	✅ `is_eligible_for_research()` with threshold and cooldown
Gap pruning	✅ `prune_old_gaps()` removes stale entries

5.3 Quality Validation ✅

Task	Status
`QualityValidator`	✅ `research/quality.rs` — validates researched claims
Source attribution	✅ Checks for authoritative domains (rfc-editor, owasp, vendor docs)
Normative language	✅ Verifies MUST/SHOULD/SHALL keywords present
Vague content detection	✅ Rejects "it depends", "typically", etc.
Consistency scoring	✅ Detects conflicting claims on same subject
`QualityReport`	✅ Detailed per-claim validation results
`filter_passed()`	✅ Returns only claims meeting quality threshold

5.4 Research Execution ✅

Task	Status
`Researcher`	✅ `research/researcher.rs` — orchestrates research pipeline
`DocumentationSource`	✅ Configurable sources with URL patterns and topics
Default sources	✅ Redis, PostgreSQL, Go, Rust, OWASP, Kafka, MongoDB
Content fetching	✅ HTTP with timeout and size limits
Normative extraction	✅ Regex-based MUST/SHOULD/SHALL extraction
Section tracking	✅ Extracts heading context for attribution
Confidence scoring	✅ Based on keyword strength, statement length, content size

5.5 CLI Integration ✅

Task	Status
`aphoria research run`	✅ Run research agent with configurable threshold
`aphoria research status`	✅ Show gap statistics and research progress
`aphoria research gaps`	✅ List gaps by project count
`--threshold`	✅ Minimum projects before researching (default: 3)
`--strict`	✅ Use strict quality validation
`--prune`	✅ Remove stale gaps before researching
`--ready`	✅ Show only gaps ready for research

Files: research/mod.rs, research/gap_detector.rs, research/gap_store.rs, research/quality.rs, research/researcher.rs, research/tests.rs

5.7 Security Extractors ✅

Extended Phase 2 extractors with OWASP-aligned security vulnerability detection:

Extractor	Detects	Languages
`weak_crypto`	MD5, SHA1, DES, RC4 usage	Rust, Go, Python, JS/TS
`command_injection`	Shell execution, os.system, subprocess shell=True	Rust, Go, Python, JS/TS
`sql_injection`	String concatenation in SQL queries	Rust, Go, Python, JS/TS

Concept paths:

crypto/hashing/algorithm — MD5, SHA1
crypto/encryption/algorithm — DES, RC4
os/command/input, os/shell_mode — command injection
db/query/input — SQL injection

5.6 Community Corpus Contributions ⬜

Future: Users can opt in to contribute patterns anonymously.

"Every Rust project has this JWT pattern" → pre-built alias set
"This Redis config is always acknowledged" → adjust default threshold
"This TLS pattern is always a real bug" → elevate threshold

Phase 6: Federated Policy & Trust Packs ✅

Allow teams to define their own authoritative truths and distribute them as signed Trust Packs. This enables "Enterprise Grade" compliance across distributed teams.

6.1 Trust Pack Format ✅

Task	Status
`TrustPack` schema	✅ `policy.rs` — Assertions, Aliases, Metadata, Signature
`PackHeader`	✅ Name, version, issuer, timestamp
Serialization	✅ `rkyv` for zero-copy efficiency
Signing	✅ `ed25519-dalek` signing and verification

6.2 Policy Management ✅

Task	Status
`PolicyManager`	✅ Loads local and remote (HTTP/HTTPS) policies
Caching	✅ Caches remote policies in `~/.cache/aphoria/policies/`
`aphoria.toml` config	✅ `policies` list support

6.3 Core Integration ✅

Task	Status
`EphemeralDetector` integration	✅ Ingests policies into memory corpus/index
`check_conflicts_pure` update	✅ Resolves policy aliases before authoritative lookup
`LocalEpisteme` export helpers	✅ `fetch_acknowledgments`, `fetch_manual_aliases`

6.4 CLI Commands ✅

Task	Status
`aphoria policy export`	✅ Exports local `ack` decisions as a Trust Pack
`aphoria scan` policy loading	✅ Auto-loads policies from config

Files: policy.rs, config.rs, episteme/mod.rs, lib.rs, main.rs

Phase 7: Declarative Extractors ⬜

Enable users to define new extractors in config/policy files (YAML/TOML) without writing Rust code. This removes the recompilation bottleneck for custom pattern enforcement.

7.1 Declarative Schema ⬜

Define a schema for pattern-based extraction:

extractors:
  - name: "api_style"
    language: "go"
    pattern: 'func \w+\(.*\) \[\]\w+'
    claim:
      subject: "api/response_format"
      predicate: "structure"
      object: "raw_array"

7.2 Implementation Tasks ⬜

Task	Description
`DeclarativeExtractor`	Generic extractor implementation reading from config
`ExtractorConfig` update	Load declarative definitions from `aphoria.toml` and Trust Packs
`Regex` optimization	Pre-compile all declarative patterns
Validation	Ensure valid regex and claim structure at load time

Milestone Summary

Phase	Deliverable	Depends On	Status
0	ConceptPath in StemeDB	concept-hierarchy spec	✅
2	Aphoria CLI (scan, report, ack)	Phase 0	✅
2A	Concept matching (leaf, alias, auto-alias)	Phase 2	✅
1	Authoritative corpus expansion	Phase 0	✅
3	Claude Code skill + hooks	Phase 2A	✅
4.5	Ephemeral scan mode (40x faster)	Phase 2	✅
5	Research agent loop	Phase 3	✅
6	Federated Policy & Trust Packs	Phase 4.5	✅
4A	Observational claims (Tier 4 write-back)	Phase 6	✅
4B	Self-conflict detection (drift)	Phase 4A	✅
4C	Diff-only scanning (--staged)	Phase 4B	✅
4E	Hosted mode (team aggregation)	Phase 4C	✅
4D	Enhanced ack (--reason, policy updates)	Phase 4C	⬜ NEXT
7	Declarative Extractors	Phase 4	⬜

Current state:

Phases 0-3, 4.5, 4A-4C, 4E, 5, 6 complete (258 tests, clippy clean)
Full corpus: RFC, OWASP, Vendor sources
10 extractors including security (weak_crypto, command_injection, sql_injection)
Trust Packs: signed policy bundles with import/export
Ephemeral mode: 40x faster for CI
Observation write-back: --sync records novel claims as Tier 4 project memory
Drift detection: Detects changes from prior observations
Staged scanning: --staged flag for fast pre-commit hooks
Hosted mode: Team aggregation via central StemeDB server

Next: Phase 4D — Enhanced Ack (--reason, policy updates)

The pre-commit hook is now a bidirectional knowledge sync:

4A ✅: Record code claims as Tier 4 observations (project memory)
4B ✅: Detect drift from prior observations (self-conflict)
4C ✅: Fast diff-only scanning for pre-commit hooks (--staged)
4E ✅: Team aggregation via hosted StemeDB server
4D ⬜: Enhanced ack with rationale and policy updates

This transforms Aphoria from a linter into a learning system that builds institutional memory per-project and collective intelligence across teams via hosted mode.

30 KiB Raw Blame History

Aphoria Roadmap

Phase 0: StemeDB Foundation ✅

Phase 2: CLI Core ✅

Phase 2 Code Quality Fixes ✅

Phase 2A: Concept Matching ✅

2A.1 Leaf-Based Concept Matching (Aphoria-side fix) ✅

2A.2 Alias Resolution in QueryEngine (StemeDB-side fix) ✅

2A.3 Auto-Alias Creation ✅

Phase 1: Authoritative Corpus Expansion ✅

Architecture

1.1 CorpusBuilder Trait ✅

1.2 RFC Ingester ✅

1.3 OWASP Ingester ✅

1.4 Vendor Docs ✅

1.5 Hardcoded Refactor ✅

1.6 CLI Integration ✅

1.7 Error Handling ✅

Phase 3: Skill Integration ✅

3.1 Claude Code Skill ✅

3.2 Agent Pre-Flight Hook ✅

3.3 Alias Suggestion Workflow ✅

Phase 4: Full-Cycle Pre-Commit (Scan + Sync) ⬜

4.1 Git Pre-Commit Hook ⬜

4.2 Baseline Mode ✅

4A: Observational Claims ✅

4B: Self-Conflict Detection ✅

4C: Diff-Only Scanning ✅

4D: Enhanced Ack ⬜

4E: Hosted Mode ✅

Phase 4.5: Ephemeral Scan Mode ✅

Problem

Solution

Implementation ✅

Usage

Performance

Phase 5: Research Agent Loop ✅

5.1 Gap Detection ✅

5.2 Gap Storage ✅

5.3 Quality Validation ✅

5.4 Research Execution ✅

5.5 CLI Integration ✅

5.7 Security Extractors ✅

5.6 Community Corpus Contributions ⬜

Phase 6: Federated Policy & Trust Packs ✅

6.1 Trust Pack Format ✅

6.2 Policy Management ✅

6.3 Core Integration ✅

6.4 CLI Commands ✅

Phase 7: Declarative Extractors ⬜

7.1 Declarative Schema ⬜

7.2 Implementation Tasks ⬜

Milestone Summary

30 KiB

Raw Blame History