stemedb/applications/aphoria/README.md
jml 65065f3d8f feat(aphoria): implement community corpus with wiki import and pattern aggregation
Implements Phase 4 (A4) - Community corpus as first-class citizens:

- **Community Corpus Builder** - Queries StemeDB pattern aggregates
- **Wiki Import** - Bootstrap corpus from markdown docs (aphoria corpus import wiki)
- **Pattern Aggregation** - Automatic learning from local scans (--sync flag)
- **Storage Layer** - StemeDBPatternStore with content-addressed deduplication
- **Promotion Logic** - Multi-tier thresholds (95%/80%/50% adoption rates)
- **Corpus Build** - Unified registry for RFC/OWASP/Vendor/Community sources
- **Trust Packs** - Export corpus as signed, distributable artifacts
- **Documentation** - bootstrap-corpus.md guide + CLI reference updates

Technical details:
- Pattern aggregates stored as assertions with predicate "pattern_aggregate"
- Content-addressed subjects via BLAKE3(subject:predicate:value)
- PatternAggregator handles write path (observations → patterns)
- StemeDBPatternStore handles read path (pattern queries)
- Integration tests + fixtures in tests/wiki_import_test.rs

Deleted hardcoded.rs (368 lines) - corpus now fully emergent from StemeDB.
Deleted enriched-corpus-patterns.md (677 lines) - feature shipped.

Closes VG-026 (community corpus), part of A4 milestone.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-09 00:12:31 +00:00

357 lines
11 KiB
Markdown

# Aphoria
**A code-level truth linter powered by Episteme.**
Aphoria scans your codebase for configuration patterns that contradict authoritative technical standards (RFCs, OWASP, vendor docs). Unlike linters that check syntax or SAST tools that find vulnerability patterns, Aphoria validates **intent against authority**.
```bash
$ aphoria scan .
BLOCK code://python/requests/tls/cert_verification
Your code: verify=False (api/client.py:42)
RFC 5246: TLS certificate verification MUST be enabled
Conflict: 0.92
1 conflict found (1 BLOCK).
```
---
## Quick Start
### Install
```bash
# From source
cd applications/aphoria
cargo install --path .
# Verify
aphoria --version
```
### Initialize
```bash
aphoria init
```
This sets up your local database. The corpus (RFCs, OWASP guidelines, community patterns) is built dynamically during scans.
**Bootstrap corpus (optional):**
```bash
# Import patterns from wiki documentation
aphoria corpus import wiki ~/docs/security-best-practices/
```
### Scan
```bash
# Quick scan (ephemeral, fast)
aphoria scan .
# With persistence (enables diff/baseline)
aphoria scan --persist
# With sync (enables community learning)
aphoria scan --persist --sync
# CI mode (exit code 1 on BLOCK)
aphoria scan --exit-code
# Pre-commit (staged files only)
aphoria scan --staged --exit-code
```
**Community Learning:** When you run `--persist --sync`, observations from your scan are aggregated into community pattern records. Patterns seen across many projects (95%+ adoption + authority backing) auto-promote to the corpus, creating an emergent, self-improving knowledge base.
### Handle Conflicts
**Fix the code:**
```python
# Before: verify=False
# After:
requests.get(url, verify=True)
```
**Or acknowledge intentionally:**
```bash
aphoria ack "code://python/requests/tls/cert_verification" \
--reason "Local dev environment with self-signed certs"
```
---
## Key Concepts: Observations vs Claims
Aphoria distinguishes between two types of extracted information:
| Type | What it is | Who creates it | Example |
|------|-----------|----------------|---------|
| **Observation** | Pattern match: "this code does X" | Extractors (automated) | `imports/tokio: true` |
| **Claim** | Rule: "code MUST do X because Y" | Humans (you!) | "Core MUST NOT import tokio because it creates runtime coupling" |
**Observations** are what extractors find - they're grep results with confidence scores. They have no opinion about whether something is good or bad.
**Claims** are human-authored rules with:
- **Provenance** - Where the rule came from (RFC, security review, architecture decision)
- **Invariant** - What must stay true ("Wallet MUST NOT derive Clone")
- **Consequence** - What breaks if violated ("Multiple wallet instances → double-spend")
- **Authority tier** - How much weight this rule carries
- **Evidence** - Supporting artifacts (ADRs, test cases, etc.)
When you run `aphoria scan`, it compares observations against:
1. **Authoritative corpus** - RFC/OWASP standards + community patterns (emergent from real usage)
2. **Your authored claims** - Project-specific rules in `.aphoria/claims.toml`
The corpus is **emergent**: patterns with 95%+ adoption across projects auto-promote to authoritative status.
See [Claims-Based Verification](#claims-based-verification) below for creating your own claims.
---
## Output Formats
```bash
aphoria scan --format table # Human-readable (default)
aphoria scan --format json # Machine-readable
aphoria scan --format sarif # GitHub Security tab
aphoria scan --format markdown # Documentation
```
---
## Pre-commit Integration
```yaml
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: aphoria
name: Aphoria truth check
entry: aphoria scan --staged --exit-code
language: system
pass_filenames: false
```
---
## CI Integration (GitHub Actions)
```yaml
- name: Install Aphoria
run: cargo install --path applications/aphoria
- name: Run Aphoria Scan
run: aphoria scan --exit-code --format sarif > results.sarif
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: results.sarif
```
---
## Key Commands
### Scanning
| Command | Description |
|---------|-------------|
| `aphoria scan` | Scan for conflicts with authoritative sources |
| `aphoria ack` | Acknowledge a conflict as intentional |
| `aphoria bless` | Define a pattern as your authoritative standard |
### Claims Management
| Command | Description |
|---------|-------------|
| `aphoria claims create` | Author a new claim with provenance and consequences |
| `aphoria claims list` | List all authored claims |
| `aphoria claims explain` | Generate detailed claim explanations |
| `aphoria claims update` | Update an existing claim |
| `aphoria claims supersede` | Mark claim as superseded by newer claim |
| `aphoria claims deprecate` | Deprecate a claim with reason |
### Inline Markers
| Command | Description |
|---------|-------------|
| `aphoria claims list-markers` | List pending inline claim markers |
| `aphoria claims formalize-marker` | Convert marker to full claim |
| `aphoria claims reject-marker` | Reject an inline marker |
### Verification
| Command | Description |
|---------|-------------|
| `aphoria verify run` | Verify authored claims against codebase |
| `aphoria verify map` | Show extractor-to-claim coverage map |
### Policy & Governance
| Command | Description |
|---------|-------------|
| `aphoria policy export` | Export standards as a Trust Pack |
| `aphoria policy import` | Import a Trust Pack from your security team |
| `aphoria governance pending` | List approval requests (Phase 14) |
| `aphoria audit export` | Export audit trail for SOC 2 compliance |
See [CLI Reference](docs/cli-reference.md) for complete command documentation.
---
## Claims-Based Verification
Beyond scanning for RFC/OWASP conflicts, Aphoria supports **human-authored claims** that encode your project's architectural decisions and safety invariants.
### Quick Example
```bash
# Author a claim
aphoria claims create \
--id wallet-no-clone-001 \
--concept-path maxwell/core/wallet/type/wallet/derives \
--predicate traits \
--value Clone \
--comparison not_contains \
--provenance "Wallet is singleton with atomic state" \
--invariant "Wallet type MUST NOT derive Clone" \
--consequence "Clone allows multiple instances, breaking single-balance invariant" \
--tier expert \
--category safety \
--by jml
# Verify claim against codebase
aphoria verify run
# Output:
# PASS wallet-no-clone-001 | maxwell/core/wallet/type/wallet/derives/traits
# Clone not found (as expected)
```
### Comparison Modes
Claims support six comparison modes for different verification patterns:
- `equals` - Value must be exactly X
- `not_equals` - Value must NOT be X
- `present` - Something must exist at this path
- `absent` - Nothing should exist at this path
- `contains` - Value must contain substring/list element (e.g., "Serialize" in "Clone,Debug,Serialize")
- `not_contains` - Value must NOT contain substring/list element (e.g., "Clone" NOT in derives)
See [Comparison Modes Guide](docs/comparison-modes.md) for detailed examples and decision tree.
### Inline Markers
Mark claims directly in code with special comments:
```rust
// @aphoria:claim[safety] Wallet MUST NOT derive Clone
#[derive(Debug)]
pub struct Wallet { ... }
```
Then formalize them:
```bash
aphoria claims list-markers
aphoria claims formalize-marker marker-001 --id wallet-no-clone-001 --by jml
```
### Git Commit Tracking
Aphoria automatically captures the git commit hash when claims and observations are ingested. This provides:
- **Temporal context** - Know exactly which code version a claim was authored against
- **Audit trail** - Trace architectural decisions through git history
- **Graceful degradation** - Works seamlessly in non-git environments
The commit hash is stored in assertion metadata and captured at ingestion time (not when TOML files are edited), avoiding the "double-commit problem."
```json
{
"authored": true,
"git_commit": "de7af7c1b9e...",
"claim_id": "wallet-no-clone-001",
"provenance": "Wallet is singleton with atomic state"
}
```
---
## Conflict Verdicts
| Verdict | Description | CI Behavior |
|---------|-------------|-------------|
| **BLOCK** | High-confidence conflict with RFC/OWASP | Fails with `--exit-code` |
| **FLAG** | Moderate-confidence conflict | Passes, visible in report |
| **ACK** | Acknowledged conflict | Passes, tracked for audit |
| **PASS** | No conflict | - |
---
## Web Dashboard
Aphoria includes a web-based dashboard for visualizing scan results, managing claims, and exploring the authoritative corpus. See [`applications/aphoria-dashboard/`](../aphoria-dashboard/) for setup instructions.
Features:
- Real-time scan visualization
- Claims management interface
- Corpus exploration and search
- Policy governance workflows
---
## Documentation
### Guides
| Guide | Audience | Time |
|-------|----------|------|
| [Solo Developer Guide](docs/guides/solo-developer-guide.md) | Individual developers, side projects | 2 min |
| [Enterprise Pilot Guide](docs/guides/enterprise-pilot-guide.md) | Security teams running pilots | 4 weeks |
| [Enterprise Quick Start](docs/guides/enterprise-quick-start.md) | Platform engineering | 5 min |
| [The First Scan](docs/guides/the-first-scan.md) | Everyone | 10 min |
### Reference
| Document | Description |
|----------|-------------|
| [CLI Reference](docs/cli-reference.md) | Complete command documentation |
| [Comparison Modes](docs/comparison-modes.md) | Guide to claim comparison modes |
| [Vision & Gaps](docs/vision-gaps.md) | Architecture and implementation status |
---
## Research & Reference
### Vision & Architecture
| Document | Description |
|----------|-------------|
| [Vision](vision.md) | Product vision and aspirational architecture |
| [Protocol Vision](protocol_vision.md) | Protocol-level design philosophy |
| [Vision & Gaps](docs/vision-gaps.md) | Honest assessment of current state vs. vision |
| [Architecture Docs](docs/architecture/README.md) | System design, concept matching, extension points |
### Testing & Validation
| Document | Description |
|----------|-------------|
| [UAT Reports](../../uat/README.md) | User acceptance testing results |
| [Phase 6 UAT](../../uat/phase6-uat.md) | Detailed validation of policy workflows |
| [Real-World Policy Source UAT](../../uat/2026-02-04-uat-real-world-policy-source.md) | Trust Pack workflow validation |
### Gap Analysis & Research
| Document | Description |
|----------|-------------|
| [Gap Analysis: Institutional Knowledge](docs/gap-analysis-institutional-knowledge.md) | Analysis of knowledge capture gaps |
| [Gap Fixes Summary](docs/gap-fixes-summary.md) | Summary of addressed gaps |
---
## What Aphoria Is Not
- **Not a linter.** Linters check syntax. Aphoria checks decisions against authoritative sources.
- **Not SAST.** SAST finds vulnerability patterns. Aphoria finds contradictions to specific standards.
- **Not AI autocomplete.** Copilot suggests code from the internet. Aphoria surfaces *your org's* decisions at the moment you contradict them.
---
## License
See [LICENSE](../../LICENSE) for details.