stemedb/applications/aphoria/spec.md
jml e95c978481 feat(aphoria): add inline claim markers and claim enrichment infrastructure
This commit implements Phase 17 of the Aphoria roadmap, adding:

**Inline Claim Markers (@aphoria:claim):**
- New extractor for detecting inline markers in comments
- Pending markers tracked in .aphoria/pending_markers.toml
- CLI commands: list-markers, formalize-marker, reject-marker
- Support for all major comment styles (Rust, Python, SQL, etc.)
- Auto-sync during scan (configurable)

**Claim Enrichment:**
- ClaimEnrichment type with source attribution (inline, extractor, manual)
- EnrichedClaimInfo with full enrichment metadata
- Extended AuthoredClaim with optional enrichment field
- API endpoints for enriched claim queries
- Dashboard UI components (enrichment badge, verdict badge)

**Enhanced Extractor Trait:**
- verifiable_predicates() method for declaring (tail_path, predicate) pairs
- 10 security extractors now implement verifiable_predicates
- Enables claim suggester skill to find unclaimed patterns

**Documentation:**
- Phase 17 summary with complete implementation details
- Gap fixes summary documenting 8 closed vision gaps
- Updated CLI reference with new commands
- New aphoria-docs skill for documentation maintenance
- Updated roadmap with Phase 17 completion

**Integration:**
- ClaimsFile support for claim enrichment persistence
- Pattern aggregate store support for enrichment queries
- Dashboard filters and display for enrichment metadata
- API handlers for list-markers and enrichment queries

**Tests:**
- New gap_fixes_integration test suite
- Corpus enricher module with best practices ingestion

Closes: VG-005, VG-017, VG-018, VG-019, VG-020, VG-021, VG-022, VG-023

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 20:18:20 +00:00

917 lines
28 KiB
Markdown

# Aphoria Technical Spec
**Status:** Draft
**Date:** 2026-02-02
---
## Overview
Aphoria is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources.
```
aphoria scan <project-root> [--config aphoria.toml] [--format table|json|sarif|markdown]
aphoria ack <concept-path> --reason "..."
aphoria baseline
aphoria diff
aphoria status
```
---
## Architecture
```
┌──────────────────────────────────────────────────────────────┐
│ aphoria CLI │
│ │
│ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────┐ │
│ │ Walker │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │
│ │ │ │ │ │ │ │ │ │
│ │ fs walk │ │ tls_verify │ │ bridge │ │ table │ │
│ │ lang det │ │ jwt_config │ │ to │ │ json │ │
│ │ path map │ │ secrets │ │ episteme │ │ sarif │ │
│ │ │ │ timeouts │ │ │ │ md │ │
│ │ │ │ deps │ │ │ │ │ │
│ │ │ │ cors │ │ │ │ │ │
│ │ │ │ rate_limit │ │ │ │ │ │
│ └──────────┘ └────────────┘ └──────────┘ └────────┘ │
│ │ ▲ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │ Episteme │────────┘ │
│ │ (local) │ query + │
│ │ │ conflict │
│ └──────────────┘ scores │
└──────────────────────────────────────────────────────────────┘
```
Aphoria depends on:
- `stemedb-core` (types: ConceptPath, Assertion, SourceClass)
- `stemedb-storage` (KVStore, IndexStore, AliasStore)
- `stemedb-ingest` (ingestion pipeline)
- `stemedb-query` (query engine, lenses)
It does **not** depend on `stemedb-api`. Aphoria talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed).
---
## Crate Structure
```
crates/
aphoria/
Cargo.toml
src/
main.rs CLI entrypoint (clap)
config.rs aphoria.toml parsing
walker/
mod.rs Project walker orchestration
language.rs Language detection
path_mapper.rs Directory → ConceptPath mapping
normalizer.rs Path normalization rules per language
extractors/
mod.rs Extractor trait + registry
tls_verify.rs TLS certificate verification
jwt_config.rs JWT validation settings
hardcoded_secrets.rs Credentials in source
timeout_config.rs HTTP/DB/Redis timeouts
dep_versions.rs Vulnerable dependency versions
cors_config.rs CORS allow-origin
rate_limit.rs Rate limiting config
corpus/
mod.rs CorpusBuilder trait
rfc.rs RFC ingestion (Tier 0)
owasp.rs OWASP ingestion (Tier 1)
vendor.rs Vendor docs (Tier 2)
policy.rs Local policy ingestion (Tier 0 Override)
bridge.rs Observation → Assertion conversion
conflict.rs Conflict query + scoring
report/
mod.rs Report generation orchestration
table.rs Terminal table output
json.rs JSON output
sarif.rs SARIF for CI integration
markdown.rs Markdown output
ack.rs Acknowledge command
baseline.rs Baseline snapshot
diff.rs Delta since last scan
```
---
## Configuration
`aphoria.toml` at project root (optional, sensible defaults):
```toml
[project]
name = "citadeldb"
language = "rust" # auto-detected if omitted
[episteme]
data_dir = "~/.aphoria/db" # local Episteme instance
# url = "http://localhost:18180" # future: remote instance
[thresholds]
block = 0.7 # conflict score >= this → BLOCK
flag = 0.4 # conflict score >= this → FLAG
# below flag threshold → PASS (not reported)
[extractors]
enabled = ["tls_verify", "jwt_config", "hardcoded_secrets", "timeout_config", "dep_versions", "cors_config", "rate_limit"]
# disabled = ["rate_limit"] # alternative: disable specific ones
[extractors.timeout_config]
min_reasonable_ms = 1000 # flag timeouts below this
max_reasonable_ms = 300000 # flag timeouts above this
[extractors.dep_versions]
advisory_db = "~/.aphoria/advisory-db" # rustsec/advisory-db clone
[scan]
exclude = ["target/", "node_modules/", ".git/", "vendor/"]
max_file_size = 1048576 # skip files > 1MB
[aliases]
auto_suggest = true # suggest aliases when shared leaves detected
auto_accept_tier0 = true # auto-accept alias suggestions to Tier 0 sources
```
---
## Walker
### Language Detection
Priority order:
1. Explicit `language` in `aphoria.toml`
2. Dominant language heuristic (count files by extension)
3. Per-file extension mapping
| Extension | Language |
|-----------|----------|
| `.rs` | rust |
| `.go` | go |
| `.py` | python |
| `.ts`, `.tsx` | typescript |
| `.js`, `.jsx` | javascript |
| `.yaml`, `.yml` | yaml |
| `.toml` | toml |
| `.json` | json |
| `.env`, `.env.*` | dotenv |
| `Dockerfile`, `docker-compose.*` | docker |
| `Cargo.toml` | cargo-manifest |
| `go.mod` | go-mod |
| `package.json` | npm-manifest |
| `requirements.txt`, `pyproject.toml` | python-manifest |
### Path Mapping
Directory structure maps to ConceptPath segments. Language-specific stripping rules remove boilerplate directories:
**Rust:**
```
Strip: src/, crates/
Keep: everything else
crates/citadeldb/src/auth/jwt.rs
→ ["rust", "citadeldb", "auth", "jwt"]
→ code://rust/citadeldb/auth/jwt/{leaf from extractor}
```
**Go:**
```
Strip: cmd/, internal/, pkg/
Keep: everything else
internal/auth/jwt/validator.go
→ ["go", "{module_name}", "auth", "jwt"]
→ code://go/{module_name}/auth/jwt/{leaf}
```
**Python:**
```
Strip: src/, lib/
Keep: everything else
src/auth/jwt_handler.py
→ ["python", "{package_name}", "auth", "jwt_handler"]
→ code://python/{package_name}/auth/jwt_handler/{leaf}
```
**Config files:**
```
config/production.yaml
→ code://config/{project_name}/production/{leaf}
.env.production
→ code://config/{project_name}/env_production/{leaf}
docker-compose.yml
→ code://docker/{project_name}/{leaf}
```
The project name comes from:
1. `aphoria.toml` `project.name`
2. `Cargo.toml` `[package] name`
3. `go.mod` module name (last segment)
4. `package.json` `name`
5. Directory name of project root
### File Filtering
Skip:
- Directories in `scan.exclude` list
- Files larger than `scan.max_file_size`
- Binary files (detected by null byte in first 8KB)
- Generated files (`*.generated.*`, `*.pb.go`, `*_generated.rs`)
- Test files (configurable: include or exclude)
---
## Extractors
### Trait Definition
```rust
/// A claim extractor that finds implicit decisions in source code.
pub trait Extractor: Send + Sync {
/// Unique identifier for this extractor.
fn name(&self) -> &str;
/// File types this extractor operates on.
fn languages(&self) -> &[Language];
/// Extract claims from a file's content.
///
/// - `path_segments`: The ConceptPath segments derived from the file's location.
/// - `content`: The file content as a string.
/// - `language`: The detected language of the file.
///
/// Returns zero or more extracted observations.
fn extract(
&self,
path_segments: &[String],
content: &str,
language: Language,
) -> Vec<Observation>;
}
```
### Observation
```rust
/// An observation extracted from source code by an Extractor.
pub struct Observation {
/// The full ConceptPath for this claim.
/// Scheme is always "code" for code-extracted claims.
pub concept_path: ConceptPath,
/// The predicate describing what aspect of the concept this claims.
/// Examples: "enabled", "config_value", "version", "allow_origin"
pub predicate: String,
/// The extracted value.
pub value: ObjectValue,
/// Source file path relative to project root.
pub file: String,
/// Line number in the source file (1-indexed).
pub line: usize,
/// The matched source text (the actual code/config that was matched).
pub matched_text: String,
/// Confidence of extraction.
/// 1.0 for exact regex matches.
/// Lower for heuristic matches.
pub confidence: f32,
/// Human-readable description of what was found.
/// Example: "JWT audience validation is disabled"
pub description: String,
}
```
### Extractor: tls_verify
**What it finds:** TLS/SSL certificate verification disabled.
**Patterns:**
Rust (reqwest):
```
Pattern: danger_accept_invalid_certs\s*\(\s*true\s*\)
Leaf: cert_verification
Predicate: enabled
Value: Boolean(false)
```
Rust (native-tls):
```
Pattern: accept_invalid_certs\s*\(\s*true\s*\)
```
Go (net/http):
```
Pattern: InsecureSkipVerify\s*:\s*true
```
Python (requests):
```
Pattern: verify\s*=\s*False
```
Node.js:
```
Pattern: rejectUnauthorized\s*:\s*false
Pattern: NODE_TLS_REJECT_UNAUTHORIZED.*['"]0['"]
```
YAML/TOML/JSON config:
```
Pattern: (tls_verify|ssl_verify|verify_ssl|verify_tls)\s*[:=]\s*(false|no|0|off)
```
### Extractor: jwt_config
**What it finds:** JWT validation settings.
**Claims extracted per finding:**
| Leaf | Predicate | What it means |
|------|-----------|---------------|
| `audience_validation` | `enabled` | Whether `aud` claim is validated |
| `expiry_validation` | `enabled` | Whether `exp` claim is validated |
| `algorithm_restriction` | `config_value` | Allowed algorithms (or "none" if unrestricted) |
| `signature_verification` | `enabled` | Whether signatures are verified |
**Patterns (Rust, jsonwebtoken crate):**
```
// aud validation
Pattern: set_audience.*\[\]|validate_aud.*false|aud.*None
Leaf: audience_validation
Value: Boolean(false)
// Dangerous: algorithm none
Pattern: Algorithm::None|alg.*none|allow_none.*true
Leaf: algorithm_restriction
Value: Text("none_allowed")
// Signature skip
Pattern: dangerous_insecure|skip_signature|verify.*false
Leaf: signature_verification
Value: Boolean(false)
```
**Patterns (Go, golang-jwt):**
```
Pattern: jwt\.Parse\(.*func\(.*\*jwt\.Token\).*\{[^}]*return.*signingKey
(without any algorithm check in the callback)
```
This is a heuristic match (confidence < 1.0) detecting missing validation is harder than detecting explicit disabling.
### Extractor: hardcoded_secrets
**What it finds:** Credentials, API keys, tokens in source (not in .env or .gitignore'd files).
**Patterns:**
```
// API keys
Pattern: (api[_-]?key|apikey)\s*[:=]\s*["'][A-Za-z0-9_\-]{20,}["']
Leaf: api_key_storage
Predicate: storage_method
Value: Text("hardcoded")
// Passwords
Pattern: (password|passwd|pwd)\s*[:=]\s*["'][^"']+["']
(excluding: "password", "changeme", "placeholder", "CHANGE_ME", "xxx", test patterns)
Leaf: password_storage
Predicate: storage_method
Value: Text("hardcoded")
// AWS keys
Pattern: (AKIA[0-9A-Z]{16})
Leaf: aws_credentials
Predicate: storage_method
Value: Text("hardcoded")
// Private keys
Pattern: -----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----
Leaf: private_key_storage
Predicate: storage_method
Value: Text("hardcoded_in_source")
```
**Exclusions:** Files matching `*test*`, `*example*`, `*fixture*`, `*mock*` are scanned but findings are marked with lower confidence (0.5).
### Extractor: timeout_config
**What it finds:** HTTP client, database, and cache timeout values.
**Patterns:**
```
// Zero/infinite timeout
Pattern: timeout\s*[:=]\s*(0|None|null|nil|infinity|Inf)
Leaf: {context}/timeout (context from surrounding code: http, db, redis, etc.)
Predicate: config_value
Value: Number(0.0)
Description: "Timeout disabled (infinite wait)"
// Unreasonably low timeout
Pattern: timeout\s*[:=]\s*(\d+)
where value_ms < config.min_reasonable_ms
Leaf: {context}/timeout
Value: Number(extracted_value)
Description: "Timeout {value}ms below minimum reasonable {min}ms"
// Unreasonably high timeout
Pattern: timeout\s*[:=]\s*(\d+)
where value_ms > config.max_reasonable_ms
```
**Unit detection:** Heuristic based on magnitude and surrounding context:
- Value > 1000000 → likely nanoseconds
- Value > 1000 and < 1000000 likely milliseconds
- Value < 100 likely seconds
- Presence of "ms", "sec", "Duration::from_secs" explicit unit
### Extractor: dep_versions
**What it finds:** Dependencies with known vulnerabilities.
**Sources:**
- `Cargo.toml` check against RustSec Advisory DB
- `go.mod` check against Go Vulnerability Database
- `package.json` check against npm audit advisories
- `requirements.txt` / `pyproject.toml` check against PyPI advisory data
**Output:**
```
Leaf: dep/{package_name}/version
Predicate: installed_version
Value: Text("1.0.2")
Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX"
```
The advisory databases are downloaded locally and refreshed periodically. Aphoria doesn't call external APIs during scan.
### Extractor: cors_config
**What it finds:** Overly permissive CORS configuration.
**Patterns:**
```
// Allow all origins
Pattern: allow_origin\s*\(\s*["']\*["']\s*\)|Access-Control-Allow-Origin.*\*|AllowAllOrigins.*true
Leaf: cors/allow_origin
Predicate: config_value
Value: Text("*")
Description: "CORS allows all origins"
// Allow credentials with wildcard
Pattern: (allow_credentials|AllowCredentials).*true
(in proximity to allow_origin *)
Leaf: cors/credentials_with_wildcard
Predicate: enabled
Value: Boolean(true)
Description: "CORS allows credentials with wildcard origin"
```
### Extractor: rate_limit
**What it finds:** Rate limiting disabled or set unreasonably high.
**Patterns:**
```
// Rate limiting disabled
Pattern: (rate_limit|ratelimit).*disabled|rate_limit\s*[:=]\s*(0|false|off|none)
Leaf: rate_limit/enabled
Predicate: enabled
Value: Boolean(false)
// Unreasonably high limit
Pattern: (rate_limit|ratelimit|max_requests)\s*[:=]\s*(\d+)
where value > 10000 per minute (configurable)
Leaf: rate_limit/max_requests
Predicate: config_value
Value: Number(extracted_value)
```
---
## Dynamic Application Policy (Phase 6)
### PolicyCorpusBuilder
A corpus builder that ingests assertions from a local `aphoria-policy.yaml` file. This allows teams to define "Application Truth" that overrides RFCs or Vendor defaults.
**File Format (`aphoria-policy.yaml`):**
```yaml
rules:
- path: "code://rust/my-app/db/pool_size"
predicate: "config_value"
value: 50
tier: "Regulatory" # Tier 0 (overrides everything)
message: "Internal policy: max 50 conns to prevent storms."
- path: "code://go/legacy-service/tls/version"
predicate: "min_version"
value: "1.2"
tier: "Clinical" # Tier 1
message: "Legacy clients require TLS 1.2 support."
```
**Ingestion:**
- Each rule becomes a Tier 0 or Tier 1 Assertion.
- Source is set to `SourceClass::Regulatory` (for Tier 0) or `SourceClass::Clinical` (for Tier 1).
- Conflict detection treats these as authoritative truths.
**Enterprise Lens:**
A specialized StemeDB Lens that resolves conflicts by prioritizing `Policy` assertions over `RFC` or `Vendor` assertions when they overlap on the same ConceptPath.
---
## Ingestion Bridge
### Claim → Assertion Mapping
```rust
fn to_assertion(
claim: &Observation,
agent_keypair: &Ed25519Keypair,
scan_timestamp: u64,
) -> Assertion {
let source_metadata = serde_json::to_vec(&json!({
"file": claim.file,
"line": claim.line,
"matched_text": claim.matched_text,
"extractor": claim.concept_path.leaf(),
"scan_tool": "aphoria",
"scan_version": env!("CARGO_PKG_VERSION"),
}));
let source_hash = blake3::hash(
format!("{}:{}:{}", claim.file, claim.line, claim.matched_text).as_bytes()
);
Assertion {
subject: claim.concept_path.to_string(), // EntityId = String
predicate: claim.predicate.clone(),
object: claim.value.clone(),
parent_hash: None,
source_hash: *source_hash.as_bytes(),
source_class: SourceClass::Expert, // code:// scheme default
visual_hash: None,
epoch: None,
source_metadata: source_metadata.ok(),
lifecycle: LifecycleStage::Approved,
signatures: vec![sign(agent_keypair, &claim)],
confidence: claim.confidence,
timestamp: scan_timestamp,
vector: None,
}
}
```
### Idempotency
Same code produces the same claims. Same claims produce the same assertion hashes (content-addressed). Re-scanning a project that hasn't changed ingests nothing new. This is guaranteed by BLAKE3 content addressing in the existing Episteme pipeline.
When code changes between scans, new assertions are created. Old assertions remain (append-only). The `diff` command compares the current scan's assertions against the last scan's to show what changed.
### Scan Metadata
Each scan is recorded as an assertion about itself:
```
Subject: aphoria://scan/{project_name}/{scan_id}
Predicate: completed
Object: Text(json!({
"project": "citadeldb",
"files_scanned": 142,
"claims_extracted": 23,
"conflicts_found": 3,
"blocks": 2,
"flags": 1,
"timestamp": 1706832000
}))
```
This enables `aphoria diff` compare two scan records and their associated assertions.
---
## Conflict Detection
### Query Strategy
After ingestion, for each extracted claim:
```rust
async fn check_conflict(
claim: &Observation,
query_engine: &QueryEngine,
) -> Option<ConflictResult> {
// 1. Query with Skeptic lens, resolving aliases
let results = query_engine.query(Query {
subject: Some(claim.concept_path.to_string()),
predicate: Some(claim.predicate.clone()),
lens: Some("skeptic".to_string()),
resolve_aliases: true,
source_class_decay: true,
..Default::default()
}).await;
// 2. Check if any authoritative source disagrees
let code_value = &claim.value;
let mut conflicts = Vec::new();
for assertion in &results.assertions {
if assertion.source_class.tier() < 3 // Tier 0, 1, or 2
&& assertion.object != *code_value // Different value
{
conflicts.push(ConflictingSource {
path: assertion.subject.clone(),
source_class: assertion.source_class,
value: assertion.object.clone(),
confidence: assertion.confidence,
});
}
}
if conflicts.is_empty() {
return None;
}
// 3. Compute conflict score
// Higher when tier spread is larger and authoritative sources are confident
let max_tier_weight = conflicts.iter()
.map(|c| c.source_class.authority_weight())
.max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
.unwrap_or(0.0);
let code_weight = SourceClass::Expert.authority_weight(); // 0.5
let conflict_score = max_tier_weight * (1.0 - code_weight);
// Tier 0 vs Tier 3: 1.0 * 0.5 = 0.50 (minimum, boosted below)
// Boosted by confidence of the authoritative source
let boosted_score = conflict_score
* conflicts.iter().map(|c| c.confidence).max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)).unwrap_or(1.0);
// Normalize: tier spread 0→3 maps to 0.4→0.95
let tier_spread = conflicts.iter()
.map(|c| c.source_class.tier())
.min()
.unwrap_or(3) as f32;
let normalized = 0.4 + (3.0 - tier_spread) / 3.0 * 0.55;
let final_score = normalized.max(boosted_score);
Some(ConflictResult {
claim: claim.clone(),
conflicts,
conflict_score: final_score,
verdict: if final_score >= threshold_block { Verdict::Block }
else if final_score >= threshold_flag { Verdict::Flag }
else { Verdict::Pass },
})
}
```
### Verdict Levels
| Verdict | Condition | Meaning |
|---------|-----------|---------|
| BLOCK | `conflict_score >= 0.7` | Authoritative source strongly contradicts. Fix or explicitly acknowledge. |
| FLAG | `conflict_score >= 0.4` | Potential disagreement. Review recommended. |
| PASS | `conflict_score < 0.4` | No significant conflict (or no authoritative data). |
| ACK | Any score, acknowledged | Conflict exists but has been explicitly accepted. |
### Acknowledged Conflicts
When a conflict has been acknowledged (via `aphoria ack`), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG:
```
ACK code://rust/citadeldb/auth/jwt/audience_validation
Your code: aud validation disabled (src/auth/jwt.rs:47)
RFC 7519: aud validation MUST be enabled (Tier 0)
Acknowledged: 2026-01-15 by jordan
Reason: "Internal service, no external JWT consumers. SEC-2024-003."
```
The acknowledgment doesn't suppress the conflict. It adds context. A future `--strict` mode can treat acknowledged conflicts as blocks again (for audits).
---
## Report Formats
### SARIF (for CI)
SARIF (Static Analysis Results Interchange Format) is the standard for CI security tools. GitHub, GitLab, and Azure DevOps all consume it.
```json
{
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/main/sarif-2.1/schema/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [{
"tool": {
"driver": {
"name": "aphoria",
"version": "0.1.0",
"informationUri": "https://github.com/orchard9/aphoria"
}
},
"results": [{
"ruleId": "epistemic-drift/tls-verify",
"level": "error",
"message": {
"text": "TLS certificate verification disabled. OWASP requires verification (Tier 1, conflict score 0.87)."
},
"locations": [{
"physicalLocation": {
"artifactLocation": { "uri": "src/net/client.rs" },
"region": { "startLine": 23 }
}
}]
}]
}]
}
```
### JSON (for programmatic consumption)
```json
{
"project": "citadeldb",
"scan_id": "abc123",
"timestamp": 1706832000,
"summary": {
"files_scanned": 142,
"claims_extracted": 23,
"conflicts": 3,
"blocks": 2,
"flags": 1
},
"conflicts": [
{
"concept_path": "code://rust/citadeldb/auth/jwt/audience_validation",
"predicate": "enabled",
"code_value": false,
"file": "src/auth/jwt.rs",
"line": 47,
"conflict_score": 0.92,
"verdict": "BLOCK",
"conflicting_sources": [
{
"path": "rfc://7519/jwt/audience_validation",
"source_class": "Regulatory",
"value": true,
"confidence": 1.0
}
],
"acknowledged": null
}
]
}
```
---
## Baseline and Diff
### Baseline
`aphoria baseline` records the current scan as the baseline. Subsequent scans only report *new* conflicts.
Implementation: store the baseline scan ID in `.aphoria/baseline` in the project root. The `diff` logic compares the current scan's conflict set against the baseline's.
```
.aphoria/
baseline # scan ID of the baseline
config.toml # symlink or copy of aphoria.toml
agent.key # Ed25519 keypair for this project's Aphoria agent
```
### Diff
`aphoria diff` shows:
- New conflicts (in current scan but not baseline)
- Resolved conflicts (in baseline but not current scan)
- Changed conflicts (same concept, different score or verdict)
```
$ aphoria diff
NEW code://rust/citadeldb/cache/redis/max_connections
Your code: max_connections = 10000 (config/redis.yaml:5)
Vendor: recommended max 128 per instance (Tier 2)
Conflict: 0.48 — FLAG
RESOLVED code://rust/citadeldb/net/tls/cert_verification
Previously: verify = false → BLOCK
Current: verify = true → PASS
1 new conflict, 1 resolved, 0 changed.
```
---
## Agent Keypair
Aphoria signs assertions with a per-project Ed25519 keypair stored in `.aphoria/agent.key`. Generated on first `aphoria scan` if it doesn't exist.
The keypair identifies "Aphoria scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables:
- Per-project audit trails ("which Aphoria agent found this?")
- TrustRank per Aphoria instance (a well-calibrated Aphoria gains reputation)
- Distinguishing human-authored assertions from Aphoria-extracted ones
---
## Episteme Instance
### Local Mode (Default)
Aphoria ships with an embedded Episteme instance. No server needed. The database lives at `~/.aphoria/db/` (configurable). Multiple projects share the same local instance their assertions are namespaced by ConceptPath (`code://rust/citadeldb/...` vs `code://go/other-project/...`).
The authoritative corpus (RFCs, OWASP) is also in the local instance. `aphoria init` bootstraps it.
```
$ aphoria init
Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested.
Downloading OWASP cheat sheets ... 89 assertions ingested.
Ready. Run `aphoria scan <project>` to begin.
```
### Remote Mode (Future)
```toml
[episteme]
url = "https://episteme.example.com"
api_key = "${APHORIA_API_KEY}"
```
In remote mode, Aphoria ingests into and queries from a shared Episteme instance. This enables:
- Cross-project conflict detection ("same JWT misconfiguration in 12 repos")
- Shared authoritative corpus (ingested once, used by all Aphoria agents)
- Centralized acknowledgment management
---
## Exit Codes
| Code | Meaning |
|------|---------|
| 0 | No conflicts above threshold |
| 1 | FLAG-level conflicts found (with `--exit-code`) |
| 2 | BLOCK-level conflicts found (with `--exit-code`) |
| 3 | Scan error (file access, Episteme connection, etc.) |
`--exit-code` enables non-zero exits. Without it, Aphoria always exits 0 (for interactive use where the report is the output, not the exit code).
---
## Performance Targets
| Metric | Target |
|--------|--------|
| Scan time, 1000-file Rust project | < 5 seconds |
| Scan time, 10000-file monorepo | < 30 seconds |
| Per-file extraction (all extractors) | < 5ms |
| Conflict query per claim | < 10ms |
| Local Episteme startup | < 100ms |
| Memory usage during scan | < 256MB |
The performance bottleneck is I/O (reading files), not extraction (regex matching). The conflict query is a local KV lookup, not a network call.
---
## Dependencies
| Dependency | Purpose |
|------------|---------|
| `clap` | CLI argument parsing |
| `ignore` | File walking (respects .gitignore, fast) |
| `regex` | Pattern matching in extractors |
| `serde` + `serde_json` | Config parsing, JSON output |
| `toml` | aphoria.toml parsing |
| `comfy-table` | Terminal table output |
| `stemedb-core` | Types |
| `stemedb-storage` | Local KV store |
| `stemedb-ingest` | Assertion ingestion |
| `stemedb-query` | Conflict queries |
| `ed25519-dalek` | Agent keypair + signing |
| `blake3` | Content hashing |
No LLM dependency. No network dependency (in local mode). No runtime other than tokio (for async KV store operations).