Major additions: - Community Next.js app (port 18187) for browsing claims with API docs - stemedb-chaos crate: Fault injection, chaos testing, CRDT properties - Latent ingestion system: Reddit/FDA ingesters with ADK-Go agents - Disputed claims handling: Manual review workflows and validation - Aphoria security scanner: New extractors (SQL injection, command injection, weak crypto, TLS version), policy-based ignores, UAT reports - Docker infrastructure: Dockerfile, docker-compose.yml for full stack - VulnBank demo: Intentionally vulnerable multi-language test corpus SDK & API enhancements: - Source registry handlers for tracking data provenance - Metrics endpoint - Skeptic filtering improvements Code quality: - Split 14 large files (>500 lines) into focused modules - All files now under 500-line limit per project guidelines Documentation: - Chaos testing guide, circuit breakers, observability docs - Phase 7 UAT documentation updates - Martin Kleppmann technical writer agent Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
28 KiB
Aphoria Technical Spec
Status: Draft Date: 2026-02-02
Overview
Aphoria is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources.
aphoria scan <project-root> [--config aphoria.toml] [--format table|json|sarif|markdown]
aphoria ack <concept-path> --reason "..."
aphoria baseline
aphoria diff
aphoria status
Architecture
┌──────────────────────────────────────────────────────────────┐
│ aphoria CLI │
│ │
│ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────┐ │
│ │ Walker │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │
│ │ │ │ │ │ │ │ │ │
│ │ fs walk │ │ tls_verify │ │ bridge │ │ table │ │
│ │ lang det │ │ jwt_config │ │ to │ │ json │ │
│ │ path map │ │ secrets │ │ episteme │ │ sarif │ │
│ │ │ │ timeouts │ │ │ │ md │ │
│ │ │ │ deps │ │ │ │ │ │
│ │ │ │ cors │ │ │ │ │ │
│ │ │ │ rate_limit │ │ │ │ │ │
│ └──────────┘ └────────────┘ └──────────┘ └────────┘ │
│ │ ▲ │
│ ▼ │ │
│ ┌──────────────┐ │ │
│ │ Episteme │────────┘ │
│ │ (local) │ query + │
│ │ │ conflict │
│ └──────────────┘ scores │
└──────────────────────────────────────────────────────────────┘
Aphoria depends on:
stemedb-core(types: ConceptPath, Assertion, SourceClass)stemedb-storage(KVStore, IndexStore, AliasStore)stemedb-ingest(ingestion pipeline)stemedb-query(query engine, lenses)
It does not depend on stemedb-api. Aphoria talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed).
Crate Structure
crates/
aphoria/
Cargo.toml
src/
main.rs CLI entrypoint (clap)
config.rs aphoria.toml parsing
walker/
mod.rs Project walker orchestration
language.rs Language detection
path_mapper.rs Directory → ConceptPath mapping
normalizer.rs Path normalization rules per language
extractors/
mod.rs Extractor trait + registry
tls_verify.rs TLS certificate verification
jwt_config.rs JWT validation settings
hardcoded_secrets.rs Credentials in source
timeout_config.rs HTTP/DB/Redis timeouts
dep_versions.rs Vulnerable dependency versions
cors_config.rs CORS allow-origin
rate_limit.rs Rate limiting config
corpus/
mod.rs CorpusBuilder trait
rfc.rs RFC ingestion (Tier 0)
owasp.rs OWASP ingestion (Tier 1)
vendor.rs Vendor docs (Tier 2)
policy.rs Local policy ingestion (Tier 0 Override)
bridge.rs ExtractedClaim → Assertion conversion
conflict.rs Conflict query + scoring
report/
mod.rs Report generation orchestration
table.rs Terminal table output
json.rs JSON output
sarif.rs SARIF for CI integration
markdown.rs Markdown output
ack.rs Acknowledge command
baseline.rs Baseline snapshot
diff.rs Delta since last scan
Configuration
aphoria.toml at project root (optional, sensible defaults):
[project]
name = "citadeldb"
language = "rust" # auto-detected if omitted
[episteme]
data_dir = "~/.aphoria/db" # local Episteme instance
# url = "http://localhost:18180" # future: remote instance
[thresholds]
block = 0.7 # conflict score >= this → BLOCK
flag = 0.4 # conflict score >= this → FLAG
# below flag threshold → PASS (not reported)
[extractors]
enabled = ["tls_verify", "jwt_config", "hardcoded_secrets", "timeout_config", "dep_versions", "cors_config", "rate_limit"]
# disabled = ["rate_limit"] # alternative: disable specific ones
[extractors.timeout_config]
min_reasonable_ms = 1000 # flag timeouts below this
max_reasonable_ms = 300000 # flag timeouts above this
[extractors.dep_versions]
advisory_db = "~/.aphoria/advisory-db" # rustsec/advisory-db clone
[scan]
exclude = ["target/", "node_modules/", ".git/", "vendor/"]
max_file_size = 1048576 # skip files > 1MB
[aliases]
auto_suggest = true # suggest aliases when shared leaves detected
auto_accept_tier0 = true # auto-accept alias suggestions to Tier 0 sources
Walker
Language Detection
Priority order:
- Explicit
languageinaphoria.toml - Dominant language heuristic (count files by extension)
- Per-file extension mapping
| Extension | Language |
|---|---|
.rs |
rust |
.go |
go |
.py |
python |
.ts, .tsx |
typescript |
.js, .jsx |
javascript |
.yaml, .yml |
yaml |
.toml |
toml |
.json |
json |
.env, .env.* |
dotenv |
Dockerfile, docker-compose.* |
docker |
Cargo.toml |
cargo-manifest |
go.mod |
go-mod |
package.json |
npm-manifest |
requirements.txt, pyproject.toml |
python-manifest |
Path Mapping
Directory structure maps to ConceptPath segments. Language-specific stripping rules remove boilerplate directories:
Rust:
Strip: src/, crates/
Keep: everything else
crates/citadeldb/src/auth/jwt.rs
→ ["rust", "citadeldb", "auth", "jwt"]
→ code://rust/citadeldb/auth/jwt/{leaf from extractor}
Go:
Strip: cmd/, internal/, pkg/
Keep: everything else
internal/auth/jwt/validator.go
→ ["go", "{module_name}", "auth", "jwt"]
→ code://go/{module_name}/auth/jwt/{leaf}
Python:
Strip: src/, lib/
Keep: everything else
src/auth/jwt_handler.py
→ ["python", "{package_name}", "auth", "jwt_handler"]
→ code://python/{package_name}/auth/jwt_handler/{leaf}
Config files:
config/production.yaml
→ code://config/{project_name}/production/{leaf}
.env.production
→ code://config/{project_name}/env_production/{leaf}
docker-compose.yml
→ code://docker/{project_name}/{leaf}
The project name comes from:
aphoria.tomlproject.nameCargo.toml[package] namego.modmodule name (last segment)package.jsonname- Directory name of project root
File Filtering
Skip:
- Directories in
scan.excludelist - Files larger than
scan.max_file_size - Binary files (detected by null byte in first 8KB)
- Generated files (
*.generated.*,*.pb.go,*_generated.rs) - Test files (configurable: include or exclude)
Extractors
Trait Definition
/// A claim extractor that finds implicit decisions in source code.
pub trait Extractor: Send + Sync {
/// Unique identifier for this extractor.
fn name(&self) -> &str;
/// File types this extractor operates on.
fn languages(&self) -> &[Language];
/// Extract claims from a file's content.
///
/// - `path_segments`: The ConceptPath segments derived from the file's location.
/// - `content`: The file content as a string.
/// - `language`: The detected language of the file.
///
/// Returns zero or more extracted claims.
fn extract(
&self,
path_segments: &[String],
content: &str,
language: Language,
) -> Vec<ExtractedClaim>;
}
ExtractedClaim
/// A claim extracted from source code by an Extractor.
pub struct ExtractedClaim {
/// The full ConceptPath for this claim.
/// Scheme is always "code" for code-extracted claims.
pub concept_path: ConceptPath,
/// The predicate describing what aspect of the concept this claims.
/// Examples: "enabled", "config_value", "version", "allow_origin"
pub predicate: String,
/// The extracted value.
pub value: ObjectValue,
/// Source file path relative to project root.
pub file: String,
/// Line number in the source file (1-indexed).
pub line: usize,
/// The matched source text (the actual code/config that was matched).
pub matched_text: String,
/// Confidence of extraction.
/// 1.0 for exact regex matches.
/// Lower for heuristic matches.
pub confidence: f32,
/// Human-readable description of what was found.
/// Example: "JWT audience validation is disabled"
pub description: String,
}
Extractor: tls_verify
What it finds: TLS/SSL certificate verification disabled.
Patterns:
Rust (reqwest):
Pattern: danger_accept_invalid_certs\s*\(\s*true\s*\)
Leaf: cert_verification
Predicate: enabled
Value: Boolean(false)
Rust (native-tls):
Pattern: accept_invalid_certs\s*\(\s*true\s*\)
Go (net/http):
Pattern: InsecureSkipVerify\s*:\s*true
Python (requests):
Pattern: verify\s*=\s*False
Node.js:
Pattern: rejectUnauthorized\s*:\s*false
Pattern: NODE_TLS_REJECT_UNAUTHORIZED.*['"]0['"]
YAML/TOML/JSON config:
Pattern: (tls_verify|ssl_verify|verify_ssl|verify_tls)\s*[:=]\s*(false|no|0|off)
Extractor: jwt_config
What it finds: JWT validation settings.
Claims extracted per finding:
| Leaf | Predicate | What it means |
|---|---|---|
audience_validation |
enabled |
Whether aud claim is validated |
expiry_validation |
enabled |
Whether exp claim is validated |
algorithm_restriction |
config_value |
Allowed algorithms (or "none" if unrestricted) |
signature_verification |
enabled |
Whether signatures are verified |
Patterns (Rust, jsonwebtoken crate):
// aud validation
Pattern: set_audience.*\[\]|validate_aud.*false|aud.*None
Leaf: audience_validation
Value: Boolean(false)
// Dangerous: algorithm none
Pattern: Algorithm::None|alg.*none|allow_none.*true
Leaf: algorithm_restriction
Value: Text("none_allowed")
// Signature skip
Pattern: dangerous_insecure|skip_signature|verify.*false
Leaf: signature_verification
Value: Boolean(false)
Patterns (Go, golang-jwt):
Pattern: jwt\.Parse\(.*func\(.*\*jwt\.Token\).*\{[^}]*return.*signingKey
(without any algorithm check in the callback)
This is a heuristic match (confidence < 1.0) — detecting missing validation is harder than detecting explicit disabling.
Extractor: hardcoded_secrets
What it finds: Credentials, API keys, tokens in source (not in .env or .gitignore'd files).
Patterns:
// API keys
Pattern: (api[_-]?key|apikey)\s*[:=]\s*["'][A-Za-z0-9_\-]{20,}["']
Leaf: api_key_storage
Predicate: storage_method
Value: Text("hardcoded")
// Passwords
Pattern: (password|passwd|pwd)\s*[:=]\s*["'][^"']+["']
(excluding: "password", "changeme", "placeholder", "CHANGE_ME", "xxx", test patterns)
Leaf: password_storage
Predicate: storage_method
Value: Text("hardcoded")
// AWS keys
Pattern: (AKIA[0-9A-Z]{16})
Leaf: aws_credentials
Predicate: storage_method
Value: Text("hardcoded")
// Private keys
Pattern: -----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----
Leaf: private_key_storage
Predicate: storage_method
Value: Text("hardcoded_in_source")
Exclusions: Files matching *test*, *example*, *fixture*, *mock* are scanned but findings are marked with lower confidence (0.5).
Extractor: timeout_config
What it finds: HTTP client, database, and cache timeout values.
Patterns:
// Zero/infinite timeout
Pattern: timeout\s*[:=]\s*(0|None|null|nil|infinity|Inf)
Leaf: {context}/timeout (context from surrounding code: http, db, redis, etc.)
Predicate: config_value
Value: Number(0.0)
Description: "Timeout disabled (infinite wait)"
// Unreasonably low timeout
Pattern: timeout\s*[:=]\s*(\d+)
where value_ms < config.min_reasonable_ms
Leaf: {context}/timeout
Value: Number(extracted_value)
Description: "Timeout {value}ms below minimum reasonable {min}ms"
// Unreasonably high timeout
Pattern: timeout\s*[:=]\s*(\d+)
where value_ms > config.max_reasonable_ms
Unit detection: Heuristic based on magnitude and surrounding context:
- Value > 1000000 → likely nanoseconds
- Value > 1000 and < 1000000 → likely milliseconds
- Value < 100 → likely seconds
- Presence of "ms", "sec", "Duration::from_secs" → explicit unit
Extractor: dep_versions
What it finds: Dependencies with known vulnerabilities.
Sources:
Cargo.toml→ check against RustSec Advisory DBgo.mod→ check against Go Vulnerability Databasepackage.json→ check against npm audit advisoriesrequirements.txt/pyproject.toml→ check against PyPI advisory data
Output:
Leaf: dep/{package_name}/version
Predicate: installed_version
Value: Text("1.0.2")
Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX"
The advisory databases are downloaded locally and refreshed periodically. Aphoria doesn't call external APIs during scan.
Extractor: cors_config
What it finds: Overly permissive CORS configuration.
Patterns:
// Allow all origins
Pattern: allow_origin\s*\(\s*["']\*["']\s*\)|Access-Control-Allow-Origin.*\*|AllowAllOrigins.*true
Leaf: cors/allow_origin
Predicate: config_value
Value: Text("*")
Description: "CORS allows all origins"
// Allow credentials with wildcard
Pattern: (allow_credentials|AllowCredentials).*true
(in proximity to allow_origin *)
Leaf: cors/credentials_with_wildcard
Predicate: enabled
Value: Boolean(true)
Description: "CORS allows credentials with wildcard origin"
Extractor: rate_limit
What it finds: Rate limiting disabled or set unreasonably high.
Patterns:
// Rate limiting disabled
Pattern: (rate_limit|ratelimit).*disabled|rate_limit\s*[:=]\s*(0|false|off|none)
Leaf: rate_limit/enabled
Predicate: enabled
Value: Boolean(false)
// Unreasonably high limit
Pattern: (rate_limit|ratelimit|max_requests)\s*[:=]\s*(\d+)
where value > 10000 per minute (configurable)
Leaf: rate_limit/max_requests
Predicate: config_value
Value: Number(extracted_value)
Dynamic Application Policy (Phase 6)
PolicyCorpusBuilder
A corpus builder that ingests assertions from a local aphoria-policy.yaml file. This allows teams to define "Application Truth" that overrides RFCs or Vendor defaults.
File Format (aphoria-policy.yaml):
rules:
- path: "code://rust/my-app/db/pool_size"
predicate: "config_value"
value: 50
tier: "Regulatory" # Tier 0 (overrides everything)
message: "Internal policy: max 50 conns to prevent storms."
- path: "code://go/legacy-service/tls/version"
predicate: "min_version"
value: "1.2"
tier: "Clinical" # Tier 1
message: "Legacy clients require TLS 1.2 support."
Ingestion:
- Each rule becomes a Tier 0 or Tier 1 Assertion.
- Source is set to
SourceClass::Regulatory(for Tier 0) orSourceClass::Clinical(for Tier 1). - Conflict detection treats these as authoritative truths.
Enterprise Lens:
A specialized StemeDB Lens that resolves conflicts by prioritizing Policy assertions over RFC or Vendor assertions when they overlap on the same ConceptPath.
Ingestion Bridge
Claim → Assertion Mapping
fn to_assertion(
claim: &ExtractedClaim,
agent_keypair: &Ed25519Keypair,
scan_timestamp: u64,
) -> Assertion {
let source_metadata = serde_json::to_vec(&json!({
"file": claim.file,
"line": claim.line,
"matched_text": claim.matched_text,
"extractor": claim.concept_path.leaf(),
"scan_tool": "aphoria",
"scan_version": env!("CARGO_PKG_VERSION"),
}));
let source_hash = blake3::hash(
format!("{}:{}:{}", claim.file, claim.line, claim.matched_text).as_bytes()
);
Assertion {
subject: claim.concept_path.to_string(), // EntityId = String
predicate: claim.predicate.clone(),
object: claim.value.clone(),
parent_hash: None,
source_hash: *source_hash.as_bytes(),
source_class: SourceClass::Expert, // code:// scheme default
visual_hash: None,
epoch: None,
source_metadata: source_metadata.ok(),
lifecycle: LifecycleStage::Approved,
signatures: vec![sign(agent_keypair, &claim)],
confidence: claim.confidence,
timestamp: scan_timestamp,
vector: None,
}
}
Idempotency
Same code produces the same claims. Same claims produce the same assertion hashes (content-addressed). Re-scanning a project that hasn't changed ingests nothing new. This is guaranteed by BLAKE3 content addressing in the existing Episteme pipeline.
When code changes between scans, new assertions are created. Old assertions remain (append-only). The diff command compares the current scan's assertions against the last scan's to show what changed.
Scan Metadata
Each scan is recorded as an assertion about itself:
Subject: aphoria://scan/{project_name}/{scan_id}
Predicate: completed
Object: Text(json!({
"project": "citadeldb",
"files_scanned": 142,
"claims_extracted": 23,
"conflicts_found": 3,
"blocks": 2,
"flags": 1,
"timestamp": 1706832000
}))
This enables aphoria diff — compare two scan records and their associated assertions.
Conflict Detection
Query Strategy
After ingestion, for each extracted claim:
async fn check_conflict(
claim: &ExtractedClaim,
query_engine: &QueryEngine,
) -> Option<ConflictResult> {
// 1. Query with Skeptic lens, resolving aliases
let results = query_engine.query(Query {
subject: Some(claim.concept_path.to_string()),
predicate: Some(claim.predicate.clone()),
lens: Some("skeptic".to_string()),
resolve_aliases: true,
source_class_decay: true,
..Default::default()
}).await;
// 2. Check if any authoritative source disagrees
let code_value = &claim.value;
let mut conflicts = Vec::new();
for assertion in &results.assertions {
if assertion.source_class.tier() < 3 // Tier 0, 1, or 2
&& assertion.object != *code_value // Different value
{
conflicts.push(ConflictingSource {
path: assertion.subject.clone(),
source_class: assertion.source_class,
value: assertion.object.clone(),
confidence: assertion.confidence,
});
}
}
if conflicts.is_empty() {
return None;
}
// 3. Compute conflict score
// Higher when tier spread is larger and authoritative sources are confident
let max_tier_weight = conflicts.iter()
.map(|c| c.source_class.authority_weight())
.max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
.unwrap_or(0.0);
let code_weight = SourceClass::Expert.authority_weight(); // 0.5
let conflict_score = max_tier_weight * (1.0 - code_weight);
// Tier 0 vs Tier 3: 1.0 * 0.5 = 0.50 (minimum, boosted below)
// Boosted by confidence of the authoritative source
let boosted_score = conflict_score
* conflicts.iter().map(|c| c.confidence).max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)).unwrap_or(1.0);
// Normalize: tier spread 0→3 maps to 0.4→0.95
let tier_spread = conflicts.iter()
.map(|c| c.source_class.tier())
.min()
.unwrap_or(3) as f32;
let normalized = 0.4 + (3.0 - tier_spread) / 3.0 * 0.55;
let final_score = normalized.max(boosted_score);
Some(ConflictResult {
claim: claim.clone(),
conflicts,
conflict_score: final_score,
verdict: if final_score >= threshold_block { Verdict::Block }
else if final_score >= threshold_flag { Verdict::Flag }
else { Verdict::Pass },
})
}
Verdict Levels
| Verdict | Condition | Meaning |
|---|---|---|
| BLOCK | conflict_score >= 0.7 |
Authoritative source strongly contradicts. Fix or explicitly acknowledge. |
| FLAG | conflict_score >= 0.4 |
Potential disagreement. Review recommended. |
| PASS | conflict_score < 0.4 |
No significant conflict (or no authoritative data). |
| ACK | Any score, acknowledged | Conflict exists but has been explicitly accepted. |
Acknowledged Conflicts
When a conflict has been acknowledged (via aphoria ack), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG:
ACK code://rust/citadeldb/auth/jwt/audience_validation
Your code: aud validation disabled (src/auth/jwt.rs:47)
RFC 7519: aud validation MUST be enabled (Tier 0)
Acknowledged: 2026-01-15 by jordan
Reason: "Internal service, no external JWT consumers. SEC-2024-003."
The acknowledgment doesn't suppress the conflict. It adds context. A future --strict mode can treat acknowledged conflicts as blocks again (for audits).
Report Formats
SARIF (for CI)
SARIF (Static Analysis Results Interchange Format) is the standard for CI security tools. GitHub, GitLab, and Azure DevOps all consume it.
{
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/main/sarif-2.1/schema/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [{
"tool": {
"driver": {
"name": "aphoria",
"version": "0.1.0",
"informationUri": "https://github.com/orchard9/aphoria"
}
},
"results": [{
"ruleId": "epistemic-drift/tls-verify",
"level": "error",
"message": {
"text": "TLS certificate verification disabled. OWASP requires verification (Tier 1, conflict score 0.87)."
},
"locations": [{
"physicalLocation": {
"artifactLocation": { "uri": "src/net/client.rs" },
"region": { "startLine": 23 }
}
}]
}]
}]
}
JSON (for programmatic consumption)
{
"project": "citadeldb",
"scan_id": "abc123",
"timestamp": 1706832000,
"summary": {
"files_scanned": 142,
"claims_extracted": 23,
"conflicts": 3,
"blocks": 2,
"flags": 1
},
"conflicts": [
{
"concept_path": "code://rust/citadeldb/auth/jwt/audience_validation",
"predicate": "enabled",
"code_value": false,
"file": "src/auth/jwt.rs",
"line": 47,
"conflict_score": 0.92,
"verdict": "BLOCK",
"conflicting_sources": [
{
"path": "rfc://7519/jwt/audience_validation",
"source_class": "Regulatory",
"value": true,
"confidence": 1.0
}
],
"acknowledged": null
}
]
}
Baseline and Diff
Baseline
aphoria baseline records the current scan as the baseline. Subsequent scans only report new conflicts.
Implementation: store the baseline scan ID in .aphoria/baseline in the project root. The diff logic compares the current scan's conflict set against the baseline's.
.aphoria/
baseline # scan ID of the baseline
config.toml # symlink or copy of aphoria.toml
agent.key # Ed25519 keypair for this project's Aphoria agent
Diff
aphoria diff shows:
- New conflicts (in current scan but not baseline)
- Resolved conflicts (in baseline but not current scan)
- Changed conflicts (same concept, different score or verdict)
$ aphoria diff
NEW code://rust/citadeldb/cache/redis/max_connections
Your code: max_connections = 10000 (config/redis.yaml:5)
Vendor: recommended max 128 per instance (Tier 2)
Conflict: 0.48 — FLAG
RESOLVED code://rust/citadeldb/net/tls/cert_verification
Previously: verify = false → BLOCK
Current: verify = true → PASS
1 new conflict, 1 resolved, 0 changed.
Agent Keypair
Aphoria signs assertions with a per-project Ed25519 keypair stored in .aphoria/agent.key. Generated on first aphoria scan if it doesn't exist.
The keypair identifies "Aphoria scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables:
- Per-project audit trails ("which Aphoria agent found this?")
- TrustRank per Aphoria instance (a well-calibrated Aphoria gains reputation)
- Distinguishing human-authored assertions from Aphoria-extracted ones
Episteme Instance
Local Mode (Default)
Aphoria ships with an embedded Episteme instance. No server needed. The database lives at ~/.aphoria/db/ (configurable). Multiple projects share the same local instance — their assertions are namespaced by ConceptPath (code://rust/citadeldb/... vs code://go/other-project/...).
The authoritative corpus (RFCs, OWASP) is also in the local instance. aphoria init bootstraps it.
$ aphoria init
Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested.
Downloading OWASP cheat sheets ... 89 assertions ingested.
Ready. Run `aphoria scan <project>` to begin.
Remote Mode (Future)
[episteme]
url = "https://episteme.example.com"
api_key = "${APHORIA_API_KEY}"
In remote mode, Aphoria ingests into and queries from a shared Episteme instance. This enables:
- Cross-project conflict detection ("same JWT misconfiguration in 12 repos")
- Shared authoritative corpus (ingested once, used by all Aphoria agents)
- Centralized acknowledgment management
Exit Codes
| Code | Meaning |
|---|---|
| 0 | No conflicts above threshold |
| 1 | FLAG-level conflicts found (with --exit-code) |
| 2 | BLOCK-level conflicts found (with --exit-code) |
| 3 | Scan error (file access, Episteme connection, etc.) |
--exit-code enables non-zero exits. Without it, Aphoria always exits 0 (for interactive use where the report is the output, not the exit code).
Performance Targets
| Metric | Target |
|---|---|
| Scan time, 1000-file Rust project | < 5 seconds |
| Scan time, 10000-file monorepo | < 30 seconds |
| Per-file extraction (all extractors) | < 5ms |
| Conflict query per claim | < 10ms |
| Local Episteme startup | < 100ms |
| Memory usage during scan | < 256MB |
The performance bottleneck is I/O (reading files), not extraction (regex matching). The conflict query is a local KV lookup, not a network call.
Dependencies
| Dependency | Purpose |
|---|---|
clap |
CLI argument parsing |
ignore |
File walking (respects .gitignore, fast) |
regex |
Pattern matching in extractors |
serde + serde_json |
Config parsing, JSON output |
toml |
aphoria.toml parsing |
comfy-table |
Terminal table output |
stemedb-core |
Types |
stemedb-storage |
Local KV store |
stemedb-ingest |
Assertion ingestion |
stemedb-query |
Conflict queries |
ed25519-dalek |
Agent keypair + signing |
blake3 |
Content hashing |
No LLM dependency. No network dependency (in local mode). No runtime other than tokio (for async KV store operations).