stemedb/applications/aphoria/spec.md

# Aphoria Technical Spec

**Status:** Draft
**Date:** 2026-02-02

---

## Overview

Aphoria is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources.

```
aphoria scan <project-root> [--config aphoria.toml] [--format table|json|sarif|markdown]
aphoria ack <concept-path> --reason "..."
aphoria baseline
aphoria diff
aphoria status
```

---

## Architecture

```
┌──────────────────────────────────────────────────────────────┐
│                        aphoria CLI                          │
│                                                              │
│  ┌──────────┐   ┌────────────┐   ┌──────────┐   ┌────────┐ │
│  │ Walker   │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │
│  │          │   │            │   │          │   │        │ │
│  │ fs walk  │   │ tls_verify │   │ bridge   │   │ table  │ │
│  │ lang det │   │ jwt_config │   │ to       │   │ json   │ │
│  │ path map │   │ secrets    │   │ episteme │   │ sarif  │ │
│  │          │   │ timeouts   │   │          │   │ md     │ │
│  │          │   │ deps       │   │          │   │        │ │
│  │          │   │ cors       │   │          │   │        │ │
│  │          │   │ rate_limit │   │          │   │        │ │
│  └──────────┘   └────────────┘   └──────────┘   └────────┘ │
│                                       │              ▲      │
│                                       ▼              │      │
│                              ┌──────────────┐        │      │
│                              │   Episteme   │────────┘      │
│                              │   (local)    │  query +      │
│                              │              │  conflict     │
│                              └──────────────┘  scores       │
└──────────────────────────────────────────────────────────────┘
```

Aphoria depends on:
- `stemedb-core` (types: ConceptPath, Assertion, SourceClass)
- `stemedb-storage` (KVStore, IndexStore, AliasStore)
- `stemedb-ingest` (ingestion pipeline)
- `stemedb-query` (query engine, lenses)

It does **not** depend on `stemedb-api`. Aphoria talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed).

---

## Crate Structure

```
crates/
  aphoria/
    Cargo.toml
    src/
      main.rs              CLI entrypoint (clap)
      config.rs            aphoria.toml parsing
      walker/
        mod.rs             Project walker orchestration
        language.rs        Language detection
        path_mapper.rs     Directory → ConceptPath mapping
        normalizer.rs      Path normalization rules per language
      extractors/
        mod.rs             Extractor trait + registry
        tls_verify.rs      TLS certificate verification
        jwt_config.rs      JWT validation settings
        hardcoded_secrets.rs  Credentials in source
        timeout_config.rs  HTTP/DB/Redis timeouts
        dep_versions.rs    Vulnerable dependency versions
        cors_config.rs     CORS allow-origin
        rate_limit.rs      Rate limiting config
      bridge.rs            ExtractedClaim → Assertion conversion
      conflict.rs          Conflict query + scoring
      report/
        mod.rs             Report generation orchestration
        table.rs           Terminal table output
        json.rs            JSON output
        sarif.rs           SARIF for CI integration
        markdown.rs        Markdown output
      ack.rs               Acknowledge command
      baseline.rs          Baseline snapshot
      diff.rs              Delta since last scan
```

---

## Configuration

`aphoria.toml` at project root (optional, sensible defaults):

```toml
[project]
name = "citadeldb"
language = "rust"            # auto-detected if omitted

[episteme]
data_dir = "~/.aphoria/db"  # local Episteme instance
# url = "http://localhost:3000"  # future: remote instance

[thresholds]
block = 0.7                  # conflict score >= this → BLOCK
flag = 0.4                   # conflict score >= this → FLAG
# below flag threshold → PASS (not reported)

[extractors]
enabled = ["tls_verify", "jwt_config", "hardcoded_secrets", "timeout_config", "dep_versions", "cors_config", "rate_limit"]
# disabled = ["rate_limit"]  # alternative: disable specific ones

[extractors.timeout_config]
min_reasonable_ms = 1000     # flag timeouts below this
max_reasonable_ms = 300000   # flag timeouts above this

[extractors.dep_versions]
advisory_db = "~/.aphoria/advisory-db"  # rustsec/advisory-db clone

[scan]
exclude = ["target/", "node_modules/", ".git/", "vendor/"]
max_file_size = 1048576      # skip files > 1MB

[aliases]
auto_suggest = true          # suggest aliases when shared leaves detected
auto_accept_tier0 = true     # auto-accept alias suggestions to Tier 0 sources
```

---

## Walker

### Language Detection

Priority order:
1. Explicit `language` in `aphoria.toml`
2. Dominant language heuristic (count files by extension)
3. Per-file extension mapping

| Extension | Language |
|-----------|----------|
| `.rs` | rust |
| `.go` | go |
| `.py` | python |
| `.ts`, `.tsx` | typescript |
| `.js`, `.jsx` | javascript |
| `.yaml`, `.yml` | yaml |
| `.toml` | toml |
| `.json` | json |
| `.env`, `.env.*` | dotenv |
| `Dockerfile`, `docker-compose.*` | docker |
| `Cargo.toml` | cargo-manifest |
| `go.mod` | go-mod |
| `package.json` | npm-manifest |
| `requirements.txt`, `pyproject.toml` | python-manifest |

### Path Mapping

Directory structure maps to ConceptPath segments. Language-specific stripping rules remove boilerplate directories:

**Rust:**
```
Strip: src/, crates/
Keep: everything else

crates/citadeldb/src/auth/jwt.rs
  → ["rust", "citadeldb", "auth", "jwt"]
  → code://rust/citadeldb/auth/jwt/{leaf from extractor}
```

**Go:**
```
Strip: cmd/, internal/, pkg/
Keep: everything else

internal/auth/jwt/validator.go
  → ["go", "{module_name}", "auth", "jwt"]
  → code://go/{module_name}/auth/jwt/{leaf}
```

**Python:**
```
Strip: src/, lib/
Keep: everything else

src/auth/jwt_handler.py
  → ["python", "{package_name}", "auth", "jwt_handler"]
  → code://python/{package_name}/auth/jwt_handler/{leaf}
```

**Config files:**
```
config/production.yaml
  → code://config/{project_name}/production/{leaf}

.env.production
  → code://config/{project_name}/env_production/{leaf}

docker-compose.yml
  → code://docker/{project_name}/{leaf}
```

The project name comes from:
1. `aphoria.toml` `project.name`
2. `Cargo.toml` `[package] name`
3. `go.mod` module name (last segment)
4. `package.json` `name`
5. Directory name of project root

### File Filtering

Skip:
- Directories in `scan.exclude` list
- Files larger than `scan.max_file_size`
- Binary files (detected by null byte in first 8KB)
- Generated files (`*.generated.*`, `*.pb.go`, `*_generated.rs`)
- Test files (configurable: include or exclude)

---

## Extractors

### Trait Definition

```rust
/// A claim extractor that finds implicit decisions in source code.
pub trait Extractor: Send + Sync {
    /// Unique identifier for this extractor.
    fn name(&self) -> &str;

    /// File types this extractor operates on.
    fn languages(&self) -> &[Language];

    /// Extract claims from a file's content.
    ///
    /// - `path_segments`: The ConceptPath segments derived from the file's location.
    /// - `content`: The file content as a string.
    /// - `language`: The detected language of the file.
    ///
    /// Returns zero or more extracted claims.
    fn extract(
        &self,
        path_segments: &[String],
        content: &str,
        language: Language,
    ) -> Vec<ExtractedClaim>;
}
```

### ExtractedClaim

```rust
/// A claim extracted from source code by an Extractor.
pub struct ExtractedClaim {
    /// The full ConceptPath for this claim.
    /// Scheme is always "code" for code-extracted claims.
    pub concept_path: ConceptPath,

    /// The predicate describing what aspect of the concept this claims.
    /// Examples: "enabled", "config_value", "version", "allow_origin"
    pub predicate: String,

    /// The extracted value.
    pub value: ObjectValue,

    /// Source file path relative to project root.
    pub file: String,

    /// Line number in the source file (1-indexed).
    pub line: usize,

    /// The matched source text (the actual code/config that was matched).
    pub matched_text: String,

    /// Confidence of extraction.
    /// 1.0 for exact regex matches.
    /// Lower for heuristic matches.
    pub confidence: f32,

    /// Human-readable description of what was found.
    /// Example: "JWT audience validation is disabled"
    pub description: String,
}
```

### Extractor: tls_verify

**What it finds:** TLS/SSL certificate verification disabled.

**Patterns:**

Rust (reqwest):
```
Pattern: danger_accept_invalid_certs\s*\(\s*true\s*\)
Leaf: cert_verification
Predicate: enabled
Value: Boolean(false)
```

Rust (native-tls):
```
Pattern: accept_invalid_certs\s*\(\s*true\s*\)
```

Go (net/http):
```
Pattern: InsecureSkipVerify\s*:\s*true
```

Python (requests):
```
Pattern: verify\s*=\s*False
```

Node.js:
```
Pattern: rejectUnauthorized\s*:\s*false
Pattern: NODE_TLS_REJECT_UNAUTHORIZED.*['"]0['"]
```

YAML/TOML/JSON config:
```
Pattern: (tls_verify|ssl_verify|verify_ssl|verify_tls)\s*[:=]\s*(false|no|0|off)
```

### Extractor: jwt_config

**What it finds:** JWT validation settings.

**Claims extracted per finding:**

| Leaf | Predicate | What it means |
|------|-----------|---------------|
| `audience_validation` | `enabled` | Whether `aud` claim is validated |
| `expiry_validation` | `enabled` | Whether `exp` claim is validated |
| `algorithm_restriction` | `config_value` | Allowed algorithms (or "none" if unrestricted) |
| `signature_verification` | `enabled` | Whether signatures are verified |

**Patterns (Rust, jsonwebtoken crate):**

```
// aud validation
Pattern: set_audience.*\[\]|validate_aud.*false|aud.*None
Leaf: audience_validation
Value: Boolean(false)

// Dangerous: algorithm none
Pattern: Algorithm::None|alg.*none|allow_none.*true
Leaf: algorithm_restriction
Value: Text("none_allowed")

// Signature skip
Pattern: dangerous_insecure|skip_signature|verify.*false
Leaf: signature_verification
Value: Boolean(false)
```

**Patterns (Go, golang-jwt):**

```
Pattern: jwt\.Parse\(.*func\(.*\*jwt\.Token\).*\{[^}]*return.*signingKey
         (without any algorithm check in the callback)
```

This is a heuristic match (confidence < 1.0) — detecting missing validation is harder than detecting explicit disabling.

### Extractor: hardcoded_secrets

**What it finds:** Credentials, API keys, tokens in source (not in .env or .gitignore'd files).

**Patterns:**

```
// API keys
Pattern: (api[_-]?key|apikey)\s*[:=]\s*["'][A-Za-z0-9_\-]{20,}["']
Leaf: api_key_storage
Predicate: storage_method
Value: Text("hardcoded")

// Passwords
Pattern: (password|passwd|pwd)\s*[:=]\s*["'][^"']+["']
  (excluding: "password", "changeme", "placeholder", "CHANGE_ME", "xxx", test patterns)
Leaf: password_storage
Predicate: storage_method
Value: Text("hardcoded")

// AWS keys
Pattern: (AKIA[0-9A-Z]{16})
Leaf: aws_credentials
Predicate: storage_method
Value: Text("hardcoded")

// Private keys
Pattern: -----BEGIN (RSA |EC |DSA )?PRIVATE KEY-----
Leaf: private_key_storage
Predicate: storage_method
Value: Text("hardcoded_in_source")
```

**Exclusions:** Files matching `*test*`, `*example*`, `*fixture*`, `*mock*` are scanned but findings are marked with lower confidence (0.5).

### Extractor: timeout_config

**What it finds:** HTTP client, database, and cache timeout values.

**Patterns:**

```
// Zero/infinite timeout
Pattern: timeout\s*[:=]\s*(0|None|null|nil|infinity|Inf)
Leaf: {context}/timeout  (context from surrounding code: http, db, redis, etc.)
Predicate: config_value
Value: Number(0.0)
Description: "Timeout disabled (infinite wait)"

// Unreasonably low timeout
Pattern: timeout\s*[:=]\s*(\d+)
  where value_ms < config.min_reasonable_ms
Leaf: {context}/timeout
Value: Number(extracted_value)
Description: "Timeout {value}ms below minimum reasonable {min}ms"

// Unreasonably high timeout
Pattern: timeout\s*[:=]\s*(\d+)
  where value_ms > config.max_reasonable_ms
```

**Unit detection:** Heuristic based on magnitude and surrounding context:
- Value > 1000000 → likely nanoseconds
- Value > 1000 and < 1000000 → likely milliseconds
- Value < 100 → likely seconds
- Presence of "ms", "sec", "Duration::from_secs" → explicit unit

### Extractor: dep_versions

**What it finds:** Dependencies with known vulnerabilities.

**Sources:**
- `Cargo.toml` → check against RustSec Advisory DB
- `go.mod` → check against Go Vulnerability Database
- `package.json` → check against npm audit advisories
- `requirements.txt` / `pyproject.toml` → check against PyPI advisory data

**Output:**

```
Leaf: dep/{package_name}/version
Predicate: installed_version
Value: Text("1.0.2")
Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX"
```

The advisory databases are downloaded locally and refreshed periodically. Aphoria doesn't call external APIs during scan.

### Extractor: cors_config

**What it finds:** Overly permissive CORS configuration.

**Patterns:**

```
// Allow all origins
Pattern: allow_origin\s*\(\s*["']\*["']\s*\)|Access-Control-Allow-Origin.*\*|AllowAllOrigins.*true
Leaf: cors/allow_origin
Predicate: config_value
Value: Text("*")
Description: "CORS allows all origins"

// Allow credentials with wildcard
Pattern: (allow_credentials|AllowCredentials).*true
  (in proximity to allow_origin *)
Leaf: cors/credentials_with_wildcard
Predicate: enabled
Value: Boolean(true)
Description: "CORS allows credentials with wildcard origin"
```

### Extractor: rate_limit

**What it finds:** Rate limiting disabled or set unreasonably high.

**Patterns:**

```
// Rate limiting disabled
Pattern: (rate_limit|ratelimit).*disabled|rate_limit\s*[:=]\s*(0|false|off|none)
Leaf: rate_limit/enabled
Predicate: enabled
Value: Boolean(false)

// Unreasonably high limit
Pattern: (rate_limit|ratelimit|max_requests)\s*[:=]\s*(\d+)
  where value > 10000 per minute (configurable)
Leaf: rate_limit/max_requests
Predicate: config_value
Value: Number(extracted_value)
```

---

## Ingestion Bridge

### Claim → Assertion Mapping

```rust
fn to_assertion(
    claim: &ExtractedClaim,
    agent_keypair: &Ed25519Keypair,
    scan_timestamp: u64,
) -> Assertion {
    let source_metadata = serde_json::to_vec(&json!({
        "file": claim.file,
        "line": claim.line,
        "matched_text": claim.matched_text,
        "extractor": claim.concept_path.leaf(),
        "scan_tool": "aphoria",
        "scan_version": env!("CARGO_PKG_VERSION"),
    }));

    let source_hash = blake3::hash(
        format!("{}:{}:{}", claim.file, claim.line, claim.matched_text).as_bytes()
    );

    Assertion {
        subject: claim.concept_path.to_string(),  // EntityId = String
        predicate: claim.predicate.clone(),
        object: claim.value.clone(),
        parent_hash: None,
        source_hash: *source_hash.as_bytes(),
        source_class: SourceClass::Expert,  // code:// scheme default
        visual_hash: None,
        epoch: None,
        source_metadata: source_metadata.ok(),
        lifecycle: LifecycleStage::Approved,
        signatures: vec![sign(agent_keypair, &claim)],
        confidence: claim.confidence,
        timestamp: scan_timestamp,
        vector: None,
    }
}
```

### Idempotency

Same code produces the same claims. Same claims produce the same assertion hashes (content-addressed). Re-scanning a project that hasn't changed ingests nothing new. This is guaranteed by BLAKE3 content addressing in the existing Episteme pipeline.

When code changes between scans, new assertions are created. Old assertions remain (append-only). The `diff` command compares the current scan's assertions against the last scan's to show what changed.

### Scan Metadata

Each scan is recorded as an assertion about itself:

```
Subject: aphoria://scan/{project_name}/{scan_id}
Predicate: completed
Object: Text(json!({
    "project": "citadeldb",
    "files_scanned": 142,
    "claims_extracted": 23,
    "conflicts_found": 3,
    "blocks": 2,
    "flags": 1,
    "timestamp": 1706832000
}))
```

This enables `aphoria diff` — compare two scan records and their associated assertions.

---

## Conflict Detection

### Query Strategy

After ingestion, for each extracted claim:

```rust
async fn check_conflict(
    claim: &ExtractedClaim,
    query_engine: &QueryEngine,
) -> Option<ConflictResult> {
    // 1. Query with Skeptic lens, resolving aliases
    let results = query_engine.query(Query {
        subject: Some(claim.concept_path.to_string()),
        predicate: Some(claim.predicate.clone()),
        lens: Some("skeptic".to_string()),
        resolve_aliases: true,
        source_class_decay: true,
        ..Default::default()
    }).await;

    // 2. Check if any authoritative source disagrees
    let code_value = &claim.value;
    let mut conflicts = Vec::new();

    for assertion in &results.assertions {
        if assertion.source_class.tier() < 3  // Tier 0, 1, or 2
            && assertion.object != *code_value  // Different value
        {
            conflicts.push(ConflictingSource {
                path: assertion.subject.clone(),
                source_class: assertion.source_class,
                value: assertion.object.clone(),
                confidence: assertion.confidence,
            });
        }
    }

    if conflicts.is_empty() {
        return None;
    }

    // 3. Compute conflict score
    //    Higher when tier spread is larger and authoritative sources are confident
    let max_tier_weight = conflicts.iter()
        .map(|c| c.source_class.authority_weight())
        .max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
        .unwrap_or(0.0);

    let code_weight = SourceClass::Expert.authority_weight(); // 0.5

    let conflict_score = max_tier_weight * (1.0 - code_weight);
    // Tier 0 vs Tier 3: 1.0 * 0.5 = 0.50 (minimum, boosted below)
    // Boosted by confidence of the authoritative source

    let boosted_score = conflict_score
        * conflicts.iter().map(|c| c.confidence).max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)).unwrap_or(1.0);

    // Normalize: tier spread 0→3 maps to 0.4→0.95
    let tier_spread = conflicts.iter()
        .map(|c| c.source_class.tier())
        .min()
        .unwrap_or(3) as f32;
    let normalized = 0.4 + (3.0 - tier_spread) / 3.0 * 0.55;
    let final_score = normalized.max(boosted_score);

    Some(ConflictResult {
        claim: claim.clone(),
        conflicts,
        conflict_score: final_score,
        verdict: if final_score >= threshold_block { Verdict::Block }
                 else if final_score >= threshold_flag { Verdict::Flag }
                 else { Verdict::Pass },
    })
}
```

### Verdict Levels

| Verdict | Condition | Meaning |
|---------|-----------|---------|
| BLOCK | `conflict_score >= 0.7` | Authoritative source strongly contradicts. Fix or explicitly acknowledge. |
| FLAG | `conflict_score >= 0.4` | Potential disagreement. Review recommended. |
| PASS | `conflict_score < 0.4` | No significant conflict (or no authoritative data). |
| ACK | Any score, acknowledged | Conflict exists but has been explicitly accepted. |

### Acknowledged Conflicts

When a conflict has been acknowledged (via `aphoria ack`), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG:

```
  ACK    code://rust/citadeldb/auth/jwt/audience_validation
         Your code:  aud validation disabled        (src/auth/jwt.rs:47)
         RFC 7519:   aud validation MUST be enabled  (Tier 0)
         Acknowledged: 2026-01-15 by jordan
         Reason: "Internal service, no external JWT consumers. SEC-2024-003."
```

The acknowledgment doesn't suppress the conflict. It adds context. A future `--strict` mode can treat acknowledged conflicts as blocks again (for audits).

---

## Report Formats

### SARIF (for CI)

SARIF (Static Analysis Results Interchange Format) is the standard for CI security tools. GitHub, GitLab, and Azure DevOps all consume it.

```json
{
  "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/main/sarif-2.1/schema/sarif-schema-2.1.0.json",
  "version": "2.1.0",
  "runs": [{
    "tool": {
      "driver": {
        "name": "aphoria",
        "version": "0.1.0",
        "informationUri": "https://github.com/orchard9/aphoria"
      }
    },
    "results": [{
      "ruleId": "epistemic-drift/tls-verify",
      "level": "error",
      "message": {
        "text": "TLS certificate verification disabled. OWASP requires verification (Tier 1, conflict score 0.87)."
      },
      "locations": [{
        "physicalLocation": {
          "artifactLocation": { "uri": "src/net/client.rs" },
          "region": { "startLine": 23 }
        }
      }]
    }]
  }]
}
```

### JSON (for programmatic consumption)

```json
{
  "project": "citadeldb",
  "scan_id": "abc123",
  "timestamp": 1706832000,
  "summary": {
    "files_scanned": 142,
    "claims_extracted": 23,
    "conflicts": 3,
    "blocks": 2,
    "flags": 1
  },
  "conflicts": [
    {
      "concept_path": "code://rust/citadeldb/auth/jwt/audience_validation",
      "predicate": "enabled",
      "code_value": false,
      "file": "src/auth/jwt.rs",
      "line": 47,
      "conflict_score": 0.92,
      "verdict": "BLOCK",
      "conflicting_sources": [
        {
          "path": "rfc://7519/jwt/audience_validation",
          "source_class": "Regulatory",
          "value": true,
          "confidence": 1.0
        }
      ],
      "acknowledged": null
    }
  ]
}
```

---

## Baseline and Diff

### Baseline

`aphoria baseline` records the current scan as the baseline. Subsequent scans only report *new* conflicts.

Implementation: store the baseline scan ID in `.aphoria/baseline` in the project root. The `diff` logic compares the current scan's conflict set against the baseline's.

```
.aphoria/
  baseline        # scan ID of the baseline
  config.toml     # symlink or copy of aphoria.toml
  agent.key       # Ed25519 keypair for this project's Aphoria agent
```

### Diff

`aphoria diff` shows:
- New conflicts (in current scan but not baseline)
- Resolved conflicts (in baseline but not current scan)
- Changed conflicts (same concept, different score or verdict)

```
$ aphoria diff

  NEW    code://rust/citadeldb/cache/redis/max_connections
         Your code:  max_connections = 10000         (config/redis.yaml:5)
         Vendor:     recommended max 128 per instance (Tier 2)
         Conflict:   0.48 — FLAG

  RESOLVED  code://rust/citadeldb/net/tls/cert_verification
         Previously: verify = false → BLOCK
         Current:    verify = true  → PASS

1 new conflict, 1 resolved, 0 changed.
```

---

## Agent Keypair

Aphoria signs assertions with a per-project Ed25519 keypair stored in `.aphoria/agent.key`. Generated on first `aphoria scan` if it doesn't exist.

The keypair identifies "Aphoria scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables:
- Per-project audit trails ("which Aphoria agent found this?")
- TrustRank per Aphoria instance (a well-calibrated Aphoria gains reputation)
- Distinguishing human-authored assertions from Aphoria-extracted ones

---

## Episteme Instance

### Local Mode (Default)

Aphoria ships with an embedded Episteme instance. No server needed. The database lives at `~/.aphoria/db/` (configurable). Multiple projects share the same local instance — their assertions are namespaced by ConceptPath (`code://rust/citadeldb/...` vs `code://go/other-project/...`).

The authoritative corpus (RFCs, OWASP) is also in the local instance. `aphoria init` bootstraps it.

```
$ aphoria init
Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested.
Downloading OWASP cheat sheets ... 89 assertions ingested.
Ready. Run `aphoria scan <project>` to begin.
```

### Remote Mode (Future)

```toml
[episteme]
url = "https://episteme.example.com"
api_key = "${APHORIA_API_KEY}"
```

In remote mode, Aphoria ingests into and queries from a shared Episteme instance. This enables:
- Cross-project conflict detection ("same JWT misconfiguration in 12 repos")
- Shared authoritative corpus (ingested once, used by all Aphoria agents)
- Centralized acknowledgment management

---

## Exit Codes

| Code | Meaning |
|------|---------|
| 0 | No conflicts above threshold |
| 1 | FLAG-level conflicts found (with `--exit-code`) |
| 2 | BLOCK-level conflicts found (with `--exit-code`) |
| 3 | Scan error (file access, Episteme connection, etc.) |

`--exit-code` enables non-zero exits. Without it, Aphoria always exits 0 (for interactive use where the report is the output, not the exit code).

---

## Performance Targets

| Metric | Target |
|--------|--------|
| Scan time, 1000-file Rust project | < 5 seconds |
| Scan time, 10000-file monorepo | < 30 seconds |
| Per-file extraction (all extractors) | < 5ms |
| Conflict query per claim | < 10ms |
| Local Episteme startup | < 100ms |
| Memory usage during scan | < 256MB |

The performance bottleneck is I/O (reading files), not extraction (regex matching). The conflict query is a local KV lookup, not a network call.

---

## Dependencies

| Dependency | Purpose |
|------------|---------|
| `clap` | CLI argument parsing |
| `ignore` | File walking (respects .gitignore, fast) |
| `regex` | Pattern matching in extractors |
| `serde` + `serde_json` | Config parsing, JSON output |
| `toml` | aphoria.toml parsing |
| `comfy-table` | Terminal table output |
| `stemedb-core` | Types |
| `stemedb-storage` | Local KV store |
| `stemedb-ingest` | Assertion ingestion |
| `stemedb-query` | Conflict queries |
| `ed25519-dalek` | Agent keypair + signing |
| `blake3` | Content hashing |

No LLM dependency. No network dependency (in local mode). No runtime other than tokio (for async KV store operations).