# Aphoria Technical Spec **Status:** Draft **Date:** 2026-02-02 --- ## Overview Aphoria is a CLI binary that scans codebases, extracts implicit claims from config and code, ingests them into a local Episteme instance, and reports conflicts against authoritative sources. ``` aphoria scan [--config aphoria.toml] [--format table|json|sarif|markdown] aphoria ack --reason "..." aphoria baseline aphoria diff aphoria status ``` --- ## Architecture ``` ┌──────────────────────────────────────────────────────────────┐ │ aphoria CLI │ │ │ │ ┌──────────┐ ┌────────────┐ ┌──────────┐ ┌────────┐ │ │ │ Walker │──▶│ Extractors │──▶│ Ingester │──▶│ Report │ │ │ │ │ │ │ │ │ │ │ │ │ │ fs walk │ │ tls_verify │ │ bridge │ │ table │ │ │ │ lang det │ │ jwt_config │ │ to │ │ json │ │ │ │ path map │ │ secrets │ │ episteme │ │ sarif │ │ │ │ │ │ timeouts │ │ │ │ md │ │ │ │ │ │ deps │ │ │ │ │ │ │ │ │ │ cors │ │ │ │ │ │ │ │ │ │ rate_limit │ │ │ │ │ │ │ └──────────┘ └────────────┘ └──────────┘ └────────┘ │ │ │ ▲ │ │ ▼ │ │ │ ┌──────────────┐ │ │ │ │ Episteme │────────┘ │ │ │ (local) │ query + │ │ │ │ conflict │ │ └──────────────┘ scores │ └──────────────────────────────────────────────────────────────┘ ``` Aphoria depends on: - `stemedb-core` (types: ConceptPath, Assertion, SourceClass) - `stemedb-storage` (KVStore, IndexStore, AliasStore) - `stemedb-ingest` (ingestion pipeline) - `stemedb-query` (query engine, lenses) It does **not** depend on `stemedb-api`. Aphoria talks to Episteme directly through the Rust crate APIs, not over HTTP. This makes it fast (no network round-trip) and self-contained (no server process needed). --- ## Crate Structure ``` crates/ aphoria/ Cargo.toml src/ main.rs CLI entrypoint (clap) config.rs aphoria.toml parsing walker/ mod.rs Project walker orchestration language.rs Language detection path_mapper.rs Directory → ConceptPath mapping normalizer.rs Path normalization rules per language extractors/ mod.rs Extractor trait + registry tls_verify.rs TLS certificate verification jwt_config.rs JWT validation settings hardcoded_secrets.rs Credentials in source timeout_config.rs HTTP/DB/Redis timeouts dep_versions.rs Vulnerable dependency versions cors_config.rs CORS allow-origin rate_limit.rs Rate limiting config corpus/ mod.rs CorpusBuilder trait rfc.rs RFC ingestion (Tier 0) owasp.rs OWASP ingestion (Tier 1) vendor.rs Vendor docs (Tier 2) policy.rs Local policy ingestion (Tier 0 Override) bridge.rs Observation → Assertion conversion conflict.rs Conflict query + scoring report/ mod.rs Report generation orchestration table.rs Terminal table output json.rs JSON output sarif.rs SARIF for CI integration markdown.rs Markdown output ack.rs Acknowledge command baseline.rs Baseline snapshot diff.rs Delta since last scan ``` --- ## Configuration `aphoria.toml` at project root (optional, sensible defaults): ```toml [project] name = "citadeldb" language = "rust" # auto-detected if omitted [episteme] data_dir = "~/.aphoria/db" # local Episteme instance # url = "http://localhost:18180" # future: remote instance [thresholds] block = 0.7 # conflict score >= this → BLOCK flag = 0.4 # conflict score >= this → FLAG # below flag threshold → PASS (not reported) [extractors] enabled = ["tls_verify", "jwt_config", "hardcoded_secrets", "timeout_config", "dep_versions", "cors_config", "rate_limit"] # disabled = ["rate_limit"] # alternative: disable specific ones [extractors.timeout_config] min_reasonable_ms = 1000 # flag timeouts below this max_reasonable_ms = 300000 # flag timeouts above this [extractors.dep_versions] advisory_db = "~/.aphoria/advisory-db" # rustsec/advisory-db clone [scan] exclude = ["target/", "node_modules/", ".git/", "vendor/"] max_file_size = 1048576 # skip files > 1MB [aliases] auto_suggest = true # suggest aliases when shared leaves detected auto_accept_tier0 = true # auto-accept alias suggestions to Tier 0 sources ``` --- ## Walker ### Language Detection Priority order: 1. Explicit `language` in `aphoria.toml` 2. Dominant language heuristic (count files by extension) 3. Per-file extension mapping | Extension | Language | |-----------|----------| | `.rs` | rust | | `.go` | go | | `.py` | python | | `.ts`, `.tsx` | typescript | | `.js`, `.jsx` | javascript | | `.yaml`, `.yml` | yaml | | `.toml` | toml | | `.json` | json | | `.env`, `.env.*` | dotenv | | `Dockerfile`, `docker-compose.*` | docker | | `Cargo.toml` | cargo-manifest | | `go.mod` | go-mod | | `package.json` | npm-manifest | | `requirements.txt`, `pyproject.toml` | python-manifest | ### Path Mapping Directory structure maps to ConceptPath segments. Language-specific stripping rules remove boilerplate directories: **Rust:** ``` Strip: src/, crates/ Keep: everything else crates/citadeldb/src/auth/jwt.rs → ["rust", "citadeldb", "auth", "jwt"] → code://rust/citadeldb/auth/jwt/{leaf from extractor} ``` **Go:** ``` Strip: cmd/, internal/, pkg/ Keep: everything else internal/auth/jwt/validator.go → ["go", "{module_name}", "auth", "jwt"] → code://go/{module_name}/auth/jwt/{leaf} ``` **Python:** ``` Strip: src/, lib/ Keep: everything else src/auth/jwt_handler.py → ["python", "{package_name}", "auth", "jwt_handler"] → code://python/{package_name}/auth/jwt_handler/{leaf} ``` **Config files:** ``` config/production.yaml → code://config/{project_name}/production/{leaf} .env.production → code://config/{project_name}/env_production/{leaf} docker-compose.yml → code://docker/{project_name}/{leaf} ``` The project name comes from: 1. `aphoria.toml` `project.name` 2. `Cargo.toml` `[package] name` 3. `go.mod` module name (last segment) 4. `package.json` `name` 5. Directory name of project root ### File Filtering Skip: - Directories in `scan.exclude` list - Files larger than `scan.max_file_size` - Binary files (detected by null byte in first 8KB) - Generated files (`*.generated.*`, `*.pb.go`, `*_generated.rs`) - Test files (configurable: include or exclude) --- ## Extractors ### Trait Definition ```rust /// A claim extractor that finds implicit decisions in source code. pub trait Extractor: Send + Sync { /// Unique identifier for this extractor. fn name(&self) -> &str; /// File types this extractor operates on. fn languages(&self) -> &[Language]; /// Extract claims from a file's content. /// /// - `path_segments`: The ConceptPath segments derived from the file's location. /// - `content`: The file content as a string. /// - `language`: The detected language of the file. /// /// Returns zero or more extracted observations. fn extract( &self, path_segments: &[String], content: &str, language: Language, ) -> Vec; } ``` ### Observation ```rust /// An observation extracted from source code by an Extractor. pub struct Observation { /// The full ConceptPath for this claim. /// Scheme is always "code" for code-extracted claims. pub concept_path: ConceptPath, /// The predicate describing what aspect of the concept this claims. /// Examples: "enabled", "config_value", "version", "allow_origin" pub predicate: String, /// The extracted value. pub value: ObjectValue, /// Source file path relative to project root. pub file: String, /// Line number in the source file (1-indexed). pub line: usize, /// The matched source text (the actual code/config that was matched). pub matched_text: String, /// Confidence of extraction. /// 1.0 for exact regex matches. /// Lower for heuristic matches. pub confidence: f32, /// Human-readable description of what was found. /// Example: "JWT audience validation is disabled" pub description: String, } ``` ### Extractor: tls_verify **What it finds:** TLS/SSL certificate verification disabled. **Patterns:** Rust (reqwest): ``` Pattern: danger_accept_invalid_certs\s*\(\s*true\s*\) Leaf: cert_verification Predicate: enabled Value: Boolean(false) ``` Rust (native-tls): ``` Pattern: accept_invalid_certs\s*\(\s*true\s*\) ``` Go (net/http): ``` Pattern: InsecureSkipVerify\s*:\s*true ``` Python (requests): ``` Pattern: verify\s*=\s*False ``` Node.js: ``` Pattern: rejectUnauthorized\s*:\s*false Pattern: NODE_TLS_REJECT_UNAUTHORIZED.*['"]0['"] ``` YAML/TOML/JSON config: ``` Pattern: (tls_verify|ssl_verify|verify_ssl|verify_tls)\s*[:=]\s*(false|no|0|off) ``` ### Extractor: jwt_config **What it finds:** JWT validation settings. **Claims extracted per finding:** | Leaf | Predicate | What it means | |------|-----------|---------------| | `audience_validation` | `enabled` | Whether `aud` claim is validated | | `expiry_validation` | `enabled` | Whether `exp` claim is validated | | `algorithm_restriction` | `config_value` | Allowed algorithms (or "none" if unrestricted) | | `signature_verification` | `enabled` | Whether signatures are verified | **Patterns (Rust, jsonwebtoken crate):** ``` // aud validation Pattern: set_audience.*\[\]|validate_aud.*false|aud.*None Leaf: audience_validation Value: Boolean(false) // Dangerous: algorithm none Pattern: Algorithm::None|alg.*none|allow_none.*true Leaf: algorithm_restriction Value: Text("none_allowed") // Signature skip Pattern: dangerous_insecure|skip_signature|verify.*false Leaf: signature_verification Value: Boolean(false) ``` **Patterns (Go, golang-jwt):** ``` Pattern: jwt\.Parse\(.*func\(.*\*jwt\.Token\).*\{[^}]*return.*signingKey (without any algorithm check in the callback) ``` This is a heuristic match (confidence < 1.0) — detecting missing validation is harder than detecting explicit disabling. ### Extractor: hardcoded_secrets **What it finds:** Credentials, API keys, tokens in source (not in .env or .gitignore'd files). **Patterns:** ``` // API keys Pattern: (api[_-]?key|apikey)\s*[:=]\s*["'][A-Za-z0-9_\-]{20,}["'] Leaf: api_key_storage Predicate: storage_method Value: Text("hardcoded") // Passwords Pattern: (password|passwd|pwd)\s*[:=]\s*["'][^"']+["'] (excluding: "password", "changeme", "placeholder", "CHANGE_ME", "xxx", test patterns) Leaf: password_storage Predicate: storage_method Value: Text("hardcoded") // AWS keys Pattern: (AKIA[0-9A-Z]{16}) Leaf: aws_credentials Predicate: storage_method Value: Text("hardcoded") // Private keys Pattern: -----BEGIN (RSA |EC |DSA )?PRIVATE KEY----- Leaf: private_key_storage Predicate: storage_method Value: Text("hardcoded_in_source") ``` **Exclusions:** Files matching `*test*`, `*example*`, `*fixture*`, `*mock*` are scanned but findings are marked with lower confidence (0.5). ### Extractor: timeout_config **What it finds:** HTTP client, database, and cache timeout values. **Patterns:** ``` // Zero/infinite timeout Pattern: timeout\s*[:=]\s*(0|None|null|nil|infinity|Inf) Leaf: {context}/timeout (context from surrounding code: http, db, redis, etc.) Predicate: config_value Value: Number(0.0) Description: "Timeout disabled (infinite wait)" // Unreasonably low timeout Pattern: timeout\s*[:=]\s*(\d+) where value_ms < config.min_reasonable_ms Leaf: {context}/timeout Value: Number(extracted_value) Description: "Timeout {value}ms below minimum reasonable {min}ms" // Unreasonably high timeout Pattern: timeout\s*[:=]\s*(\d+) where value_ms > config.max_reasonable_ms ``` **Unit detection:** Heuristic based on magnitude and surrounding context: - Value > 1000000 → likely nanoseconds - Value > 1000 and < 1000000 → likely milliseconds - Value < 100 → likely seconds - Presence of "ms", "sec", "Duration::from_secs" → explicit unit ### Extractor: dep_versions **What it finds:** Dependencies with known vulnerabilities. **Sources:** - `Cargo.toml` → check against RustSec Advisory DB - `go.mod` → check against Go Vulnerability Database - `package.json` → check against npm audit advisories - `requirements.txt` / `pyproject.toml` → check against PyPI advisory data **Output:** ``` Leaf: dep/{package_name}/version Predicate: installed_version Value: Text("1.0.2") Description: "openssl 1.0.2 has known vulnerability CVE-2024-XXXX" ``` The advisory databases are downloaded locally and refreshed periodically. Aphoria doesn't call external APIs during scan. ### Extractor: cors_config **What it finds:** Overly permissive CORS configuration. **Patterns:** ``` // Allow all origins Pattern: allow_origin\s*\(\s*["']\*["']\s*\)|Access-Control-Allow-Origin.*\*|AllowAllOrigins.*true Leaf: cors/allow_origin Predicate: config_value Value: Text("*") Description: "CORS allows all origins" // Allow credentials with wildcard Pattern: (allow_credentials|AllowCredentials).*true (in proximity to allow_origin *) Leaf: cors/credentials_with_wildcard Predicate: enabled Value: Boolean(true) Description: "CORS allows credentials with wildcard origin" ``` ### Extractor: rate_limit **What it finds:** Rate limiting disabled or set unreasonably high. **Patterns:** ``` // Rate limiting disabled Pattern: (rate_limit|ratelimit).*disabled|rate_limit\s*[:=]\s*(0|false|off|none) Leaf: rate_limit/enabled Predicate: enabled Value: Boolean(false) // Unreasonably high limit Pattern: (rate_limit|ratelimit|max_requests)\s*[:=]\s*(\d+) where value > 10000 per minute (configurable) Leaf: rate_limit/max_requests Predicate: config_value Value: Number(extracted_value) ``` --- ## Dynamic Application Policy (Phase 6) ### PolicyCorpusBuilder A corpus builder that ingests assertions from a local `aphoria-policy.yaml` file. This allows teams to define "Application Truth" that overrides RFCs or Vendor defaults. **File Format (`aphoria-policy.yaml`):** ```yaml rules: - path: "code://rust/my-app/db/pool_size" predicate: "config_value" value: 50 tier: "Regulatory" # Tier 0 (overrides everything) message: "Internal policy: max 50 conns to prevent storms." - path: "code://go/legacy-service/tls/version" predicate: "min_version" value: "1.2" tier: "Clinical" # Tier 1 message: "Legacy clients require TLS 1.2 support." ``` **Ingestion:** - Each rule becomes a Tier 0 or Tier 1 Assertion. - Source is set to `SourceClass::Regulatory` (for Tier 0) or `SourceClass::Clinical` (for Tier 1). - Conflict detection treats these as authoritative truths. **Enterprise Lens:** A specialized StemeDB Lens that resolves conflicts by prioritizing `Policy` assertions over `RFC` or `Vendor` assertions when they overlap on the same ConceptPath. --- ## Ingestion Bridge ### Claim → Assertion Mapping ```rust fn to_assertion( claim: &Observation, agent_keypair: &Ed25519Keypair, scan_timestamp: u64, ) -> Assertion { let source_metadata = serde_json::to_vec(&json!({ "file": claim.file, "line": claim.line, "matched_text": claim.matched_text, "extractor": claim.concept_path.leaf(), "scan_tool": "aphoria", "scan_version": env!("CARGO_PKG_VERSION"), })); let source_hash = blake3::hash( format!("{}:{}:{}", claim.file, claim.line, claim.matched_text).as_bytes() ); Assertion { subject: claim.concept_path.to_string(), // EntityId = String predicate: claim.predicate.clone(), object: claim.value.clone(), parent_hash: None, source_hash: *source_hash.as_bytes(), source_class: SourceClass::Expert, // code:// scheme default visual_hash: None, epoch: None, source_metadata: source_metadata.ok(), lifecycle: LifecycleStage::Approved, signatures: vec![sign(agent_keypair, &claim)], confidence: claim.confidence, timestamp: scan_timestamp, vector: None, } } ``` ### Idempotency Same code produces the same claims. Same claims produce the same assertion hashes (content-addressed). Re-scanning a project that hasn't changed ingests nothing new. This is guaranteed by BLAKE3 content addressing in the existing Episteme pipeline. When code changes between scans, new assertions are created. Old assertions remain (append-only). The `diff` command compares the current scan's assertions against the last scan's to show what changed. ### Scan Metadata Each scan is recorded as an assertion about itself: ``` Subject: aphoria://scan/{project_name}/{scan_id} Predicate: completed Object: Text(json!({ "project": "citadeldb", "files_scanned": 142, "claims_extracted": 23, "conflicts_found": 3, "blocks": 2, "flags": 1, "timestamp": 1706832000 })) ``` This enables `aphoria diff` — compare two scan records and their associated assertions. --- ## Conflict Detection ### Query Strategy After ingestion, for each extracted claim: ```rust async fn check_conflict( claim: &Observation, query_engine: &QueryEngine, ) -> Option { // 1. Query with Skeptic lens, resolving aliases let results = query_engine.query(Query { subject: Some(claim.concept_path.to_string()), predicate: Some(claim.predicate.clone()), lens: Some("skeptic".to_string()), resolve_aliases: true, source_class_decay: true, ..Default::default() }).await; // 2. Check if any authoritative source disagrees let code_value = &claim.value; let mut conflicts = Vec::new(); for assertion in &results.assertions { if assertion.source_class.tier() < 3 // Tier 0, 1, or 2 && assertion.object != *code_value // Different value { conflicts.push(ConflictingSource { path: assertion.subject.clone(), source_class: assertion.source_class, value: assertion.object.clone(), confidence: assertion.confidence, }); } } if conflicts.is_empty() { return None; } // 3. Compute conflict score // Higher when tier spread is larger and authoritative sources are confident let max_tier_weight = conflicts.iter() .map(|c| c.source_class.authority_weight()) .max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)) .unwrap_or(0.0); let code_weight = SourceClass::Expert.authority_weight(); // 0.5 let conflict_score = max_tier_weight * (1.0 - code_weight); // Tier 0 vs Tier 3: 1.0 * 0.5 = 0.50 (minimum, boosted below) // Boosted by confidence of the authoritative source let boosted_score = conflict_score * conflicts.iter().map(|c| c.confidence).max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)).unwrap_or(1.0); // Normalize: tier spread 0→3 maps to 0.4→0.95 let tier_spread = conflicts.iter() .map(|c| c.source_class.tier()) .min() .unwrap_or(3) as f32; let normalized = 0.4 + (3.0 - tier_spread) / 3.0 * 0.55; let final_score = normalized.max(boosted_score); Some(ConflictResult { claim: claim.clone(), conflicts, conflict_score: final_score, verdict: if final_score >= threshold_block { Verdict::Block } else if final_score >= threshold_flag { Verdict::Flag } else { Verdict::Pass }, }) } ``` ### Verdict Levels | Verdict | Condition | Meaning | |---------|-----------|---------| | BLOCK | `conflict_score >= 0.7` | Authoritative source strongly contradicts. Fix or explicitly acknowledge. | | FLAG | `conflict_score >= 0.4` | Potential disagreement. Review recommended. | | PASS | `conflict_score < 0.4` | No significant conflict (or no authoritative data). | | ACK | Any score, acknowledged | Conflict exists but has been explicitly accepted. | ### Acknowledged Conflicts When a conflict has been acknowledged (via `aphoria ack`), the acknowledgment assertion exists in Episteme. The conflict still has a score, but the report marks it as ACK instead of BLOCK/FLAG: ``` ACK code://rust/citadeldb/auth/jwt/audience_validation Your code: aud validation disabled (src/auth/jwt.rs:47) RFC 7519: aud validation MUST be enabled (Tier 0) Acknowledged: 2026-01-15 by jordan Reason: "Internal service, no external JWT consumers. SEC-2024-003." ``` The acknowledgment doesn't suppress the conflict. It adds context. A future `--strict` mode can treat acknowledged conflicts as blocks again (for audits). --- ## Report Formats ### SARIF (for CI) SARIF (Static Analysis Results Interchange Format) is the standard for CI security tools. GitHub, GitLab, and Azure DevOps all consume it. ```json { "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/main/sarif-2.1/schema/sarif-schema-2.1.0.json", "version": "2.1.0", "runs": [{ "tool": { "driver": { "name": "aphoria", "version": "0.1.0", "informationUri": "https://github.com/orchard9/aphoria" } }, "results": [{ "ruleId": "epistemic-drift/tls-verify", "level": "error", "message": { "text": "TLS certificate verification disabled. OWASP requires verification (Tier 1, conflict score 0.87)." }, "locations": [{ "physicalLocation": { "artifactLocation": { "uri": "src/net/client.rs" }, "region": { "startLine": 23 } } }] }] }] } ``` ### JSON (for programmatic consumption) ```json { "project": "citadeldb", "scan_id": "abc123", "timestamp": 1706832000, "summary": { "files_scanned": 142, "claims_extracted": 23, "conflicts": 3, "blocks": 2, "flags": 1 }, "conflicts": [ { "concept_path": "code://rust/citadeldb/auth/jwt/audience_validation", "predicate": "enabled", "code_value": false, "file": "src/auth/jwt.rs", "line": 47, "conflict_score": 0.92, "verdict": "BLOCK", "conflicting_sources": [ { "path": "rfc://7519/jwt/audience_validation", "source_class": "Regulatory", "value": true, "confidence": 1.0 } ], "acknowledged": null } ] } ``` --- ## Baseline and Diff ### Baseline `aphoria baseline` records the current scan as the baseline. Subsequent scans only report *new* conflicts. Implementation: store the baseline scan ID in `.aphoria/baseline` in the project root. The `diff` logic compares the current scan's conflict set against the baseline's. ``` .aphoria/ baseline # scan ID of the baseline config.toml # symlink or copy of aphoria.toml agent.key # Ed25519 keypair for this project's Aphoria agent ``` ### Diff `aphoria diff` shows: - New conflicts (in current scan but not baseline) - Resolved conflicts (in baseline but not current scan) - Changed conflicts (same concept, different score or verdict) ``` $ aphoria diff NEW code://rust/citadeldb/cache/redis/max_connections Your code: max_connections = 10000 (config/redis.yaml:5) Vendor: recommended max 128 per instance (Tier 2) Conflict: 0.48 — FLAG RESOLVED code://rust/citadeldb/net/tls/cert_verification Previously: verify = false → BLOCK Current: verify = true → PASS 1 new conflict, 1 resolved, 0 changed. ``` --- ## Agent Keypair Aphoria signs assertions with a per-project Ed25519 keypair stored in `.aphoria/agent.key`. Generated on first `aphoria scan` if it doesn't exist. The keypair identifies "Aphoria scanning project X" as a distinct agent in Episteme's trust system. Multiple projects have different keypairs. This enables: - Per-project audit trails ("which Aphoria agent found this?") - TrustRank per Aphoria instance (a well-calibrated Aphoria gains reputation) - Distinguishing human-authored assertions from Aphoria-extracted ones --- ## Episteme Instance ### Local Mode (Default) Aphoria ships with an embedded Episteme instance. No server needed. The database lives at `~/.aphoria/db/` (configurable). Multiple projects share the same local instance — their assertions are namespaced by ConceptPath (`code://rust/citadeldb/...` vs `code://go/other-project/...`). The authoritative corpus (RFCs, OWASP) is also in the local instance. `aphoria init` bootstraps it. ``` $ aphoria init Downloading RFC corpus (auth, crypto, TLS) ... 127 assertions ingested. Downloading OWASP cheat sheets ... 89 assertions ingested. Ready. Run `aphoria scan ` to begin. ``` ### Remote Mode (Future) ```toml [episteme] url = "https://episteme.example.com" api_key = "${APHORIA_API_KEY}" ``` In remote mode, Aphoria ingests into and queries from a shared Episteme instance. This enables: - Cross-project conflict detection ("same JWT misconfiguration in 12 repos") - Shared authoritative corpus (ingested once, used by all Aphoria agents) - Centralized acknowledgment management --- ## Exit Codes | Code | Meaning | |------|---------| | 0 | No conflicts above threshold | | 1 | FLAG-level conflicts found (with `--exit-code`) | | 2 | BLOCK-level conflicts found (with `--exit-code`) | | 3 | Scan error (file access, Episteme connection, etc.) | `--exit-code` enables non-zero exits. Without it, Aphoria always exits 0 (for interactive use where the report is the output, not the exit code). --- ## Performance Targets | Metric | Target | |--------|--------| | Scan time, 1000-file Rust project | < 5 seconds | | Scan time, 10000-file monorepo | < 30 seconds | | Per-file extraction (all extractors) | < 5ms | | Conflict query per claim | < 10ms | | Local Episteme startup | < 100ms | | Memory usage during scan | < 256MB | The performance bottleneck is I/O (reading files), not extraction (regex matching). The conflict query is a local KV lookup, not a network call. --- ## Dependencies | Dependency | Purpose | |------------|---------| | `clap` | CLI argument parsing | | `ignore` | File walking (respects .gitignore, fast) | | `regex` | Pattern matching in extractors | | `serde` + `serde_json` | Config parsing, JSON output | | `toml` | aphoria.toml parsing | | `comfy-table` | Terminal table output | | `stemedb-core` | Types | | `stemedb-storage` | Local KV store | | `stemedb-ingest` | Assertion ingestion | | `stemedb-query` | Conflict queries | | `ed25519-dalek` | Agent keypair + signing | | `blake3` | Content hashing | No LLM dependency. No network dependency (in local mode). No runtime other than tokio (for async KV store operations).