--- name: aphoria-dev description: Development guidelines for Aphoria - the code-level truth linter powered by Episteme --- # Aphoria Development Skill You are an expert Aphoria developer. Aphoria is a **code-level truth linter** that validates code against authoritative sources (RFCs, OWASP, vendor docs). Unlike traditional linters (syntax/style) or SAST tools (vulnerability patterns), Aphoria validates **intent against authority** using Episteme's probabilistic knowledge graph. ## Core Concept Aphoria extracts **implicit claims** from code and configs, then checks them against **tiered authoritative sources**: | Tier | Source | Example | |------|--------|---------| | 0 | Regulatory | RFC 7519: "JWT audience validation is mandatory" | | 1 | Clinical | OWASP: "TLS certificate verification required" | | 2 | Observational | Vendor docs: "Redis timeout should be > 0" | | 3 | Expert | Team policy: "Our pool size is 50" | | 4 | Community | Prior observations from this codebase | **Example conflict:** ``` code://rust/myapp/auth/jwt/audience_validation = false → Conflicts with rfc://7519/auth/jwt/audience_validation = true (Tier 0, confidence 1.0) → Verdict: BLOCK ``` ## Principles ### 1. Claims Over Facts Aphoria stores **claims** (assertions with provenance, confidence, timestamps), not absolute facts. Conflicts are normal and resolved via Lenses at query time. ### 2. Tiered Authority Lower tier = higher authority. Tier 0 (RFC) outranks Tier 3 (team policy). Conflict scores weight by tier. ### 3. Leaf-Path Matching Cross-scheme matching uses last 2 path segments: - `code://rust/myapp/tls/cert_verification` matches - `rfc://5246/tls/cert_verification` ### 4. Ephemeral by Default Fast (~0.25s) in-memory scans for CI/pre-commit. Use `--persist` only when drift detection or observation write-back is needed. ### 5. Non-Blocking Workflow Conflicts don't fail unless `--exit-code` is passed. Let developers acknowledge known conflicts with `aphoria ack`. ## Architecture ``` ┌─────────────────────────────────────────────┐ │ Aphoria CLI Pipeline │ ├─────────────────────────────────────────────┤ │ 1. WALK → Traverse project (respects │ │ .gitignore) │ │ 2. EXTRACT → Pattern-based claim │ │ extraction (12 extractors) │ │ 3. INGEST → Convert to Episteme │ │ assertions (BLAKE3+Ed25519) │ │ 4. CONFLICT → Query for authority matches │ │ (ConceptIndex + Leaf path) │ │ 5. REPORT → Output in multiple formats │ │ 6. SYNC → (Optional) Write-back │ │ observations to local store │ └─────────────────────────────────────────────┘ ``` ## Key Modules | Module | Purpose | Key File | |--------|---------|----------| | `scan.rs` | Main entry; mode dispatch | Core orchestrator | | `walker/` | Project traversal | Respects .gitignore | | `extractors/` | 12 pattern-based extractors | Regex, not AST | | `episteme/` | LocalEpisteme + EphemeralDetector | Conflict detection | | `bridge.rs` | ExtractedClaim → Assertion | BLAKE3 + Ed25519 | | `report/` | Table, JSON, SARIF, Markdown | Output formatting | | `policy_ops.rs` | Bless, ack, export/import | Trust Pack workflow | | `types/` | ScanArgs, ConflictResult, Verdict | Domain types | | `config.rs` | aphoria.toml parsing | Configuration | ## Key Types ```rust // From code/config pub struct ExtractedClaim { pub concept_path: String, // e.g., "code://rust/myapp/auth/jwt/aud_validation" pub predicate: String, // e.g., "enabled" pub value: ObjectValue, // true/false/number/text pub file: String, // relative path pub line: usize, // 1-indexed pub confidence: f32, // 0.0-1.0 } // Conflict detection result pub struct ConflictResult { pub claim: ExtractedClaim, pub conflicts: Vec, pub conflict_score: f32, // 0.0-1.0 pub verdict: Verdict, // Block/Flag/Pass/Ack/Drift } // Verdict determination pub enum Verdict { Block, // score >= 0.7 (configurable) Flag, // score >= 0.5 (configurable) Pass, // below thresholds Ack, // acknowledged by user Drift, // changed from prior observation } // Scan modes pub enum ScanMode { Ephemeral, // Fast, in-memory (~0.25s) Persistent, // Full Episteme stack (~1-2s) } // File sources pub enum FileSource { All, // Entire project Staged, // Git-staged files only } ``` ## Step Back: Before Implementing Before writing code, challenge your assumptions: ### 1. Is This Claim Extraction or Detection? > "Am I adding a new extractor (claim extraction) or improving conflict detection?" - Extractors live in `src/extractors/` and implement the `Extractor` trait - Detection logic lives in `src/episteme/` and uses ConceptIndex - Don't mix concerns ### 2. Does This Need Persistence? > "Does this feature require WAL/KV store, or can it work ephemerally?" - Prefer ephemeral for speed - Use persistence only for: drift detection, observation write-back, baseline tracking - `--sync` requires `--persist` ### 3. What's the Authority Tier? > "What tier is this authoritative source?" - Tier 0-2 come from corpus builders (RFC, OWASP, Vendor) - Tier 3 is team policy (bless/ack commands) - Tier 4 is observational (auto-generated from code with no conflicts) ### 4. Will This Break Fast Scans? > "Does this change affect ephemeral scan performance (~0.25s target)?" - Avoid disk I/O in ephemeral mode - Don't load full Episteme stack unless `--persist` - Profile before/after **After step back:** If unsure, trace through `scan.rs` to see where your change fits. ## Do 1. **Use the correct scan mode.** Ephemeral for CI/pre-commit, Persistent for drift/sync. 2. **Implement new extractors with regex.** Not AST parsing. Keep them simple and fast. 3. **Return empty vec from extractors on no match.** Never panic or error for missing patterns. 4. **Use structured concept paths.** Format: `scheme://source/path/to/concept` 5. **Add tests for new extractors.** In `src/tests/` with `tempfile::TempDir` for isolation. 6. **Update `roadmap.md` when completing phases.** Keep status accurate. 7. **Use `#[instrument]` on critical path functions.** Walker, extractors, episteme, report. 8. **Log with `tracing` macros.** `info!`, `warn!`, `error!` — not `println!`. 9. **Validate `--sync` requires `--persist`.** This is enforced in `handlers.rs`. 10. **Support multiple report formats.** Table (default), JSON, SARIF 2.1.0, Markdown. ## Do Not 1. **Use `unwrap()` or `expect()` in production code.** Clippy denies these. 2. **Add disk I/O to ephemeral mode.** It must stay fast (~0.25s). 3. **Mix claim extraction with conflict detection.** Separate concerns. 4. **Hardcode concept paths.** Build them programmatically from file context. 5. **Skip the confidence field.** Every claim needs a confidence score (0.0-1.0). 6. **Forget the source file and line.** Extractors must track provenance. 7. **Use `println!` in library code.** Only allowed in CLI binaries (main.rs, handlers.rs). 8. **Ignore SARIF format requirements.** Security tools expect SARIF 2.1.0 compliance. 9. **Break leaf-path matching.** Cross-scheme matching depends on consistent path structure. 10. **Commit without running `cargo clippy --workspace -- -D warnings`.** CI will fail. ## Decision Points ### Adding a New Extractor Stop. Questions: - What languages does this pattern appear in? - What's the concept path scheme? (`code://lang/project/category/concept`) - What authoritative source defines the expected value? - What regex reliably detects this pattern without false positives? ### Modifying Conflict Detection Stop. Questions: - Does this change affect ephemeral mode? - Does it require new indexes in LocalEpisteme? - How does it interact with existing leaf-path matching? - What's the performance impact? ### Adding CLI Commands Stop. Questions: - Does this command need persistence? - What's the exit code contract? - Does it need validation (like `--sync` requires `--persist`)? - What report format should it output? ## Constraints **NEVER:** - Use `unwrap()` or `expect()` in production code - Add disk I/O to ephemeral scan mode - Break the 0.25s target for ephemeral scans - Mutate existing Episteme assertions (append-only) - Skip Ed25519 signing when creating assertions **ALWAYS:** - Run `cargo clippy --workspace -- -D warnings` before commit - Add tests for new functionality - Update roadmap.md for completed phases - Use `#[instrument]` on public methods in critical paths - Respect .gitignore in walker traversal ## Testing Commands ```bash # Full test suite cargo test -p aphoria --workspace # Specific test cargo test -p aphoria test_ephemeral_scan # Lint check cargo clippy -p aphoria -- -D warnings # Format check cargo fmt -p aphoria --check # Quick ephemeral scan (should be ~0.25s) cargo run -p aphoria -- scan . # Staged files only (pre-commit mode) cargo run -p aphoria -- scan --staged --exit-code # Persistent with sync cargo run -p aphoria -- scan . --persist --sync # Export report cargo run -p aphoria -- scan . --format sarif > report.sarif.json ``` ## Common Workflows ### Adding a New Extractor 1. Create `src/extractors/{name}.rs` 2. Implement `Extractor` trait (name, languages, extract) 3. Register in `src/extractors/mod.rs` 4. Add tests in `src/tests/` 5. Update roadmap.md if this completes a phase ### Debugging Conflict Detection 1. Run with `RUST_LOG=aphoria=debug` 2. Check concept path format (must use leaf-path matching) 3. Verify authoritative source exists in corpus 4. Check confidence and tier of both claims 5. Inspect `ConflictTrace` if available ### Pre-Commit Integration ```bash #!/bin/sh # .git/hooks/pre-commit aphoria scan --staged --exit-code ``` Exit codes: 0 (pass), 1 (flag/drift), 2 (block) ## Output Format When implementing features or fixing bugs, provide: ``` ## Summary [One-line description] ## Changes - [File]: [What changed] ## Testing - [How to verify] ## Roadmap Impact - [Phase affected, if any] ``` ## Phase Status Reference | Phase | Status | Next | |-------|--------|------| | 0-3 | Complete | - | | 4.5 | Complete | Ephemeral mode | | 4A | Complete | Observation write-back | | 4B | Complete | Drift detection | | 4C | Complete | Staged scanning | | **4D** | **Next** | Enhanced ack | | 4E | Planned | Community contribution | | 5 | Complete | Research agent loop | | 6 | Complete | Trust Packs | | 7 | Planned | Declarative extractors |