stemedb/.claude/skills/aphoria-dev/SKILL.md

---
name: aphoria-dev
description: Development guidelines for Aphoria - the code-level truth linter powered by Episteme
---

# Aphoria Development Skill

You are an expert Aphoria developer. Aphoria is a **code-level truth linter** that validates code against authoritative sources (RFCs, OWASP, vendor docs). Unlike traditional linters (syntax/style) or SAST tools (vulnerability patterns), Aphoria validates **intent against authority** using Episteme's probabilistic knowledge graph.

## Core Concept

Aphoria extracts **implicit claims** from code and configs, then checks them against **tiered authoritative sources**:

| Tier | Source | Example |
|------|--------|---------|
| 0 | Regulatory | RFC 7519: "JWT audience validation is mandatory" |
| 1 | Clinical | OWASP: "TLS certificate verification required" |
| 2 | Observational | Vendor docs: "Redis timeout should be > 0" |
| 3 | Expert | Team policy: "Our pool size is 50" |
| 4 | Community | Prior observations from this codebase |

**Example conflict:**
```
code://rust/myapp/auth/jwt/audience_validation = false
→ Conflicts with rfc://7519/auth/jwt/audience_validation = true (Tier 0, confidence 1.0)
→ Verdict: BLOCK
```

## Principles

### 1. Claims Over Facts
Aphoria stores **claims** (assertions with provenance, confidence, timestamps), not absolute facts. Conflicts are normal and resolved via Lenses at query time.

### 2. Tiered Authority
Lower tier = higher authority. Tier 0 (RFC) outranks Tier 3 (team policy). Conflict scores weight by tier.

### 3. Leaf-Path Matching
Cross-scheme matching uses last 2 path segments:
- `code://rust/myapp/tls/cert_verification` matches
- `rfc://5246/tls/cert_verification`

### 4. Ephemeral by Default
Fast (~0.25s) in-memory scans for CI/pre-commit. Use `--persist` only when drift detection or observation write-back is needed.

### 5. Non-Blocking Workflow
Conflicts don't fail unless `--exit-code` is passed. Let developers acknowledge known conflicts with `aphoria ack`.

## Architecture

```
┌─────────────────────────────────────────────┐
│           Aphoria CLI Pipeline              │
├─────────────────────────────────────────────┤
│  1. WALK      → Traverse project (respects  │
│                 .gitignore)                 │
│  2. EXTRACT   → Pattern-based claim         │
│                 extraction (12 extractors)  │
│  3. INGEST    → Convert to Episteme         │
│                 assertions (BLAKE3+Ed25519) │
│  4. CONFLICT  → Query for authority matches │
│                 (ConceptIndex + Leaf path)  │
│  5. REPORT    → Output in multiple formats  │
│  6. SYNC      → (Optional) Write-back       │
│                 observations to local store │
└─────────────────────────────────────────────┘
```

## Key Modules

| Module | Purpose | Key File |
|--------|---------|----------|
| `scan.rs` | Main entry; mode dispatch | Core orchestrator |
| `walker/` | Project traversal | Respects .gitignore |
| `extractors/` | 12 pattern-based extractors | Regex, not AST |
| `episteme/` | LocalEpisteme + EphemeralDetector | Conflict detection |
| `bridge.rs` | ExtractedClaim → Assertion | BLAKE3 + Ed25519 |
| `report/` | Table, JSON, SARIF, Markdown | Output formatting |
| `policy_ops.rs` | Bless, ack, export/import | Trust Pack workflow |
| `types/` | ScanArgs, ConflictResult, Verdict | Domain types |
| `config.rs` | aphoria.toml parsing | Configuration |

## Key Types

```rust
// From code/config
pub struct ExtractedClaim {
    pub concept_path: String,      // e.g., "code://rust/myapp/auth/jwt/aud_validation"
    pub predicate: String,         // e.g., "enabled"
    pub value: ObjectValue,        // true/false/number/text
    pub file: String,              // relative path
    pub line: usize,               // 1-indexed
    pub confidence: f32,           // 0.0-1.0
}

// Conflict detection result
pub struct ConflictResult {
    pub claim: ExtractedClaim,
    pub conflicts: Vec<ConflictingSource>,
    pub conflict_score: f32,       // 0.0-1.0
    pub verdict: Verdict,          // Block/Flag/Pass/Ack/Drift
}

// Verdict determination
pub enum Verdict {
    Block,  // score >= 0.7 (configurable)
    Flag,   // score >= 0.5 (configurable)
    Pass,   // below thresholds
    Ack,    // acknowledged by user
    Drift,  // changed from prior observation
}

// Scan modes
pub enum ScanMode {
    Ephemeral,   // Fast, in-memory (~0.25s)
    Persistent,  // Full Episteme stack (~1-2s)
}

// File sources
pub enum FileSource {
    All,     // Entire project
    Staged,  // Git-staged files only
}
```

## Step Back: Before Implementing

Before writing code, challenge your assumptions:

### 1. Is This Claim Extraction or Detection?
> "Am I adding a new extractor (claim extraction) or improving conflict detection?"

- Extractors live in `src/extractors/` and implement the `Extractor` trait
- Detection logic lives in `src/episteme/` and uses ConceptIndex
- Don't mix concerns

### 2. Does This Need Persistence?
> "Does this feature require WAL/KV store, or can it work ephemerally?"

- Prefer ephemeral for speed
- Use persistence only for: drift detection, observation write-back, baseline tracking
- `--sync` requires `--persist`

### 3. What's the Authority Tier?
> "What tier is this authoritative source?"

- Tier 0-2 come from corpus builders (RFC, OWASP, Vendor)
- Tier 3 is team policy (bless/ack commands)
- Tier 4 is observational (auto-generated from code with no conflicts)

### 4. Will This Break Fast Scans?
> "Does this change affect ephemeral scan performance (~0.25s target)?"

- Avoid disk I/O in ephemeral mode
- Don't load full Episteme stack unless `--persist`
- Profile before/after

**After step back:** If unsure, trace through `scan.rs` to see where your change fits.

## Do

1. **Use the correct scan mode.** Ephemeral for CI/pre-commit, Persistent for drift/sync.
2. **Implement new extractors with regex.** Not AST parsing. Keep them simple and fast.
3. **Return empty vec from extractors on no match.** Never panic or error for missing patterns.
4. **Use structured concept paths.** Format: `scheme://source/path/to/concept`
5. **Add tests for new extractors.** In `src/tests/` with `tempfile::TempDir` for isolation.
6. **Update `roadmap.md` when completing phases.** Keep status accurate.
7. **Use `#[instrument]` on critical path functions.** Walker, extractors, episteme, report.
8. **Log with `tracing` macros.** `info!`, `warn!`, `error!` — not `println!`.
9. **Validate `--sync` requires `--persist`.** This is enforced in `handlers.rs`.
10. **Support multiple report formats.** Table (default), JSON, SARIF 2.1.0, Markdown.

## Do Not

1. **Use `unwrap()` or `expect()` in production code.** Clippy denies these.
2. **Add disk I/O to ephemeral mode.** It must stay fast (~0.25s).
3. **Mix claim extraction with conflict detection.** Separate concerns.
4. **Hardcode concept paths.** Build them programmatically from file context.
5. **Skip the confidence field.** Every claim needs a confidence score (0.0-1.0).
6. **Forget the source file and line.** Extractors must track provenance.
7. **Use `println!` in library code.** Only allowed in CLI binaries (main.rs, handlers.rs).
8. **Ignore SARIF format requirements.** Security tools expect SARIF 2.1.0 compliance.
9. **Break leaf-path matching.** Cross-scheme matching depends on consistent path structure.
10. **Commit without running `cargo clippy --workspace -- -D warnings`.** CI will fail.
11. **Write inline timestamp code.** Use `crate::current_timestamp()` or `crate::current_timestamp_millis()` — never inline `SystemTime::now()` or `Utc::now().timestamp()`. Canonical implementation is in `episteme/corpus.rs`.
12. **Use generic `.map_err(|e| AphoriaError::X(e.to_string()))`.** Always include operation context in error messages. Use `format!("Failed to X at Y: {e}")` pattern instead.

## Decision Points

### Adding a New Extractor

Stop. Questions:
- What languages does this pattern appear in?
- What's the concept path scheme? (`code://lang/project/category/concept`)
- What authoritative source defines the expected value?
- What regex reliably detects this pattern without false positives?

### Modifying Conflict Detection

Stop. Questions:
- Does this change affect ephemeral mode?
- Does it require new indexes in LocalEpisteme?
- How does it interact with existing leaf-path matching?
- What's the performance impact?

### Adding CLI Commands

Stop. Questions:
- Does this command need persistence?
- What's the exit code contract?
- Does it need validation (like `--sync` requires `--persist`)?
- What report format should it output?

## Constraints

**NEVER:**
- Use `unwrap()` or `expect()` in production code
- Add disk I/O to ephemeral scan mode
- Break the 0.25s target for ephemeral scans
- Mutate existing Episteme assertions (append-only)
- Skip Ed25519 signing when creating assertions
- Write inline timestamp code (use `current_timestamp()` from crate root)

**ALWAYS:**
- Run `cargo clippy --workspace -- -D warnings` before commit
- Add tests for new functionality
- Update roadmap.md for completed phases
- Use `#[instrument]` on public methods in critical paths
- Respect .gitignore in walker traversal
- Use `crate::current_timestamp()` for Unix timestamps in seconds
- Use `crate::current_timestamp_millis()` for millisecond precision
- Use context-aware error mapping: `.map_err(|e| AphoriaError::X(format!("Failed to Y: {e}")))`

## Testing Commands

```bash
# Full test suite
cargo test -p aphoria --workspace

# Specific test
cargo test -p aphoria test_ephemeral_scan

# Lint check
cargo clippy -p aphoria -- -D warnings

# Format check
cargo fmt -p aphoria --check

# Quick ephemeral scan (should be ~0.25s)
cargo run -p aphoria -- scan .

# Staged files only (pre-commit mode)
cargo run -p aphoria -- scan --staged --exit-code

# Persistent with sync
cargo run -p aphoria -- scan . --persist --sync

# Export report
cargo run -p aphoria -- scan . --format sarif > report.sarif.json
```

## Common Workflows

### Adding a New Extractor

1. Create `src/extractors/{name}.rs`
2. Implement `Extractor` trait (name, languages, extract)
3. Register in `src/extractors/mod.rs`
4. Add tests in `src/tests/`
5. Update roadmap.md if this completes a phase

### Debugging Conflict Detection

1. Run with `RUST_LOG=aphoria=debug`
2. Check concept path format (must use leaf-path matching)
3. Verify authoritative source exists in corpus
4. Check confidence and tier of both claims
5. Inspect `ConflictTrace` if available

### Pre-Commit Integration

```bash
#!/bin/sh
# .git/hooks/pre-commit
aphoria scan --staged --exit-code
```

Exit codes: 0 (pass), 1 (flag/drift), 2 (block)

## Output Format

When implementing features or fixing bugs, provide:

```
## Summary
[One-line description]

## Changes
- [File]: [What changed]

## Testing
- [How to verify]

## Roadmap Impact
- [Phase affected, if any]
```

## Phase Status Reference

| Phase | Status | Next |
|-------|--------|------|
| 0-3 | Complete | - |
| 4.5 | Complete | Ephemeral mode |
| 4A | Complete | Observation write-back |
| 4B | Complete | Drift detection |
| 4C | Complete | Staged scanning |
| 4D | Planned | Enhanced ack |
| 4E | Planned | Community contribution |
| 5 | Complete | Research agent loop |
| 6 | Complete | Trust Packs |
| 7 | Planned | Declarative extractors |
| A1 | Complete | Observations vs Claims type system |
| A2 | Complete | Claim authoring workflow + CLI |
| A3 | Complete | Verification engine + verify command |
| A4 | Complete | Corpus as assertions + authority lens |
| A5.1 | Complete | Coverage metrics (coverage.rs) |
| A5.2 | Complete | Docs generation (explain.rs + claims_explain) |
| **A5.3** | **Next** | Claim suggester skill (aphoria-suggest) |
| A5.4 | Complete | Onboarding mode (aphoria explain) |

## Related Skills

| Skill | Purpose |
|-------|---------|
| `aphoria-claims` | Review diffs for claimable changes (reactive) |
| `aphoria-suggest` | Suggest claims from patterns + gaps (proactive) |
| `aphoria-self-review` | Evaluate scan quality and noise |
| `aphoria-llm-optimization` | Optimize LLM extraction quality |
| `extract-claims` | Extract claims from prose text |
| `aphoria-install` | Install Aphoria for local dev |