Complete Aphoria claims system overhaul: - A1: Rename ExtractedClaim to Observation (extractors produce observations, not claims) - A2: Add AuthoredClaim with full provenance, invariants, and authority tiers - A3: Verify engine comparing observations against authored claims, CLI + formatters - A4: Corpus as first-class assertions with predicate indexing, authority lens, trust packs - A5: Coverage analysis, explain/docs generation, self-audit extractor, claim suggester skill Also includes: 42 extractors updated for Observation type, verifiable_predicates trait, conflict detection with comparison modes, claims TOML persistence, Grafana dashboard, backup/restore scripts, and comprehensive test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
340 lines
12 KiB
Markdown
340 lines
12 KiB
Markdown
---
|
|
name: aphoria-dev
|
|
description: Development guidelines for Aphoria - the code-level truth linter powered by Episteme
|
|
---
|
|
|
|
# Aphoria Development Skill
|
|
|
|
You are an expert Aphoria developer. Aphoria is a **code-level truth linter** that validates code against authoritative sources (RFCs, OWASP, vendor docs). Unlike traditional linters (syntax/style) or SAST tools (vulnerability patterns), Aphoria validates **intent against authority** using Episteme's probabilistic knowledge graph.
|
|
|
|
## Core Concept
|
|
|
|
Aphoria extracts **implicit claims** from code and configs, then checks them against **tiered authoritative sources**:
|
|
|
|
| Tier | Source | Example |
|
|
|------|--------|---------|
|
|
| 0 | Regulatory | RFC 7519: "JWT audience validation is mandatory" |
|
|
| 1 | Clinical | OWASP: "TLS certificate verification required" |
|
|
| 2 | Observational | Vendor docs: "Redis timeout should be > 0" |
|
|
| 3 | Expert | Team policy: "Our pool size is 50" |
|
|
| 4 | Community | Prior observations from this codebase |
|
|
|
|
**Example conflict:**
|
|
```
|
|
code://rust/myapp/auth/jwt/audience_validation = false
|
|
→ Conflicts with rfc://7519/auth/jwt/audience_validation = true (Tier 0, confidence 1.0)
|
|
→ Verdict: BLOCK
|
|
```
|
|
|
|
## Principles
|
|
|
|
### 1. Claims Over Facts
|
|
Aphoria stores **claims** (assertions with provenance, confidence, timestamps), not absolute facts. Conflicts are normal and resolved via Lenses at query time.
|
|
|
|
### 2. Tiered Authority
|
|
Lower tier = higher authority. Tier 0 (RFC) outranks Tier 3 (team policy). Conflict scores weight by tier.
|
|
|
|
### 3. Leaf-Path Matching
|
|
Cross-scheme matching uses last 2 path segments:
|
|
- `code://rust/myapp/tls/cert_verification` matches
|
|
- `rfc://5246/tls/cert_verification`
|
|
|
|
### 4. Ephemeral by Default
|
|
Fast (~0.25s) in-memory scans for CI/pre-commit. Use `--persist` only when drift detection or observation write-back is needed.
|
|
|
|
### 5. Non-Blocking Workflow
|
|
Conflicts don't fail unless `--exit-code` is passed. Let developers acknowledge known conflicts with `aphoria ack`.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────┐
|
|
│ Aphoria CLI Pipeline │
|
|
├─────────────────────────────────────────────┤
|
|
│ 1. WALK → Traverse project (respects │
|
|
│ .gitignore) │
|
|
│ 2. EXTRACT → Pattern-based claim │
|
|
│ extraction (12 extractors) │
|
|
│ 3. INGEST → Convert to Episteme │
|
|
│ assertions (BLAKE3+Ed25519) │
|
|
│ 4. CONFLICT → Query for authority matches │
|
|
│ (ConceptIndex + Leaf path) │
|
|
│ 5. REPORT → Output in multiple formats │
|
|
│ 6. SYNC → (Optional) Write-back │
|
|
│ observations to local store │
|
|
└─────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Key Modules
|
|
|
|
| Module | Purpose | Key File |
|
|
|--------|---------|----------|
|
|
| `scan.rs` | Main entry; mode dispatch | Core orchestrator |
|
|
| `walker/` | Project traversal | Respects .gitignore |
|
|
| `extractors/` | 12 pattern-based extractors | Regex, not AST |
|
|
| `episteme/` | LocalEpisteme + EphemeralDetector | Conflict detection |
|
|
| `bridge.rs` | ExtractedClaim → Assertion | BLAKE3 + Ed25519 |
|
|
| `report/` | Table, JSON, SARIF, Markdown | Output formatting |
|
|
| `policy_ops.rs` | Bless, ack, export/import | Trust Pack workflow |
|
|
| `types/` | ScanArgs, ConflictResult, Verdict | Domain types |
|
|
| `config.rs` | aphoria.toml parsing | Configuration |
|
|
|
|
## Key Types
|
|
|
|
```rust
|
|
// From code/config
|
|
pub struct ExtractedClaim {
|
|
pub concept_path: String, // e.g., "code://rust/myapp/auth/jwt/aud_validation"
|
|
pub predicate: String, // e.g., "enabled"
|
|
pub value: ObjectValue, // true/false/number/text
|
|
pub file: String, // relative path
|
|
pub line: usize, // 1-indexed
|
|
pub confidence: f32, // 0.0-1.0
|
|
}
|
|
|
|
// Conflict detection result
|
|
pub struct ConflictResult {
|
|
pub claim: ExtractedClaim,
|
|
pub conflicts: Vec<ConflictingSource>,
|
|
pub conflict_score: f32, // 0.0-1.0
|
|
pub verdict: Verdict, // Block/Flag/Pass/Ack/Drift
|
|
}
|
|
|
|
// Verdict determination
|
|
pub enum Verdict {
|
|
Block, // score >= 0.7 (configurable)
|
|
Flag, // score >= 0.5 (configurable)
|
|
Pass, // below thresholds
|
|
Ack, // acknowledged by user
|
|
Drift, // changed from prior observation
|
|
}
|
|
|
|
// Scan modes
|
|
pub enum ScanMode {
|
|
Ephemeral, // Fast, in-memory (~0.25s)
|
|
Persistent, // Full Episteme stack (~1-2s)
|
|
}
|
|
|
|
// File sources
|
|
pub enum FileSource {
|
|
All, // Entire project
|
|
Staged, // Git-staged files only
|
|
}
|
|
```
|
|
|
|
## Step Back: Before Implementing
|
|
|
|
Before writing code, challenge your assumptions:
|
|
|
|
### 1. Is This Claim Extraction or Detection?
|
|
> "Am I adding a new extractor (claim extraction) or improving conflict detection?"
|
|
|
|
- Extractors live in `src/extractors/` and implement the `Extractor` trait
|
|
- Detection logic lives in `src/episteme/` and uses ConceptIndex
|
|
- Don't mix concerns
|
|
|
|
### 2. Does This Need Persistence?
|
|
> "Does this feature require WAL/KV store, or can it work ephemerally?"
|
|
|
|
- Prefer ephemeral for speed
|
|
- Use persistence only for: drift detection, observation write-back, baseline tracking
|
|
- `--sync` requires `--persist`
|
|
|
|
### 3. What's the Authority Tier?
|
|
> "What tier is this authoritative source?"
|
|
|
|
- Tier 0-2 come from corpus builders (RFC, OWASP, Vendor)
|
|
- Tier 3 is team policy (bless/ack commands)
|
|
- Tier 4 is observational (auto-generated from code with no conflicts)
|
|
|
|
### 4. Will This Break Fast Scans?
|
|
> "Does this change affect ephemeral scan performance (~0.25s target)?"
|
|
|
|
- Avoid disk I/O in ephemeral mode
|
|
- Don't load full Episteme stack unless `--persist`
|
|
- Profile before/after
|
|
|
|
**After step back:** If unsure, trace through `scan.rs` to see where your change fits.
|
|
|
|
## Do
|
|
|
|
1. **Use the correct scan mode.** Ephemeral for CI/pre-commit, Persistent for drift/sync.
|
|
2. **Implement new extractors with regex.** Not AST parsing. Keep them simple and fast.
|
|
3. **Return empty vec from extractors on no match.** Never panic or error for missing patterns.
|
|
4. **Use structured concept paths.** Format: `scheme://source/path/to/concept`
|
|
5. **Add tests for new extractors.** In `src/tests/` with `tempfile::TempDir` for isolation.
|
|
6. **Update `roadmap.md` when completing phases.** Keep status accurate.
|
|
7. **Use `#[instrument]` on critical path functions.** Walker, extractors, episteme, report.
|
|
8. **Log with `tracing` macros.** `info!`, `warn!`, `error!` — not `println!`.
|
|
9. **Validate `--sync` requires `--persist`.** This is enforced in `handlers.rs`.
|
|
10. **Support multiple report formats.** Table (default), JSON, SARIF 2.1.0, Markdown.
|
|
|
|
## Do Not
|
|
|
|
1. **Use `unwrap()` or `expect()` in production code.** Clippy denies these.
|
|
2. **Add disk I/O to ephemeral mode.** It must stay fast (~0.25s).
|
|
3. **Mix claim extraction with conflict detection.** Separate concerns.
|
|
4. **Hardcode concept paths.** Build them programmatically from file context.
|
|
5. **Skip the confidence field.** Every claim needs a confidence score (0.0-1.0).
|
|
6. **Forget the source file and line.** Extractors must track provenance.
|
|
7. **Use `println!` in library code.** Only allowed in CLI binaries (main.rs, handlers.rs).
|
|
8. **Ignore SARIF format requirements.** Security tools expect SARIF 2.1.0 compliance.
|
|
9. **Break leaf-path matching.** Cross-scheme matching depends on consistent path structure.
|
|
10. **Commit without running `cargo clippy --workspace -- -D warnings`.** CI will fail.
|
|
11. **Write inline timestamp code.** Use `crate::current_timestamp()` or `crate::current_timestamp_millis()` — never inline `SystemTime::now()` or `Utc::now().timestamp()`. Canonical implementation is in `episteme/corpus.rs`.
|
|
12. **Use generic `.map_err(|e| AphoriaError::X(e.to_string()))`.** Always include operation context in error messages. Use `format!("Failed to X at Y: {e}")` pattern instead.
|
|
|
|
## Decision Points
|
|
|
|
### Adding a New Extractor
|
|
|
|
Stop. Questions:
|
|
- What languages does this pattern appear in?
|
|
- What's the concept path scheme? (`code://lang/project/category/concept`)
|
|
- What authoritative source defines the expected value?
|
|
- What regex reliably detects this pattern without false positives?
|
|
|
|
### Modifying Conflict Detection
|
|
|
|
Stop. Questions:
|
|
- Does this change affect ephemeral mode?
|
|
- Does it require new indexes in LocalEpisteme?
|
|
- How does it interact with existing leaf-path matching?
|
|
- What's the performance impact?
|
|
|
|
### Adding CLI Commands
|
|
|
|
Stop. Questions:
|
|
- Does this command need persistence?
|
|
- What's the exit code contract?
|
|
- Does it need validation (like `--sync` requires `--persist`)?
|
|
- What report format should it output?
|
|
|
|
## Constraints
|
|
|
|
**NEVER:**
|
|
- Use `unwrap()` or `expect()` in production code
|
|
- Add disk I/O to ephemeral scan mode
|
|
- Break the 0.25s target for ephemeral scans
|
|
- Mutate existing Episteme assertions (append-only)
|
|
- Skip Ed25519 signing when creating assertions
|
|
- Write inline timestamp code (use `current_timestamp()` from crate root)
|
|
|
|
**ALWAYS:**
|
|
- Run `cargo clippy --workspace -- -D warnings` before commit
|
|
- Add tests for new functionality
|
|
- Update roadmap.md for completed phases
|
|
- Use `#[instrument]` on public methods in critical paths
|
|
- Respect .gitignore in walker traversal
|
|
- Use `crate::current_timestamp()` for Unix timestamps in seconds
|
|
- Use `crate::current_timestamp_millis()` for millisecond precision
|
|
- Use context-aware error mapping: `.map_err(|e| AphoriaError::X(format!("Failed to Y: {e}")))`
|
|
|
|
## Testing Commands
|
|
|
|
```bash
|
|
# Full test suite
|
|
cargo test -p aphoria --workspace
|
|
|
|
# Specific test
|
|
cargo test -p aphoria test_ephemeral_scan
|
|
|
|
# Lint check
|
|
cargo clippy -p aphoria -- -D warnings
|
|
|
|
# Format check
|
|
cargo fmt -p aphoria --check
|
|
|
|
# Quick ephemeral scan (should be ~0.25s)
|
|
cargo run -p aphoria -- scan .
|
|
|
|
# Staged files only (pre-commit mode)
|
|
cargo run -p aphoria -- scan --staged --exit-code
|
|
|
|
# Persistent with sync
|
|
cargo run -p aphoria -- scan . --persist --sync
|
|
|
|
# Export report
|
|
cargo run -p aphoria -- scan . --format sarif > report.sarif.json
|
|
```
|
|
|
|
## Common Workflows
|
|
|
|
### Adding a New Extractor
|
|
|
|
1. Create `src/extractors/{name}.rs`
|
|
2. Implement `Extractor` trait (name, languages, extract)
|
|
3. Register in `src/extractors/mod.rs`
|
|
4. Add tests in `src/tests/`
|
|
5. Update roadmap.md if this completes a phase
|
|
|
|
### Debugging Conflict Detection
|
|
|
|
1. Run with `RUST_LOG=aphoria=debug`
|
|
2. Check concept path format (must use leaf-path matching)
|
|
3. Verify authoritative source exists in corpus
|
|
4. Check confidence and tier of both claims
|
|
5. Inspect `ConflictTrace` if available
|
|
|
|
### Pre-Commit Integration
|
|
|
|
```bash
|
|
#!/bin/sh
|
|
# .git/hooks/pre-commit
|
|
aphoria scan --staged --exit-code
|
|
```
|
|
|
|
Exit codes: 0 (pass), 1 (flag/drift), 2 (block)
|
|
|
|
## Output Format
|
|
|
|
When implementing features or fixing bugs, provide:
|
|
|
|
```
|
|
## Summary
|
|
[One-line description]
|
|
|
|
## Changes
|
|
- [File]: [What changed]
|
|
|
|
## Testing
|
|
- [How to verify]
|
|
|
|
## Roadmap Impact
|
|
- [Phase affected, if any]
|
|
```
|
|
|
|
## Phase Status Reference
|
|
|
|
| Phase | Status | Next |
|
|
|-------|--------|------|
|
|
| 0-3 | Complete | - |
|
|
| 4.5 | Complete | Ephemeral mode |
|
|
| 4A | Complete | Observation write-back |
|
|
| 4B | Complete | Drift detection |
|
|
| 4C | Complete | Staged scanning |
|
|
| 4D | Planned | Enhanced ack |
|
|
| 4E | Planned | Community contribution |
|
|
| 5 | Complete | Research agent loop |
|
|
| 6 | Complete | Trust Packs |
|
|
| 7 | Planned | Declarative extractors |
|
|
| A1 | Complete | Observations vs Claims type system |
|
|
| A2 | Complete | Claim authoring workflow + CLI |
|
|
| A3 | Complete | Verification engine + verify command |
|
|
| A4 | Complete | Corpus as assertions + authority lens |
|
|
| A5.1 | Complete | Coverage metrics (coverage.rs) |
|
|
| A5.2 | Complete | Docs generation (explain.rs + claims_explain) |
|
|
| **A5.3** | **Next** | Claim suggester skill (aphoria-suggest) |
|
|
| A5.4 | Complete | Onboarding mode (aphoria explain) |
|
|
|
|
## Related Skills
|
|
|
|
| Skill | Purpose |
|
|
|-------|---------|
|
|
| `aphoria-claims` | Review diffs for claimable changes (reactive) |
|
|
| `aphoria-suggest` | Suggest claims from patterns + gaps (proactive) |
|
|
| `aphoria-self-review` | Evaluate scan quality and noise |
|
|
| `aphoria-llm-optimization` | Optimize LLM extraction quality |
|
|
| `extract-claims` | Extract claims from prose text |
|
|
| `aphoria-install` | Install Aphoria for local dev |
|