Major additions: - Staged scanning modes (working tree, staged, committed) with git integration - Drift detection for baseline vs current state comparisons - Hosted API handlers for policy CRUD operations via StemeDB API - stemedb-ontology crate with domain definitions and medical extractors - Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.) - Aphoria development skill documentation Code organization: - Split large files into focused modules to stay under 500-line limit - Extracted config tests, episteme helpers/drift/aliases, API helpers Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
11 KiB
| name | description |
|---|---|
| aphoria-dev | Development guidelines for Aphoria - the code-level truth linter powered by Episteme |
Aphoria Development Skill
You are an expert Aphoria developer. Aphoria is a code-level truth linter that validates code against authoritative sources (RFCs, OWASP, vendor docs). Unlike traditional linters (syntax/style) or SAST tools (vulnerability patterns), Aphoria validates intent against authority using Episteme's probabilistic knowledge graph.
Core Concept
Aphoria extracts implicit claims from code and configs, then checks them against tiered authoritative sources:
| Tier | Source | Example |
|---|---|---|
| 0 | Regulatory | RFC 7519: "JWT audience validation is mandatory" |
| 1 | Clinical | OWASP: "TLS certificate verification required" |
| 2 | Observational | Vendor docs: "Redis timeout should be > 0" |
| 3 | Expert | Team policy: "Our pool size is 50" |
| 4 | Community | Prior observations from this codebase |
Example conflict:
code://rust/myapp/auth/jwt/audience_validation = false
→ Conflicts with rfc://7519/auth/jwt/audience_validation = true (Tier 0, confidence 1.0)
→ Verdict: BLOCK
Principles
1. Claims Over Facts
Aphoria stores claims (assertions with provenance, confidence, timestamps), not absolute facts. Conflicts are normal and resolved via Lenses at query time.
2. Tiered Authority
Lower tier = higher authority. Tier 0 (RFC) outranks Tier 3 (team policy). Conflict scores weight by tier.
3. Leaf-Path Matching
Cross-scheme matching uses last 2 path segments:
code://rust/myapp/tls/cert_verificationmatchesrfc://5246/tls/cert_verification
4. Ephemeral by Default
Fast (~0.25s) in-memory scans for CI/pre-commit. Use --persist only when drift detection or observation write-back is needed.
5. Non-Blocking Workflow
Conflicts don't fail unless --exit-code is passed. Let developers acknowledge known conflicts with aphoria ack.
Architecture
┌─────────────────────────────────────────────┐
│ Aphoria CLI Pipeline │
├─────────────────────────────────────────────┤
│ 1. WALK → Traverse project (respects │
│ .gitignore) │
│ 2. EXTRACT → Pattern-based claim │
│ extraction (12 extractors) │
│ 3. INGEST → Convert to Episteme │
│ assertions (BLAKE3+Ed25519) │
│ 4. CONFLICT → Query for authority matches │
│ (ConceptIndex + Leaf path) │
│ 5. REPORT → Output in multiple formats │
│ 6. SYNC → (Optional) Write-back │
│ observations to local store │
└─────────────────────────────────────────────┘
Key Modules
| Module | Purpose | Key File |
|---|---|---|
scan.rs |
Main entry; mode dispatch | Core orchestrator |
walker/ |
Project traversal | Respects .gitignore |
extractors/ |
12 pattern-based extractors | Regex, not AST |
episteme/ |
LocalEpisteme + EphemeralDetector | Conflict detection |
bridge.rs |
ExtractedClaim → Assertion | BLAKE3 + Ed25519 |
report/ |
Table, JSON, SARIF, Markdown | Output formatting |
policy_ops.rs |
Bless, ack, export/import | Trust Pack workflow |
types/ |
ScanArgs, ConflictResult, Verdict | Domain types |
config.rs |
aphoria.toml parsing | Configuration |
Key Types
// From code/config
pub struct ExtractedClaim {
pub concept_path: String, // e.g., "code://rust/myapp/auth/jwt/aud_validation"
pub predicate: String, // e.g., "enabled"
pub value: ObjectValue, // true/false/number/text
pub file: String, // relative path
pub line: usize, // 1-indexed
pub confidence: f32, // 0.0-1.0
}
// Conflict detection result
pub struct ConflictResult {
pub claim: ExtractedClaim,
pub conflicts: Vec<ConflictingSource>,
pub conflict_score: f32, // 0.0-1.0
pub verdict: Verdict, // Block/Flag/Pass/Ack/Drift
}
// Verdict determination
pub enum Verdict {
Block, // score >= 0.7 (configurable)
Flag, // score >= 0.5 (configurable)
Pass, // below thresholds
Ack, // acknowledged by user
Drift, // changed from prior observation
}
// Scan modes
pub enum ScanMode {
Ephemeral, // Fast, in-memory (~0.25s)
Persistent, // Full Episteme stack (~1-2s)
}
// File sources
pub enum FileSource {
All, // Entire project
Staged, // Git-staged files only
}
Step Back: Before Implementing
Before writing code, challenge your assumptions:
1. Is This Claim Extraction or Detection?
"Am I adding a new extractor (claim extraction) or improving conflict detection?"
- Extractors live in
src/extractors/and implement theExtractortrait - Detection logic lives in
src/episteme/and uses ConceptIndex - Don't mix concerns
2. Does This Need Persistence?
"Does this feature require WAL/KV store, or can it work ephemerally?"
- Prefer ephemeral for speed
- Use persistence only for: drift detection, observation write-back, baseline tracking
--syncrequires--persist
3. What's the Authority Tier?
"What tier is this authoritative source?"
- Tier 0-2 come from corpus builders (RFC, OWASP, Vendor)
- Tier 3 is team policy (bless/ack commands)
- Tier 4 is observational (auto-generated from code with no conflicts)
4. Will This Break Fast Scans?
"Does this change affect ephemeral scan performance (~0.25s target)?"
- Avoid disk I/O in ephemeral mode
- Don't load full Episteme stack unless
--persist - Profile before/after
After step back: If unsure, trace through scan.rs to see where your change fits.
Do
- Use the correct scan mode. Ephemeral for CI/pre-commit, Persistent for drift/sync.
- Implement new extractors with regex. Not AST parsing. Keep them simple and fast.
- Return empty vec from extractors on no match. Never panic or error for missing patterns.
- Use structured concept paths. Format:
scheme://source/path/to/concept - Add tests for new extractors. In
src/tests/withtempfile::TempDirfor isolation. - Update
roadmap.mdwhen completing phases. Keep status accurate. - Use
#[instrument]on critical path functions. Walker, extractors, episteme, report. - Log with
tracingmacros.info!,warn!,error!— notprintln!. - Validate
--syncrequires--persist. This is enforced inhandlers.rs. - Support multiple report formats. Table (default), JSON, SARIF 2.1.0, Markdown.
Do Not
- Use
unwrap()orexpect()in production code. Clippy denies these. - Add disk I/O to ephemeral mode. It must stay fast (~0.25s).
- Mix claim extraction with conflict detection. Separate concerns.
- Hardcode concept paths. Build them programmatically from file context.
- Skip the confidence field. Every claim needs a confidence score (0.0-1.0).
- Forget the source file and line. Extractors must track provenance.
- Use
println!in library code. Only allowed in CLI binaries (main.rs, handlers.rs). - Ignore SARIF format requirements. Security tools expect SARIF 2.1.0 compliance.
- Break leaf-path matching. Cross-scheme matching depends on consistent path structure.
- Commit without running
cargo clippy --workspace -- -D warnings. CI will fail.
Decision Points
Adding a New Extractor
Stop. Questions:
- What languages does this pattern appear in?
- What's the concept path scheme? (
code://lang/project/category/concept) - What authoritative source defines the expected value?
- What regex reliably detects this pattern without false positives?
Modifying Conflict Detection
Stop. Questions:
- Does this change affect ephemeral mode?
- Does it require new indexes in LocalEpisteme?
- How does it interact with existing leaf-path matching?
- What's the performance impact?
Adding CLI Commands
Stop. Questions:
- Does this command need persistence?
- What's the exit code contract?
- Does it need validation (like
--syncrequires--persist)? - What report format should it output?
Constraints
NEVER:
- Use
unwrap()orexpect()in production code - Add disk I/O to ephemeral scan mode
- Break the 0.25s target for ephemeral scans
- Mutate existing Episteme assertions (append-only)
- Skip Ed25519 signing when creating assertions
ALWAYS:
- Run
cargo clippy --workspace -- -D warningsbefore commit - Add tests for new functionality
- Update roadmap.md for completed phases
- Use
#[instrument]on public methods in critical paths - Respect .gitignore in walker traversal
Testing Commands
# Full test suite
cargo test -p aphoria --workspace
# Specific test
cargo test -p aphoria test_ephemeral_scan
# Lint check
cargo clippy -p aphoria -- -D warnings
# Format check
cargo fmt -p aphoria --check
# Quick ephemeral scan (should be ~0.25s)
cargo run -p aphoria -- scan .
# Staged files only (pre-commit mode)
cargo run -p aphoria -- scan --staged --exit-code
# Persistent with sync
cargo run -p aphoria -- scan . --persist --sync
# Export report
cargo run -p aphoria -- scan . --format sarif > report.sarif.json
Common Workflows
Adding a New Extractor
- Create
src/extractors/{name}.rs - Implement
Extractortrait (name, languages, extract) - Register in
src/extractors/mod.rs - Add tests in
src/tests/ - Update roadmap.md if this completes a phase
Debugging Conflict Detection
- Run with
RUST_LOG=aphoria=debug - Check concept path format (must use leaf-path matching)
- Verify authoritative source exists in corpus
- Check confidence and tier of both claims
- Inspect
ConflictTraceif available
Pre-Commit Integration
#!/bin/sh
# .git/hooks/pre-commit
aphoria scan --staged --exit-code
Exit codes: 0 (pass), 1 (flag/drift), 2 (block)
Output Format
When implementing features or fixing bugs, provide:
## Summary
[One-line description]
## Changes
- [File]: [What changed]
## Testing
- [How to verify]
## Roadmap Impact
- [Phase affected, if any]
Phase Status Reference
| Phase | Status | Next |
|---|---|---|
| 0-3 | Complete | - |
| 4.5 | Complete | Ephemeral mode |
| 4A | Complete | Observation write-back |
| 4B | Complete | Drift detection |
| 4C | Complete | Staged scanning |
| 4D | Next | Enhanced ack |
| 4E | Planned | Community contribution |
| 5 | Complete | Research agent loop |
| 6 | Complete | Trust Packs |
| 7 | Planned | Declarative extractors |