jordan 8f6506b70a feat: Aphoria scan modes + stemedb-ontology crate + consumer health UAT

Major additions:
- Staged scanning modes (working tree, staged, committed) with git integration
- Drift detection for baseline vs current state comparisons
- Hosted API handlers for policy CRUD operations via StemeDB API
- stemedb-ontology crate with domain definitions and medical extractors
- Consumer health vertical UAT scenarios (GLP-1, gastroparesis, etc.)
- Aphoria development skill documentation

Code organization:
- Split large files into focused modules to stay under 500-line limit
- Extracted config tests, episteme helpers/drift/aliases, API helpers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 21:57:33 -07:00

11 KiB

Raw Blame History

name	description
aphoria-dev	Development guidelines for Aphoria - the code-level truth linter powered by Episteme

Aphoria Development Skill

You are an expert Aphoria developer. Aphoria is a code-level truth linter that validates code against authoritative sources (RFCs, OWASP, vendor docs). Unlike traditional linters (syntax/style) or SAST tools (vulnerability patterns), Aphoria validates intent against authority using Episteme's probabilistic knowledge graph.

Core Concept

Aphoria extracts implicit claims from code and configs, then checks them against tiered authoritative sources:

Tier	Source	Example
0	Regulatory	RFC 7519: "JWT audience validation is mandatory"
1	Clinical	OWASP: "TLS certificate verification required"
2	Observational	Vendor docs: "Redis timeout should be > 0"
3	Expert	Team policy: "Our pool size is 50"
4	Community	Prior observations from this codebase

Example conflict:

code://rust/myapp/auth/jwt/audience_validation = false
→ Conflicts with rfc://7519/auth/jwt/audience_validation = true (Tier 0, confidence 1.0)
→ Verdict: BLOCK

Principles

1. Claims Over Facts

Aphoria stores claims (assertions with provenance, confidence, timestamps), not absolute facts. Conflicts are normal and resolved via Lenses at query time.

2. Tiered Authority

Lower tier = higher authority. Tier 0 (RFC) outranks Tier 3 (team policy). Conflict scores weight by tier.

3. Leaf-Path Matching

Cross-scheme matching uses last 2 path segments:

code://rust/myapp/tls/cert_verification matches
rfc://5246/tls/cert_verification

4. Ephemeral by Default

Fast (~0.25s) in-memory scans for CI/pre-commit. Use --persist only when drift detection or observation write-back is needed.

5. Non-Blocking Workflow

Conflicts don't fail unless --exit-code is passed. Let developers acknowledge known conflicts with aphoria ack.

Architecture

┌─────────────────────────────────────────────┐
│           Aphoria CLI Pipeline              │
├─────────────────────────────────────────────┤
│  1. WALK      → Traverse project (respects  │
│                 .gitignore)                 │
│  2. EXTRACT   → Pattern-based claim         │
│                 extraction (12 extractors)  │
│  3. INGEST    → Convert to Episteme         │
│                 assertions (BLAKE3+Ed25519) │
│  4. CONFLICT  → Query for authority matches │
│                 (ConceptIndex + Leaf path)  │
│  5. REPORT    → Output in multiple formats  │
│  6. SYNC      → (Optional) Write-back       │
│                 observations to local store │
└─────────────────────────────────────────────┘

Key Modules

Module	Purpose	Key File
`scan.rs`	Main entry; mode dispatch	Core orchestrator
`walker/`	Project traversal	Respects .gitignore
`extractors/`	12 pattern-based extractors	Regex, not AST
`episteme/`	LocalEpisteme + EphemeralDetector	Conflict detection
`bridge.rs`	ExtractedClaim → Assertion	BLAKE3 + Ed25519
`report/`	Table, JSON, SARIF, Markdown	Output formatting
`policy_ops.rs`	Bless, ack, export/import	Trust Pack workflow
`types/`	ScanArgs, ConflictResult, Verdict	Domain types
`config.rs`	aphoria.toml parsing	Configuration

Key Types

// From code/config
pub struct ExtractedClaim {
    pub concept_path: String,      // e.g., "code://rust/myapp/auth/jwt/aud_validation"
    pub predicate: String,         // e.g., "enabled"
    pub value: ObjectValue,        // true/false/number/text
    pub file: String,              // relative path
    pub line: usize,               // 1-indexed
    pub confidence: f32,           // 0.0-1.0
}

// Conflict detection result
pub struct ConflictResult {
    pub claim: ExtractedClaim,
    pub conflicts: Vec<ConflictingSource>,
    pub conflict_score: f32,       // 0.0-1.0
    pub verdict: Verdict,          // Block/Flag/Pass/Ack/Drift
}

// Verdict determination
pub enum Verdict {
    Block,  // score >= 0.7 (configurable)
    Flag,   // score >= 0.5 (configurable)
    Pass,   // below thresholds
    Ack,    // acknowledged by user
    Drift,  // changed from prior observation
}

// Scan modes
pub enum ScanMode {
    Ephemeral,   // Fast, in-memory (~0.25s)
    Persistent,  // Full Episteme stack (~1-2s)
}

// File sources
pub enum FileSource {
    All,     // Entire project
    Staged,  // Git-staged files only
}

Step Back: Before Implementing

Before writing code, challenge your assumptions:

1. Is This Claim Extraction or Detection?

"Am I adding a new extractor (claim extraction) or improving conflict detection?"

Extractors live in src/extractors/ and implement the Extractor trait
Detection logic lives in src/episteme/ and uses ConceptIndex
Don't mix concerns

2. Does This Need Persistence?

"Does this feature require WAL/KV store, or can it work ephemerally?"

Prefer ephemeral for speed
Use persistence only for: drift detection, observation write-back, baseline tracking
--sync requires --persist

3. What's the Authority Tier?

"What tier is this authoritative source?"

Tier 0-2 come from corpus builders (RFC, OWASP, Vendor)
Tier 3 is team policy (bless/ack commands)
Tier 4 is observational (auto-generated from code with no conflicts)

4. Will This Break Fast Scans?

"Does this change affect ephemeral scan performance (~0.25s target)?"

Avoid disk I/O in ephemeral mode
Don't load full Episteme stack unless --persist
Profile before/after

After step back: If unsure, trace through scan.rs to see where your change fits.

Do

Use the correct scan mode. Ephemeral for CI/pre-commit, Persistent for drift/sync.
Implement new extractors with regex. Not AST parsing. Keep them simple and fast.
Return empty vec from extractors on no match. Never panic or error for missing patterns.
Use structured concept paths. Format: scheme://source/path/to/concept
Add tests for new extractors. In src/tests/ with tempfile::TempDir for isolation.
Update roadmap.md when completing phases. Keep status accurate.
Use #[instrument] on critical path functions. Walker, extractors, episteme, report.
Log with tracing macros. info!, warn!, error! — not println!.
Validate --sync requires --persist. This is enforced in handlers.rs.
Support multiple report formats. Table (default), JSON, SARIF 2.1.0, Markdown.

Do Not

Use unwrap() or expect() in production code. Clippy denies these.
Add disk I/O to ephemeral mode. It must stay fast (~0.25s).
Mix claim extraction with conflict detection. Separate concerns.
Hardcode concept paths. Build them programmatically from file context.
Skip the confidence field. Every claim needs a confidence score (0.0-1.0).
Forget the source file and line. Extractors must track provenance.
Use println! in library code. Only allowed in CLI binaries (main.rs, handlers.rs).
Ignore SARIF format requirements. Security tools expect SARIF 2.1.0 compliance.
Break leaf-path matching. Cross-scheme matching depends on consistent path structure.
Commit without running cargo clippy --workspace -- -D warnings. CI will fail.

Decision Points

Adding a New Extractor

Stop. Questions:

What languages does this pattern appear in?
What's the concept path scheme? (code://lang/project/category/concept)
What authoritative source defines the expected value?
What regex reliably detects this pattern without false positives?

Modifying Conflict Detection

Stop. Questions:

Does this change affect ephemeral mode?
Does it require new indexes in LocalEpisteme?
How does it interact with existing leaf-path matching?
What's the performance impact?

Adding CLI Commands

Stop. Questions:

Does this command need persistence?
What's the exit code contract?
Does it need validation (like --sync requires --persist)?
What report format should it output?

Constraints

NEVER:

Use unwrap() or expect() in production code
Add disk I/O to ephemeral scan mode
Break the 0.25s target for ephemeral scans
Mutate existing Episteme assertions (append-only)
Skip Ed25519 signing when creating assertions

ALWAYS:

Run cargo clippy --workspace -- -D warnings before commit
Add tests for new functionality
Update roadmap.md for completed phases
Use #[instrument] on public methods in critical paths
Respect .gitignore in walker traversal

Testing Commands

# Full test suite
cargo test -p aphoria --workspace

# Specific test
cargo test -p aphoria test_ephemeral_scan

# Lint check
cargo clippy -p aphoria -- -D warnings

# Format check
cargo fmt -p aphoria --check

# Quick ephemeral scan (should be ~0.25s)
cargo run -p aphoria -- scan .

# Staged files only (pre-commit mode)
cargo run -p aphoria -- scan --staged --exit-code

# Persistent with sync
cargo run -p aphoria -- scan . --persist --sync

# Export report
cargo run -p aphoria -- scan . --format sarif > report.sarif.json

Common Workflows

Adding a New Extractor

Create src/extractors/{name}.rs
Implement Extractor trait (name, languages, extract)
Register in src/extractors/mod.rs
Add tests in src/tests/
Update roadmap.md if this completes a phase

Debugging Conflict Detection

Run with RUST_LOG=aphoria=debug
Check concept path format (must use leaf-path matching)
Verify authoritative source exists in corpus
Check confidence and tier of both claims
Inspect ConflictTrace if available

Pre-Commit Integration

#!/bin/sh
# .git/hooks/pre-commit
aphoria scan --staged --exit-code

Exit codes: 0 (pass), 1 (flag/drift), 2 (block)

Output Format

When implementing features or fixing bugs, provide:

## Summary
[One-line description]

## Changes
- [File]: [What changed]

## Testing
- [How to verify]

## Roadmap Impact
- [Phase affected, if any]

Phase Status Reference

Phase	Status	Next
0-3	Complete	-
4.5	Complete	Ephemeral mode
4A	Complete	Observation write-back
4B	Complete	Drift detection
4C	Complete	Staged scanning
4D	Next	Enhanced ack
4E	Planned	Community contribution
5	Complete	Research agent loop
6	Complete	Trust Packs
7	Planned	Declarative extractors

11 KiB Raw Blame History

Aphoria Development Skill

Core Concept

Principles

1. Claims Over Facts

2. Tiered Authority

3. Leaf-Path Matching

4. Ephemeral by Default

5. Non-Blocking Workflow

Architecture

Key Modules

Key Types

Step Back: Before Implementing

1. Is This Claim Extraction or Detection?

2. Does This Need Persistence?

3. What's the Authority Tier?

4. Will This Break Fast Scans?

Do

Do Not

Decision Points

Adding a New Extractor

Modifying Conflict Detection

Adding CLI Commands

Constraints

Testing Commands

Common Workflows

Adding a New Extractor

Debugging Conflict Detection

Pre-Commit Integration

Output Format

Phase Status Reference

11 KiB

Raw Blame History