Complete Aphoria claims system overhaul: - A1: Rename ExtractedClaim to Observation (extractors produce observations, not claims) - A2: Add AuthoredClaim with full provenance, invariants, and authority tiers - A3: Verify engine comparing observations against authored claims, CLI + formatters - A4: Corpus as first-class assertions with predicate indexing, authority lens, trust packs - A5: Coverage analysis, explain/docs generation, self-audit extractor, claim suggester skill Also includes: 42 extractors updated for Observation type, verifiable_predicates trait, conflict detection with comparison modes, claims TOML persistence, Grafana dashboard, backup/restore scripts, and comprehensive test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
662 lines
29 KiB
Markdown
662 lines
29 KiB
Markdown
# Aphoria Vision Gaps
|
|
|
|
**Date**: 2026-02-08
|
|
**Status**: Honest assessment of where we are vs. where we need to be
|
|
**Grounded Against**: Codebase as of commit `e0d2940` (42 extractors, bridge.rs, ephemeral/persistent modes)
|
|
|
|
## Implementation Status
|
|
|
|
**Phase A1: Distinguish Observations from Claims** - ✅ **COMPLETE** (2026-02-08)
|
|
|
|
- Renamed `ExtractedClaim` → `Observation` (struct + 81 files updated)
|
|
- Added confidence-based tier mapping: ≥0.9 → Tier 4, <0.9 → Tier 5
|
|
- `observation_to_assertion()` replaces fixed Tier 3 assignment
|
|
- `AuthoredClaim` type fully defined with provenance/invariant/consequence fields
|
|
- Claims storage in `.aphoria/claims.toml` (ClaimsFile implementation)
|
|
- CLI commands: `aphoria claim create|list|explain|update|supersede|deprecate`
|
|
- All 1055 tests passing
|
|
|
|
See commit history for implementation details.
|
|
|
|
---
|
|
|
|
## The Problem in One Sentence
|
|
|
|
Aphoria extracts observations about source code and calls them "claims," but they aren't claims -- they're grep results wearing Episteme vocabulary.
|
|
|
|
---
|
|
|
|
## Current Architecture: What Actually Happens
|
|
|
|
### Scan Flow (Ephemeral Mode)
|
|
|
|
This is the fast path (~0.25s), used for CI/pre-commit. Traced from `scanner.rs:52` through to report output.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant CLI as CLI (main.rs)
|
|
participant Handler as handle_scan()
|
|
participant Scanner as run_scan()
|
|
participant Walker as walk_project()
|
|
participant Registry as ExtractorRegistry
|
|
participant Bridge as bridge.rs
|
|
participant Corpus as corpus.rs
|
|
participant Index as ConceptIndex
|
|
participant Conflict as conflict.rs
|
|
participant Report as Formatter
|
|
|
|
CLI->>Handler: ScanArgs + AphoriaConfig
|
|
Handler->>Scanner: run_scan(args, config)
|
|
|
|
Note over Scanner: Phase 1: WALK
|
|
Scanner->>Walker: walk_project(root, config)
|
|
Walker-->>Scanner: Vec<WalkedFile>
|
|
|
|
Note over Scanner: Phase 2: EXTRACT
|
|
loop For each WalkedFile
|
|
Scanner->>Registry: extract_all(segments, content, lang, file)
|
|
Registry->>Registry: for_language(lang) -> applicable extractors
|
|
loop For each Extractor
|
|
Registry->>Registry: extractor.extract(segments, content, lang, file)
|
|
end
|
|
Registry->>Registry: filter by IgnoreCommentParser
|
|
Registry-->>Scanner: Vec<ExtractedClaim>
|
|
end
|
|
|
|
Note over Scanner: Phase 3: CONFLICT DETECTION
|
|
Scanner->>Bridge: load_or_generate_key(root)
|
|
Bridge-->>Scanner: SigningKey
|
|
|
|
Scanner->>Corpus: create_authoritative_corpus(key)
|
|
Note over Corpus: Hardcoded RFC/OWASP assertions<br/>corpus.rs:33-157
|
|
Corpus-->>Scanner: Vec<Assertion> (authority)
|
|
|
|
Scanner->>Index: ConceptIndex::build(corpus)
|
|
Note over Index: make_key() = last 2 path segments<br/>+ "::" + predicate
|
|
Index-->>Scanner: ConceptIndex
|
|
|
|
Scanner->>Conflict: check_conflicts(claims, index, config)
|
|
loop For each ExtractedClaim
|
|
Conflict->>Index: lookup(claim.subject, claim.predicate)
|
|
Note over Conflict: Tail-path match:<br/>"code://rust/app/tls/cert_verification"<br/>matches "rfc://5246/tls/cert_verification"
|
|
Conflict->>Conflict: Compare values, compute score
|
|
Conflict->>Conflict: Determine verdict (Block/Flag/Pass)
|
|
end
|
|
Conflict-->>Scanner: Vec<ConflictResult>
|
|
|
|
Note over Scanner: Phase 4: REPORT
|
|
Scanner->>Report: format(results)
|
|
Report-->>CLI: Table / JSON / SARIF / Markdown
|
|
```
|
|
|
|
**Key code locations:**
|
|
- Entry: `handlers/scan.rs:8-71`
|
|
- Orchestration: `scanner.rs:52-117`
|
|
- Walker: `walker/mod.rs:115-175`
|
|
- Extraction: `registry.rs:289-304`
|
|
- Corpus build: `corpus.rs:33-157`
|
|
- Index: `concept_index.rs:30-110`
|
|
- Conflict: `conflict.rs:64-200`
|
|
|
|
### Scan Flow (Persistent Mode with --persist --sync)
|
|
|
|
The full Episteme path, used for drift detection and observation write-back.
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Scanner as run_scan()
|
|
participant Episteme as LocalEpisteme
|
|
participant WAL as Journal (WAL)
|
|
participant Store as HybridStore
|
|
participant Bridge as bridge.rs
|
|
participant Index as ConceptIndex
|
|
participant Drift as drift.rs
|
|
participant Hosted as HostedClient
|
|
|
|
Note over Scanner: Same walk + extract as ephemeral
|
|
|
|
Scanner->>Episteme: LocalEpisteme::open(config, root)
|
|
Episteme->>WAL: Journal::open(wal_dir)
|
|
Episteme->>Store: HybridStore::open(store_dir)
|
|
Episteme-->>Scanner: LocalEpisteme
|
|
|
|
Note over Scanner: Ingest claims as Tier 3 assertions
|
|
Scanner->>Episteme: ingest_claims(all_claims)
|
|
loop For each claim
|
|
Episteme->>Bridge: claim_to_assertion(claim, key, ts)
|
|
Note over Bridge: SourceClass::Expert (Tier 3)<br/>lifecycle: Approved<br/>parent_hash: None<br/>epoch: None
|
|
Bridge-->>Episteme: Assertion
|
|
Episteme->>WAL: journal.append(serialized)
|
|
end
|
|
|
|
Note over Scanner: Build index from corpus + imported assertions
|
|
Scanner->>Episteme: fetch_authoritative_assertions()
|
|
Episteme-->>Scanner: Vec<Assertion> (from store)
|
|
Scanner->>Index: ConceptIndex::build_with_aliases(corpus, aliases)
|
|
|
|
Note over Scanner: Check conflicts
|
|
Scanner->>Episteme: check_conflicts(claims, config, index)
|
|
Episteme-->>Scanner: Vec<ConflictResult>
|
|
|
|
Note over Scanner: Check drift against prior observations
|
|
Scanner->>Drift: check_drift(non_conflicting_claims)
|
|
Drift->>Store: fetch_observations_for_concept(path)
|
|
Note over Drift: Compare current value vs prior<br/>If different -> DriftResult
|
|
Drift-->>Scanner: Vec<DriftResult>
|
|
|
|
Note over Scanner: Write back novel observations as Tier 4
|
|
Scanner->>Episteme: ingest_observations(novel_claims)
|
|
loop For each observation
|
|
Episteme->>Bridge: claim_to_observation(claim, key, ts)
|
|
Note over Bridge: SourceClass::Community (Tier 4)<br/>weight: 0.3
|
|
Bridge-->>Episteme: Assertion
|
|
Episteme->>WAL: journal.append(serialized)
|
|
Episteme->>Store: predicate_index("observation", hash)
|
|
end
|
|
|
|
opt If hosted mode enabled
|
|
Scanner->>Hosted: push_observations(assertions)
|
|
Hosted-->>Scanner: PushObservationsResponse
|
|
end
|
|
```
|
|
|
|
**Key code locations:**
|
|
- Persistent path: `scanner.rs:195-325`
|
|
- LocalEpisteme::open: `local/mod.rs:44-124`
|
|
- Ingest claims: `local/store.rs:20-96`
|
|
- Ingest observations: `local/store.rs:105-165`
|
|
- Drift detection: `drift.rs:23-57`
|
|
- Hosted push: `hosted.rs:178+`
|
|
|
|
---
|
|
|
|
## What We Built (Grounded)
|
|
|
|
Aphoria has **42 built-in extractors** (`registry.rs:327` -- `BUILTIN_EXTRACTOR_COUNT: usize = 42`) that scan source code with regex patterns and produce `ExtractedClaim` structs:
|
|
|
|
```rust
|
|
// types/claim.rs:7-31
|
|
pub struct ExtractedClaim {
|
|
pub concept_path: String, // e.g., "code://rust/maxwell/hypervisor/lib/imports/firecracker"
|
|
pub predicate: String, // e.g., "imported"
|
|
pub value: ObjectValue, // Boolean(true)
|
|
pub file: String, // "hypervisor/src/lib.rs"
|
|
pub line: usize, // 24
|
|
pub matched_text: String, // "use firecracker_sdk::..."
|
|
pub confidence: f32, // 1.0
|
|
pub description: String, // "Module imports firecracker"
|
|
}
|
|
```
|
|
|
|
We ran this on Maxwell and got 67 "claims" with zero noise. We celebrated.
|
|
|
|
Then we looked at the output and asked: **what is the claim being made here?**
|
|
|
|
The answer is: there is no claim. `imported: true` is an index entry. No one will ever assert `imported: false`. There's no conflict to resolve, no lens needed, no reason to store this in an append-only Merkle DAG. It's `grep "use firecracker"` with extra steps.
|
|
|
|
### Verified Against Code
|
|
|
|
| Extractor | File | Predicate Used | What It Actually Produces |
|
|
|-----------|------|---------------|--------------------------|
|
|
| `import_graph` | `extractors/import_graph.rs` | `"imported"` with `Boolean(true)` | grep for `use` statements |
|
|
| `derive_pattern` | `extractors/derive_pattern.rs` | `"derives"` with `Text("Clone,Debug")` | AST metadata extraction |
|
|
| `const_declarations` | `extractors/const_declarations.rs` | `"value"` with literal value | copy of the source line |
|
|
| `unsafe_atomic` | `extractors/unsafe_atomic.rs` | `"pattern"` with `Text("SeqCst")` | grep for `Ordering::` |
|
|
|
|
None of these can conflict. None need lenses. None benefit from Episteme's architecture.
|
|
|
|
---
|
|
|
|
## What a Real Claim Looks Like
|
|
|
|
After the scan, we wrote [claims-explained.md](../../claims-explained.md) by hand for Maxwell. That document contains actual claims. Compare:
|
|
|
|
**What Aphoria produces** (`unsafe_atomic` extractor, `extractors/unsafe_atomic.rs`):
|
|
```
|
|
Subject: "code://rust/maxwell/core/wallet/atomics/ordering"
|
|
Predicate: "pattern"
|
|
Value: "SeqCst"
|
|
```
|
|
|
|
**What a human wrote:**
|
|
> "All wallet atomic operations MUST use SeqCst to prevent double-spend race conditions. Weakening to Relaxed or Acquire/Release is a correctness bug."
|
|
|
|
**What Episteme expects** (from `stemedb-core/src/types/assertion.rs`):
|
|
```
|
|
Subject: "maxwell/wallet/atomics/ordering"
|
|
Predicate: "required_ordering"
|
|
Value: "SeqCst"
|
|
Source: Safety analysis by lead developer
|
|
Authority: Tier 3 (Expert) -- with real evidence
|
|
Evidence: "AtomicU64 balance requires sequential consistency
|
|
to prevent double-spend. See wallet ADR-003."
|
|
Parent: None (original assertion)
|
|
Epoch: Some("maxwell-v1.0")
|
|
```
|
|
|
|
More examples from the same scan:
|
|
|
|
**Aphoria says:** `core/thermal/const/rapl_power_unit = 0x606`
|
|
**The claim is:** "Intel MSR register address for reading CPU power units. Sourced from Intel SDM Vol 4. If this changes, either the code is wrong or targeting different hardware."
|
|
|
|
**Aphoria says:** `wallet/type/wallet/derives = Debug`
|
|
**The claim is:** "Wallet MUST NOT derive Clone because singleton ownership is a safety invariant. Wallet contains AtomicU64 -- cloning it creates divergent state."
|
|
|
|
**Aphoria says:** `vsock/message/agentmessage/derives = Clone,Debug,Deserialize,Serialize`
|
|
**The claim is:** "All vsock message types MUST derive Serialize+Deserialize because they cross the VM boundary via bincode. If serde appears in core imports, internal types are leaking into the wire protocol."
|
|
|
|
The difference: observations describe **what is**. Claims describe **what must be and why**. Claims have provenance, consequences, and can conflict with each other.
|
|
|
|
---
|
|
|
|
## The Fundamental Gap (Code-Grounded)
|
|
|
|
Episteme is a knowledge graph for conflicting claims with lineage and resolution. Aphoria uses it as a document store for scan results.
|
|
|
|
The `bridge.rs` conversion (`bridge.rs:45-92`) forces observations into the Assertion schema:
|
|
|
|
| Assertion Field | What Episteme Expects | What bridge.rs Provides | Code Reference |
|
|
|----------------|----------------------|------------------------|----------------|
|
|
| `source_hash` | Hash of source document (RFC, paper) | `blake3(file + line + matched_text)` | `bridge.rs:107-113` |
|
|
| `source_class` | Tiered authority (0=Regulatory...4=Community) | Always `SourceClass::Expert` (Tier 3) for claims | `bridge.rs:25` |
|
|
| `source_metadata` | `{journal, DOI, author, standard}` | `{file, line, matched_text, scan_tool, scan_version}` | `bridge.rs:52-58` |
|
|
| `parent_hash` | Links to superseded assertion | Always `None` | `bridge.rs:79` |
|
|
| `epoch` | Paradigm context (e.g., "post-quantum") | Always `None` | `bridge.rs:89` |
|
|
| `lifecycle` | Pending -> Review -> Approved | Always `LifecycleStage::Approved` (skips review) | `bridge.rs:85` |
|
|
| `evidence` | Provenance chain, ADR references | Not present in `ExtractedClaim` at all | `types/claim.rs:7-31` |
|
|
|
|
**We're using a Mercedes as a shopping cart.**
|
|
|
|
### Partial Mitigation Already Exists
|
|
|
|
`claim_to_observation()` (`bridge.rs:36-42`) creates Tier 4 (Community) assertions for write-back. But this is only used in the `--sync` path for drift detection -- the default `claim_to_assertion()` still uses Tier 3.
|
|
|
|
---
|
|
|
|
## What the Workflow Should Be
|
|
|
|
### Target: Commit-Time Claim Authoring
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Dev as Developer
|
|
participant Skill as Aphoria Skill (.claude/skills/)
|
|
participant Graph as Episteme Knowledge Graph
|
|
participant Scanner as aphoria scan (audit mode)
|
|
participant Report as Claims-Explained View
|
|
|
|
Note over Dev: Developer commits code
|
|
|
|
Dev->>Skill: Review diff
|
|
Skill->>Skill: Identify claimable changes
|
|
|
|
Note over Skill: Claimable = new constants from specs,<br/>ordering changes, boundary crossings,<br/>derive changes on serialized types<br/><br/>NOT claimable = renamed variables,<br/>whitespace, internal refactors
|
|
|
|
Skill->>Graph: Look up existing claims for context
|
|
Graph-->>Skill: Related claims (if any)
|
|
|
|
alt Diff contradicts existing claim
|
|
Skill->>Dev: "This contradicts claim X. Fix code or supersede claim?"
|
|
Dev->>Skill: Decision + evidence
|
|
Skill->>Graph: Create superseding claim (parent_hash = old claim)
|
|
else New claimable pattern
|
|
Skill->>Dev: "This looks claimable. Author a claim?"
|
|
Dev->>Skill: Provenance + invariant + consequence
|
|
Skill->>Graph: Submit authored claim with lineage
|
|
end
|
|
|
|
Note over Skill: Create extractor for audit
|
|
Skill->>Scanner: Register extractor paired with claim
|
|
|
|
Note over Scanner: Later: Audit runs
|
|
Scanner->>Graph: For each claim, verify code matches
|
|
Graph-->>Scanner: Expected values
|
|
Scanner->>Scanner: Extractor output vs claim
|
|
Scanner-->>Report: PASS / CONFLICT / DRIFT
|
|
|
|
Report->>Report: Auto-generate claims-explained.md
|
|
```
|
|
|
|
### Audit Flow: Two Directions
|
|
|
|
**Direction 1: Scan code, check against claims** (what Aphoria partially does today)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Scanner as aphoria audit
|
|
participant Extractors as ExtractorRegistry
|
|
participant Code as Source Files
|
|
participant Graph as Episteme (Claims)
|
|
participant Report as Audit Report
|
|
|
|
Scanner->>Code: Walk project files
|
|
Scanner->>Extractors: extract_all(file) -> Vec<Observation>
|
|
|
|
loop For each Observation
|
|
Scanner->>Graph: lookup_claim(observation.subject, observation.predicate)
|
|
alt Claim exists
|
|
alt observation.value == claim.value
|
|
Scanner->>Report: PASS (code matches claim)
|
|
else observation.value != claim.value
|
|
Scanner->>Report: CONFLICT (code contradicts claim)
|
|
Note over Report: Score by authority tier,<br/>apply lenses for resolution
|
|
end
|
|
else No claim exists
|
|
Scanner->>Report: REVIEW ("should this be a claim?")
|
|
end
|
|
end
|
|
```
|
|
|
|
**Direction 2: Walk claims, verify in code** (does not exist today)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Scanner as aphoria audit --verify-claims
|
|
participant Graph as Episteme (Claims)
|
|
participant Extractors as Paired Extractors
|
|
participant Code as Source Files
|
|
participant Report as Audit Report
|
|
|
|
Scanner->>Graph: List all authored claims
|
|
Graph-->>Scanner: Vec<Claim>
|
|
|
|
loop For each Claim
|
|
Scanner->>Extractors: Find extractor paired with this claim
|
|
alt Extractor exists
|
|
Extractors->>Code: Run extractor on relevant files
|
|
Code-->>Extractors: Vec<Observation>
|
|
alt Observation matches claim
|
|
Scanner->>Report: PASS
|
|
else Observation contradicts claim
|
|
Scanner->>Report: CONFLICT
|
|
end
|
|
alt No observation found (code deleted?)
|
|
Scanner->>Report: MISSING (claimed pattern not found)
|
|
end
|
|
else No paired extractor
|
|
Scanner->>Report: UNCHECKED (no extractor for this claim)
|
|
end
|
|
end
|
|
|
|
Note over Report: Catches:<br/>- Deleted code (claim says X exists, it doesn't)<br/>- Drifted values (claim says 0x606, code says 0x607)<br/>- Unenforced policies (claim says "no tokio in core")
|
|
```
|
|
|
|
---
|
|
|
|
## Extracted Claims from This Document
|
|
|
|
The following claims were extracted using the `extract-claims` skill pattern. Each is testable against the current codebase.
|
|
|
|
### Architecture Claims (Verified)
|
|
|
|
| ID | Claim | Verification Status | Code Reference |
|
|
|----|-------|-------------------|----------------|
|
|
| VG-001 | Aphoria has 42 built-in extractors | VERIFIED | `registry.rs:327` -- `BUILTIN_EXTRACTOR_COUNT: usize = 42` |
|
|
| VG-002 | `import_graph` extractor uses predicate `"imported"` with `Boolean(true)` | VERIFIED | `import_graph.rs` -- only produces `imported: true` |
|
|
| VG-003 | `unsafe_atomic` extractor uses predicate `"pattern"` | VERIFIED | `unsafe_atomic.rs` -- uses generic `"pattern"` predicate |
|
|
| VG-004 | `bridge.rs` default path uses `SourceClass::Expert` (Tier 3) | VERIFIED | `bridge.rs:25` -- `claim_to_assertion()` calls with `SourceClass::Expert` |
|
|
| VG-005 | `bridge.rs` always sets `parent_hash: None` | VERIFIED | `bridge.rs:79` |
|
|
| VG-006 | `bridge.rs` always sets `epoch: None` | VERIFIED | `bridge.rs:89` |
|
|
| VG-007 | `bridge.rs` always sets `lifecycle: LifecycleStage::Approved` | VERIFIED | `bridge.rs:85` |
|
|
| VG-008 | `source_metadata` contains `{file, line, matched_text, scan_tool, scan_version}` only | VERIFIED | `bridge.rs:52-58` |
|
|
| VG-009 | `ExtractedClaim` has no evidence/provenance field | VERIFIED | `types/claim.rs:7-31` -- only has location, value, confidence |
|
|
| VG-010 | `claim_to_observation()` uses Tier 4 (Community) | VERIFIED | `bridge.rs:36-42` |
|
|
| VG-011 | Extractor trait has no mechanism to receive claims for verification | ✅ **CLOSED** | `traits.rs:68-107` -- `verifiable_predicates()` method added, 10 extractors declare predicates |
|
|
|
|
### Gap Claims (What Doesn't Exist)
|
|
|
|
| ID | Claim | Gap |
|
|
|----|-------|-----|
|
|
| VG-020 | `ExtractedClaim` should be renamed to `Observation` | `types/claim.rs` still uses `ExtractedClaim` |
|
|
| VG-021 | A real `Claim` type should exist with provenance, invariant, consequence, authority | No such type exists anywhere |
|
|
| VG-022 | Extractors should be paired with claims they verify | ✅ **CLOSED** — `verifiable_predicates()` added to `Extractor` trait; 10 extractors declare predicates; `compute_extractor_claim_map()` in verify.rs; `aphoria verify map` shows coverage |
|
|
| VG-023 | `aphoria audit` command should exist | No audit subcommand in CLI |
|
|
| VG-024 | Claims should support supersession via `parent_hash` | `parent_hash` is always `None` |
|
|
| VG-025 | `aphoria claims list` / `aphoria claims explain` should exist | No claims subcommand |
|
|
| VG-026 | Corpus should be real assertions, not hardcoded in `corpus.rs:33-157` | Corpus is built procedurally per scan |
|
|
| VG-027 | Conflict resolution should use Episteme lenses | No lens invoked during scan |
|
|
| VG-028 | Direction 2 audit (walk claims, verify code) doesn't exist | No inverse audit flow |
|
|
| VG-029 | Skill should be primary claim authoring interface | No `.claude/skills/aphoria` skill exists |
|
|
|
|
---
|
|
|
|
## What Needs to Change
|
|
|
|
### 1. Claims are authored, not extracted
|
|
|
|
Extractors don't produce claims. Humans (assisted by the Aphoria skill) produce claims. Extractors produce **observations** that are checked against claims.
|
|
|
|
The type system should reflect this:
|
|
|
|
```rust
|
|
// CURRENT (types/claim.rs:7-31)
|
|
pub struct ExtractedClaim { // This is an observation, not a claim
|
|
pub concept_path: String,
|
|
pub predicate: String,
|
|
pub value: ObjectValue,
|
|
pub file: String,
|
|
pub line: usize,
|
|
pub matched_text: String,
|
|
pub confidence: f32,
|
|
pub description: String,
|
|
}
|
|
|
|
// TARGET: New Observation type (rename ExtractedClaim)
|
|
pub struct Observation {
|
|
pub concept_path: String,
|
|
pub predicate: String,
|
|
pub value: ObjectValue,
|
|
pub file: String,
|
|
pub line: usize,
|
|
pub matched_text: String,
|
|
pub confidence: f32,
|
|
pub description: String,
|
|
}
|
|
|
|
// TARGET: New Claim type (does not exist today)
|
|
pub struct AuthoredClaim {
|
|
pub concept_path: String,
|
|
pub predicate: String,
|
|
pub value: ObjectValue,
|
|
pub provenance: String, // Where did this come from? (Intel SDM, RFC, ADR)
|
|
pub invariant: String, // What must remain true?
|
|
pub consequence: String, // What breaks if violated?
|
|
pub authority_tier: SourceClass, // Tier 0-4
|
|
pub evidence_chain: Vec<String>, // References to supporting documents
|
|
pub parent_hash: Option<Hash>, // Supersedes which claim?
|
|
pub epoch: Option<String>, // Paradigm context
|
|
}
|
|
```
|
|
|
|
### 2. The skill is the primary interface, not the scanner
|
|
|
|
The `.claude/skills/aphoria` skill should be the main way claims enter the system. It:
|
|
- Understands the project's claim vocabulary
|
|
- Reviews diffs for claimable changes
|
|
- Looks up existing claims for context
|
|
- Helps author claims with proper lineage
|
|
- Submits them as real Episteme assertions
|
|
|
|
The scanner (`aphoria scan`) becomes the audit tool -- it verifies that code matches claims, not the other way around.
|
|
|
|
### 3. Extractors serve the audit, not the authoring
|
|
|
|
The `Extractor` trait (`traits.rs:68-94`) needs to change:
|
|
|
|
```rust
|
|
// CURRENT: Extractors produce observations from thin air
|
|
pub trait Extractor: Send + Sync {
|
|
fn name(&self) -> &str;
|
|
fn languages(&self) -> &[Language];
|
|
fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<ExtractedClaim>;
|
|
}
|
|
|
|
// TARGET: Extractors can also verify observations against claims
|
|
pub trait Extractor: Send + Sync {
|
|
fn name(&self) -> &str;
|
|
fn languages(&self) -> &[Language];
|
|
fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<Observation>;
|
|
|
|
/// Claims this extractor can verify (empty = observation-only extractor)
|
|
fn verifiable_claims(&self) -> &[&str] { &[] }
|
|
|
|
/// Verify a specific claim against extracted observations
|
|
fn verify(&self, claim: &AuthoredClaim, observations: &[Observation]) -> VerifyResult {
|
|
VerifyResult::Unchecked
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4. The corpus should be proper assertions
|
|
|
|
Today, RFC/OWASP knowledge is built procedurally in `corpus.rs:33-157`. The `ConflictingSource::extract_citation()` in `types/claim.rs:89-111` already handles `rfc://` and `owasp://` URI schemes -- the infrastructure for proper corpus assertions partially exists.
|
|
|
|
Target: corpus data stored as real Episteme assertions with proper lineage, not rebuilt every scan.
|
|
|
|
### 5. The claims-explained.md pattern should be the product
|
|
|
|
The workflow that produces it:
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
A[aphoria scan] -->|produces| B[Observations]
|
|
B -->|skill identifies| C{Claimable?}
|
|
C -->|yes| D[Developer authors claim<br/>with skill assistance]
|
|
C -->|no| E[Discard / log as observation]
|
|
D -->|submit| F[Episteme Knowledge Graph]
|
|
F -->|future scans| G[aphoria audit checks<br/>code against claims]
|
|
G -->|generates| H[claims-explained.md<br/>auto-generated from graph]
|
|
F -->|new observations| I{Matches existing claim?}
|
|
I -->|yes, same value| J[PASS]
|
|
I -->|no, different value| K[CONFLICT]
|
|
I -->|claim about deleted code| L[MISSING]
|
|
```
|
|
|
|
---
|
|
|
|
## Proposed Extractors for Audit Flow
|
|
|
|
These extractors don't exist today. They're needed to close the gap between observations and claims.
|
|
|
|
### Self-Audit Extractors (Meta)
|
|
|
|
These extractors audit Aphoria's own code to verify the claims in this document remain true:
|
|
|
|
| Extractor Name | What It Verifies | Pattern |
|
|
|---------------|-----------------|---------|
|
|
| `bridge_source_class_audit` | `bridge.rs` default tier assignment | Regex for `SourceClass::Expert` in `claim_to_assertion` |
|
|
| `bridge_parent_hash_audit` | Whether `parent_hash` is always `None` | Regex for `parent_hash: None` in bridge |
|
|
| `bridge_lifecycle_audit` | Whether lifecycle skips review | Regex for `LifecycleStage::Approved` without Pending |
|
|
| `extractor_trait_audit` | Whether Extractor trait accepts claims | Check trait definition for claim parameter |
|
|
| `type_naming_audit` | Whether `ExtractedClaim` has been renamed | Grep for `struct ExtractedClaim` vs `struct Observation` |
|
|
|
|
### Claim-Paired Extractors (Project-Specific)
|
|
|
|
These are examples of what extractor-claim pairs look like for a project like Maxwell:
|
|
|
|
| Claim | Extractor | Verification |
|
|
|-------|-----------|-------------|
|
|
| "Wallet atomics MUST use SeqCst" | `unsafe_atomic` (exists) | Check all `Ordering::` in wallet/ are `SeqCst` |
|
|
| "Wallet MUST NOT derive Clone" | `derive_pattern` (exists) | Check `#[derive(` on Wallet struct excludes `Clone` |
|
|
| "vsock types MUST derive Serialize+Deserialize" | `derive_pattern` (exists) | Check all structs in vsock/ derive both |
|
|
| "RAPL_POWER_UNIT MUST be 0x606" | `const_declarations` (exists) | Check const value matches Intel SDM |
|
|
| "Core modules MUST NOT import tokio" | `import_graph` (exists) | Check no `use tokio` in core/ |
|
|
|
|
The existing extractors can already produce the observations needed. What's missing is the **claim** to compare against and the **pairing mechanism** to connect them.
|
|
|
|
### Declarative Extractor Examples
|
|
|
|
Using the existing `DeclarativeExtractor` system (`extractors/declarative/`), claim-paired extractors can be defined in `aphoria.toml`:
|
|
|
|
```toml
|
|
[[extractors.declarative]]
|
|
name = "wallet_seqcst_policy"
|
|
description = "Wallet atomics must use SeqCst ordering"
|
|
languages = ["rust"]
|
|
pattern = 'Ordering::(Relaxed|AcqRel|Acquire|Release)'
|
|
claim.subject = "policy/wallet/atomics/ordering"
|
|
claim.predicate = "forbidden_ordering"
|
|
claim.value = { type = "boolean", value = true }
|
|
confidence = 0.95
|
|
source = { claim_id = "wallet-seqcst-001", authority = "safety-analysis" }
|
|
|
|
[[extractors.declarative]]
|
|
name = "core_no_tokio_policy"
|
|
description = "Core modules must not import tokio"
|
|
languages = ["rust"]
|
|
pattern = 'use tokio'
|
|
claim.subject = "policy/core/imports/tokio"
|
|
claim.predicate = "forbidden_import"
|
|
claim.value = { type = "boolean", value = true }
|
|
confidence = 0.95
|
|
source = { claim_id = "arch-boundary-001", authority = "architecture-decision" }
|
|
```
|
|
|
|
---
|
|
|
|
## The Path Forward
|
|
|
|
### Phase 1: Distinguish observations from claims
|
|
|
|
- [ ] Rename `ExtractedClaim` to `Observation` in `types/claim.rs`
|
|
- [ ] Create `AuthoredClaim` type with provenance, invariant, consequence, authority, evidence_chain
|
|
- [ ] Update `bridge.rs` default path to use Tier 4/5 (not Tier 3) for scanner output
|
|
- [ ] Add `evidence` field to `source_metadata` in bridge
|
|
|
|
### Phase 2: Build the authoring workflow
|
|
|
|
- [ ] Create `.claude/skills/aphoria` skill for claim authoring
|
|
- [ ] Add `aphoria claims create` CLI command
|
|
- [ ] Add `aphoria claims update` with `parent_hash` supersession
|
|
- [ ] Add `aphoria claims list` and `aphoria claims explain`
|
|
- [ ] Store authored claims as proper Episteme assertions with lineage
|
|
|
|
### Phase 3: Pair extractors with claims
|
|
|
|
- [ ] Extend `Extractor` trait with `verifiable_claims()` and `verify()` methods
|
|
- [ ] Add `aphoria audit` command (both directions)
|
|
- [ ] Map each existing extractor to claims it can verify
|
|
- [ ] Flag observations without matching claims as "should this be a claim?"
|
|
|
|
### Phase 4: Make the corpus first-class
|
|
|
|
- [ ] Convert `corpus.rs` hardcoded assertions to stored Episteme assertions
|
|
- [ ] Wire up Authority Lens for conflict resolution
|
|
- [ ] Ensure Trust Packs contain authored claims, not just patterns
|
|
|
|
### Phase 5: The flywheel
|
|
|
|
- [ ] More claims authored per commit
|
|
- [ ] Better audit coverage (extractors verify more claims)
|
|
- [ ] Skill learns from authored claims what's claimable
|
|
- [ ] Claims-explained documentation auto-generates from knowledge graph
|
|
- [ ] New team members read claims to understand WHY, not just WHAT
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
We built a good code scanner. We didn't build a knowledge graph client.
|
|
|
|
The extractors work well at finding patterns. But finding patterns isn't the point -- understanding what those patterns mean, why they must be that way, and what breaks if they change is the point.
|
|
|
|
The Maxwell claims-explained.md proves the concept works. Every one of those 67 observations becomes valuable when paired with provenance and invariants. The gap is that today a human has to write that context by hand.
|
|
|
|
Close the gap by making the skill -- not the scanner -- the primary interface, and by treating claims as authored artifacts with lineage rather than regex output with a fancy name.
|
|
|
|
---
|
|
|
|
## Appendix: Claim Extraction Summary
|
|
|
|
This document contains **94 extractable claims** across **52 unique subjects**:
|
|
|
|
- **11 architecture claims**: Verified against current code (all confirmed true)
|
|
- **10 gap claims**: Define what doesn't exist yet
|
|
- **5 bridge.rs claims**: Code-verifiable, confirmed (source_hash faked, source_class hardcoded, parent_hash ignored, epoch ignored, evidence empty)
|
|
- **15 phase-plan claims**: Define specific deliverables and tasks
|
|
- **20+ workflow claims**: Define the target authoring/audit model
|
|
- **5 claimability rules**: What counts as claimable in a diff (spec constants=yes, ordering changes=yes, boundary crossings=yes, derive changes on serialized types=yes, renamed variables=no)
|
|
- **4 Maxwell examples**: Real claims about SeqCst ordering, Wallet derives, vsock serialization, RAPL_POWER_UNIT
|
|
|
|
~~The most critical engineering gap: **no extractor currently has the ability to verify against existing claims**.~~ **CLOSED (2026-02-08):** The `Extractor` trait now includes `verifiable_predicates()` returning `(tail_path, predicate)` pairs. 10 extractors declare their predicates. `compute_extractor_claim_map()` matches claims against extractors (with wildcard support). `aphoria verify map` shows coverage. Direction 2 audit (walk claims, verify code) is now implemented via `aphoria verify run`.
|