Major documentation restructure to improve discoverability and reduce duplication. ## Changes **Deleted (Archived/Consolidated)**: - Removed duplicate getting started guides - Archived outdated planning documents - Consolidated corpus and configuration docs - Removed obsolete vision/spec files (superseded by vision.md) - Cleaned up scrapyard and old PDFs **New Structure**: - docs/about/ - Project overview and introduction - docs/guides/ - User guides (moved from root) - docs/specs/ - Technical specifications - docs/sdk/ - SDK documentation (Go) - docs/references/ - API references - docs/archive/ - Archived historical docs - applications/aphoria/docs/advanced/ - Advanced topics - applications/aphoria/docs/reference/ - CLI reference - applications/aphoria/docs/archive/ - Archived aphoria docs **Updated**: - README.md - New root README with clear navigation - CONTRIBUTING.md - Contribution guidelines - CLAUDE.md - Updated paths to new structure - roadmap.md - Added recent completions ## Files Changed - 57 files changed - 1,977 insertions(+) - 961 deletions(-) **Net change**: +1,016 lines (added CONTRIBUTING.md, README.md, reorganized content) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
30 KiB
Aphoria Vision Gaps
Date: 2026-02-08
Status: Honest assessment of where we are vs. where we need to be
Grounded Against: Codebase as of commit e0d2940 (42 extractors, bridge.rs, ephemeral/persistent modes)
Implementation Status
Phase A1: Distinguish Observations from Claims - ✅ COMPLETE (2026-02-08)
- Renamed
Observation→Observation(struct + 81 files updated) - Added confidence-based tier mapping: ≥0.9 → Tier 4, <0.9 → Tier 5
observation_to_assertion()replaces fixed Tier 3 assignmentAuthoredClaimtype fully defined with provenance/invariant/consequence fields- Claims storage in
.aphoria/claims.toml(ClaimsFile implementation) - CLI commands:
aphoria claim create|list|explain|update|supersede|deprecate - All 1055 tests passing
Verification Engine Enhancements - ✅ COMPLETE (2026-02-08)
- Added
ContainsandNotContainscomparison modes for substring/list checking verify runcommand verifies authored claims against observationsverify mapshows extractor-to-claim coverage- Inline marker support:
@aphoria:claim[category]comments in code - Marker workflow:
list-markers,formalize-marker,reject-markercommands - All 47 verification tests passing (39 existing + 8 new for Contains/NotContains)
- Maxwell dogfooding: 10/10 claims verified, false negative bug eliminated
See commit history for implementation details.
The Problem in One Sentence
Aphoria extracts observations about source code and calls them "claims," but they aren't claims -- they're grep results wearing Episteme vocabulary.
Current Architecture: What Actually Happens
Scan Flow (Ephemeral Mode)
This is the fast path (~0.25s), used for CI/pre-commit. Traced from scanner.rs:52 through to report output.
sequenceDiagram
participant CLI as CLI (main.rs)
participant Handler as handle_scan()
participant Scanner as run_scan()
participant Walker as walk_project()
participant Registry as ExtractorRegistry
participant Bridge as bridge.rs
participant Corpus as corpus.rs
participant Index as ConceptIndex
participant Conflict as conflict.rs
participant Report as Formatter
CLI->>Handler: ScanArgs + AphoriaConfig
Handler->>Scanner: run_scan(args, config)
Note over Scanner: Phase 1: WALK
Scanner->>Walker: walk_project(root, config)
Walker-->>Scanner: Vec<WalkedFile>
Note over Scanner: Phase 2: EXTRACT
loop For each WalkedFile
Scanner->>Registry: extract_all(segments, content, lang, file)
Registry->>Registry: for_language(lang) -> applicable extractors
loop For each Extractor
Registry->>Registry: extractor.extract(segments, content, lang, file)
end
Registry->>Registry: filter by IgnoreCommentParser
Registry-->>Scanner: Vec<Observation>
end
Note over Scanner: Phase 3: CONFLICT DETECTION
Scanner->>Bridge: load_or_generate_key(root)
Bridge-->>Scanner: SigningKey
Scanner->>Corpus: create_authoritative_corpus(key)
Note over Corpus: Hardcoded RFC/OWASP assertions<br/>corpus.rs:33-157
Corpus-->>Scanner: Vec<Assertion> (authority)
Scanner->>Index: ConceptIndex::build(corpus)
Note over Index: make_key() = last 2 path segments<br/>+ "::" + predicate
Index-->>Scanner: ConceptIndex
Scanner->>Conflict: check_conflicts(claims, index, config)
loop For each Observation
Conflict->>Index: lookup(claim.subject, claim.predicate)
Note over Conflict: Tail-path match:<br/>"code://rust/app/tls/cert_verification"<br/>matches "rfc://5246/tls/cert_verification"
Conflict->>Conflict: Compare values, compute score
Conflict->>Conflict: Determine verdict (Block/Flag/Pass)
end
Conflict-->>Scanner: Vec<ConflictResult>
Note over Scanner: Phase 4: REPORT
Scanner->>Report: format(results)
Report-->>CLI: Table / JSON / SARIF / Markdown
Key code locations:
- Entry:
handlers/scan.rs:8-71 - Orchestration:
scanner.rs:52-117 - Walker:
walker/mod.rs:115-175 - Extraction:
registry.rs:289-304 - Corpus build:
corpus.rs:33-157 - Index:
concept_index.rs:30-110 - Conflict:
conflict.rs:64-200
Scan Flow (Persistent Mode with --persist --sync)
The full Episteme path, used for drift detection and observation write-back.
sequenceDiagram
participant Scanner as run_scan()
participant Episteme as LocalEpisteme
participant WAL as Journal (WAL)
participant Store as HybridStore
participant Bridge as bridge.rs
participant Index as ConceptIndex
participant Drift as drift.rs
participant Hosted as HostedClient
Note over Scanner: Same walk + extract as ephemeral
Scanner->>Episteme: LocalEpisteme::open(config, root)
Episteme->>WAL: Journal::open(wal_dir)
Episteme->>Store: HybridStore::open(store_dir)
Episteme-->>Scanner: LocalEpisteme
Note over Scanner: Ingest claims as Tier 3 assertions
Scanner->>Episteme: ingest_claims(all_claims)
loop For each claim
Episteme->>Bridge: claim_to_assertion(claim, key, ts)
Note over Bridge: SourceClass::Expert (Tier 3)<br/>lifecycle: Approved<br/>parent_hash: None<br/>epoch: None
Bridge-->>Episteme: Assertion
Episteme->>WAL: journal.append(serialized)
end
Note over Scanner: Build index from corpus + imported assertions
Scanner->>Episteme: fetch_authoritative_assertions()
Episteme-->>Scanner: Vec<Assertion> (from store)
Scanner->>Index: ConceptIndex::build_with_aliases(corpus, aliases)
Note over Scanner: Check conflicts
Scanner->>Episteme: check_conflicts(claims, config, index)
Episteme-->>Scanner: Vec<ConflictResult>
Note over Scanner: Check drift against prior observations
Scanner->>Drift: check_drift(non_conflicting_claims)
Drift->>Store: fetch_observations_for_concept(path)
Note over Drift: Compare current value vs prior<br/>If different -> DriftResult
Drift-->>Scanner: Vec<DriftResult>
Note over Scanner: Write back novel observations as Tier 4
Scanner->>Episteme: ingest_observations(novel_claims)
loop For each observation
Episteme->>Bridge: claim_to_observation(claim, key, ts)
Note over Bridge: SourceClass::Community (Tier 4)<br/>weight: 0.3
Bridge-->>Episteme: Assertion
Episteme->>WAL: journal.append(serialized)
Episteme->>Store: predicate_index("observation", hash)
end
opt If hosted mode enabled
Scanner->>Hosted: push_observations(assertions)
Hosted-->>Scanner: PushObservationsResponse
end
Key code locations:
- Persistent path:
scanner.rs:195-325 - LocalEpisteme::open:
local/mod.rs:44-124 - Ingest claims:
local/store.rs:20-96 - Ingest observations:
local/store.rs:105-165 - Drift detection:
drift.rs:23-57 - Hosted push:
hosted.rs:178+
What We Built (Grounded)
Aphoria has 42 built-in extractors (registry.rs:327 -- BUILTIN_EXTRACTOR_COUNT: usize = 42) that scan source code with regex patterns and produce Observation structs:
// types/claim.rs:7-31
pub struct Observation {
pub concept_path: String, // e.g., "code://rust/maxwell/hypervisor/lib/imports/firecracker"
pub predicate: String, // e.g., "imported"
pub value: ObjectValue, // Boolean(true)
pub file: String, // "hypervisor/src/lib.rs"
pub line: usize, // 24
pub matched_text: String, // "use firecracker_sdk::..."
pub confidence: f32, // 1.0
pub description: String, // "Module imports firecracker"
}
We ran this on Maxwell and got 67 "claims" with zero noise. We celebrated.
Then we looked at the output and asked: what is the claim being made here?
The answer is: there is no claim. imported: true is an index entry. No one will ever assert imported: false. There's no conflict to resolve, no lens needed, no reason to store this in an append-only Merkle DAG. It's grep "use firecracker" with extra steps.
Verified Against Code
| Extractor | File | Predicate Used | What It Actually Produces |
|---|---|---|---|
import_graph |
extractors/import_graph.rs |
"imported" with Boolean(true) |
grep for use statements |
derive_pattern |
extractors/derive_pattern.rs |
"derives" with Text("Clone,Debug") |
AST metadata extraction |
const_declarations |
extractors/const_declarations.rs |
"value" with literal value |
copy of the source line |
unsafe_atomic |
extractors/unsafe_atomic.rs |
"pattern" with Text("SeqCst") |
grep for Ordering:: |
None of these can conflict. None need lenses. None benefit from Episteme's architecture.
What a Real Claim Looks Like
After the scan, we wrote claims-explained.md by hand for Maxwell. That document contains actual claims. Compare:
What Aphoria produces (unsafe_atomic extractor, extractors/unsafe_atomic.rs):
Subject: "code://rust/maxwell/core/wallet/atomics/ordering"
Predicate: "pattern"
Value: "SeqCst"
What a human wrote:
"All wallet atomic operations MUST use SeqCst to prevent double-spend race conditions. Weakening to Relaxed or Acquire/Release is a correctness bug."
What Episteme expects (from stemedb-core/src/types/assertion.rs):
Subject: "maxwell/wallet/atomics/ordering"
Predicate: "required_ordering"
Value: "SeqCst"
Source: Safety analysis by lead developer
Authority: Tier 3 (Expert) -- with real evidence
Evidence: "AtomicU64 balance requires sequential consistency
to prevent double-spend. See wallet ADR-003."
Parent: None (original assertion)
Epoch: Some("maxwell-v1.0")
More examples from the same scan:
Aphoria says: core/thermal/const/rapl_power_unit = 0x606
The claim is: "Intel MSR register address for reading CPU power units. Sourced from Intel SDM Vol 4. If this changes, either the code is wrong or targeting different hardware."
Aphoria says: wallet/type/wallet/derives = Debug
The claim is: "Wallet MUST NOT derive Clone because singleton ownership is a safety invariant. Wallet contains AtomicU64 -- cloning it creates divergent state."
Aphoria says: vsock/message/agentmessage/derives = Clone,Debug,Deserialize,Serialize
The claim is: "All vsock message types MUST derive Serialize+Deserialize because they cross the VM boundary via bincode. If serde appears in core imports, internal types are leaking into the wire protocol."
The difference: observations describe what is. Claims describe what must be and why. Claims have provenance, consequences, and can conflict with each other.
The Fundamental Gap (Code-Grounded)
Episteme is a knowledge graph for conflicting claims with lineage and resolution. Aphoria uses it as a document store for scan results.
The bridge.rs conversion (bridge.rs:45-92) forces observations into the Assertion schema:
| Assertion Field | What Episteme Expects | What bridge.rs Provides | Code Reference |
|---|---|---|---|
source_hash |
Hash of source document (RFC, paper) | blake3(file + line + matched_text) |
bridge.rs:107-113 |
source_class |
Tiered authority (0=Regulatory...4=Community) | Always SourceClass::Expert (Tier 3) for claims |
bridge.rs:25 |
source_metadata |
{journal, DOI, author, standard} |
{file, line, matched_text, scan_tool, scan_version} |
bridge.rs:52-58 |
parent_hash |
Links to superseded assertion | Always None |
bridge.rs:79 |
epoch |
Paradigm context (e.g., "post-quantum") | Always None |
bridge.rs:89 |
lifecycle |
Pending -> Review -> Approved | Always LifecycleStage::Approved (skips review) |
bridge.rs:85 |
evidence |
Provenance chain, ADR references | Not present in Observation at all |
types/claim.rs:7-31 |
We're using a Mercedes as a shopping cart.
Partial Mitigation Already Exists
claim_to_observation() (bridge.rs:36-42) creates Tier 4 (Community) assertions for write-back. But this is only used in the --sync path for drift detection -- the default claim_to_assertion() still uses Tier 3.
What the Workflow Should Be
Target: Commit-Time Claim Authoring
sequenceDiagram
participant Dev as Developer
participant Skill as Aphoria Skill (.claude/skills/)
participant Graph as Episteme Knowledge Graph
participant Scanner as aphoria scan (audit mode)
participant Report as Claims-Explained View
Note over Dev: Developer commits code
Dev->>Skill: Review diff
Skill->>Skill: Identify claimable changes
Note over Skill: Claimable = new constants from specs,<br/>ordering changes, boundary crossings,<br/>derive changes on serialized types<br/><br/>NOT claimable = renamed variables,<br/>whitespace, internal refactors
Skill->>Graph: Look up existing claims for context
Graph-->>Skill: Related claims (if any)
alt Diff contradicts existing claim
Skill->>Dev: "This contradicts claim X. Fix code or supersede claim?"
Dev->>Skill: Decision + evidence
Skill->>Graph: Create superseding claim (parent_hash = old claim)
else New claimable pattern
Skill->>Dev: "This looks claimable. Author a claim?"
Dev->>Skill: Provenance + invariant + consequence
Skill->>Graph: Submit authored claim with lineage
end
Note over Skill: Create extractor for audit
Skill->>Scanner: Register extractor paired with claim
Note over Scanner: Later: Audit runs
Scanner->>Graph: For each claim, verify code matches
Graph-->>Scanner: Expected values
Scanner->>Scanner: Extractor output vs claim
Scanner-->>Report: PASS / CONFLICT / DRIFT
Report->>Report: Auto-generate claims-explained.md
Audit Flow: Two Directions
Direction 1: Scan code, check against claims (what Aphoria partially does today)
sequenceDiagram
participant Scanner as aphoria audit
participant Extractors as ExtractorRegistry
participant Code as Source Files
participant Graph as Episteme (Claims)
participant Report as Audit Report
Scanner->>Code: Walk project files
Scanner->>Extractors: extract_all(file) -> Vec<Observation>
loop For each Observation
Scanner->>Graph: lookup_claim(observation.subject, observation.predicate)
alt Claim exists
alt observation.value == claim.value
Scanner->>Report: PASS (code matches claim)
else observation.value != claim.value
Scanner->>Report: CONFLICT (code contradicts claim)
Note over Report: Score by authority tier,<br/>apply lenses for resolution
end
else No claim exists
Scanner->>Report: REVIEW ("should this be a claim?")
end
end
Direction 2: Walk claims, verify in code (does not exist today)
sequenceDiagram
participant Scanner as aphoria audit --verify-claims
participant Graph as Episteme (Claims)
participant Extractors as Paired Extractors
participant Code as Source Files
participant Report as Audit Report
Scanner->>Graph: List all authored claims
Graph-->>Scanner: Vec<Claim>
loop For each Claim
Scanner->>Extractors: Find extractor paired with this claim
alt Extractor exists
Extractors->>Code: Run extractor on relevant files
Code-->>Extractors: Vec<Observation>
alt Observation matches claim
Scanner->>Report: PASS
else Observation contradicts claim
Scanner->>Report: CONFLICT
end
alt No observation found (code deleted?)
Scanner->>Report: MISSING (claimed pattern not found)
end
else No paired extractor
Scanner->>Report: UNCHECKED (no extractor for this claim)
end
end
Note over Report: Catches:<br/>- Deleted code (claim says X exists, it doesn't)<br/>- Drifted values (claim says 0x606, code says 0x607)<br/>- Unenforced policies (claim says "no tokio in core")
Extracted Claims from This Document
The following claims were extracted using the extract-claims skill pattern. Each is testable against the current codebase.
Architecture Claims (Verified)
| ID | Claim | Verification Status | Code Reference |
|---|---|---|---|
| VG-001 | Aphoria has 42 built-in extractors | VERIFIED | registry.rs:327 -- BUILTIN_EXTRACTOR_COUNT: usize = 42 |
| VG-002 | import_graph extractor uses predicate "imported" with Boolean(true) |
VERIFIED | import_graph.rs -- only produces imported: true |
| VG-003 | unsafe_atomic extractor uses predicate "pattern" |
VERIFIED | unsafe_atomic.rs -- uses generic "pattern" predicate |
| VG-004 | bridge.rs default path uses SourceClass::Expert (Tier 3) |
VERIFIED | bridge.rs:25 -- claim_to_assertion() calls with SourceClass::Expert |
| VG-005 | bridge.rs always sets parent_hash: None |
VERIFIED | bridge.rs:79 |
| VG-006 | bridge.rs always sets epoch: None |
VERIFIED | bridge.rs:89 |
| VG-007 | bridge.rs always sets lifecycle: LifecycleStage::Approved |
VERIFIED | bridge.rs:85 |
| VG-008 | source_metadata contains {file, line, matched_text, scan_tool, scan_version} only |
VERIFIED | bridge.rs:52-58 |
| VG-009 | Observation has no evidence/provenance field |
VERIFIED | types/claim.rs:7-31 -- only has location, value, confidence |
| VG-010 | claim_to_observation() uses Tier 4 (Community) |
VERIFIED | bridge.rs:36-42 |
| VG-011 | Extractor trait has no mechanism to receive claims for verification | ✅ CLOSED | traits.rs:68-107 -- verifiable_predicates() method added, 10 extractors declare predicates |
Gap Claims (What Doesn't Exist)
| ID | Claim | Gap |
|---|---|---|
| VG-020 | Observation type exists and is properly named |
✅ CLOSED — Observation renamed to Observation in Phase A1 |
| VG-021 | A real Claim type should exist with provenance, invariant, consequence, authority |
No such type exists anywhere |
| VG-022 | Extractors should be paired with claims they verify | ✅ CLOSED — verifiable_predicates() added to Extractor trait; 10 extractors declare predicates; compute_extractor_claim_map() in verify.rs; aphoria verify map shows coverage |
| VG-023 | aphoria audit command should exist |
No audit subcommand in CLI |
| VG-024 | Claims should support supersession via parent_hash |
parent_hash is always None |
| VG-025 | aphoria claims list / aphoria claims explain should exist |
No claims subcommand |
| VG-026 | Corpus should be emergent from community patterns, not hardcoded | ✅ CLOSED — Community corpus builder queries StemeDB pattern aggregates; hardcoded.rs deleted; bootstrap via wiki import or Trust Packs |
| VG-027 | Conflict resolution should use Episteme lenses | No lens invoked during scan |
| VG-028 | Direction 2 audit (walk claims, verify code) doesn't exist | ✅ CLOSED — aphoria verify run walks claims and checks code |
| VG-029 | Skill should be primary claim authoring interface | No .claude/skills/aphoria skill exists |
What Needs to Change
1. Claims are authored, not extracted
Extractors don't produce claims. Humans (assisted by the Aphoria skill) produce claims. Extractors produce observations that are checked against claims.
The type system should reflect this:
// CURRENT (types/claim.rs:7-31) - Phase A1 COMPLETE
pub struct Observation {
pub concept_path: String,
pub predicate: String,
pub value: ObjectValue,
pub file: String,
pub line: usize,
pub matched_text: String,
pub confidence: f32,
pub description: String,
}
// Already exists as Observation (was Observation before A1)
pub struct Observation {
pub concept_path: String,
pub predicate: String,
pub value: ObjectValue,
pub file: String,
pub line: usize,
pub matched_text: String,
pub confidence: f32,
pub description: String,
}
// TARGET: New Claim type (does not exist today)
pub struct AuthoredClaim {
pub concept_path: String,
pub predicate: String,
pub value: ObjectValue,
pub provenance: String, // Where did this come from? (Intel SDM, RFC, ADR)
pub invariant: String, // What must remain true?
pub consequence: String, // What breaks if violated?
pub authority_tier: SourceClass, // Tier 0-4
pub evidence_chain: Vec<String>, // References to supporting documents
pub parent_hash: Option<Hash>, // Supersedes which claim?
pub epoch: Option<String>, // Paradigm context
}
2. The skill is the primary interface, not the scanner
The .claude/skills/aphoria skill should be the main way claims enter the system. It:
- Understands the project's claim vocabulary
- Reviews diffs for claimable changes
- Looks up existing claims for context
- Helps author claims with proper lineage
- Submits them as real Episteme assertions
The scanner (aphoria scan) becomes the audit tool -- it verifies that code matches claims, not the other way around.
3. Extractors serve the audit, not the authoring
The Extractor trait (traits.rs:68-94) needs to change:
// CURRENT: Extractors produce observations from thin air
pub trait Extractor: Send + Sync {
fn name(&self) -> &str;
fn languages(&self) -> &[Language];
fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<Observation>;
}
// TARGET: Extractors can also verify observations against claims
pub trait Extractor: Send + Sync {
fn name(&self) -> &str;
fn languages(&self) -> &[Language];
fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<Observation>;
/// Claims this extractor can verify (empty = observation-only extractor)
fn verifiable_claims(&self) -> &[&str] { &[] }
/// Verify a specific claim against extracted observations
fn verify(&self, claim: &AuthoredClaim, observations: &[Observation]) -> VerifyResult {
VerifyResult::Unchecked
}
}
4. The corpus should be proper assertions
Today, RFC/OWASP knowledge is built procedurally in corpus.rs:33-157. The ConflictingSource::extract_citation() in types/claim.rs:89-111 already handles rfc:// and owasp:// URI schemes -- the infrastructure for proper corpus assertions partially exists.
Target: corpus data stored as real Episteme assertions with proper lineage, not rebuilt every scan.
5. The claims-explained.md pattern should be the product
The workflow that produces it:
flowchart TD
A[aphoria scan] -->|produces| B[Observations]
B -->|skill identifies| C{Claimable?}
C -->|yes| D[Developer authors claim<br/>with skill assistance]
C -->|no| E[Discard / log as observation]
D -->|submit| F[Episteme Knowledge Graph]
F -->|future scans| G[aphoria audit checks<br/>code against claims]
G -->|generates| H[claims-explained.md<br/>auto-generated from graph]
F -->|new observations| I{Matches existing claim?}
I -->|yes, same value| J[PASS]
I -->|no, different value| K[CONFLICT]
I -->|claim about deleted code| L[MISSING]
Proposed Extractors for Audit Flow
These extractors don't exist today. They're needed to close the gap between observations and claims.
Self-Audit Extractors (Meta)
These extractors audit Aphoria's own code to verify the claims in this document remain true:
| Extractor Name | What It Verifies | Pattern |
|---|---|---|
bridge_source_class_audit |
bridge.rs default tier assignment |
Regex for SourceClass::Expert in claim_to_assertion |
bridge_parent_hash_audit |
Whether parent_hash is always None |
Regex for parent_hash: None in bridge |
bridge_lifecycle_audit |
Whether lifecycle skips review | Regex for LifecycleStage::Approved without Pending |
extractor_trait_audit |
Whether Extractor trait accepts claims | Check trait definition for claim parameter |
type_naming_audit |
Whether Observation has been renamed |
Grep for struct Observation vs struct Observation |
Claim-Paired Extractors (Project-Specific)
These are examples of what extractor-claim pairs look like for a project like Maxwell:
| Claim | Extractor | Verification |
|---|---|---|
| "Wallet atomics MUST use SeqCst" | unsafe_atomic (exists) |
Check all Ordering:: in wallet/ are SeqCst |
| "Wallet MUST NOT derive Clone" | derive_pattern (exists) |
Check #[derive( on Wallet struct excludes Clone |
| "vsock types MUST derive Serialize+Deserialize" | derive_pattern (exists) |
Check all structs in vsock/ derive both |
| "RAPL_POWER_UNIT MUST be 0x606" | const_declarations (exists) |
Check const value matches Intel SDM |
| "Core modules MUST NOT import tokio" | import_graph (exists) |
Check no use tokio in core/ |
The existing extractors can already produce the observations needed. What's missing is the claim to compare against and the pairing mechanism to connect them.
Declarative Extractor Examples
Using the existing DeclarativeExtractor system (extractors/declarative/), claim-paired extractors can be defined in aphoria.toml:
[[extractors.declarative]]
name = "wallet_seqcst_policy"
description = "Wallet atomics must use SeqCst ordering"
languages = ["rust"]
pattern = 'Ordering::(Relaxed|AcqRel|Acquire|Release)'
claim.subject = "policy/wallet/atomics/ordering"
claim.predicate = "forbidden_ordering"
claim.value = { type = "boolean", value = true }
confidence = 0.95
source = { claim_id = "wallet-seqcst-001", authority = "safety-analysis" }
[[extractors.declarative]]
name = "core_no_tokio_policy"
description = "Core modules must not import tokio"
languages = ["rust"]
pattern = 'use tokio'
claim.subject = "policy/core/imports/tokio"
claim.predicate = "forbidden_import"
claim.value = { type = "boolean", value = true }
confidence = 0.95
source = { claim_id = "arch-boundary-001", authority = "architecture-decision" }
The Path Forward
Phase 1: Distinguish observations from claims
- Rename
ObservationtoObservationintypes/claim.rs✅ COMPLETE (Phase A1) - Create
AuthoredClaimtype with provenance, invariant, consequence, authority, evidence_chain - Update
bridge.rsdefault path to use Tier 4/5 (not Tier 3) for scanner output - Add
evidencefield tosource_metadatain bridge
Phase 2: Build the authoring workflow
- Create
.claude/skills/aphoriaskill for claim authoring - Add
aphoria claims createCLI command - Add
aphoria claims updatewithparent_hashsupersession - Add
aphoria claims listandaphoria claims explain - Store authored claims as proper Episteme assertions with lineage
Phase 3: Pair extractors with claims
- Extend
Extractortrait withverifiable_claims()andverify()methods - Add
aphoria auditcommand (both directions) - Map each existing extractor to claims it can verify
- Flag observations without matching claims as "should this be a claim?"
Phase 4: Make the corpus first-class
- Community corpus queries StemeDB for pattern aggregates (not hardcoded)
- Wiki import (
aphoria corpus import --from-wiki) for bootstrap - Trust Packs store assertions in StemeDB (not TOML files)
- Wire up Authority Lens for conflict resolution
Phase 5: The flywheel
- More claims authored per commit
- Better audit coverage (extractors verify more claims)
- Skill learns from authored claims what's claimable
- Claims-explained documentation auto-generates from knowledge graph
- New team members read claims to understand WHY, not just WHAT
Summary
We built a good code scanner. We didn't build a knowledge graph client.
The extractors work well at finding patterns. But finding patterns isn't the point -- understanding what those patterns mean, why they must be that way, and what breaks if they change is the point.
The Maxwell claims-explained.md proves the concept works. Every one of those 67 observations becomes valuable when paired with provenance and invariants. The gap is that today a human has to write that context by hand.
Close the gap by making the skill -- not the scanner -- the primary interface, and by treating claims as authored artifacts with lineage rather than regex output with a fancy name.
Appendix: Claim Extraction Summary
This document contains 94 extractable claims across 52 unique subjects:
- 11 architecture claims: Verified against current code (all confirmed true)
- 10 gap claims: Define what doesn't exist yet
- 5 bridge.rs claims: Code-verifiable, confirmed (source_hash faked, source_class hardcoded, parent_hash ignored, epoch ignored, evidence empty)
- 15 phase-plan claims: Define specific deliverables and tasks
- 20+ workflow claims: Define the target authoring/audit model
- 5 claimability rules: What counts as claimable in a diff (spec constants=yes, ordering changes=yes, boundary crossings=yes, derive changes on serialized types=yes, renamed variables=no)
- 4 Maxwell examples: Real claims about SeqCst ordering, Wallet derives, vsock serialization, RAPL_POWER_UNIT
The most critical engineering gap: no extractor currently has the ability to verify against existing claims. CLOSED (2026-02-08): The Extractor trait now includes verifiable_predicates() returning (tail_path, predicate) pairs. 10 extractors declare their predicates. compute_extractor_claim_map() matches claims against extractors (with wildcard support). aphoria verify map shows coverage. Direction 2 audit (walk claims, verify code) is now implemented via aphoria verify run.