stemedb/applications/aphoria/docs/vision-gaps.md
jml cce54358d2 feat(aphoria): add git commit tracking + comprehensive documentation
**Git Commit Tracking**
- Automatically capture git commit hash when claims/observations are ingested
- Store in assertion metadata for temporal context and audit trails
- Graceful degradation in non-git environments
- Solves double-commit problem by capturing hash at ingestion time

**Implementation**
- walker/git.rs: get_current_commit_hash() utility function
- bridge.rs: Accept optional git_commit parameter in all conversion functions
- episteme/local: Store project_root, capture git hash during ingestion
- 5 new tests for git hash tracking + metadata validation
- All 1162 aphoria tests passing

**Documentation Overhaul**
- README: Added Observations vs Claims distinction, git tracking, dashboard
- CLI Reference: New sections for git integration and ignore/exclusion system
- Comprehensive ignore documentation: .aphoriaignore, inline comments, 4 methods
- Enhanced verification engine docs with matching capabilities
- DOCUMENTATION_UPDATES.md: Complete audit summary

**Dashboard Separation**
- Moved Aphoria-specific UI from stemedb-dashboard to aphoria-dashboard
- Clean separation of concerns: StemeDB for core, Aphoria for security
- Added dashboard documentation and setup guides

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-08 18:36:46 +00:00

29 KiB

Aphoria Vision Gaps

Date: 2026-02-08 Status: Honest assessment of where we are vs. where we need to be Grounded Against: Codebase as of commit e0d2940 (42 extractors, bridge.rs, ephemeral/persistent modes)

Implementation Status

Phase A1: Distinguish Observations from Claims - COMPLETE (2026-02-08)

  • Renamed ExtractedClaimObservation (struct + 81 files updated)
  • Added confidence-based tier mapping: ≥0.9 → Tier 4, <0.9 → Tier 5
  • observation_to_assertion() replaces fixed Tier 3 assignment
  • AuthoredClaim type fully defined with provenance/invariant/consequence fields
  • Claims storage in .aphoria/claims.toml (ClaimsFile implementation)
  • CLI commands: aphoria claim create|list|explain|update|supersede|deprecate
  • All 1055 tests passing

Verification Engine Enhancements - COMPLETE (2026-02-08)

  • Added Contains and NotContains comparison modes for substring/list checking
  • verify run command verifies authored claims against observations
  • verify map shows extractor-to-claim coverage
  • Inline marker support: @aphoria:claim[category] comments in code
  • Marker workflow: list-markers, formalize-marker, reject-marker commands
  • All 47 verification tests passing (39 existing + 8 new for Contains/NotContains)
  • Maxwell dogfooding: 10/10 claims verified, false negative bug eliminated

See commit history for implementation details.


The Problem in One Sentence

Aphoria extracts observations about source code and calls them "claims," but they aren't claims -- they're grep results wearing Episteme vocabulary.


Current Architecture: What Actually Happens

Scan Flow (Ephemeral Mode)

This is the fast path (~0.25s), used for CI/pre-commit. Traced from scanner.rs:52 through to report output.

sequenceDiagram
    participant CLI as CLI (main.rs)
    participant Handler as handle_scan()
    participant Scanner as run_scan()
    participant Walker as walk_project()
    participant Registry as ExtractorRegistry
    participant Bridge as bridge.rs
    participant Corpus as corpus.rs
    participant Index as ConceptIndex
    participant Conflict as conflict.rs
    participant Report as Formatter

    CLI->>Handler: ScanArgs + AphoriaConfig
    Handler->>Scanner: run_scan(args, config)

    Note over Scanner: Phase 1: WALK
    Scanner->>Walker: walk_project(root, config)
    Walker-->>Scanner: Vec<WalkedFile>

    Note over Scanner: Phase 2: EXTRACT
    loop For each WalkedFile
        Scanner->>Registry: extract_all(segments, content, lang, file)
        Registry->>Registry: for_language(lang) -> applicable extractors
        loop For each Extractor
            Registry->>Registry: extractor.extract(segments, content, lang, file)
        end
        Registry->>Registry: filter by IgnoreCommentParser
        Registry-->>Scanner: Vec<ExtractedClaim>
    end

    Note over Scanner: Phase 3: CONFLICT DETECTION
    Scanner->>Bridge: load_or_generate_key(root)
    Bridge-->>Scanner: SigningKey

    Scanner->>Corpus: create_authoritative_corpus(key)
    Note over Corpus: Hardcoded RFC/OWASP assertions<br/>corpus.rs:33-157
    Corpus-->>Scanner: Vec<Assertion> (authority)

    Scanner->>Index: ConceptIndex::build(corpus)
    Note over Index: make_key() = last 2 path segments<br/>+ "::" + predicate
    Index-->>Scanner: ConceptIndex

    Scanner->>Conflict: check_conflicts(claims, index, config)
    loop For each ExtractedClaim
        Conflict->>Index: lookup(claim.subject, claim.predicate)
        Note over Conflict: Tail-path match:<br/>"code://rust/app/tls/cert_verification"<br/>matches "rfc://5246/tls/cert_verification"
        Conflict->>Conflict: Compare values, compute score
        Conflict->>Conflict: Determine verdict (Block/Flag/Pass)
    end
    Conflict-->>Scanner: Vec<ConflictResult>

    Note over Scanner: Phase 4: REPORT
    Scanner->>Report: format(results)
    Report-->>CLI: Table / JSON / SARIF / Markdown

Key code locations:

  • Entry: handlers/scan.rs:8-71
  • Orchestration: scanner.rs:52-117
  • Walker: walker/mod.rs:115-175
  • Extraction: registry.rs:289-304
  • Corpus build: corpus.rs:33-157
  • Index: concept_index.rs:30-110
  • Conflict: conflict.rs:64-200

Scan Flow (Persistent Mode with --persist --sync)

The full Episteme path, used for drift detection and observation write-back.

sequenceDiagram
    participant Scanner as run_scan()
    participant Episteme as LocalEpisteme
    participant WAL as Journal (WAL)
    participant Store as HybridStore
    participant Bridge as bridge.rs
    participant Index as ConceptIndex
    participant Drift as drift.rs
    participant Hosted as HostedClient

    Note over Scanner: Same walk + extract as ephemeral

    Scanner->>Episteme: LocalEpisteme::open(config, root)
    Episteme->>WAL: Journal::open(wal_dir)
    Episteme->>Store: HybridStore::open(store_dir)
    Episteme-->>Scanner: LocalEpisteme

    Note over Scanner: Ingest claims as Tier 3 assertions
    Scanner->>Episteme: ingest_claims(all_claims)
    loop For each claim
        Episteme->>Bridge: claim_to_assertion(claim, key, ts)
        Note over Bridge: SourceClass::Expert (Tier 3)<br/>lifecycle: Approved<br/>parent_hash: None<br/>epoch: None
        Bridge-->>Episteme: Assertion
        Episteme->>WAL: journal.append(serialized)
    end

    Note over Scanner: Build index from corpus + imported assertions
    Scanner->>Episteme: fetch_authoritative_assertions()
    Episteme-->>Scanner: Vec<Assertion> (from store)
    Scanner->>Index: ConceptIndex::build_with_aliases(corpus, aliases)

    Note over Scanner: Check conflicts
    Scanner->>Episteme: check_conflicts(claims, config, index)
    Episteme-->>Scanner: Vec<ConflictResult>

    Note over Scanner: Check drift against prior observations
    Scanner->>Drift: check_drift(non_conflicting_claims)
    Drift->>Store: fetch_observations_for_concept(path)
    Note over Drift: Compare current value vs prior<br/>If different -> DriftResult
    Drift-->>Scanner: Vec<DriftResult>

    Note over Scanner: Write back novel observations as Tier 4
    Scanner->>Episteme: ingest_observations(novel_claims)
    loop For each observation
        Episteme->>Bridge: claim_to_observation(claim, key, ts)
        Note over Bridge: SourceClass::Community (Tier 4)<br/>weight: 0.3
        Bridge-->>Episteme: Assertion
        Episteme->>WAL: journal.append(serialized)
        Episteme->>Store: predicate_index("observation", hash)
    end

    opt If hosted mode enabled
        Scanner->>Hosted: push_observations(assertions)
        Hosted-->>Scanner: PushObservationsResponse
    end

Key code locations:

  • Persistent path: scanner.rs:195-325
  • LocalEpisteme::open: local/mod.rs:44-124
  • Ingest claims: local/store.rs:20-96
  • Ingest observations: local/store.rs:105-165
  • Drift detection: drift.rs:23-57
  • Hosted push: hosted.rs:178+

What We Built (Grounded)

Aphoria has 42 built-in extractors (registry.rs:327 -- BUILTIN_EXTRACTOR_COUNT: usize = 42) that scan source code with regex patterns and produce ExtractedClaim structs:

// types/claim.rs:7-31
pub struct ExtractedClaim {
    pub concept_path: String,      // e.g., "code://rust/maxwell/hypervisor/lib/imports/firecracker"
    pub predicate: String,         // e.g., "imported"
    pub value: ObjectValue,        // Boolean(true)
    pub file: String,              // "hypervisor/src/lib.rs"
    pub line: usize,               // 24
    pub matched_text: String,      // "use firecracker_sdk::..."
    pub confidence: f32,           // 1.0
    pub description: String,       // "Module imports firecracker"
}

We ran this on Maxwell and got 67 "claims" with zero noise. We celebrated.

Then we looked at the output and asked: what is the claim being made here?

The answer is: there is no claim. imported: true is an index entry. No one will ever assert imported: false. There's no conflict to resolve, no lens needed, no reason to store this in an append-only Merkle DAG. It's grep "use firecracker" with extra steps.

Verified Against Code

Extractor File Predicate Used What It Actually Produces
import_graph extractors/import_graph.rs "imported" with Boolean(true) grep for use statements
derive_pattern extractors/derive_pattern.rs "derives" with Text("Clone,Debug") AST metadata extraction
const_declarations extractors/const_declarations.rs "value" with literal value copy of the source line
unsafe_atomic extractors/unsafe_atomic.rs "pattern" with Text("SeqCst") grep for Ordering::

None of these can conflict. None need lenses. None benefit from Episteme's architecture.


What a Real Claim Looks Like

After the scan, we wrote claims-explained.md by hand for Maxwell. That document contains actual claims. Compare:

What Aphoria produces (unsafe_atomic extractor, extractors/unsafe_atomic.rs):

Subject:    "code://rust/maxwell/core/wallet/atomics/ordering"
Predicate:  "pattern"
Value:      "SeqCst"

What a human wrote:

"All wallet atomic operations MUST use SeqCst to prevent double-spend race conditions. Weakening to Relaxed or Acquire/Release is a correctness bug."

What Episteme expects (from stemedb-core/src/types/assertion.rs):

Subject:    "maxwell/wallet/atomics/ordering"
Predicate:  "required_ordering"
Value:      "SeqCst"
Source:     Safety analysis by lead developer
Authority:  Tier 3 (Expert) -- with real evidence
Evidence:   "AtomicU64 balance requires sequential consistency
            to prevent double-spend. See wallet ADR-003."
Parent:     None (original assertion)
Epoch:      Some("maxwell-v1.0")

More examples from the same scan:

Aphoria says: core/thermal/const/rapl_power_unit = 0x606 The claim is: "Intel MSR register address for reading CPU power units. Sourced from Intel SDM Vol 4. If this changes, either the code is wrong or targeting different hardware."

Aphoria says: wallet/type/wallet/derives = Debug The claim is: "Wallet MUST NOT derive Clone because singleton ownership is a safety invariant. Wallet contains AtomicU64 -- cloning it creates divergent state."

Aphoria says: vsock/message/agentmessage/derives = Clone,Debug,Deserialize,Serialize The claim is: "All vsock message types MUST derive Serialize+Deserialize because they cross the VM boundary via bincode. If serde appears in core imports, internal types are leaking into the wire protocol."

The difference: observations describe what is. Claims describe what must be and why. Claims have provenance, consequences, and can conflict with each other.


The Fundamental Gap (Code-Grounded)

Episteme is a knowledge graph for conflicting claims with lineage and resolution. Aphoria uses it as a document store for scan results.

The bridge.rs conversion (bridge.rs:45-92) forces observations into the Assertion schema:

Assertion Field What Episteme Expects What bridge.rs Provides Code Reference
source_hash Hash of source document (RFC, paper) blake3(file + line + matched_text) bridge.rs:107-113
source_class Tiered authority (0=Regulatory...4=Community) Always SourceClass::Expert (Tier 3) for claims bridge.rs:25
source_metadata {journal, DOI, author, standard} {file, line, matched_text, scan_tool, scan_version} bridge.rs:52-58
parent_hash Links to superseded assertion Always None bridge.rs:79
epoch Paradigm context (e.g., "post-quantum") Always None bridge.rs:89
lifecycle Pending -> Review -> Approved Always LifecycleStage::Approved (skips review) bridge.rs:85
evidence Provenance chain, ADR references Not present in ExtractedClaim at all types/claim.rs:7-31

We're using a Mercedes as a shopping cart.

Partial Mitigation Already Exists

claim_to_observation() (bridge.rs:36-42) creates Tier 4 (Community) assertions for write-back. But this is only used in the --sync path for drift detection -- the default claim_to_assertion() still uses Tier 3.


What the Workflow Should Be

Target: Commit-Time Claim Authoring

sequenceDiagram
    participant Dev as Developer
    participant Skill as Aphoria Skill (.claude/skills/)
    participant Graph as Episteme Knowledge Graph
    participant Scanner as aphoria scan (audit mode)
    participant Report as Claims-Explained View

    Note over Dev: Developer commits code

    Dev->>Skill: Review diff
    Skill->>Skill: Identify claimable changes

    Note over Skill: Claimable = new constants from specs,<br/>ordering changes, boundary crossings,<br/>derive changes on serialized types<br/><br/>NOT claimable = renamed variables,<br/>whitespace, internal refactors

    Skill->>Graph: Look up existing claims for context
    Graph-->>Skill: Related claims (if any)

    alt Diff contradicts existing claim
        Skill->>Dev: "This contradicts claim X. Fix code or supersede claim?"
        Dev->>Skill: Decision + evidence
        Skill->>Graph: Create superseding claim (parent_hash = old claim)
    else New claimable pattern
        Skill->>Dev: "This looks claimable. Author a claim?"
        Dev->>Skill: Provenance + invariant + consequence
        Skill->>Graph: Submit authored claim with lineage
    end

    Note over Skill: Create extractor for audit
    Skill->>Scanner: Register extractor paired with claim

    Note over Scanner: Later: Audit runs
    Scanner->>Graph: For each claim, verify code matches
    Graph-->>Scanner: Expected values
    Scanner->>Scanner: Extractor output vs claim
    Scanner-->>Report: PASS / CONFLICT / DRIFT

    Report->>Report: Auto-generate claims-explained.md

Audit Flow: Two Directions

Direction 1: Scan code, check against claims (what Aphoria partially does today)

sequenceDiagram
    participant Scanner as aphoria audit
    participant Extractors as ExtractorRegistry
    participant Code as Source Files
    participant Graph as Episteme (Claims)
    participant Report as Audit Report

    Scanner->>Code: Walk project files
    Scanner->>Extractors: extract_all(file) -> Vec<Observation>

    loop For each Observation
        Scanner->>Graph: lookup_claim(observation.subject, observation.predicate)
        alt Claim exists
            alt observation.value == claim.value
                Scanner->>Report: PASS (code matches claim)
            else observation.value != claim.value
                Scanner->>Report: CONFLICT (code contradicts claim)
                Note over Report: Score by authority tier,<br/>apply lenses for resolution
            end
        else No claim exists
            Scanner->>Report: REVIEW ("should this be a claim?")
        end
    end

Direction 2: Walk claims, verify in code (does not exist today)

sequenceDiagram
    participant Scanner as aphoria audit --verify-claims
    participant Graph as Episteme (Claims)
    participant Extractors as Paired Extractors
    participant Code as Source Files
    participant Report as Audit Report

    Scanner->>Graph: List all authored claims
    Graph-->>Scanner: Vec<Claim>

    loop For each Claim
        Scanner->>Extractors: Find extractor paired with this claim
        alt Extractor exists
            Extractors->>Code: Run extractor on relevant files
            Code-->>Extractors: Vec<Observation>
            alt Observation matches claim
                Scanner->>Report: PASS
            else Observation contradicts claim
                Scanner->>Report: CONFLICT
            end
            alt No observation found (code deleted?)
                Scanner->>Report: MISSING (claimed pattern not found)
            end
        else No paired extractor
            Scanner->>Report: UNCHECKED (no extractor for this claim)
        end
    end

    Note over Report: Catches:<br/>- Deleted code (claim says X exists, it doesn't)<br/>- Drifted values (claim says 0x606, code says 0x607)<br/>- Unenforced policies (claim says "no tokio in core")

Extracted Claims from This Document

The following claims were extracted using the extract-claims skill pattern. Each is testable against the current codebase.

Architecture Claims (Verified)

ID Claim Verification Status Code Reference
VG-001 Aphoria has 42 built-in extractors VERIFIED registry.rs:327 -- BUILTIN_EXTRACTOR_COUNT: usize = 42
VG-002 import_graph extractor uses predicate "imported" with Boolean(true) VERIFIED import_graph.rs -- only produces imported: true
VG-003 unsafe_atomic extractor uses predicate "pattern" VERIFIED unsafe_atomic.rs -- uses generic "pattern" predicate
VG-004 bridge.rs default path uses SourceClass::Expert (Tier 3) VERIFIED bridge.rs:25 -- claim_to_assertion() calls with SourceClass::Expert
VG-005 bridge.rs always sets parent_hash: None VERIFIED bridge.rs:79
VG-006 bridge.rs always sets epoch: None VERIFIED bridge.rs:89
VG-007 bridge.rs always sets lifecycle: LifecycleStage::Approved VERIFIED bridge.rs:85
VG-008 source_metadata contains {file, line, matched_text, scan_tool, scan_version} only VERIFIED bridge.rs:52-58
VG-009 ExtractedClaim has no evidence/provenance field VERIFIED types/claim.rs:7-31 -- only has location, value, confidence
VG-010 claim_to_observation() uses Tier 4 (Community) VERIFIED bridge.rs:36-42
VG-011 Extractor trait has no mechanism to receive claims for verification CLOSED traits.rs:68-107 -- verifiable_predicates() method added, 10 extractors declare predicates

Gap Claims (What Doesn't Exist)

ID Claim Gap
VG-020 ExtractedClaim should be renamed to Observation types/claim.rs still uses ExtractedClaim
VG-021 A real Claim type should exist with provenance, invariant, consequence, authority No such type exists anywhere
VG-022 Extractors should be paired with claims they verify CLOSEDverifiable_predicates() added to Extractor trait; 10 extractors declare predicates; compute_extractor_claim_map() in verify.rs; aphoria verify map shows coverage
VG-023 aphoria audit command should exist No audit subcommand in CLI
VG-024 Claims should support supersession via parent_hash parent_hash is always None
VG-025 aphoria claims list / aphoria claims explain should exist No claims subcommand
VG-026 Corpus should be real assertions, not hardcoded in corpus.rs:33-157 Corpus is built procedurally per scan
VG-027 Conflict resolution should use Episteme lenses No lens invoked during scan
VG-028 Direction 2 audit (walk claims, verify code) doesn't exist No inverse audit flow
VG-029 Skill should be primary claim authoring interface No .claude/skills/aphoria skill exists

What Needs to Change

1. Claims are authored, not extracted

Extractors don't produce claims. Humans (assisted by the Aphoria skill) produce claims. Extractors produce observations that are checked against claims.

The type system should reflect this:

// CURRENT (types/claim.rs:7-31)
pub struct ExtractedClaim {  // This is an observation, not a claim
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub file: String,
    pub line: usize,
    pub matched_text: String,
    pub confidence: f32,
    pub description: String,
}

// TARGET: New Observation type (rename ExtractedClaim)
pub struct Observation {
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub file: String,
    pub line: usize,
    pub matched_text: String,
    pub confidence: f32,
    pub description: String,
}

// TARGET: New Claim type (does not exist today)
pub struct AuthoredClaim {
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub provenance: String,          // Where did this come from? (Intel SDM, RFC, ADR)
    pub invariant: String,           // What must remain true?
    pub consequence: String,         // What breaks if violated?
    pub authority_tier: SourceClass,  // Tier 0-4
    pub evidence_chain: Vec<String>, // References to supporting documents
    pub parent_hash: Option<Hash>,   // Supersedes which claim?
    pub epoch: Option<String>,       // Paradigm context
}

2. The skill is the primary interface, not the scanner

The .claude/skills/aphoria skill should be the main way claims enter the system. It:

  • Understands the project's claim vocabulary
  • Reviews diffs for claimable changes
  • Looks up existing claims for context
  • Helps author claims with proper lineage
  • Submits them as real Episteme assertions

The scanner (aphoria scan) becomes the audit tool -- it verifies that code matches claims, not the other way around.

3. Extractors serve the audit, not the authoring

The Extractor trait (traits.rs:68-94) needs to change:

// CURRENT: Extractors produce observations from thin air
pub trait Extractor: Send + Sync {
    fn name(&self) -> &str;
    fn languages(&self) -> &[Language];
    fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<ExtractedClaim>;
}

// TARGET: Extractors can also verify observations against claims
pub trait Extractor: Send + Sync {
    fn name(&self) -> &str;
    fn languages(&self) -> &[Language];
    fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<Observation>;

    /// Claims this extractor can verify (empty = observation-only extractor)
    fn verifiable_claims(&self) -> &[&str] { &[] }

    /// Verify a specific claim against extracted observations
    fn verify(&self, claim: &AuthoredClaim, observations: &[Observation]) -> VerifyResult {
        VerifyResult::Unchecked
    }
}

4. The corpus should be proper assertions

Today, RFC/OWASP knowledge is built procedurally in corpus.rs:33-157. The ConflictingSource::extract_citation() in types/claim.rs:89-111 already handles rfc:// and owasp:// URI schemes -- the infrastructure for proper corpus assertions partially exists.

Target: corpus data stored as real Episteme assertions with proper lineage, not rebuilt every scan.

5. The claims-explained.md pattern should be the product

The workflow that produces it:

flowchart TD
    A[aphoria scan] -->|produces| B[Observations]
    B -->|skill identifies| C{Claimable?}
    C -->|yes| D[Developer authors claim<br/>with skill assistance]
    C -->|no| E[Discard / log as observation]
    D -->|submit| F[Episteme Knowledge Graph]
    F -->|future scans| G[aphoria audit checks<br/>code against claims]
    G -->|generates| H[claims-explained.md<br/>auto-generated from graph]
    F -->|new observations| I{Matches existing claim?}
    I -->|yes, same value| J[PASS]
    I -->|no, different value| K[CONFLICT]
    I -->|claim about deleted code| L[MISSING]

Proposed Extractors for Audit Flow

These extractors don't exist today. They're needed to close the gap between observations and claims.

Self-Audit Extractors (Meta)

These extractors audit Aphoria's own code to verify the claims in this document remain true:

Extractor Name What It Verifies Pattern
bridge_source_class_audit bridge.rs default tier assignment Regex for SourceClass::Expert in claim_to_assertion
bridge_parent_hash_audit Whether parent_hash is always None Regex for parent_hash: None in bridge
bridge_lifecycle_audit Whether lifecycle skips review Regex for LifecycleStage::Approved without Pending
extractor_trait_audit Whether Extractor trait accepts claims Check trait definition for claim parameter
type_naming_audit Whether ExtractedClaim has been renamed Grep for struct ExtractedClaim vs struct Observation

Claim-Paired Extractors (Project-Specific)

These are examples of what extractor-claim pairs look like for a project like Maxwell:

Claim Extractor Verification
"Wallet atomics MUST use SeqCst" unsafe_atomic (exists) Check all Ordering:: in wallet/ are SeqCst
"Wallet MUST NOT derive Clone" derive_pattern (exists) Check #[derive( on Wallet struct excludes Clone
"vsock types MUST derive Serialize+Deserialize" derive_pattern (exists) Check all structs in vsock/ derive both
"RAPL_POWER_UNIT MUST be 0x606" const_declarations (exists) Check const value matches Intel SDM
"Core modules MUST NOT import tokio" import_graph (exists) Check no use tokio in core/

The existing extractors can already produce the observations needed. What's missing is the claim to compare against and the pairing mechanism to connect them.

Declarative Extractor Examples

Using the existing DeclarativeExtractor system (extractors/declarative/), claim-paired extractors can be defined in aphoria.toml:

[[extractors.declarative]]
name = "wallet_seqcst_policy"
description = "Wallet atomics must use SeqCst ordering"
languages = ["rust"]
pattern = 'Ordering::(Relaxed|AcqRel|Acquire|Release)'
claim.subject = "policy/wallet/atomics/ordering"
claim.predicate = "forbidden_ordering"
claim.value = { type = "boolean", value = true }
confidence = 0.95
source = { claim_id = "wallet-seqcst-001", authority = "safety-analysis" }

[[extractors.declarative]]
name = "core_no_tokio_policy"
description = "Core modules must not import tokio"
languages = ["rust"]
pattern = 'use tokio'
claim.subject = "policy/core/imports/tokio"
claim.predicate = "forbidden_import"
claim.value = { type = "boolean", value = true }
confidence = 0.95
source = { claim_id = "arch-boundary-001", authority = "architecture-decision" }

The Path Forward

Phase 1: Distinguish observations from claims

  • Rename ExtractedClaim to Observation in types/claim.rs
  • Create AuthoredClaim type with provenance, invariant, consequence, authority, evidence_chain
  • Update bridge.rs default path to use Tier 4/5 (not Tier 3) for scanner output
  • Add evidence field to source_metadata in bridge

Phase 2: Build the authoring workflow

  • Create .claude/skills/aphoria skill for claim authoring
  • Add aphoria claims create CLI command
  • Add aphoria claims update with parent_hash supersession
  • Add aphoria claims list and aphoria claims explain
  • Store authored claims as proper Episteme assertions with lineage

Phase 3: Pair extractors with claims

  • Extend Extractor trait with verifiable_claims() and verify() methods
  • Add aphoria audit command (both directions)
  • Map each existing extractor to claims it can verify
  • Flag observations without matching claims as "should this be a claim?"

Phase 4: Make the corpus first-class

  • Convert corpus.rs hardcoded assertions to stored Episteme assertions
  • Wire up Authority Lens for conflict resolution
  • Ensure Trust Packs contain authored claims, not just patterns

Phase 5: The flywheel

  • More claims authored per commit
  • Better audit coverage (extractors verify more claims)
  • Skill learns from authored claims what's claimable
  • Claims-explained documentation auto-generates from knowledge graph
  • New team members read claims to understand WHY, not just WHAT

Summary

We built a good code scanner. We didn't build a knowledge graph client.

The extractors work well at finding patterns. But finding patterns isn't the point -- understanding what those patterns mean, why they must be that way, and what breaks if they change is the point.

The Maxwell claims-explained.md proves the concept works. Every one of those 67 observations becomes valuable when paired with provenance and invariants. The gap is that today a human has to write that context by hand.

Close the gap by making the skill -- not the scanner -- the primary interface, and by treating claims as authored artifacts with lineage rather than regex output with a fancy name.


Appendix: Claim Extraction Summary

This document contains 94 extractable claims across 52 unique subjects:

  • 11 architecture claims: Verified against current code (all confirmed true)
  • 10 gap claims: Define what doesn't exist yet
  • 5 bridge.rs claims: Code-verifiable, confirmed (source_hash faked, source_class hardcoded, parent_hash ignored, epoch ignored, evidence empty)
  • 15 phase-plan claims: Define specific deliverables and tasks
  • 20+ workflow claims: Define the target authoring/audit model
  • 5 claimability rules: What counts as claimable in a diff (spec constants=yes, ordering changes=yes, boundary crossings=yes, derive changes on serialized types=yes, renamed variables=no)
  • 4 Maxwell examples: Real claims about SeqCst ordering, Wallet derives, vsock serialization, RAPL_POWER_UNIT

The most critical engineering gap: no extractor currently has the ability to verify against existing claims. CLOSED (2026-02-08): The Extractor trait now includes verifiable_predicates() returning (tail_path, predicate) pairs. 10 extractors declare their predicates. compute_extractor_claim_map() matches claims against extractors (with wildcard support). aphoria verify map shows coverage. Direction 2 audit (walk claims, verify code) is now implemented via aphoria verify run.