jml cce54358d2 feat(aphoria): add git commit tracking + comprehensive documentation

**Git Commit Tracking**
- Automatically capture git commit hash when claims/observations are ingested
- Store in assertion metadata for temporal context and audit trails
- Graceful degradation in non-git environments
- Solves double-commit problem by capturing hash at ingestion time

**Implementation**
- walker/git.rs: get_current_commit_hash() utility function
- bridge.rs: Accept optional git_commit parameter in all conversion functions
- episteme/local: Store project_root, capture git hash during ingestion
- 5 new tests for git hash tracking + metadata validation
- All 1162 aphoria tests passing

**Documentation Overhaul**
- README: Added Observations vs Claims distinction, git tracking, dashboard
- CLI Reference: New sections for git integration and ignore/exclusion system
- Comprehensive ignore documentation: .aphoriaignore, inline comments, 4 methods
- Enhanced verification engine docs with matching capabilities
- DOCUMENTATION_UPDATES.md: Complete audit summary

**Dashboard Separation**
- Moved Aphoria-specific UI from stemedb-dashboard to aphoria-dashboard
- Clean separation of concerns: StemeDB for core, Aphoria for security
- Added dashboard documentation and setup guides

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-08 18:36:46 +00:00

29 KiB

Raw Blame History

Aphoria Vision Gaps

Date: 2026-02-08 Status: Honest assessment of where we are vs. where we need to be Grounded Against: Codebase as of commit e0d2940 (42 extractors, bridge.rs, ephemeral/persistent modes)

Implementation Status

Phase A1: Distinguish Observations from Claims - ✅ COMPLETE (2026-02-08)

Renamed ExtractedClaim → Observation (struct + 81 files updated)
Added confidence-based tier mapping: ≥0.9 → Tier 4, <0.9 → Tier 5
observation_to_assertion() replaces fixed Tier 3 assignment
AuthoredClaim type fully defined with provenance/invariant/consequence fields
Claims storage in .aphoria/claims.toml (ClaimsFile implementation)
CLI commands: aphoria claim create|list|explain|update|supersede|deprecate
All 1055 tests passing

Verification Engine Enhancements - ✅ COMPLETE (2026-02-08)

Added Contains and NotContains comparison modes for substring/list checking
verify run command verifies authored claims against observations
verify map shows extractor-to-claim coverage
Inline marker support: @aphoria:claim[category] comments in code
Marker workflow: list-markers, formalize-marker, reject-marker commands
All 47 verification tests passing (39 existing + 8 new for Contains/NotContains)
Maxwell dogfooding: 10/10 claims verified, false negative bug eliminated

See commit history for implementation details.

The Problem in One Sentence

Aphoria extracts observations about source code and calls them "claims," but they aren't claims -- they're grep results wearing Episteme vocabulary.

Current Architecture: What Actually Happens

Scan Flow (Ephemeral Mode)

This is the fast path (~0.25s), used for CI/pre-commit. Traced from scanner.rs:52 through to report output.

sequenceDiagram
    participant CLI as CLI (main.rs)
    participant Handler as handle_scan()
    participant Scanner as run_scan()
    participant Walker as walk_project()
    participant Registry as ExtractorRegistry
    participant Bridge as bridge.rs
    participant Corpus as corpus.rs
    participant Index as ConceptIndex
    participant Conflict as conflict.rs
    participant Report as Formatter

    CLI->>Handler: ScanArgs + AphoriaConfig
    Handler->>Scanner: run_scan(args, config)

    Note over Scanner: Phase 1: WALK
    Scanner->>Walker: walk_project(root, config)
    Walker-->>Scanner: Vec<WalkedFile>

    Note over Scanner: Phase 2: EXTRACT
    loop For each WalkedFile
        Scanner->>Registry: extract_all(segments, content, lang, file)
        Registry->>Registry: for_language(lang) -> applicable extractors
        loop For each Extractor
            Registry->>Registry: extractor.extract(segments, content, lang, file)
        end
        Registry->>Registry: filter by IgnoreCommentParser
        Registry-->>Scanner: Vec<ExtractedClaim>
    end

    Note over Scanner: Phase 3: CONFLICT DETECTION
    Scanner->>Bridge: load_or_generate_key(root)
    Bridge-->>Scanner: SigningKey

    Scanner->>Corpus: create_authoritative_corpus(key)
    Note over Corpus: Hardcoded RFC/OWASP assertions<br/>corpus.rs:33-157
    Corpus-->>Scanner: Vec<Assertion> (authority)

    Scanner->>Index: ConceptIndex::build(corpus)
    Note over Index: make_key() = last 2 path segments<br/>+ "::" + predicate
    Index-->>Scanner: ConceptIndex

    Scanner->>Conflict: check_conflicts(claims, index, config)
    loop For each ExtractedClaim
        Conflict->>Index: lookup(claim.subject, claim.predicate)
        Note over Conflict: Tail-path match:<br/>"code://rust/app/tls/cert_verification"<br/>matches "rfc://5246/tls/cert_verification"
        Conflict->>Conflict: Compare values, compute score
        Conflict->>Conflict: Determine verdict (Block/Flag/Pass)
    end
    Conflict-->>Scanner: Vec<ConflictResult>

    Note over Scanner: Phase 4: REPORT
    Scanner->>Report: format(results)
    Report-->>CLI: Table / JSON / SARIF / Markdown

Key code locations:

Entry: handlers/scan.rs:8-71
Orchestration: scanner.rs:52-117
Walker: walker/mod.rs:115-175
Extraction: registry.rs:289-304
Corpus build: corpus.rs:33-157
Index: concept_index.rs:30-110
Conflict: conflict.rs:64-200

Scan Flow (Persistent Mode with --persist --sync)

The full Episteme path, used for drift detection and observation write-back.

sequenceDiagram
    participant Scanner as run_scan()
    participant Episteme as LocalEpisteme
    participant WAL as Journal (WAL)
    participant Store as HybridStore
    participant Bridge as bridge.rs
    participant Index as ConceptIndex
    participant Drift as drift.rs
    participant Hosted as HostedClient

    Note over Scanner: Same walk + extract as ephemeral

    Scanner->>Episteme: LocalEpisteme::open(config, root)
    Episteme->>WAL: Journal::open(wal_dir)
    Episteme->>Store: HybridStore::open(store_dir)
    Episteme-->>Scanner: LocalEpisteme

    Note over Scanner: Ingest claims as Tier 3 assertions
    Scanner->>Episteme: ingest_claims(all_claims)
    loop For each claim
        Episteme->>Bridge: claim_to_assertion(claim, key, ts)
        Note over Bridge: SourceClass::Expert (Tier 3)<br/>lifecycle: Approved<br/>parent_hash: None<br/>epoch: None
        Bridge-->>Episteme: Assertion
        Episteme->>WAL: journal.append(serialized)
    end

    Note over Scanner: Build index from corpus + imported assertions
    Scanner->>Episteme: fetch_authoritative_assertions()
    Episteme-->>Scanner: Vec<Assertion> (from store)
    Scanner->>Index: ConceptIndex::build_with_aliases(corpus, aliases)

    Note over Scanner: Check conflicts
    Scanner->>Episteme: check_conflicts(claims, config, index)
    Episteme-->>Scanner: Vec<ConflictResult>

    Note over Scanner: Check drift against prior observations
    Scanner->>Drift: check_drift(non_conflicting_claims)
    Drift->>Store: fetch_observations_for_concept(path)
    Note over Drift: Compare current value vs prior<br/>If different -> DriftResult
    Drift-->>Scanner: Vec<DriftResult>

    Note over Scanner: Write back novel observations as Tier 4
    Scanner->>Episteme: ingest_observations(novel_claims)
    loop For each observation
        Episteme->>Bridge: claim_to_observation(claim, key, ts)
        Note over Bridge: SourceClass::Community (Tier 4)<br/>weight: 0.3
        Bridge-->>Episteme: Assertion
        Episteme->>WAL: journal.append(serialized)
        Episteme->>Store: predicate_index("observation", hash)
    end

    opt If hosted mode enabled
        Scanner->>Hosted: push_observations(assertions)
        Hosted-->>Scanner: PushObservationsResponse
    end

Key code locations:

Persistent path: scanner.rs:195-325
LocalEpisteme::open: local/mod.rs:44-124
Ingest claims: local/store.rs:20-96
Ingest observations: local/store.rs:105-165
Drift detection: drift.rs:23-57
Hosted push: hosted.rs:178+

What We Built (Grounded)

Aphoria has 42 built-in extractors (registry.rs:327 -- BUILTIN_EXTRACTOR_COUNT: usize = 42) that scan source code with regex patterns and produce ExtractedClaim structs:

// types/claim.rs:7-31
pub struct ExtractedClaim {
    pub concept_path: String,      // e.g., "code://rust/maxwell/hypervisor/lib/imports/firecracker"
    pub predicate: String,         // e.g., "imported"
    pub value: ObjectValue,        // Boolean(true)
    pub file: String,              // "hypervisor/src/lib.rs"
    pub line: usize,               // 24
    pub matched_text: String,      // "use firecracker_sdk::..."
    pub confidence: f32,           // 1.0
    pub description: String,       // "Module imports firecracker"
}

We ran this on Maxwell and got 67 "claims" with zero noise. We celebrated.

Then we looked at the output and asked: what is the claim being made here?

The answer is: there is no claim. imported: true is an index entry. No one will ever assert imported: false. There's no conflict to resolve, no lens needed, no reason to store this in an append-only Merkle DAG. It's grep "use firecracker" with extra steps.

Verified Against Code

Extractor	File	Predicate Used	What It Actually Produces
`import_graph`	`extractors/import_graph.rs`	`"imported"` with `Boolean(true)`	grep for `use` statements
`derive_pattern`	`extractors/derive_pattern.rs`	`"derives"` with `Text("Clone,Debug")`	AST metadata extraction
`const_declarations`	`extractors/const_declarations.rs`	`"value"` with literal value	copy of the source line
`unsafe_atomic`	`extractors/unsafe_atomic.rs`	`"pattern"` with `Text("SeqCst")`	grep for `Ordering::`

None of these can conflict. None need lenses. None benefit from Episteme's architecture.

What a Real Claim Looks Like

After the scan, we wrote claims-explained.md by hand for Maxwell. That document contains actual claims. Compare:

What Aphoria produces (unsafe_atomic extractor, extractors/unsafe_atomic.rs):

Subject:    "code://rust/maxwell/core/wallet/atomics/ordering"
Predicate:  "pattern"
Value:      "SeqCst"

What a human wrote:

"All wallet atomic operations MUST use SeqCst to prevent double-spend race conditions. Weakening to Relaxed or Acquire/Release is a correctness bug."

What Episteme expects (from stemedb-core/src/types/assertion.rs):

Subject:    "maxwell/wallet/atomics/ordering"
Predicate:  "required_ordering"
Value:      "SeqCst"
Source:     Safety analysis by lead developer
Authority:  Tier 3 (Expert) -- with real evidence
Evidence:   "AtomicU64 balance requires sequential consistency
            to prevent double-spend. See wallet ADR-003."
Parent:     None (original assertion)
Epoch:      Some("maxwell-v1.0")

More examples from the same scan:

Aphoria says: core/thermal/const/rapl_power_unit = 0x606 The claim is: "Intel MSR register address for reading CPU power units. Sourced from Intel SDM Vol 4. If this changes, either the code is wrong or targeting different hardware."

Aphoria says: wallet/type/wallet/derives = Debug The claim is: "Wallet MUST NOT derive Clone because singleton ownership is a safety invariant. Wallet contains AtomicU64 -- cloning it creates divergent state."

Aphoria says: vsock/message/agentmessage/derives = Clone,Debug,Deserialize,Serialize The claim is: "All vsock message types MUST derive Serialize+Deserialize because they cross the VM boundary via bincode. If serde appears in core imports, internal types are leaking into the wire protocol."

The difference: observations describe what is. Claims describe what must be and why. Claims have provenance, consequences, and can conflict with each other.

The Fundamental Gap (Code-Grounded)

Episteme is a knowledge graph for conflicting claims with lineage and resolution. Aphoria uses it as a document store for scan results.

The bridge.rs conversion (bridge.rs:45-92) forces observations into the Assertion schema:

Assertion Field	What Episteme Expects	What bridge.rs Provides	Code Reference
`source_hash`	Hash of source document (RFC, paper)	`blake3(file + line + matched_text)`	`bridge.rs:107-113`
`source_class`	Tiered authority (0=Regulatory...4=Community)	Always `SourceClass::Expert` (Tier 3) for claims	`bridge.rs:25`
`source_metadata`	`{journal, DOI, author, standard}`	`{file, line, matched_text, scan_tool, scan_version}`	`bridge.rs:52-58`
`parent_hash`	Links to superseded assertion	Always `None`	`bridge.rs:79`
`epoch`	Paradigm context (e.g., "post-quantum")	Always `None`	`bridge.rs:89`
`lifecycle`	Pending -> Review -> Approved	Always `LifecycleStage::Approved` (skips review)	`bridge.rs:85`
`evidence`	Provenance chain, ADR references	Not present in `ExtractedClaim` at all	`types/claim.rs:7-31`

We're using a Mercedes as a shopping cart.

Partial Mitigation Already Exists

claim_to_observation() (bridge.rs:36-42) creates Tier 4 (Community) assertions for write-back. But this is only used in the --sync path for drift detection -- the default claim_to_assertion() still uses Tier 3.

What the Workflow Should Be

Target: Commit-Time Claim Authoring

sequenceDiagram
    participant Dev as Developer
    participant Skill as Aphoria Skill (.claude/skills/)
    participant Graph as Episteme Knowledge Graph
    participant Scanner as aphoria scan (audit mode)
    participant Report as Claims-Explained View

    Note over Dev: Developer commits code

    Dev->>Skill: Review diff
    Skill->>Skill: Identify claimable changes

    Note over Skill: Claimable = new constants from specs,<br/>ordering changes, boundary crossings,<br/>derive changes on serialized types<br/><br/>NOT claimable = renamed variables,<br/>whitespace, internal refactors

    Skill->>Graph: Look up existing claims for context
    Graph-->>Skill: Related claims (if any)

    alt Diff contradicts existing claim
        Skill->>Dev: "This contradicts claim X. Fix code or supersede claim?"
        Dev->>Skill: Decision + evidence
        Skill->>Graph: Create superseding claim (parent_hash = old claim)
    else New claimable pattern
        Skill->>Dev: "This looks claimable. Author a claim?"
        Dev->>Skill: Provenance + invariant + consequence
        Skill->>Graph: Submit authored claim with lineage
    end

    Note over Skill: Create extractor for audit
    Skill->>Scanner: Register extractor paired with claim

    Note over Scanner: Later: Audit runs
    Scanner->>Graph: For each claim, verify code matches
    Graph-->>Scanner: Expected values
    Scanner->>Scanner: Extractor output vs claim
    Scanner-->>Report: PASS / CONFLICT / DRIFT

    Report->>Report: Auto-generate claims-explained.md

Audit Flow: Two Directions

Direction 1: Scan code, check against claims (what Aphoria partially does today)

sequenceDiagram
    participant Scanner as aphoria audit
    participant Extractors as ExtractorRegistry
    participant Code as Source Files
    participant Graph as Episteme (Claims)
    participant Report as Audit Report

    Scanner->>Code: Walk project files
    Scanner->>Extractors: extract_all(file) -> Vec<Observation>

    loop For each Observation
        Scanner->>Graph: lookup_claim(observation.subject, observation.predicate)
        alt Claim exists
            alt observation.value == claim.value
                Scanner->>Report: PASS (code matches claim)
            else observation.value != claim.value
                Scanner->>Report: CONFLICT (code contradicts claim)
                Note over Report: Score by authority tier,<br/>apply lenses for resolution
            end
        else No claim exists
            Scanner->>Report: REVIEW ("should this be a claim?")
        end
    end

Direction 2: Walk claims, verify in code (does not exist today)

sequenceDiagram
    participant Scanner as aphoria audit --verify-claims
    participant Graph as Episteme (Claims)
    participant Extractors as Paired Extractors
    participant Code as Source Files
    participant Report as Audit Report

    Scanner->>Graph: List all authored claims
    Graph-->>Scanner: Vec<Claim>

    loop For each Claim
        Scanner->>Extractors: Find extractor paired with this claim
        alt Extractor exists
            Extractors->>Code: Run extractor on relevant files
            Code-->>Extractors: Vec<Observation>
            alt Observation matches claim
                Scanner->>Report: PASS
            else Observation contradicts claim
                Scanner->>Report: CONFLICT
            end
            alt No observation found (code deleted?)
                Scanner->>Report: MISSING (claimed pattern not found)
            end
        else No paired extractor
            Scanner->>Report: UNCHECKED (no extractor for this claim)
        end
    end

    Note over Report: Catches:<br/>- Deleted code (claim says X exists, it doesn't)<br/>- Drifted values (claim says 0x606, code says 0x607)<br/>- Unenforced policies (claim says "no tokio in core")

Extracted Claims from This Document

The following claims were extracted using the extract-claims skill pattern. Each is testable against the current codebase.

Architecture Claims (Verified)

ID	Claim	Verification Status	Code Reference
VG-001	Aphoria has 42 built-in extractors	VERIFIED	`registry.rs:327` -- `BUILTIN_EXTRACTOR_COUNT: usize = 42`
VG-002	`import_graph` extractor uses predicate `"imported"` with `Boolean(true)`	VERIFIED	`import_graph.rs` -- only produces `imported: true`
VG-003	`unsafe_atomic` extractor uses predicate `"pattern"`	VERIFIED	`unsafe_atomic.rs` -- uses generic `"pattern"` predicate
VG-004	`bridge.rs` default path uses `SourceClass::Expert` (Tier 3)	VERIFIED	`bridge.rs:25` -- `claim_to_assertion()` calls with `SourceClass::Expert`
VG-005	`bridge.rs` always sets `parent_hash: None`	VERIFIED	`bridge.rs:79`
VG-006	`bridge.rs` always sets `epoch: None`	VERIFIED	`bridge.rs:89`
VG-007	`bridge.rs` always sets `lifecycle: LifecycleStage::Approved`	VERIFIED	`bridge.rs:85`
VG-008	`source_metadata` contains `{file, line, matched_text, scan_tool, scan_version}` only	VERIFIED	`bridge.rs:52-58`
VG-009	`ExtractedClaim` has no evidence/provenance field	VERIFIED	`types/claim.rs:7-31` -- only has location, value, confidence
VG-010	`claim_to_observation()` uses Tier 4 (Community)	VERIFIED	`bridge.rs:36-42`
VG-011	Extractor trait has no mechanism to receive claims for verification	✅ CLOSED	`traits.rs:68-107` -- `verifiable_predicates()` method added, 10 extractors declare predicates

Gap Claims (What Doesn't Exist)

ID	Claim	Gap
VG-020	`ExtractedClaim` should be renamed to `Observation`	`types/claim.rs` still uses `ExtractedClaim`
VG-021	A real `Claim` type should exist with provenance, invariant, consequence, authority	No such type exists anywhere
VG-022	Extractors should be paired with claims they verify	✅ CLOSED — `verifiable_predicates()` added to `Extractor` trait; 10 extractors declare predicates; `compute_extractor_claim_map()` in verify.rs; `aphoria verify map` shows coverage
VG-023	`aphoria audit` command should exist	No audit subcommand in CLI
VG-024	Claims should support supersession via `parent_hash`	`parent_hash` is always `None`
VG-025	`aphoria claims list` / `aphoria claims explain` should exist	No claims subcommand
VG-026	Corpus should be real assertions, not hardcoded in `corpus.rs:33-157`	Corpus is built procedurally per scan
VG-027	Conflict resolution should use Episteme lenses	No lens invoked during scan
VG-028	Direction 2 audit (walk claims, verify code) doesn't exist	No inverse audit flow
VG-029	Skill should be primary claim authoring interface	No `.claude/skills/aphoria` skill exists

What Needs to Change

1. Claims are authored, not extracted

Extractors don't produce claims. Humans (assisted by the Aphoria skill) produce claims. Extractors produce observations that are checked against claims.

The type system should reflect this:

// CURRENT (types/claim.rs:7-31)
pub struct ExtractedClaim {  // This is an observation, not a claim
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub file: String,
    pub line: usize,
    pub matched_text: String,
    pub confidence: f32,
    pub description: String,
}

// TARGET: New Observation type (rename ExtractedClaim)
pub struct Observation {
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub file: String,
    pub line: usize,
    pub matched_text: String,
    pub confidence: f32,
    pub description: String,
}

// TARGET: New Claim type (does not exist today)
pub struct AuthoredClaim {
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub provenance: String,          // Where did this come from? (Intel SDM, RFC, ADR)
    pub invariant: String,           // What must remain true?
    pub consequence: String,         // What breaks if violated?
    pub authority_tier: SourceClass,  // Tier 0-4
    pub evidence_chain: Vec<String>, // References to supporting documents
    pub parent_hash: Option<Hash>,   // Supersedes which claim?
    pub epoch: Option<String>,       // Paradigm context
}

2. The skill is the primary interface, not the scanner

The .claude/skills/aphoria skill should be the main way claims enter the system. It:

Understands the project's claim vocabulary
Reviews diffs for claimable changes
Looks up existing claims for context
Helps author claims with proper lineage
Submits them as real Episteme assertions

The scanner (aphoria scan) becomes the audit tool -- it verifies that code matches claims, not the other way around.

3. Extractors serve the audit, not the authoring

The Extractor trait (traits.rs:68-94) needs to change:

// CURRENT: Extractors produce observations from thin air
pub trait Extractor: Send + Sync {
    fn name(&self) -> &str;
    fn languages(&self) -> &[Language];
    fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<ExtractedClaim>;
}

// TARGET: Extractors can also verify observations against claims
pub trait Extractor: Send + Sync {
    fn name(&self) -> &str;
    fn languages(&self) -> &[Language];
    fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<Observation>;

    /// Claims this extractor can verify (empty = observation-only extractor)
    fn verifiable_claims(&self) -> &[&str] { &[] }

    /// Verify a specific claim against extracted observations
    fn verify(&self, claim: &AuthoredClaim, observations: &[Observation]) -> VerifyResult {
        VerifyResult::Unchecked
    }
}

4. The corpus should be proper assertions

Today, RFC/OWASP knowledge is built procedurally in corpus.rs:33-157. The ConflictingSource::extract_citation() in types/claim.rs:89-111 already handles rfc:// and owasp:// URI schemes -- the infrastructure for proper corpus assertions partially exists.

Target: corpus data stored as real Episteme assertions with proper lineage, not rebuilt every scan.

5. The claims-explained.md pattern should be the product

The workflow that produces it:

flowchart TD
    A[aphoria scan] -->|produces| B[Observations]
    B -->|skill identifies| C{Claimable?}
    C -->|yes| D[Developer authors claim<br/>with skill assistance]
    C -->|no| E[Discard / log as observation]
    D -->|submit| F[Episteme Knowledge Graph]
    F -->|future scans| G[aphoria audit checks<br/>code against claims]
    G -->|generates| H[claims-explained.md<br/>auto-generated from graph]
    F -->|new observations| I{Matches existing claim?}
    I -->|yes, same value| J[PASS]
    I -->|no, different value| K[CONFLICT]
    I -->|claim about deleted code| L[MISSING]

Proposed Extractors for Audit Flow

These extractors don't exist today. They're needed to close the gap between observations and claims.

Self-Audit Extractors (Meta)

These extractors audit Aphoria's own code to verify the claims in this document remain true:

Extractor Name	What It Verifies	Pattern
`bridge_source_class_audit`	`bridge.rs` default tier assignment	Regex for `SourceClass::Expert` in `claim_to_assertion`
`bridge_parent_hash_audit`	Whether `parent_hash` is always `None`	Regex for `parent_hash: None` in bridge
`bridge_lifecycle_audit`	Whether lifecycle skips review	Regex for `LifecycleStage::Approved` without Pending
`extractor_trait_audit`	Whether Extractor trait accepts claims	Check trait definition for claim parameter
`type_naming_audit`	Whether `ExtractedClaim` has been renamed	Grep for `struct ExtractedClaim` vs `struct Observation`

Claim-Paired Extractors (Project-Specific)

These are examples of what extractor-claim pairs look like for a project like Maxwell:

Claim	Extractor	Verification
"Wallet atomics MUST use SeqCst"	`unsafe_atomic` (exists)	Check all `Ordering::` in wallet/ are `SeqCst`
"Wallet MUST NOT derive Clone"	`derive_pattern` (exists)	Check `#[derive(` on Wallet struct excludes `Clone`
"vsock types MUST derive Serialize+Deserialize"	`derive_pattern` (exists)	Check all structs in vsock/ derive both
"RAPL_POWER_UNIT MUST be 0x606"	`const_declarations` (exists)	Check const value matches Intel SDM
"Core modules MUST NOT import tokio"	`import_graph` (exists)	Check no `use tokio` in core/

The existing extractors can already produce the observations needed. What's missing is the claim to compare against and the pairing mechanism to connect them.

Declarative Extractor Examples

Using the existing DeclarativeExtractor system (extractors/declarative/), claim-paired extractors can be defined in aphoria.toml:

[[extractors.declarative]]
name = "wallet_seqcst_policy"
description = "Wallet atomics must use SeqCst ordering"
languages = ["rust"]
pattern = 'Ordering::(Relaxed|AcqRel|Acquire|Release)'
claim.subject = "policy/wallet/atomics/ordering"
claim.predicate = "forbidden_ordering"
claim.value = { type = "boolean", value = true }
confidence = 0.95
source = { claim_id = "wallet-seqcst-001", authority = "safety-analysis" }

[[extractors.declarative]]
name = "core_no_tokio_policy"
description = "Core modules must not import tokio"
languages = ["rust"]
pattern = 'use tokio'
claim.subject = "policy/core/imports/tokio"
claim.predicate = "forbidden_import"
claim.value = { type = "boolean", value = true }
confidence = 0.95
source = { claim_id = "arch-boundary-001", authority = "architecture-decision" }

The Path Forward

Phase 1: Distinguish observations from claims

Rename ExtractedClaim to Observation in types/claim.rs
Create AuthoredClaim type with provenance, invariant, consequence, authority, evidence_chain
Update bridge.rs default path to use Tier 4/5 (not Tier 3) for scanner output
Add evidence field to source_metadata in bridge

Phase 2: Build the authoring workflow

Create .claude/skills/aphoria skill for claim authoring
Add aphoria claims create CLI command
Add aphoria claims update with parent_hash supersession
Add aphoria claims list and aphoria claims explain
Store authored claims as proper Episteme assertions with lineage

Phase 3: Pair extractors with claims

Extend Extractor trait with verifiable_claims() and verify() methods
Add aphoria audit command (both directions)
Map each existing extractor to claims it can verify
Flag observations without matching claims as "should this be a claim?"

Phase 4: Make the corpus first-class

Convert corpus.rs hardcoded assertions to stored Episteme assertions
Wire up Authority Lens for conflict resolution
Ensure Trust Packs contain authored claims, not just patterns

Phase 5: The flywheel

More claims authored per commit
Better audit coverage (extractors verify more claims)
Skill learns from authored claims what's claimable
Claims-explained documentation auto-generates from knowledge graph
New team members read claims to understand WHY, not just WHAT

Summary

We built a good code scanner. We didn't build a knowledge graph client.

The extractors work well at finding patterns. But finding patterns isn't the point -- understanding what those patterns mean, why they must be that way, and what breaks if they change is the point.

The Maxwell claims-explained.md proves the concept works. Every one of those 67 observations becomes valuable when paired with provenance and invariants. The gap is that today a human has to write that context by hand.

Close the gap by making the skill -- not the scanner -- the primary interface, and by treating claims as authored artifacts with lineage rather than regex output with a fancy name.

Appendix: Claim Extraction Summary

This document contains 94 extractable claims across 52 unique subjects:

11 architecture claims: Verified against current code (all confirmed true)
10 gap claims: Define what doesn't exist yet
5 bridge.rs claims: Code-verifiable, confirmed (source_hash faked, source_class hardcoded, parent_hash ignored, epoch ignored, evidence empty)
15 phase-plan claims: Define specific deliverables and tasks
20+ workflow claims: Define the target authoring/audit model
5 claimability rules: What counts as claimable in a diff (spec constants=yes, ordering changes=yes, boundary crossings=yes, derive changes on serialized types=yes, renamed variables=no)
4 Maxwell examples: Real claims about SeqCst ordering, Wallet derives, vsock serialization, RAPL_POWER_UNIT

~~The most critical engineering gap: no extractor currently has the ability to verify against existing claims.~~ CLOSED (2026-02-08): The Extractor trait now includes verifiable_predicates() returning (tail_path, predicate) pairs. 10 extractors declare their predicates. compute_extractor_claim_map() matches claims against extractors (with wildcard support). aphoria verify map shows coverage. Direction 2 audit (walk claims, verify code) is now implemented via aphoria verify run.

29 KiB Raw Blame History