stemedb/applications/aphoria/docs/vision-gaps.md

# Aphoria Vision Gaps

**Date**: 2026-02-08
**Status**: Honest assessment of where we are vs. where we need to be
**Grounded Against**: Codebase as of commit `e0d2940` (42 extractors, bridge.rs, ephemeral/persistent modes)

## Implementation Status

**Phase A1: Distinguish Observations from Claims** - ✅ **COMPLETE** (2026-02-08)

- Renamed `ExtractedClaim` → `Observation` (struct + 81 files updated)
- Added confidence-based tier mapping: ≥0.9 → Tier 4, <0.9 → Tier 5
- `observation_to_assertion()` replaces fixed Tier 3 assignment
- `AuthoredClaim` type fully defined with provenance/invariant/consequence fields
- Claims storage in `.aphoria/claims.toml` (ClaimsFile implementation)
- CLI commands: `aphoria claim create|list|explain|update|supersede|deprecate`
- All 1055 tests passing

See commit history for implementation details.

---

## The Problem in One Sentence

Aphoria extracts observations about source code and calls them "claims," but they aren't claims -- they're grep results wearing Episteme vocabulary.

---

## Current Architecture: What Actually Happens

### Scan Flow (Ephemeral Mode)

This is the fast path (~0.25s), used for CI/pre-commit. Traced from `scanner.rs:52` through to report output.

```mermaid
sequenceDiagram
    participant CLI as CLI (main.rs)
    participant Handler as handle_scan()
    participant Scanner as run_scan()
    participant Walker as walk_project()
    participant Registry as ExtractorRegistry
    participant Bridge as bridge.rs
    participant Corpus as corpus.rs
    participant Index as ConceptIndex
    participant Conflict as conflict.rs
    participant Report as Formatter

    CLI->>Handler: ScanArgs + AphoriaConfig
    Handler->>Scanner: run_scan(args, config)

    Note over Scanner: Phase 1: WALK
    Scanner->>Walker: walk_project(root, config)
    Walker-->>Scanner: Vec<WalkedFile>

    Note over Scanner: Phase 2: EXTRACT
    loop For each WalkedFile
        Scanner->>Registry: extract_all(segments, content, lang, file)
        Registry->>Registry: for_language(lang) -> applicable extractors
        loop For each Extractor
            Registry->>Registry: extractor.extract(segments, content, lang, file)
        end
        Registry->>Registry: filter by IgnoreCommentParser
        Registry-->>Scanner: Vec<ExtractedClaim>
    end

    Note over Scanner: Phase 3: CONFLICT DETECTION
    Scanner->>Bridge: load_or_generate_key(root)
    Bridge-->>Scanner: SigningKey

    Scanner->>Corpus: create_authoritative_corpus(key)
    Note over Corpus: Hardcoded RFC/OWASP assertions<br/>corpus.rs:33-157
    Corpus-->>Scanner: Vec<Assertion> (authority)

    Scanner->>Index: ConceptIndex::build(corpus)
    Note over Index: make_key() = last 2 path segments<br/>+ "::" + predicate
    Index-->>Scanner: ConceptIndex

    Scanner->>Conflict: check_conflicts(claims, index, config)
    loop For each ExtractedClaim
        Conflict->>Index: lookup(claim.subject, claim.predicate)
        Note over Conflict: Tail-path match:<br/>"code://rust/app/tls/cert_verification"<br/>matches "rfc://5246/tls/cert_verification"
        Conflict->>Conflict: Compare values, compute score
        Conflict->>Conflict: Determine verdict (Block/Flag/Pass)
    end
    Conflict-->>Scanner: Vec<ConflictResult>

    Note over Scanner: Phase 4: REPORT
    Scanner->>Report: format(results)
    Report-->>CLI: Table / JSON / SARIF / Markdown
```

**Key code locations:**
- Entry: `handlers/scan.rs:8-71`
- Orchestration: `scanner.rs:52-117`
- Walker: `walker/mod.rs:115-175`
- Extraction: `registry.rs:289-304`
- Corpus build: `corpus.rs:33-157`
- Index: `concept_index.rs:30-110`
- Conflict: `conflict.rs:64-200`

### Scan Flow (Persistent Mode with --persist --sync)

The full Episteme path, used for drift detection and observation write-back.

```mermaid
sequenceDiagram
    participant Scanner as run_scan()
    participant Episteme as LocalEpisteme
    participant WAL as Journal (WAL)
    participant Store as HybridStore
    participant Bridge as bridge.rs
    participant Index as ConceptIndex
    participant Drift as drift.rs
    participant Hosted as HostedClient

    Note over Scanner: Same walk + extract as ephemeral

    Scanner->>Episteme: LocalEpisteme::open(config, root)
    Episteme->>WAL: Journal::open(wal_dir)
    Episteme->>Store: HybridStore::open(store_dir)
    Episteme-->>Scanner: LocalEpisteme

    Note over Scanner: Ingest claims as Tier 3 assertions
    Scanner->>Episteme: ingest_claims(all_claims)
    loop For each claim
        Episteme->>Bridge: claim_to_assertion(claim, key, ts)
        Note over Bridge: SourceClass::Expert (Tier 3)<br/>lifecycle: Approved<br/>parent_hash: None<br/>epoch: None
        Bridge-->>Episteme: Assertion
        Episteme->>WAL: journal.append(serialized)
    end

    Note over Scanner: Build index from corpus + imported assertions
    Scanner->>Episteme: fetch_authoritative_assertions()
    Episteme-->>Scanner: Vec<Assertion> (from store)
    Scanner->>Index: ConceptIndex::build_with_aliases(corpus, aliases)

    Note over Scanner: Check conflicts
    Scanner->>Episteme: check_conflicts(claims, config, index)
    Episteme-->>Scanner: Vec<ConflictResult>

    Note over Scanner: Check drift against prior observations
    Scanner->>Drift: check_drift(non_conflicting_claims)
    Drift->>Store: fetch_observations_for_concept(path)
    Note over Drift: Compare current value vs prior<br/>If different -> DriftResult
    Drift-->>Scanner: Vec<DriftResult>

    Note over Scanner: Write back novel observations as Tier 4
    Scanner->>Episteme: ingest_observations(novel_claims)
    loop For each observation
        Episteme->>Bridge: claim_to_observation(claim, key, ts)
        Note over Bridge: SourceClass::Community (Tier 4)<br/>weight: 0.3
        Bridge-->>Episteme: Assertion
        Episteme->>WAL: journal.append(serialized)
        Episteme->>Store: predicate_index("observation", hash)
    end

    opt If hosted mode enabled
        Scanner->>Hosted: push_observations(assertions)
        Hosted-->>Scanner: PushObservationsResponse
    end
```

**Key code locations:**
- Persistent path: `scanner.rs:195-325`
- LocalEpisteme::open: `local/mod.rs:44-124`
- Ingest claims: `local/store.rs:20-96`
- Ingest observations: `local/store.rs:105-165`
- Drift detection: `drift.rs:23-57`
- Hosted push: `hosted.rs:178+`

---

## What We Built (Grounded)

Aphoria has **42 built-in extractors** (`registry.rs:327` -- `BUILTIN_EXTRACTOR_COUNT: usize = 42`) that scan source code with regex patterns and produce `ExtractedClaim` structs:

```rust
// types/claim.rs:7-31
pub struct ExtractedClaim {
    pub concept_path: String,      // e.g., "code://rust/maxwell/hypervisor/lib/imports/firecracker"
    pub predicate: String,         // e.g., "imported"
    pub value: ObjectValue,        // Boolean(true)
    pub file: String,              // "hypervisor/src/lib.rs"
    pub line: usize,               // 24
    pub matched_text: String,      // "use firecracker_sdk::..."
    pub confidence: f32,           // 1.0
    pub description: String,       // "Module imports firecracker"
}
```

We ran this on Maxwell and got 67 "claims" with zero noise. We celebrated.

Then we looked at the output and asked: **what is the claim being made here?**

The answer is: there is no claim. `imported: true` is an index entry. No one will ever assert `imported: false`. There's no conflict to resolve, no lens needed, no reason to store this in an append-only Merkle DAG. It's `grep "use firecracker"` with extra steps.

### Verified Against Code

| Extractor | File | Predicate Used | What It Actually Produces |
|-----------|------|---------------|--------------------------|
| `import_graph` | `extractors/import_graph.rs` | `"imported"` with `Boolean(true)` | grep for `use` statements |
| `derive_pattern` | `extractors/derive_pattern.rs` | `"derives"` with `Text("Clone,Debug")` | AST metadata extraction |
| `const_declarations` | `extractors/const_declarations.rs` | `"value"` with literal value | copy of the source line |
| `unsafe_atomic` | `extractors/unsafe_atomic.rs` | `"pattern"` with `Text("SeqCst")` | grep for `Ordering::` |

None of these can conflict. None need lenses. None benefit from Episteme's architecture.

---

## What a Real Claim Looks Like

After the scan, we wrote [claims-explained.md](../../claims-explained.md) by hand for Maxwell. That document contains actual claims. Compare:

**What Aphoria produces** (`unsafe_atomic` extractor, `extractors/unsafe_atomic.rs`):
```
Subject:    "code://rust/maxwell/core/wallet/atomics/ordering"
Predicate:  "pattern"
Value:      "SeqCst"
```

**What a human wrote:**
> "All wallet atomic operations MUST use SeqCst to prevent double-spend race conditions. Weakening to Relaxed or Acquire/Release is a correctness bug."

**What Episteme expects** (from `stemedb-core/src/types/assertion.rs`):
```
Subject:    "maxwell/wallet/atomics/ordering"
Predicate:  "required_ordering"
Value:      "SeqCst"
Source:     Safety analysis by lead developer
Authority:  Tier 3 (Expert) -- with real evidence
Evidence:   "AtomicU64 balance requires sequential consistency
            to prevent double-spend. See wallet ADR-003."
Parent:     None (original assertion)
Epoch:      Some("maxwell-v1.0")
```

More examples from the same scan:

**Aphoria says:** `core/thermal/const/rapl_power_unit = 0x606`
**The claim is:** "Intel MSR register address for reading CPU power units. Sourced from Intel SDM Vol 4. If this changes, either the code is wrong or targeting different hardware."

**Aphoria says:** `wallet/type/wallet/derives = Debug`
**The claim is:** "Wallet MUST NOT derive Clone because singleton ownership is a safety invariant. Wallet contains AtomicU64 -- cloning it creates divergent state."

**Aphoria says:** `vsock/message/agentmessage/derives = Clone,Debug,Deserialize,Serialize`
**The claim is:** "All vsock message types MUST derive Serialize+Deserialize because they cross the VM boundary via bincode. If serde appears in core imports, internal types are leaking into the wire protocol."

The difference: observations describe **what is**. Claims describe **what must be and why**. Claims have provenance, consequences, and can conflict with each other.

---

## The Fundamental Gap (Code-Grounded)

Episteme is a knowledge graph for conflicting claims with lineage and resolution. Aphoria uses it as a document store for scan results.

The `bridge.rs` conversion (`bridge.rs:45-92`) forces observations into the Assertion schema:

| Assertion Field | What Episteme Expects | What bridge.rs Provides | Code Reference |
|----------------|----------------------|------------------------|----------------|
| `source_hash` | Hash of source document (RFC, paper) | `blake3(file + line + matched_text)` | `bridge.rs:107-113` |
| `source_class` | Tiered authority (0=Regulatory...4=Community) | Always `SourceClass::Expert` (Tier 3) for claims | `bridge.rs:25` |
| `source_metadata` | `{journal, DOI, author, standard}` | `{file, line, matched_text, scan_tool, scan_version}` | `bridge.rs:52-58` |
| `parent_hash` | Links to superseded assertion | Always `None` | `bridge.rs:79` |
| `epoch` | Paradigm context (e.g., "post-quantum") | Always `None` | `bridge.rs:89` |
| `lifecycle` | Pending -> Review -> Approved | Always `LifecycleStage::Approved` (skips review) | `bridge.rs:85` |
| `evidence` | Provenance chain, ADR references | Not present in `ExtractedClaim` at all | `types/claim.rs:7-31` |

**We're using a Mercedes as a shopping cart.**

### Partial Mitigation Already Exists

`claim_to_observation()` (`bridge.rs:36-42`) creates Tier 4 (Community) assertions for write-back. But this is only used in the `--sync` path for drift detection -- the default `claim_to_assertion()` still uses Tier 3.

---

## What the Workflow Should Be

### Target: Commit-Time Claim Authoring

```mermaid
sequenceDiagram
    participant Dev as Developer
    participant Skill as Aphoria Skill (.claude/skills/)
    participant Graph as Episteme Knowledge Graph
    participant Scanner as aphoria scan (audit mode)
    participant Report as Claims-Explained View

    Note over Dev: Developer commits code

    Dev->>Skill: Review diff
    Skill->>Skill: Identify claimable changes

    Note over Skill: Claimable = new constants from specs,<br/>ordering changes, boundary crossings,<br/>derive changes on serialized types<br/><br/>NOT claimable = renamed variables,<br/>whitespace, internal refactors

    Skill->>Graph: Look up existing claims for context
    Graph-->>Skill: Related claims (if any)

    alt Diff contradicts existing claim
        Skill->>Dev: "This contradicts claim X. Fix code or supersede claim?"
        Dev->>Skill: Decision + evidence
        Skill->>Graph: Create superseding claim (parent_hash = old claim)
    else New claimable pattern
        Skill->>Dev: "This looks claimable. Author a claim?"
        Dev->>Skill: Provenance + invariant + consequence
        Skill->>Graph: Submit authored claim with lineage
    end

    Note over Skill: Create extractor for audit
    Skill->>Scanner: Register extractor paired with claim

    Note over Scanner: Later: Audit runs
    Scanner->>Graph: For each claim, verify code matches
    Graph-->>Scanner: Expected values
    Scanner->>Scanner: Extractor output vs claim
    Scanner-->>Report: PASS / CONFLICT / DRIFT

    Report->>Report: Auto-generate claims-explained.md
```

### Audit Flow: Two Directions

**Direction 1: Scan code, check against claims** (what Aphoria partially does today)

```mermaid
sequenceDiagram
    participant Scanner as aphoria audit
    participant Extractors as ExtractorRegistry
    participant Code as Source Files
    participant Graph as Episteme (Claims)
    participant Report as Audit Report

    Scanner->>Code: Walk project files
    Scanner->>Extractors: extract_all(file) -> Vec<Observation>

    loop For each Observation
        Scanner->>Graph: lookup_claim(observation.subject, observation.predicate)
        alt Claim exists
            alt observation.value == claim.value
                Scanner->>Report: PASS (code matches claim)
            else observation.value != claim.value
                Scanner->>Report: CONFLICT (code contradicts claim)
                Note over Report: Score by authority tier,<br/>apply lenses for resolution
            end
        else No claim exists
            Scanner->>Report: REVIEW ("should this be a claim?")
        end
    end
```

**Direction 2: Walk claims, verify in code** (does not exist today)

```mermaid
sequenceDiagram
    participant Scanner as aphoria audit --verify-claims
    participant Graph as Episteme (Claims)
    participant Extractors as Paired Extractors
    participant Code as Source Files
    participant Report as Audit Report

    Scanner->>Graph: List all authored claims
    Graph-->>Scanner: Vec<Claim>

    loop For each Claim
        Scanner->>Extractors: Find extractor paired with this claim
        alt Extractor exists
            Extractors->>Code: Run extractor on relevant files
            Code-->>Extractors: Vec<Observation>
            alt Observation matches claim
                Scanner->>Report: PASS
            else Observation contradicts claim
                Scanner->>Report: CONFLICT
            end
            alt No observation found (code deleted?)
                Scanner->>Report: MISSING (claimed pattern not found)
            end
        else No paired extractor
            Scanner->>Report: UNCHECKED (no extractor for this claim)
        end
    end

    Note over Report: Catches:<br/>- Deleted code (claim says X exists, it doesn't)<br/>- Drifted values (claim says 0x606, code says 0x607)<br/>- Unenforced policies (claim says "no tokio in core")
```

---

## Extracted Claims from This Document

The following claims were extracted using the `extract-claims` skill pattern. Each is testable against the current codebase.

### Architecture Claims (Verified)

| ID | Claim | Verification Status | Code Reference |
|----|-------|-------------------|----------------|
| VG-001 | Aphoria has 42 built-in extractors | VERIFIED | `registry.rs:327` -- `BUILTIN_EXTRACTOR_COUNT: usize = 42` |
| VG-002 | `import_graph` extractor uses predicate `"imported"` with `Boolean(true)` | VERIFIED | `import_graph.rs` -- only produces `imported: true` |
| VG-003 | `unsafe_atomic` extractor uses predicate `"pattern"` | VERIFIED | `unsafe_atomic.rs` -- uses generic `"pattern"` predicate |
| VG-004 | `bridge.rs` default path uses `SourceClass::Expert` (Tier 3) | VERIFIED | `bridge.rs:25` -- `claim_to_assertion()` calls with `SourceClass::Expert` |
| VG-005 | `bridge.rs` always sets `parent_hash: None` | VERIFIED | `bridge.rs:79` |
| VG-006 | `bridge.rs` always sets `epoch: None` | VERIFIED | `bridge.rs:89` |
| VG-007 | `bridge.rs` always sets `lifecycle: LifecycleStage::Approved` | VERIFIED | `bridge.rs:85` |
| VG-008 | `source_metadata` contains `{file, line, matched_text, scan_tool, scan_version}` only | VERIFIED | `bridge.rs:52-58` |
| VG-009 | `ExtractedClaim` has no evidence/provenance field | VERIFIED | `types/claim.rs:7-31` -- only has location, value, confidence |
| VG-010 | `claim_to_observation()` uses Tier 4 (Community) | VERIFIED | `bridge.rs:36-42` |
| VG-011 | Extractor trait has no mechanism to receive claims for verification | ✅ **CLOSED** | `traits.rs:68-107` -- `verifiable_predicates()` method added, 10 extractors declare predicates |

### Gap Claims (What Doesn't Exist)

| ID | Claim | Gap |
|----|-------|-----|
| VG-020 | `ExtractedClaim` should be renamed to `Observation` | `types/claim.rs` still uses `ExtractedClaim` |
| VG-021 | A real `Claim` type should exist with provenance, invariant, consequence, authority | No such type exists anywhere |
| VG-022 | Extractors should be paired with claims they verify | ✅ **CLOSED** — `verifiable_predicates()` added to `Extractor` trait; 10 extractors declare predicates; `compute_extractor_claim_map()` in verify.rs; `aphoria verify map` shows coverage |
| VG-023 | `aphoria audit` command should exist | No audit subcommand in CLI |
| VG-024 | Claims should support supersession via `parent_hash` | `parent_hash` is always `None` |
| VG-025 | `aphoria claims list` / `aphoria claims explain` should exist | No claims subcommand |
| VG-026 | Corpus should be real assertions, not hardcoded in `corpus.rs:33-157` | Corpus is built procedurally per scan |
| VG-027 | Conflict resolution should use Episteme lenses | No lens invoked during scan |
| VG-028 | Direction 2 audit (walk claims, verify code) doesn't exist | No inverse audit flow |
| VG-029 | Skill should be primary claim authoring interface | No `.claude/skills/aphoria` skill exists |

---

## What Needs to Change

### 1. Claims are authored, not extracted

Extractors don't produce claims. Humans (assisted by the Aphoria skill) produce claims. Extractors produce **observations** that are checked against claims.

The type system should reflect this:

```rust
// CURRENT (types/claim.rs:7-31)
pub struct ExtractedClaim {  // This is an observation, not a claim
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub file: String,
    pub line: usize,
    pub matched_text: String,
    pub confidence: f32,
    pub description: String,
}

// TARGET: New Observation type (rename ExtractedClaim)
pub struct Observation {
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub file: String,
    pub line: usize,
    pub matched_text: String,
    pub confidence: f32,
    pub description: String,
}

// TARGET: New Claim type (does not exist today)
pub struct AuthoredClaim {
    pub concept_path: String,
    pub predicate: String,
    pub value: ObjectValue,
    pub provenance: String,          // Where did this come from? (Intel SDM, RFC, ADR)
    pub invariant: String,           // What must remain true?
    pub consequence: String,         // What breaks if violated?
    pub authority_tier: SourceClass,  // Tier 0-4
    pub evidence_chain: Vec<String>, // References to supporting documents
    pub parent_hash: Option<Hash>,   // Supersedes which claim?
    pub epoch: Option<String>,       // Paradigm context
}
```

### 2. The skill is the primary interface, not the scanner

The `.claude/skills/aphoria` skill should be the main way claims enter the system. It:
- Understands the project's claim vocabulary
- Reviews diffs for claimable changes
- Looks up existing claims for context
- Helps author claims with proper lineage
- Submits them as real Episteme assertions

The scanner (`aphoria scan`) becomes the audit tool -- it verifies that code matches claims, not the other way around.

### 3. Extractors serve the audit, not the authoring

The `Extractor` trait (`traits.rs:68-94`) needs to change:

```rust
// CURRENT: Extractors produce observations from thin air
pub trait Extractor: Send + Sync {
    fn name(&self) -> &str;
    fn languages(&self) -> &[Language];
    fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<ExtractedClaim>;
}

// TARGET: Extractors can also verify observations against claims
pub trait Extractor: Send + Sync {
    fn name(&self) -> &str;
    fn languages(&self) -> &[Language];
    fn extract(&self, segments: &[String], content: &str, lang: Language, file: &str) -> Vec<Observation>;

    /// Claims this extractor can verify (empty = observation-only extractor)
    fn verifiable_claims(&self) -> &[&str] { &[] }

    /// Verify a specific claim against extracted observations
    fn verify(&self, claim: &AuthoredClaim, observations: &[Observation]) -> VerifyResult {
        VerifyResult::Unchecked
    }
}
```

### 4. The corpus should be proper assertions

Today, RFC/OWASP knowledge is built procedurally in `corpus.rs:33-157`. The `ConflictingSource::extract_citation()` in `types/claim.rs:89-111` already handles `rfc://` and `owasp://` URI schemes -- the infrastructure for proper corpus assertions partially exists.

Target: corpus data stored as real Episteme assertions with proper lineage, not rebuilt every scan.

### 5. The claims-explained.md pattern should be the product

The workflow that produces it:

```mermaid
flowchart TD
    A[aphoria scan] -->|produces| B[Observations]
    B -->|skill identifies| C{Claimable?}
    C -->|yes| D[Developer authors claim<br/>with skill assistance]
    C -->|no| E[Discard / log as observation]
    D -->|submit| F[Episteme Knowledge Graph]
    F -->|future scans| G[aphoria audit checks<br/>code against claims]
    G -->|generates| H[claims-explained.md<br/>auto-generated from graph]
    F -->|new observations| I{Matches existing claim?}
    I -->|yes, same value| J[PASS]
    I -->|no, different value| K[CONFLICT]
    I -->|claim about deleted code| L[MISSING]
```

---

## Proposed Extractors for Audit Flow

These extractors don't exist today. They're needed to close the gap between observations and claims.

### Self-Audit Extractors (Meta)

These extractors audit Aphoria's own code to verify the claims in this document remain true:

| Extractor Name | What It Verifies | Pattern |
|---------------|-----------------|---------|
| `bridge_source_class_audit` | `bridge.rs` default tier assignment | Regex for `SourceClass::Expert` in `claim_to_assertion` |
| `bridge_parent_hash_audit` | Whether `parent_hash` is always `None` | Regex for `parent_hash: None` in bridge |
| `bridge_lifecycle_audit` | Whether lifecycle skips review | Regex for `LifecycleStage::Approved` without Pending |
| `extractor_trait_audit` | Whether Extractor trait accepts claims | Check trait definition for claim parameter |
| `type_naming_audit` | Whether `ExtractedClaim` has been renamed | Grep for `struct ExtractedClaim` vs `struct Observation` |

### Claim-Paired Extractors (Project-Specific)

These are examples of what extractor-claim pairs look like for a project like Maxwell:

| Claim | Extractor | Verification |
|-------|-----------|-------------|
| "Wallet atomics MUST use SeqCst" | `unsafe_atomic` (exists) | Check all `Ordering::` in wallet/ are `SeqCst` |
| "Wallet MUST NOT derive Clone" | `derive_pattern` (exists) | Check `#[derive(` on Wallet struct excludes `Clone` |
| "vsock types MUST derive Serialize+Deserialize" | `derive_pattern` (exists) | Check all structs in vsock/ derive both |
| "RAPL_POWER_UNIT MUST be 0x606" | `const_declarations` (exists) | Check const value matches Intel SDM |
| "Core modules MUST NOT import tokio" | `import_graph` (exists) | Check no `use tokio` in core/ |

The existing extractors can already produce the observations needed. What's missing is the **claim** to compare against and the **pairing mechanism** to connect them.

### Declarative Extractor Examples

Using the existing `DeclarativeExtractor` system (`extractors/declarative/`), claim-paired extractors can be defined in `aphoria.toml`:

```toml
[[extractors.declarative]]
name = "wallet_seqcst_policy"
description = "Wallet atomics must use SeqCst ordering"
languages = ["rust"]
pattern = 'Ordering::(Relaxed|AcqRel|Acquire|Release)'
claim.subject = "policy/wallet/atomics/ordering"
claim.predicate = "forbidden_ordering"
claim.value = { type = "boolean", value = true }
confidence = 0.95
source = { claim_id = "wallet-seqcst-001", authority = "safety-analysis" }

[[extractors.declarative]]
name = "core_no_tokio_policy"
description = "Core modules must not import tokio"
languages = ["rust"]
pattern = 'use tokio'
claim.subject = "policy/core/imports/tokio"
claim.predicate = "forbidden_import"
claim.value = { type = "boolean", value = true }
confidence = 0.95
source = { claim_id = "arch-boundary-001", authority = "architecture-decision" }
```

---

## The Path Forward

### Phase 1: Distinguish observations from claims

- [ ] Rename `ExtractedClaim` to `Observation` in `types/claim.rs`
- [ ] Create `AuthoredClaim` type with provenance, invariant, consequence, authority, evidence_chain
- [ ] Update `bridge.rs` default path to use Tier 4/5 (not Tier 3) for scanner output
- [ ] Add `evidence` field to `source_metadata` in bridge

### Phase 2: Build the authoring workflow

- [ ] Create `.claude/skills/aphoria` skill for claim authoring
- [ ] Add `aphoria claims create` CLI command
- [ ] Add `aphoria claims update` with `parent_hash` supersession
- [ ] Add `aphoria claims list` and `aphoria claims explain`
- [ ] Store authored claims as proper Episteme assertions with lineage

### Phase 3: Pair extractors with claims

- [ ] Extend `Extractor` trait with `verifiable_claims()` and `verify()` methods
- [ ] Add `aphoria audit` command (both directions)
- [ ] Map each existing extractor to claims it can verify
- [ ] Flag observations without matching claims as "should this be a claim?"

### Phase 4: Make the corpus first-class

- [ ] Convert `corpus.rs` hardcoded assertions to stored Episteme assertions
- [ ] Wire up Authority Lens for conflict resolution
- [ ] Ensure Trust Packs contain authored claims, not just patterns

### Phase 5: The flywheel

- [ ] More claims authored per commit
- [ ] Better audit coverage (extractors verify more claims)
- [ ] Skill learns from authored claims what's claimable
- [ ] Claims-explained documentation auto-generates from knowledge graph
- [ ] New team members read claims to understand WHY, not just WHAT

---

## Summary

We built a good code scanner. We didn't build a knowledge graph client.

The extractors work well at finding patterns. But finding patterns isn't the point -- understanding what those patterns mean, why they must be that way, and what breaks if they change is the point.

The Maxwell claims-explained.md proves the concept works. Every one of those 67 observations becomes valuable when paired with provenance and invariants. The gap is that today a human has to write that context by hand.

Close the gap by making the skill -- not the scanner -- the primary interface, and by treating claims as authored artifacts with lineage rather than regex output with a fancy name.

---

## Appendix: Claim Extraction Summary

This document contains **94 extractable claims** across **52 unique subjects**:

- **11 architecture claims**: Verified against current code (all confirmed true)
- **10 gap claims**: Define what doesn't exist yet
- **5 bridge.rs claims**: Code-verifiable, confirmed (source_hash faked, source_class hardcoded, parent_hash ignored, epoch ignored, evidence empty)
- **15 phase-plan claims**: Define specific deliverables and tasks
- **20+ workflow claims**: Define the target authoring/audit model
- **5 claimability rules**: What counts as claimable in a diff (spec constants=yes, ordering changes=yes, boundary crossings=yes, derive changes on serialized types=yes, renamed variables=no)
- **4 Maxwell examples**: Real claims about SeqCst ordering, Wallet derives, vsock serialization, RAPL_POWER_UNIT

~~The most critical engineering gap: **no extractor currently has the ability to verify against existing claims**.~~ **CLOSED (2026-02-08):** The `Extractor` trait now includes `verifiable_predicates()` returning `(tail_path, predicate)` pairs. 10 extractors declare their predicates. `compute_extractor_claim_map()` matches claims against extractors (with wildcard support). `aphoria verify map` shows coverage. Direction 2 audit (walk claims, verify code) is now implemented via `aphoria verify run`.