Claims now flow through StemeDB's append-only knowledge graph instead of mutable TOML files. This resolves all 6 critical claim-bypass code paths: - Bridge: lossless AuthoredClaim ↔ Assertion round-trip (comparison, status, lifecycle mapping) - LocalEpisteme: ingest_authored_claim() and fetch_authored_claims() with AUTHORED_CLAIM predicate index - EpistemeClaimStore: ClaimStore trait backed by StemeDB (append-only delete via deprecation) - CLI handlers: all claim commands read/write through StemeDB - Scanner: loads claims from StemeDB with auto-migration fallback to TOML - Export: new `aphoria claims export` serializes StemeDB claims to TOML/JSON Also cleans up dead code (EpistemeConfig.url), renames ingest_claims→ingest_observations, fixes ClaimFilter.authority_tier type, adds Draft variant to ClaimStatus, and fixes pre-existing clippy warnings (too_many_arguments, filter_next→rfind). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
11 KiB
Episteme: Git for Truth
Internal Codename: StemeDB Category: Infrastructure / Probabilistic Knowledge Database Role: The shared memory for AI research agents that disagree
The Problem: Databases Force False Certainty
Current databases (Postgres, Neo4j, Vector DBs) suffer from The Tower of Babel problem: they store Data, not Evidence. They are deterministic, stateless, and brittle.
When multiple agents observe the world and report different things, traditional databases force you to:
- Pick a winner (losing the disagreement)
- Version-table chaos (complexity explodes)
- Application logic everywhere (authority weighting, decay, cascades)
Real example: A woman researching Semaglutide found her doctor saying "well-tolerated" while Reddit flagged gastroparesis months before the FDA added the warning. She had no way to weigh these sources structurally. The Reddit signal was right. The system failed her.
The Solution: Store Claims, Resolve at Read Time
Episteme rejects the idea of a single, static "database state." Instead, it models knowledge as a Probabilistic Marketplace:
- Assertions are immutable. Every claim is signed, timestamped, and preserved forever.
Status: PARTIALLY IMPLEMENTED. StemeDB Assertions follow this model. Aphoria's
AuthoredClaimentries are currently stored in a mutable TOML file (.aphoria/claims.toml), not yet routed through the append-only DAG. Bridging claims into StemeDB as Assertions is tracked in the gap closure plan. - Contradictions coexist. The database holds disagreement without forcing resolution.
- Lenses resolve at query time. Different readers can apply different resolution strategies.
- Source authority is structural. A regulatory filing outweighs a Reddit post by design.
The Four Pillars
Every use case must demonstrate at least one pillar. If Postgres could do it, it's not a compelling use case.
| Pillar | What It Enables | Postgres Gap |
|---|---|---|
| First-Class Contradiction | Hold conflicting facts without forcing resolution | Must pick one value or version-table chaos |
| Invalidation Cascades | Retracted evidence flags all downstream decisions | Recursive CTEs don't scale, app logic drifts |
| Multi-Signature Consensus | Weighted trust via cryptographic co-signatures | Join tables have no cryptographic proof |
| Semantic Decay | Old data fades from hot path but remains auditable | Manual WHERE clauses, inconsistent decay rates |
The Core Data Model: The Signed Assertion
The atomic unit is not a Row, Document, or Embedding. It is the Signed Assertion:
struct Assertion {
// The Proposition (What is being claimed)
subject: EntityId, // "semaglutide", "Tesla_Inc"
predicate: RelationId, // "has_side_effect", "annual_revenue"
object: ObjectValue, // "gastroparesis", "$96.7B"
// The Lineage (Why we believe it)
source_hash: Hash, // Content-addressed link to source document
source_class: SourceClass, // Authority tier (0=Regulatory...5=Anecdotal)
source_metadata: Option<JSON>, // Rich provenance (journal, DOI, etc.)
visual_hash: Option<PHash>, // Perceptual hash for image provenance
epoch: Option<EpochId>, // Paradigm context ("covid-2020", "gaap-2024")
// The Meta-Cognition (Who said it, how confident)
signatures: Vec<SignatureEntry>, // Ed25519 cryptographic proofs
confidence: f32, // 0.0-1.0 subjective certainty
timestamp: u64, // When created
lifecycle: LifecycleStage, // Proposed → Approved → Deprecated
// The Semantic (Meaning for similarity search)
vector: Option<Vec<f32>>, // Embedding for k-NN queries
}
Current state: Aphoria uses a separate
AuthoredClaimstruct with fields likeconcept_path,predicate,value,comparison,invariant, andconsequence. A bridge function (authored_claim_to_assertion()) exists to convert between the two representations but is not yet used in the primary claim storage path. Claims are currently persisted in.aphoria/claims.toml, not asAssertionentries in the DAG. Routing claims through StemeDB is planned.
The Source Class Hierarchy
Every assertion has a source class that structurally affects resolution weight and decay:
| Tier | Class | Example | Decay Half-Life | Authority Weight |
|---|---|---|---|---|
| 0 | Regulatory | FDA label, SEC filing | Never | 1.0 |
| 1 | Clinical | Peer-reviewed RCTs | 2 years | 0.9 |
| 2 | Observational | Real-world evidence | 1 year | 0.7 |
| 3 | Expert | Physician guidelines | 6 months | 0.5 |
| 4 | Community | Patient registries | 3 months | 0.2 |
| 5 | Anecdotal | Reddit posts, social | 30 days | 0.1 |
A million Tier-5 anecdotal assertions cannot outvote a single Tier-0 regulatory assertion. But the million anecdotes can signal "something is happening here" via cluster escalation.
The Query Engine: Truth Lenses
Reading applies a Lens to collapse the probabilistic field into a concrete answer. Materialized Views ensure sub-millisecond latency for common patterns.
Resolution Lenses (Pick a Winner)
| Lens | Behavior |
|---|---|
| Recency | Last writer wins |
| Consensus | Highest cluster density of object values |
| Authority | Filter by TrustRank reputation |
| Vote-Aware | Weight by Ballot Box votes |
| EpochAware | Filter out superseded paradigms |
Analysis Lenses (Surface Disagreement)
| Lens | Behavior |
|---|---|
| Skeptic | Return all claims with conflict score and weight shares |
| Layered Consensus | Per-source-class resolution (tier-by-tier visibility) |
| Constraints | Pre-flight check for must_use/forbidden predicates |
The Skeptic and Layered Consensus lenses are key differentiators: they answer "where do sources agree and disagree?" rather than hiding the variance.
Key Capabilities
Time-Travel Queries
"What was the known risk profile when I started Semaglutide in June 2023?"
GET /query?subject=semaglutide&predicate=side_effects&as_of=1687000000
The append-only DAG preserves every historical state. Time travel is a hash lookup, not a reconstruction.
Semantic Decay
Confidence decays based on source class. Old Reddit posts fade; regulatory filings persist:
GET /query?subject=semaglutide&predicate=efficacy&source_class_decay=true
Conflict Analysis
Instead of "here is the answer," show "here is the shape of disagreement":
GET /skeptic?subject=semaglutide&predicate=gastroparesis_risk
Returns: which tiers agree, which disagree, emerging signals without clinical evidence.
Query Audit Trail
Every query is logged with full provenance. "Why did you believe that?" is answerable:
GET /audit/query/{query_id}
The Ballot Box: High-Velocity Consensus
To avoid write contention on assertions, agents vote separately:
struct Vote {
assertion_hash: Hash,
agent_id: PublicKey,
weight: f32,
signature: Signature,
}
Votes are append-only. A background Materializer aggregates votes to update O(1) read views.
Trust Packs: The Curator Economy
Users subscribe to "Trust Packs" (curated lists of trusted agents) to filter reality:
- "The Skeptical Cardio Pack" filters out low-quality cardiac studies
- "Mayo Clinic Curated" only shows assertions from verified Mayo researchers
Trust Packs are BitSet overlays that filter the Consensus Lens efficiently.
The Meter: Economics of Reasoning
Deep Research is computationally expensive. Episteme enforces token-bucket quotas:
- Assert: 10 tokens
- Vote: 1 token
- Query: 5 + lens complexity tokens
- Default: 10,000 tokens/agent/hour
Architecture: The Biological Stack
| Layer | Crate | Role |
|---|---|---|
| The Spine | stemedb-wal |
Append-only WAL for durability |
| The Lattice | stemedb-storage |
KV store, indexes, vector/visual indices |
| The Cortex | stemedb-query, stemedb-lens |
Query engine, Lenses, Materializer |
| The Surface | stemedb-api |
HTTP API with OpenAPI docs |
The biological metaphor:
- Spine: Raw persistence. Never loses a claim.
- Lattice: Connectivity. O(1) lookups via compound indexes.
- Cortex: Reasoning. Collapse probability into answers.
Future Vision
Forking Reality (Planned)
Agents simulate futures without polluting the main branch via Copy-on-Write Branching using Sparse Merkle Trees.
The Super Curator (Planned)
A specialized swarm of reviewer agents that audits high-variance facts and escalates emerging signals.
The Simulator (Planned)
A pipeline that converts high-confidence failure logs into synthetic training trajectories.
The Git Analogy
| Git Concept | Episteme Equivalent |
|---|---|
| Commit | Assertion (immutable, content-addressed) |
| Branch | Epoch (paradigm context) |
| Merge | Lens resolution |
| Revert | Epoch supersession cascade |
| Blame | Signature/agent audit trail |
| History | Append-only DAG preserved forever |
When to Use Episteme
Use Episteme when:
- Multiple sources report conflicting information
- You need to weight sources by authority, not just timestamp
- You need to surface disagreement, not hide it
- Guidance changes and you need to notify prior consumers
- You need to audit "why did you believe that?"
- You need historical snapshots ("what was true on this date?")
Use Postgres when:
- You have a single source of truth
- Data never conflicts
- Temporal validity doesn't matter
- Consensus has already been reached by humans
For everything else: Episteme is the database.