Break monolith source files into focused modules: - stemedb-core/types.rs → types/ directory (assertion, source, gold_standard, etc.) - stemedb-storage: audit_store, quota_store, trust_rank_store, vector_index, vote_store → module directories - stemedb-ingest/worker.rs → worker/ with separate test modules - stemedb-query: engine, materializer, query → module directories - stemedb-lens: epoch_aware, skeptic → module directories - stemedb-sim/lib.rs → agent, arenas/, helpers, runner, strategy, types - stemedb-api/tests: integration_tests → http_basic, http_validation, http_epoch, http_pipeline - stemedb-api/tests: e2e_flow_test → e2e_full_pipeline, e2e_lens_resolution - stemedb-query/tests: e2e_pipeline → e2e_pipeline + e2e_decay Also adds new features: gold standard verification, escalation handlers, admin endpoints, concept hierarchy spec, arena roadmap, and Go SDK. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
17 KiB
StemeDB Data Structures
Last Updated: 2026-01-31 Source:
crates/stemedb-core/src/types.rs
This document describes the core data structures in StemeDB (Episteme). These types form the foundation of the "Git for Truth" knowledge graph.
Design Principles
- Append-Only: Data is never mutated. New assertions create new records.
- Content-Addressed: Every assertion's ID is a BLAKE3 hash of its content.
- Zero-Copy: Uses
rkyvfor serialization - data can be read directly from disk without parsing. - Provenance-First: Every fact carries its source, signers, and confidence.
Primitive Types
pub type Hash = [u8; 32]; // BLAKE3 256-bit hash
pub type PHash = [u8; 8]; // Perceptual hash for images (8 bytes)
pub type EntityId = String; // Subject or object identifier
pub type RelationId = String; // Predicate identifier
pub type EpochId = Hash; // Paradigm/era identifier
pub type QueryId = Hash; // Query audit record identifier
The Assertion (Atomic Unit of Knowledge)
The Assertion is the fundamental unit. It represents a single claim about the world.
pub struct Assertion {
// ═══════════════════════════════════════════════════════════
// 1. THE FACT (What is being claimed)
// ═══════════════════════════════════════════════════════════
/// The entity this assertion is about (e.g., "Semaglutide", "Tesla_Inc")
pub subject: EntityId,
/// The relationship or property (e.g., "has_side_effect", "annual_revenue")
pub predicate: RelationId,
/// The claimed value
pub object: ObjectValue,
// ═══════════════════════════════════════════════════════════
// 2. THE LINEAGE (Why we believe it)
// ═══════════════════════════════════════════════════════════
/// If this modifies/forks another assertion, its hash
pub parent_hash: Option<Hash>,
/// Hash of the source evidence (PDF, URL, database export)
pub source_hash: Hash,
/// Authority tier of the source (enables indexing and decay rates)
pub source_class: SourceClass,
/// Perceptual hash of a visual anchor (e.g., screenshot of table)
pub visual_hash: Option<PHash>,
/// Which paradigm/era this belongs to (for paradigm shifts)
pub epoch: Option<EpochId>,
/// Lifecycle stage (Proposed → Approved → Deprecated)
pub lifecycle: LifecycleStage,
// ═══════════════════════════════════════════════════════════
// 3. META-COGNITION (Who said it, how sure are they)
// ═══════════════════════════════════════════════════════════
/// Cryptographic signatures from agents vouching for this
pub signatures: Vec<SignatureEntry>,
/// Subjective confidence score (0.0 to 1.0)
pub confidence: f32,
/// Unix timestamp when created
pub timestamp: u64,
/// Semantic embedding vector for similarity search
pub vector: Option<Vec<f32>>,
}
ObjectValue
The value in a subject-predicate-object triple:
pub enum ObjectValue {
Text(String), // "muscle loss"
Number(f64), // 96.7
Boolean(bool), // true
Reference(EntityId), // Points to another entity (graph edge)
}
LifecycleStage
Assertions progress through stages (as new assertions, not mutations):
Proposed → UnderReview → Approved
↘ Rejected
↘ Deprecated
pub enum LifecycleStage {
Proposed, // Initial idea, not for production use
UnderReview, // Gathering votes and feedback
Approved, // Accepted as current truth
Deprecated, // Was true, now superseded
Rejected, // Explicitly declined
}
SourceClass
Authority tier classification for sources. Enables indexing by tier and tier-based decay rates:
| Tier | Class | Example | Default Decay |
|---|---|---|---|
| 0 | Regulatory | FDA, EMA, WHO | Never |
| 1 | Clinical | Phase III trials, peer-reviewed RCTs | 2 years |
| 2 | Observational | Real-world evidence, cohort studies | 1 year |
| 3 | Expert | Medical professional opinions, guidelines | 6 months |
| 4 | Community | Curated forums, patient advocacy groups | 3 months |
| 5 | Anecdotal | Reddit posts, individual testimonials | 1 month |
pub enum SourceClass {
Regulatory, // Tier 0: Highest authority, never decays
Clinical, // Tier 1: Peer-reviewed research
Observational, // Tier 2: Real-world evidence
Expert, // Tier 3: Professional opinions (default)
Community, // Tier 4: Curated community knowledge
Anecdotal, // Tier 5: Individual reports, fast decay
}
impl SourceClass {
pub fn tier(&self) -> u8; // Returns 0-5
pub fn default_decay_days(&self) -> Option<u32>;
pub fn authority_weight(&self) -> f32; // 1.0 for Regulatory, 0.1 for Anecdotal
}
Key Benefits:
- Indexing:
SC:{source_class}index enables "show me only regulatory sources" - Decay rates: Anecdotal claims decay faster than clinical evidence
- Trust weighting: Lenses can weight sources by authority tier in conflict resolution
SignatureEntry
Cryptographic proof that an agent vouches for an assertion:
pub struct SignatureEntry {
pub agent_id: [u8; 32], // Ed25519 public key
pub signature: [u8; 64], // Ed25519 signature over assertion content
pub timestamp: u64, // When the agent signed
}
The Vote (High-Velocity Consensus)
Votes are separated from assertions to enable thousands of agents to vote simultaneously without lock contention (the "Ballot Box" pattern).
A vote is not just "I agree" - it's a cryptographic witness: "I saw this exact text at this URL at this time." This enables browser extension products where votes represent observations, not opinions.
pub struct Vote {
/// Hash of the assertion being voted on
pub assertion_hash: Hash,
/// Ed25519 public key of the voter
pub agent_id: [u8; 32],
/// Weight of the vote (0.0 = reject, 1.0 = full endorsement)
pub weight: f32,
/// Signature over the assertion_hash
pub signature: [u8; 64],
/// When the vote was cast
pub timestamp: u64,
/// The URL where the claim was observed (optional)
/// Enables provenance tracking: "I saw this at example.com/article"
pub source_url: Option<String>,
/// Optional context (page snippet, etc.) stored as bytes
/// Same pattern as source_metadata on Assertion for rkyv zero-copy
pub observed_context: Option<Vec<u8>>,
}
Key Insight: Votes are append-only. An agent can change their vote by submitting a new one with a later timestamp.
Provenance Witness: The source_url and observed_context fields transform votes from opinions into observations, enabling the browser extension to count "How many people saw this claim on this page?" rather than just "How many people agree?"
The Epoch (Paradigm Shifts)
Epochs represent distinct periods of truth. When knowledge paradigms shift, old epochs can be superseded.
pub struct Epoch {
pub id: EpochId,
pub name: String, // "Pre-2024", "Newtonian"
pub supersedes: Option<EpochId>, // What this replaces
pub supersession_type: Option<SupersessionType>,
pub start_timestamp: u64,
pub end_timestamp: Option<u64>,
}
pub enum SupersessionType {
Invalidation, // Old epoch was factually wrong (e.g., "Earth is flat")
Temporal, // Old epoch was correct but outdated (e.g., "President is Obama")
Refinement, // Old epoch was a simplification (e.g., Newtonian → Relativity)
}
Query Results
MaterializedView (O(1) Winner Lookup)
Pre-computed resolution stored at MV:{subject}:{predicate}:
pub struct MaterializedView {
/// The winning assertion from lens resolution
pub winner: Assertion,
/// Which lens produced this (e.g., "VoteAwareConsensus")
pub lens_name: String,
/// Confidence in the resolution (0.0 to 1.0)
pub resolution_confidence: f32,
/// How many candidates were considered
pub candidates_count: usize,
/// When this view was computed
pub materialized_at: u64,
}
ConflictAnalysis (Trust but Verify)
For the SkepticLens - surfaces all competing claims instead of picking a winner:
pub struct ConflictAnalysis {
/// Overall status: Unanimous, Agreed, or Contested
pub status: ResolutionStatus,
/// Conflict score (0.0 = unanimous, 1.0 = maximum chaos)
/// Calculated using normalized Shannon entropy
pub conflict_score: f32,
/// All distinct claims, ranked by weight_share descending
pub claims: Vec<ClaimSummary>,
/// Total candidates considered
pub candidates_count: usize,
}
pub enum ResolutionStatus {
Unanimous, // All agree (entropy < 0.1)
Agreed, // Strong majority (entropy < 0.4)
Contested, // Significant disagreement (entropy >= 0.4)
}
ClaimSummary
A single competing claim within a ConflictAnalysis:
pub struct ClaimSummary {
/// The claimed value
pub value: ObjectValue,
/// This claim's share of total support (0.0 to 1.0)
pub weight_share: f32,
/// Number of assertions making this claim
pub assertion_count: u32,
/// Hash of the highest-confidence assertion (for drill-down)
pub representative_hash: Hash,
/// Source provenance
pub source: SourceSummary,
/// Agents who signed assertions for this claim
pub supporting_agents: Vec<AgentSummary>,
}
SourceSummary & AgentSummary
Provenance types for "show me the proof" UX:
pub struct SourceSummary {
pub source_hash: Hash, // Hash of source document
pub visual_hash: Option<PHash>, // Visual anchor (screenshot)
}
pub struct AgentSummary {
pub agent_id: [u8; 32], // Agent's public key
pub trust_score: f32, // Trust score at query time
}
Query Audit Trail
Every query is logged for "Why did you think that?" debugging:
pub struct QueryAudit {
pub query_id: QueryId,
pub agent_id: Option<[u8; 32]>, // Who queried (from X-Agent-Id header)
pub timestamp: u64,
pub params: QueryParams,
pub result_hash: Option<Hash>, // Winning assertion hash
pub result_confidence: f32,
pub contributing_assertions: Vec<ContributingAssertion>,
}
pub struct QueryParams {
pub subject: Option<EntityId>,
pub predicate: Option<RelationId>,
pub lifecycle: Option<LifecycleStage>,
pub epoch: Option<EpochId>,
pub lens: Option<String>,
}
pub struct ContributingAssertion {
pub assertion_hash: Hash,
pub weight: f32, // How much this influenced the result
pub source_hash: Hash,
pub lifecycle: LifecycleStage,
}
Storage Layout
Key patterns in the KV store:
| Key Pattern | Value | Purpose |
|---|---|---|
H:{hash} |
Serialized Assertion | Primary assertion storage |
S:{subject} |
Vec<Hash> |
Subject index |
SP:{subject}:{predicate} |
Vec<Hash> |
Compound index (O(1) lookup) |
MV:{subject}:{predicate} |
MaterializedView | Pre-computed winner |
V:{assertion_hash}:{vote_hash} |
Vote | Individual votes |
VC:{assertion_hash} |
u64 | Vote count cache |
VW:{assertion_hash} |
f32 | Aggregate vote weight cache |
TR:{agent_id} |
TrustRank | Agent reputation |
TP:{pack_id} |
TrustPack | Curated agent lists |
AUD:{query_id} |
QueryAudit | Query audit record |
E:{epoch_id} |
Epoch | Epoch definitions |
The Trust Pack (Curator Economy)
Trust Packs are the "App Store for Trust" - curated lists of trusted agents that filter consensus through domain expertise.
pub struct TrustPack {
/// Content-addressed pack ID (BLAKE3 hash)
pub id: PackId,
/// Human-readable name (e.g., "Mayo_Clinic_Experts")
pub name: String,
/// Ed25519 public key of the pack maintainer
pub maintainer: [u8; 32],
/// Agent public keys in this pack
/// Future: Replace with RoaringBitmap for O(1) membership
pub agents: Vec<[u8; 32]>,
/// Unix timestamp when pack was created
pub created_at: u64,
/// Unix timestamp of last modification
pub updated_at: u64,
}
Key Methods:
add_agent(agent_id)- Idempotent agent additionremove_agent(agent_id)- Safe removalcontains_agent(agent_id) -> bool- Membership check
Use Case: Users subscribe to packs like "Skeptical Cardio Pack" to filter GLP-1 side effect claims through vetted cardiologists.
Serialization
All types use rkyv for zero-copy deserialization:
use stemedb_core::serde::{serialize, deserialize};
// Serialize
let bytes: Vec<u8> = serialize(&assertion)?;
// Deserialize (zero-copy when possible)
let assertion: Assertion = deserialize(&bytes)?;
Critical Rule: Never use raw AllocSerializer in production code. Always use stemedb_core::serde::{serialize, deserialize}.
Relationship Diagram
┌─────────────────────────────────────────────┐
│ ASSERTION │
│ ┌─────────┐ ┌───────────┐ ┌─────────────┐ │
│ │ subject │ │ predicate │ │ object │ │
│ └─────────┘ └───────────┘ └─────────────┘ │
│ │
│ ┌─────────────────┐ ┌──────────────────┐ │
│ │ source_hash │ │ signatures[] │ │
│ └────────┬────────┘ └────────┬─────────┘ │
│ │ │ │
└───────────┼────────────────────┼────────────┘
│ │
┌───────────▼───────┐ ┌────────▼────────┐
│ SOURCE DOCUMENT │ │ AGENTS │
│ (PDF, URL...) │ │ (Ed25519 keys) │
└───────────────────┘ └────────┬────────┘
│
┌────────▼────────┐
│ TRUST RANK │
│ (reputation) │
└─────────────────┘
┌─────────────────┐ ┌─────────────────┐
│ VOTE │◄────────│ ASSERTION │
│ (Ballot Box) │ votes │ (target) │
│ weight: 0.0-1.0│ on │ │
└─────────────────┘ └─────────────────┘
┌─────────────────┐ ┌─────────────────┐
│ EPOCH B │◄────────│ EPOCH A │
│ supersedes: A │ older │ │
│ type: Temporal │ epoch │ │
└─────────────────┘ └─────────────────┘
API Representation
All binary data (hashes, signatures, agent IDs) is hex-encoded in JSON APIs:
{
"subject": "Semaglutide",
"predicate": "muscle_effect",
"object": { "type": "Text", "value": "Significant loss" },
"source_hash": "a1b2c3d4e5f6...",
"signatures": [
{
"agent_id": "deadbeef...",
"signature": "cafebabe...",
"timestamp": 1706745600
}
],
"confidence": 0.85,
"timestamp": 1706745600
}