# StemeDB Data Structures > **Last Updated:** 2026-02-19 > **Source:** `crates/stemedb-core/src/types.rs` This document describes the core data structures in StemeDB (Episteme). These types form the foundation of the "Git for Truth" knowledge graph. --- ## Design Principles 1. **Append-Only**: Data is never mutated. New assertions create new records. 2. **Content-Addressed**: Every assertion's ID is a BLAKE3 hash of its content. 3. **Zero-Copy**: Uses `rkyv` for serialization - data can be read directly from disk without parsing. 4. **Provenance-First**: Every fact carries its source, signers, and confidence. --- ## Primitive Types ```rust pub type Hash = [u8; 32]; // BLAKE3 256-bit hash pub type PHash = [u8; 8]; // Perceptual hash for images (8 bytes) pub type EntityId = String; // Subject or object identifier pub type RelationId = String; // Predicate identifier pub type EpochId = Hash; // Paradigm/era identifier pub type QueryId = Hash; // Query audit record identifier ``` --- ## The Assertion (Atomic Unit of Knowledge) The `Assertion` is the fundamental unit. It represents a single claim about the world. ```rust pub struct Assertion { // ═══════════════════════════════════════════════════════════ // 1. THE FACT (What is being claimed) // ═══════════════════════════════════════════════════════════ /// The entity this assertion is about (e.g., "Semaglutide", "Tesla_Inc") pub subject: EntityId, /// The relationship or property (e.g., "has_side_effect", "annual_revenue") pub predicate: RelationId, /// The claimed value pub object: ObjectValue, // ═══════════════════════════════════════════════════════════ // 2. THE LINEAGE (Why we believe it) // ═══════════════════════════════════════════════════════════ /// If this modifies/forks another assertion, its hash pub parent_hash: Option, /// Hash of the source evidence (PDF, URL, database export) pub source_hash: Hash, /// Authority tier of the source (enables indexing and decay rates) pub source_class: SourceClass, /// Perceptual hash of a visual anchor (e.g., screenshot of table) pub visual_hash: Option, /// Which paradigm/era this belongs to (for paradigm shifts) pub epoch: Option, /// Lifecycle stage (Proposed → Approved → Deprecated) pub lifecycle: LifecycleStage, // ═══════════════════════════════════════════════════════════ // 3. META-COGNITION (Who said it, how sure are they) // ═══════════════════════════════════════════════════════════ /// Cryptographic signatures from agents vouching for this pub signatures: Vec, /// Subjective confidence score (0.0 to 1.0) pub confidence: f32, /// Unix timestamp when created pub timestamp: u64, /// Semantic embedding vector for similarity search pub vector: Option>, } ``` ### ObjectValue The value in a subject-predicate-object triple: ```rust pub enum ObjectValue { Text(String), // "muscle loss" Number(f64), // 96.7 Boolean(bool), // true Reference(EntityId), // Points to another entity (graph edge) } ``` ### LifecycleStage Assertions progress through stages (as new assertions, not mutations): ``` Proposed → UnderReview → Approved ↘ Rejected ↘ Deprecated ``` ```rust pub enum LifecycleStage { Proposed, // Initial idea, not for production use UnderReview, // Gathering votes and feedback Approved, // Accepted as current truth Deprecated, // Was true, now superseded Rejected, // Explicitly declined } ``` ### SourceClass Authority tier classification for sources. Enables indexing by tier and tier-based decay rates: | Tier | Class | Example | Default Decay | |------|-------|---------|---------------| | 0 | Regulatory | FDA, EMA, WHO | Never | | 1 | Clinical | Phase III trials, peer-reviewed RCTs | 2 years | | 2 | Observational | Real-world evidence, cohort studies | 1 year | | 3 | Expert | Medical professional opinions, guidelines | 6 months | | 4 | Community | Curated forums, patient advocacy groups | 3 months | | 5 | Anecdotal | Reddit posts, individual testimonials | 1 month | ```rust pub enum SourceClass { Regulatory, // Tier 0: Highest authority, never decays Clinical, // Tier 1: Peer-reviewed research Observational, // Tier 2: Real-world evidence Expert, // Tier 3: Professional opinions (default) Community, // Tier 4: Curated community knowledge Anecdotal, // Tier 5: Individual reports, fast decay } impl SourceClass { pub fn tier(&self) -> u8; // Returns 0-5 pub fn default_decay_days(&self) -> Option; pub fn authority_weight(&self) -> f32; // 1.0 for Regulatory, 0.1 for Anecdotal } ``` **Key Benefits:** - **Indexing**: `SC:{source_class}` index enables "show me only regulatory sources" - **Decay rates**: Anecdotal claims decay faster than clinical evidence - **Trust weighting**: Lenses can weight sources by authority tier in conflict resolution ### SignatureEntry Cryptographic proof that an agent vouches for an assertion: ```rust pub struct SignatureEntry { pub agent_id: [u8; 32], // Ed25519 public key pub signature: [u8; 64], // Ed25519 signature over assertion content pub timestamp: u64, // When the agent signed } ``` --- ## The Vote (High-Velocity Consensus) Votes are separated from assertions to enable thousands of agents to vote simultaneously without lock contention (the "Ballot Box" pattern). A vote is not just "I agree" - it's a **cryptographic witness**: "I saw this exact text at this URL at this time." This enables browser extension products where votes represent observations, not opinions. ```rust pub struct Vote { /// Hash of the assertion being voted on pub assertion_hash: Hash, /// Ed25519 public key of the voter pub agent_id: [u8; 32], /// Weight of the vote (0.0 = reject, 1.0 = full endorsement) pub weight: f32, /// Signature over the assertion_hash pub signature: [u8; 64], /// When the vote was cast pub timestamp: u64, /// The URL where the claim was observed (optional) /// Enables provenance tracking: "I saw this at example.com/article" pub source_url: Option, /// Optional context (page snippet, etc.) stored as bytes /// Same pattern as source_metadata on Assertion for rkyv zero-copy pub observed_context: Option>, } ``` **Key Insight**: Votes are append-only. An agent can change their vote by submitting a new one with a later timestamp. **Provenance Witness**: The `source_url` and `observed_context` fields transform votes from opinions into observations, enabling the browser extension to count "How many people saw this claim on this page?" rather than just "How many people agree?" --- ## The Epoch (Paradigm Shifts) Epochs represent distinct periods of truth. When knowledge paradigms shift, old epochs can be superseded. ```rust pub struct Epoch { pub id: EpochId, pub name: String, // "Pre-2024", "Newtonian" pub supersedes: Option, // What this replaces pub supersession_type: Option, pub start_timestamp: u64, pub end_timestamp: Option, } pub enum SupersessionType { Invalidation, // Old epoch was factually wrong (e.g., "Earth is flat") Temporal, // Old epoch was correct but outdated (e.g., "President is Obama") Refinement, // Old epoch was a simplification (e.g., Newtonian → Relativity) } ``` --- ## Query Results ### MaterializedView (O(1) Winner Lookup) Pre-computed resolution stored at `MV:{subject}:{predicate}`: ```rust pub struct MaterializedView { /// The winning assertion from lens resolution pub winner: Assertion, /// Which lens produced this (e.g., "VoteAwareConsensus") pub lens_name: String, /// Confidence in the resolution (0.0 to 1.0) pub resolution_confidence: f32, /// How many candidates were considered pub candidates_count: usize, /// When this view was computed pub materialized_at: u64, } ``` ### ConflictAnalysis (Trust but Verify) For the SkepticLens - surfaces all competing claims instead of picking a winner: ```rust pub struct ConflictAnalysis { /// Overall status: Unanimous, Agreed, or Contested pub status: ResolutionStatus, /// Conflict score (0.0 = unanimous, 1.0 = maximum chaos) /// Calculated using normalized Shannon entropy pub conflict_score: f32, /// All distinct claims, ranked by weight_share descending pub claims: Vec, /// Total candidates considered pub candidates_count: usize, } pub enum ResolutionStatus { Unanimous, // All agree (entropy < 0.1) Agreed, // Strong majority (entropy < 0.4) Contested, // Significant disagreement (entropy >= 0.4) } ``` ### ClaimSummary A single competing claim within a ConflictAnalysis: ```rust pub struct ClaimSummary { /// The claimed value pub value: ObjectValue, /// This claim's share of total support (0.0 to 1.0) pub weight_share: f32, /// Number of assertions making this claim pub assertion_count: u32, /// Hash of the highest-confidence assertion (for drill-down) pub representative_hash: Hash, /// Source provenance pub source: SourceSummary, /// Agents who signed assertions for this claim pub supporting_agents: Vec, } ``` ### SourceSummary & AgentSummary Provenance types for "show me the proof" UX: ```rust pub struct SourceSummary { pub source_hash: Hash, // Hash of source document pub visual_hash: Option, // Visual anchor (screenshot) } pub struct AgentSummary { pub agent_id: [u8; 32], // Agent's public key pub trust_score: f32, // Trust score at query time } ``` --- ## Query Audit Trail Every query is logged for "Why did you think that?" debugging: ```rust pub struct QueryAudit { pub query_id: QueryId, pub agent_id: Option<[u8; 32]>, // Who queried (from X-Agent-Id header) pub timestamp: u64, pub params: QueryParams, pub result_hash: Option, // Winning assertion hash pub result_confidence: f32, pub contributing_assertions: Vec, } pub struct QueryParams { pub subject: Option, pub predicate: Option, pub lifecycle: Option, pub epoch: Option, pub lens: Option, } pub struct ContributingAssertion { pub assertion_hash: Hash, pub weight: f32, // How much this influenced the result pub source_hash: Hash, pub lifecycle: LifecycleStage, } ``` --- ## Storage Layout Key patterns in the KV store: | Key Pattern | Value | Purpose | |-------------|-------|---------| | `H:{hash}` | Serialized Assertion | Primary assertion storage | | `S:{subject}` | `Vec` | Subject index | | `SP:{subject}:{predicate}` | `Vec` | Compound index (O(1) lookup) | | `MV:{subject}:{predicate}` | MaterializedView | Pre-computed winner | | `V:{assertion_hash}:{vote_hash}` | Vote | Individual votes | | `VC:{assertion_hash}` | u64 | Vote count cache | | `VW:{assertion_hash}` | f32 | Aggregate vote weight cache | | `TR:{agent_id}` | TrustRank | Agent reputation | | `TP:{pack_id}` | TrustPack | Curated agent lists | | `AUD:{query_id}` | QueryAudit | Query audit record | | `E:{epoch_id}` | Epoch | Epoch definitions | --- ## The Trust Pack (Curator Economy) Trust Packs are the "App Store for Trust" - curated lists of trusted agents that filter consensus through domain expertise. ```rust pub struct TrustPack { /// Content-addressed pack ID (BLAKE3 hash) pub id: PackId, /// Human-readable name (e.g., "Mayo_Clinic_Experts") pub name: String, /// Ed25519 public key of the pack maintainer pub maintainer: [u8; 32], /// Agent public keys in this pack /// Future: Replace with RoaringBitmap for O(1) membership pub agents: Vec<[u8; 32]>, /// Unix timestamp when pack was created pub created_at: u64, /// Unix timestamp of last modification pub updated_at: u64, } ``` **Key Methods:** - `add_agent(agent_id)` - Idempotent agent addition - `remove_agent(agent_id)` - Safe removal - `contains_agent(agent_id) -> bool` - Membership check **Use Case:** Users subscribe to packs like "Skeptical Cardio Pack" to filter GLP-1 side effect claims through vetted cardiologists. --- ## The SourceRecord (Source Registry) The Source Registry maps content-addressed source hashes to human-readable metadata. This enables the dashboard to show "FDA Approval Letter for Wegovy" instead of a raw BLAKE3 hash. ```rust pub struct SourceRecord { /// Content-addressed hash of the source (BLAKE3, 32 bytes). pub hash: [u8; 32], /// Human-readable label. pub label: String, /// Optional URL where the source can be accessed. pub url: Option, /// Authority tier (0-5), matching SourceClass. pub tier: u8, /// Current status (Active, Deprecated, Quarantined). pub status: SourceStatus, /// HLC timestamp when the record was created. pub created_at: u64, /// HLC timestamp of the last update. pub updated_at: u64, /// Optional curator notes about the source. pub notes: Option, /// Optional full-text content of the source document. /// Populated by pipelines that extract text from PDFs. /// Max size: 1 MB (MAX_SOURCE_CONTENT_LEN). pub content: Option, } ``` **Key Points:** - **Status lifecycle:** Active → Deprecated or Quarantined (curator-driven) - **Content field:** Stores extracted document text (e.g., from `pdftotext`). Stripped from list responses (`GET /v1/sources`) to avoid returning megabytes; included in single-source responses (`GET /v1/sources/{hash}`) - **rkyv compat:** Uses `deserialize_source_record_compat()` for backward compatibility with data written before the `content` field was added --- ## Serialization All types use `rkyv` for zero-copy deserialization: ```rust use stemedb_core::serde::{serialize, deserialize}; // Serialize let bytes: Vec = serialize(&assertion)?; // Deserialize (zero-copy when possible) let assertion: Assertion = deserialize(&bytes)?; ``` **Critical Rule**: Never use raw `AllocSerializer` in production code. Always use `stemedb_core::serde::{serialize, deserialize}`. ### Schema Evolution (rkyv Compat) rkyv does **not** support schema evolution. When a field is added to a struct, old data can't be deserialized with the new struct. The solution is a legacy compat pattern: | Type | Compat Function | Legacy Struct | |------|----------------|---------------| | `Assertion` | `deserialize_assertion_compat()` | `LegacyAssertion` (pre-`narrative`) | | `SourceRecord` | `deserialize_source_record_compat()` | `LegacySourceRecord` (pre-`content`) | All assertion deserialization should use `deserialize_assertion_compat()`. All source record deserialization should use `deserialize_source_record_compat()`. When adding fields to rkyv structs in the future, always add a legacy compat deserializer following this pattern. --- ## Relationship Diagram ``` ┌─────────────────────────────────────────────┐ │ ASSERTION │ │ ┌─────────┐ ┌───────────┐ ┌─────────────┐ │ │ │ subject │ │ predicate │ │ object │ │ │ └─────────┘ └───────────┘ └─────────────┘ │ │ │ │ ┌─────────────────┐ ┌──────────────────┐ │ │ │ source_hash │ │ signatures[] │ │ │ └────────┬────────┘ └────────┬─────────┘ │ │ │ │ │ └───────────┼────────────────────┼────────────┘ │ │ ┌───────────▼───────┐ ┌────────▼────────┐ │ SOURCE DOCUMENT │ │ AGENTS │ │ (PDF, URL...) │ │ (Ed25519 keys) │ └───────────────────┘ └────────┬────────┘ │ ┌────────▼────────┐ │ TRUST RANK │ │ (reputation) │ └─────────────────┘ ┌─────────────────┐ ┌─────────────────┐ │ VOTE │◄────────│ ASSERTION │ │ (Ballot Box) │ votes │ (target) │ │ weight: 0.0-1.0│ on │ │ └─────────────────┘ └─────────────────┘ ┌─────────────────┐ ┌─────────────────┐ │ EPOCH B │◄────────│ EPOCH A │ │ supersedes: A │ older │ │ │ type: Temporal │ epoch │ │ └─────────────────┘ └─────────────────┘ ``` --- ## API Representation All binary data (hashes, signatures, agent IDs) is hex-encoded in JSON APIs: ```json { "subject": "Semaglutide", "predicate": "muscle_effect", "object": { "type": "Text", "value": "Significant loss" }, "source_hash": "a1b2c3d4e5f6...", "signatures": [ { "agent_id": "deadbeef...", "signature": "cafebabe...", "timestamp": 1706745600 } ], "confidence": 0.85, "timestamp": 1706745600 } ``` --- ## See Also - [Architecture Overview](../architecture.md) - [Lens Documentation](../ai-lookup/services/lens.md) - [API Endpoints Guide](../.claude/guides/backend/api-endpoints.md)