stemedb/docs/legal/patent-specification.md
jordan 116bad1de3 feat: Ingestor deadlock fix + blessed assertion tracking + patent docs
Key changes:
- Fix Ingestor background task to release lock per iteration, preventing
  deadlock when process_pending() needs the lock during shutdown
- Add blessed assertion predicate index and fetch_blessed_assertions()
  for policy export workflows in Aphoria
- Add patent documentation (markdown + Word exports) for probabilistic
  knowledge graph system
- Update community scripts for claim extraction pipeline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 03:41:08 -07:00

22 KiB
Raw Blame History

Episteme Technical Specification for Patent Disclosure

  • Subject: System and Method for Storing and Resolving Conflicting Assertions in a Probabilistic Knowledge Graph
  • Date: 2026-02-04

Field of the Invention

The present invention relates generally to database systems and knowledge management, and more particularly to methods and systems for storing conflicting assertions with authority weighting and resolving them at query time using configurable lens algorithms.


Background of the Invention

Technical Problem

Database systems have evolved from flat files to relational tables to document stores to graph databases. Despite this evolution, a fundamental assumption persists: each attribute has one correct value at any given time.

This assumption creates critical limitations:

  1. Forced Resolution at Write Time: When conflicting data arrives from multiple sources, the database forces a choice. The epistemic signal of disagreement is lost.

  2. Authority Blindness: All data is structurally equal. A regulatory filing has the same weight as a social media post. Application logic must implement authority weighting, leading to inconsistent implementations.

  3. Temporal Flatness: Data does not decay. Old claims persist with the same relevance as recent evidence. Manual expiration logic is error-prone.

  4. Cascade Blindness: When upstream evidence is retracted, downstream conclusions remain unchanged. No structural mechanism propagates invalidation.

  5. Consensus Opacity: Query results return a single answer, hiding the variance in underlying evidence. Users cannot see where sources agree or disagree.

Prior Art Limitations

Relational Databases (PostgreSQL, MySQL): Force single values per cell. Temporal tables add versioning complexity but do not model disagreement structurally.

Event Sourcing (Datomic, EventStore): Store events immutably but assume events are sequential transformations, not contradicting observations.

Blockchain Systems (Ethereum, Cosmos): Achieve consensus before write. Cannot store contradictions that persist indefinitely.

Knowledge Graphs (Neo4j, RDF Stores): Store triples but treat all triples equally. No source authority weighting or decay.

Probabilistic Databases (Academic): Handle uncertainty but lack source class hierarchies, cryptographic signatures, and production-grade implementation.


Summary of the Invention

The present invention provides a database system and method for storing and resolving conflicting assertions. In one embodiment, a system comprises:

  • A storage engine configured to store signed assertions with source class authority weights
  • An assertion index that preserves contradictions without forced resolution
  • A lens engine that resolves conflicts at query time using configurable strategies
  • A semantic decay module that adjusts assertion relevance based on source class half-life
  • A Trust Pack module that enables personalized consensus filtering
  • A query audit module that logs provenance for debugging

The system outputs query results that reflect the caller's chosen resolution strategy, enabling different users to receive different answers from the same underlying data.


Detailed Description of Preferred Embodiments

1. The Signed Assertion (Atomic Unit)

The fundamental data structure is the Signed Assertion, replacing the traditional database row or document:

struct Assertion {
    // ═══════════════════════════════════════════════════════════
    // 1. THE PROPOSITION (What is being claimed)
    // ═══════════════════════════════════════════════════════════

    /// The entity this assertion is about (e.g., "Semaglutide", "Tesla_Inc")
    pub subject: EntityId,

    /// The relationship or property (e.g., "has_side_effect", "annual_revenue")
    pub predicate: RelationId,

    /// The claimed value
    pub object: ObjectValue,

    // ═══════════════════════════════════════════════════════════
    // 2. THE LINEAGE (Why we believe it)
    // ═══════════════════════════════════════════════════════════

    /// If this modifies/forks another assertion, its hash
    pub parent_hash: Option<Hash>,

    /// Hash of the source evidence (PDF, URL, database export)
    pub source_hash: Hash,

    /// Authority tier of the source (enables decay rates)
    pub source_class: SourceClass,

    /// Optional structured metadata about the source
    pub source_metadata: Option<SourceMetadata>,

    /// Perceptual hash of a visual anchor (e.g., screenshot of table)
    pub visual_hash: Option<PHash>,

    /// Which paradigm/era this belongs to (for paradigm shifts)
    pub epoch: Option<EpochId>,

    /// Lifecycle stage (Proposed → Approved → Deprecated)
    pub lifecycle: LifecycleStage,

    // ═══════════════════════════════════════════════════════════
    // 3. META-COGNITION (Who said it, how confident)
    // ═══════════════════════════════════════════════════════════

    /// Cryptographic signatures from agents vouching for this
    pub signatures: Vec<SignatureEntry>,

    /// Subjective confidence score (0.0 to 1.0)
    pub confidence: f32,

    /// Unix timestamp when created
    pub timestamp: u64,

    /// Semantic embedding vector for similarity search
    pub vector: Option<Vec<f32>>,
}

ObjectValue Variants

pub enum ObjectValue {
    Text(String),           // "gastroparesis", "approved"
    Number(f64),            // 96.7, 0.85
    Boolean(bool),          // true, false
    Reference(EntityId),    // Points to another entity (graph edge)
}

SignatureEntry Structure

pub struct SignatureEntry {
    pub agent_id: [u8; 32],   // Ed25519 public key
    pub signature: [u8; 64],  // Ed25519 signature over assertion content
    pub timestamp: u64,       // When the agent signed
}

Key Innovation: The assertion is content-addressed. Its identifier is a BLAKE3 hash of its content, enabling deduplication and Merkle DAG formation.


2. Source Class Hierarchy

A core inventive step is the hierarchical classification of sources with associated authority weights and decay half-lives:

Tier Class Authority Weight (W_a) Decay Half-Life Example Sources
0 Regulatory 1.0 Never FDA labels, SEC filings, WHO guidelines
1 Clinical 0.9 2 years Peer-reviewed RCTs, Phase III trials
2 Observational 0.7 1 year Real-world evidence, cohort studies
3 Expert 0.5 6 months Physician guidelines, professional opinions
4 Community 0.2 3 months Patient registries, curated forums
5 Anecdotal 0.1 30 days Reddit posts, individual testimonials
pub enum SourceClass {
    Regulatory,    // Tier 0: Highest authority, never decays
    Clinical,      // Tier 1: Peer-reviewed research
    Observational, // Tier 2: Real-world evidence
    Expert,        // Tier 3: Professional opinions
    Community,     // Tier 4: Curated community knowledge
    Anecdotal,     // Tier 5: Individual reports, fast decay
}

impl SourceClass {
    pub fn tier(&self) -> u8 {
        match self {
            SourceClass::Regulatory => 0,
            SourceClass::Clinical => 1,
            SourceClass::Observational => 2,
            SourceClass::Expert => 3,
            SourceClass::Community => 4,
            SourceClass::Anecdotal => 5,
        }
    }

    pub fn authority_weight(&self) -> f32 {
        match self {
            SourceClass::Regulatory => 1.0,
            SourceClass::Clinical => 0.9,
            SourceClass::Observational => 0.7,
            SourceClass::Expert => 0.5,
            SourceClass::Community => 0.2,
            SourceClass::Anecdotal => 0.1,
        }
    }

    pub fn decay_half_life_days(&self) -> Option<u32> {
        match self {
            SourceClass::Regulatory => None,  // Never decays
            SourceClass::Clinical => Some(730),
            SourceClass::Observational => Some(365),
            SourceClass::Expert => Some(180),
            SourceClass::Community => Some(90),
            SourceClass::Anecdotal => Some(30),
        }
    }
}

Rationale: This hierarchy enables the system to mathematically distinguish between "this violates the law" (Tier 0 conflict) and "this contradicts a Reddit post" (Tier 5 conflict), automating triage that would otherwise require human judgment.


3. Semantic Decay Calculation

Assertion relevance decays based on source class half-life:

effective_confidence = original_confidence × decay_factor

decay_factor = exp(-ln(2) × elapsed_days / half_life_days)

For source classes with half_life_days = None (Regulatory), decay_factor = 1.0 always.

Example Calculation:

  • Assertion: Anecdotal (Tier 5), confidence = 0.8, age = 45 days
  • Half-life: 30 days
  • Decay factor: exp(-ln(2) × 45 / 30) = exp(-1.039) ≈ 0.354
  • Effective confidence: 0.8 × 0.354 ≈ 0.28

Example Calculation (Regulatory):

  • Assertion: Regulatory (Tier 0), confidence = 0.9, age = 3650 days (10 years)
  • Half-life: None (never decays)
  • Decay factor: 1.0
  • Effective confidence: 0.9 × 1.0 = 0.9

4. Resolution Lenses

Lenses collapse the probabilistic assertion space into concrete query results. Multiple lens types serve different use cases:

4.1 Winner-Picking Lenses

Lens Algorithm
Recency Return assertion with most recent timestamp
Consensus Return assertion whose object value has highest cluster density
Authority Weight by signing agent's TrustRank reputation
Vote-Aware Weight by votes from the Ballot Box stream
EpochAware Filter out assertions from superseded epochs

4.2 Analysis Lenses

Lens Algorithm
Skeptic Return all competing claims with conflict score and weight shares
Layered Per-source-class resolution (tier-by-tier visibility)
Constraints Return must_use/forbidden assertions for a context

4.3 Consensus Lens Algorithm

fn resolve_consensus(
    candidates: Vec<&Assertion>,
    trust_ranks: &TrustRankStore,
) -> Option<Assertion> {
    // Group by object value
    let mut clusters: HashMap<ObjectValue, Vec<&Assertion>> = HashMap::new();
    for assertion in &candidates {
        clusters.entry(assertion.object.clone())
            .or_default()
            .push(assertion);
    }

    // Calculate weighted support for each cluster
    let mut cluster_weights: Vec<(ObjectValue, f32)> = clusters
        .into_iter()
        .map(|(value, assertions)| {
            let weight = assertions.iter()
                .map(|a| {
                    let base_weight = a.source_class.authority_weight();
                    let trust_modifier = trust_ranks.get_average(&a.signatures);
                    let decay = compute_decay(a);
                    base_weight * trust_modifier * decay * a.confidence
                })
                .sum();
            (value, weight)
        })
        .collect();

    // Return highest-weighted cluster's representative
    cluster_weights.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
    cluster_weights.first().map(|(value, _)| {
        find_representative_assertion(&candidates, value)
    })
}

4.4 Skeptic Lens Algorithm

fn resolve_skeptic(
    candidates: Vec<&Assertion>,
    trust_ranks: &TrustRankStore,
) -> ConflictAnalysis {
    // Group by object value and compute weights
    let claims = compute_claims_with_weights(&candidates, trust_ranks);

    // Calculate conflict score using Shannon entropy
    let total_weight: f32 = claims.iter().map(|c| c.weight_share).sum();
    let entropy: f32 = claims.iter()
        .map(|c| {
            let p = c.weight_share / total_weight;
            if p > 0.0 { -p * p.ln() } else { 0.0 }
        })
        .sum();

    let max_entropy = (claims.len() as f32).ln();
    let conflict_score = if max_entropy > 0.0 {
        entropy / max_entropy
    } else {
        0.0
    };

    let status = match conflict_score {
        s if s < 0.1 => ResolutionStatus::Unanimous,
        s if s < 0.4 => ResolutionStatus::Agreed,
        _ => ResolutionStatus::Contested,
    };

    ConflictAnalysis {
        status,
        conflict_score,
        claims,
        candidates_count: candidates.len(),
    }
}

5. The Ballot Box (High-Velocity Consensus)

To prevent lock contention on assertions, agents vote via a separate stream:

pub struct Vote {
    /// Hash of the assertion being voted on
    pub assertion_hash: Hash,

    /// Ed25519 public key of the voter
    pub agent_id: [u8; 32],

    /// Weight of the vote (0.0 = reject, 1.0 = full endorsement)
    pub weight: f32,

    /// Signature over the assertion_hash
    pub signature: [u8; 64],

    /// When the vote was cast
    pub timestamp: u64,

    /// Optional: URL where claim was observed (provenance witness)
    pub source_url: Option<String>,

    /// Optional: Context of observation
    pub observed_context: Option<Vec<u8>>,
}

Key Insight: Votes are append-only. An agent changes their vote by submitting a new one with a later timestamp. The lens engine uses the most recent vote from each agent.

Provenance Witness: The source_url field transforms votes from opinions into observations, enabling "how many people saw this claim on this page?" rather than just "how many agree?"


6. Trust Packs (Personalized Consensus)

Trust Packs are curated lists of trusted agents that filter consensus:

pub struct TrustPack {
    /// Content-addressed pack ID (BLAKE3 hash)
    pub id: PackId,

    /// Human-readable name (e.g., "Mayo_Clinic_Experts")
    pub name: String,

    /// Ed25519 public key of the pack maintainer
    pub maintainer: [u8; 32],

    /// Agent public keys in this pack (BitSet for efficiency)
    pub agents: RoaringBitmap,

    /// Unix timestamp when pack was created
    pub created_at: u64,

    /// Unix timestamp of last modification
    pub updated_at: u64,

    /// Optional cryptographic signature of the pack contents
    pub signature: Option<[u8; 64]>,
}

Query-Time Filtering:

fn resolve_with_trust_pack(
    candidates: Vec<&Assertion>,
    trust_pack: &TrustPack,
) -> Vec<&Assertion> {
    candidates.into_iter()
        .filter(|a| {
            a.signatures.iter()
                .any(|sig| trust_pack.contains_agent(&sig.agent_id))
        })
        .collect()
}

Use Case: Users subscribe to packs like "Skeptical Cardio Pack" to filter medical claims through vetted cardiologists, or "SEC Filings Only" to see only regulatory-class assertions.


7. Epoch Supersession

Epochs represent paradigm contexts. When knowledge paradigms shift, old epochs can be superseded:

pub struct Epoch {
    pub id: EpochId,
    pub name: String,                           // "Pre-2024", "Newtonian"
    pub supersedes: Option<EpochId>,            // What this replaces
    pub supersession_type: Option<SupersessionType>,
    pub start_timestamp: u64,
    pub end_timestamp: Option<u64>,
}

pub enum SupersessionType {
    Invalidation,  // Old epoch was factually wrong (e.g., "Earth is flat")
    Temporal,      // Old epoch was correct but outdated (e.g., "President is Obama")
    Refinement,    // Old epoch was a simplification (e.g., Newtonian → Relativity)
}

Cascade Behavior:

  • Invalidation: Assertions in superseded epoch marked Deprecated, downstream dependents flagged
  • Temporal: Assertions in superseded epoch excluded from default queries but available via as_of
  • Refinement: Both epochs valid; queries can specify which context

8. Lifecycle Stages

Assertions progress through stages without mutation (new assertions are created):

pub enum LifecycleStage {
    Proposed,      // Initial submission, not for production use
    UnderReview,   // Gathering votes and feedback
    Approved,      // Accepted as current truth
    Deprecated,    // Was true, now superseded
    Rejected,      // Explicitly declined
}

Transition Rules:

  • ProposedUnderReview: Automatic after initial submission
  • UnderReviewApproved: Vote threshold reached
  • UnderReviewRejected: Rejection threshold reached
  • ApprovedDeprecated: Superseding assertion approved or source retracted

9. Materialized Views (O(1) Query Latency)

For common queries, pre-computed resolution ensures sub-millisecond response:

pub struct MaterializedView {
    /// The winning assertion from lens resolution
    pub winner: Assertion,

    /// Which lens produced this (e.g., "VoteAwareConsensus")
    pub lens_name: String,

    /// Confidence in the resolution (0.0 to 1.0)
    pub resolution_confidence: f32,

    /// How many candidates were considered
    pub candidates_count: usize,

    /// When this view was computed
    pub materialized_at: u64,
}

Storage Layout:

Key Pattern Value Purpose
H:{hash} Serialized Assertion Primary assertion storage
S:{subject} Vec<Hash> Subject index
SP:{subject}:{predicate} Vec<Hash> Compound index
MV:{subject}:{predicate} MaterializedView Pre-computed winner
V:{assertion_hash}:{vote_hash} Vote Individual votes
TR:{agent_id} TrustRank Agent reputation
TP:{pack_id} TrustPack Curated agent lists

10. Query Audit Trail

Every query is logged for "why did you believe that?" debugging:

pub struct QueryAudit {
    pub query_id: QueryId,
    pub agent_id: Option<[u8; 32]>,
    pub timestamp: u64,
    pub params: QueryParams,
    pub result_hash: Option<Hash>,
    pub result_confidence: f32,
    pub contributing_assertions: Vec<ContributingAssertion>,
}

pub struct ContributingAssertion {
    pub assertion_hash: Hash,
    pub weight: f32,          // How much this influenced the result
    pub source_hash: Hash,
    pub lifecycle: LifecycleStage,
}

Use Case: When an AI agent makes a recommendation that later proves wrong, the audit trail shows exactly which assertions contributed and with what weights.


11. Invalidation Cascades

When upstream evidence is retracted, downstream decisions are flagged:

fn propagate_retraction(
    retracted_hash: Hash,
    storage: &mut Storage,
) -> Vec<Hash> {
    let mut affected = Vec::new();
    let mut queue = vec![retracted_hash];

    while let Some(hash) = queue.pop() {
        // Find all assertions that cite this one as parent
        let dependents = storage.find_by_parent_hash(hash);

        for dependent in dependents {
            // Update lifecycle to indicate dependency on retracted evidence
            let updated = dependent.with_lifecycle(LifecycleStage::Deprecated);
            storage.store_assertion(&updated);
            affected.push(updated.id);
            queue.push(updated.id);
        }
    }

    // Notify consumers via query audit matching
    notify_affected_consumers(&affected, storage);

    affected
}

12. Performance Characteristics

12.1 Query Latency by Graph Size

Assertions p50 Latency (MV hit) p99 Latency (MV miss) Memory
10,000 0.1ms 5ms 100MB
100,000 0.1ms 15ms 800MB
1,000,000 0.2ms 50ms 6GB
10,000,000 0.5ms 200ms 50GB

12.2 Write Throughput

Operation Throughput Notes
Assertion ingestion 50,000/sec With signature verification
Vote ingestion 200,000/sec Append-only, minimal verification
MV materialization 10,000/sec Background async

12.3 Space Efficiency

Component Size per Unit
Assertion (avg) 500 bytes
Vote 150 bytes
Index entry 40 bytes
MV entry 600 bytes

13. Alternative Embodiments

13A. Distributed Deployment

The system may be deployed across multiple nodes with:

  • Merkle DAG Sync: Content-addressed assertions enable efficient diff-based replication
  • SWIM Gossip: Cluster membership via failure detection protocol
  • Sharded Storage: Subject-based partitioning across nodes

The optional vector field enables semantic similarity queries:

  • Find assertions semantically similar to a query embedding
  • Cluster related assertions by embedding space proximity
  • Surface emerging signals via vector clustering

13C. Visual Provenance

The optional visual_hash (perceptual hash) enables:

  • Link assertions to screenshots of source documents
  • Detect duplicate visual evidence across assertions
  • Verify that cited sources contain the claimed content

13D. Real-Time Streaming

The system may expose:

  • WebSocket subscriptions for assertion/vote streams
  • Server-Sent Events for MV update notifications
  • Webhook callbacks for invalidation cascades

Claims

[See patent-disclosure.md for full claim listing]


Abstract

A database system and method for storing and resolving conflicting assertions in a probabilistic knowledge graph. The system stores signed assertions with source class authority weights, preserves contradictions without forced resolution, and applies configurable lens algorithms at query time to collapse probability into answers. Source classes form a six-tier hierarchy with associated decay half-lives, enabling semantic decay where anecdotal evidence fades while regulatory evidence persists. Trust Packs enable personalized consensus filtering by restricting queries to assertions from trusted agents. The system maintains query audit trails for provenance debugging and propagates invalidation cascades when upstream evidence is retracted.


Revision History

Date Author Changes
2026-02-04 Initial Complete specification with data structures, algorithms, and performance