jordan 116bad1de3 feat: Ingestor deadlock fix + blessed assertion tracking + patent docs

Key changes:
- Fix Ingestor background task to release lock per iteration, preventing
  deadlock when process_pending() needs the lock during shutdown
- Add blessed assertion predicate index and fetch_blessed_assertions()
  for policy export workflows in Aphoria
- Add patent documentation (markdown + Word exports) for probabilistic
  knowledge graph system
- Update community scripts for claim extraction pipeline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-02-04 03:41:08 -07:00

22 KiB

Raw Blame History

Episteme Technical Specification for Patent Disclosure

Subject: System and Method for Storing and Resolving Conflicting Assertions in a Probabilistic Knowledge Graph
Date: 2026-02-04

Field of the Invention

The present invention relates generally to database systems and knowledge management, and more particularly to methods and systems for storing conflicting assertions with authority weighting and resolving them at query time using configurable lens algorithms.

Background of the Invention

Technical Problem

Database systems have evolved from flat files to relational tables to document stores to graph databases. Despite this evolution, a fundamental assumption persists: each attribute has one correct value at any given time.

This assumption creates critical limitations:

Forced Resolution at Write Time: When conflicting data arrives from multiple sources, the database forces a choice. The epistemic signal of disagreement is lost.
Authority Blindness: All data is structurally equal. A regulatory filing has the same weight as a social media post. Application logic must implement authority weighting, leading to inconsistent implementations.
Temporal Flatness: Data does not decay. Old claims persist with the same relevance as recent evidence. Manual expiration logic is error-prone.
Cascade Blindness: When upstream evidence is retracted, downstream conclusions remain unchanged. No structural mechanism propagates invalidation.
Consensus Opacity: Query results return a single answer, hiding the variance in underlying evidence. Users cannot see where sources agree or disagree.

Prior Art Limitations

Relational Databases (PostgreSQL, MySQL): Force single values per cell. Temporal tables add versioning complexity but do not model disagreement structurally.

Event Sourcing (Datomic, EventStore): Store events immutably but assume events are sequential transformations, not contradicting observations.

Blockchain Systems (Ethereum, Cosmos): Achieve consensus before write. Cannot store contradictions that persist indefinitely.

Knowledge Graphs (Neo4j, RDF Stores): Store triples but treat all triples equally. No source authority weighting or decay.

Probabilistic Databases (Academic): Handle uncertainty but lack source class hierarchies, cryptographic signatures, and production-grade implementation.

Summary of the Invention

The present invention provides a database system and method for storing and resolving conflicting assertions. In one embodiment, a system comprises:

A storage engine configured to store signed assertions with source class authority weights
An assertion index that preserves contradictions without forced resolution
A lens engine that resolves conflicts at query time using configurable strategies
A semantic decay module that adjusts assertion relevance based on source class half-life
A Trust Pack module that enables personalized consensus filtering
A query audit module that logs provenance for debugging

The system outputs query results that reflect the caller's chosen resolution strategy, enabling different users to receive different answers from the same underlying data.

Detailed Description of Preferred Embodiments

1. The Signed Assertion (Atomic Unit)

The fundamental data structure is the Signed Assertion, replacing the traditional database row or document:

struct Assertion {
    // ═══════════════════════════════════════════════════════════
    // 1. THE PROPOSITION (What is being claimed)
    // ═══════════════════════════════════════════════════════════

    /// The entity this assertion is about (e.g., "Semaglutide", "Tesla_Inc")
    pub subject: EntityId,

    /// The relationship or property (e.g., "has_side_effect", "annual_revenue")
    pub predicate: RelationId,

    /// The claimed value
    pub object: ObjectValue,

    // ═══════════════════════════════════════════════════════════
    // 2. THE LINEAGE (Why we believe it)
    // ═══════════════════════════════════════════════════════════

    /// If this modifies/forks another assertion, its hash
    pub parent_hash: Option<Hash>,

    /// Hash of the source evidence (PDF, URL, database export)
    pub source_hash: Hash,

    /// Authority tier of the source (enables decay rates)
    pub source_class: SourceClass,

    /// Optional structured metadata about the source
    pub source_metadata: Option<SourceMetadata>,

    /// Perceptual hash of a visual anchor (e.g., screenshot of table)
    pub visual_hash: Option<PHash>,

    /// Which paradigm/era this belongs to (for paradigm shifts)
    pub epoch: Option<EpochId>,

    /// Lifecycle stage (Proposed → Approved → Deprecated)
    pub lifecycle: LifecycleStage,

    // ═══════════════════════════════════════════════════════════
    // 3. META-COGNITION (Who said it, how confident)
    // ═══════════════════════════════════════════════════════════

    /// Cryptographic signatures from agents vouching for this
    pub signatures: Vec<SignatureEntry>,

    /// Subjective confidence score (0.0 to 1.0)
    pub confidence: f32,

    /// Unix timestamp when created
    pub timestamp: u64,

    /// Semantic embedding vector for similarity search
    pub vector: Option<Vec<f32>>,
}

ObjectValue Variants

pub enum ObjectValue {
    Text(String),           // "gastroparesis", "approved"
    Number(f64),            // 96.7, 0.85
    Boolean(bool),          // true, false
    Reference(EntityId),    // Points to another entity (graph edge)
}

SignatureEntry Structure

pub struct SignatureEntry {
    pub agent_id: [u8; 32],   // Ed25519 public key
    pub signature: [u8; 64],  // Ed25519 signature over assertion content
    pub timestamp: u64,       // When the agent signed
}

Key Innovation: The assertion is content-addressed. Its identifier is a BLAKE3 hash of its content, enabling deduplication and Merkle DAG formation.

2. Source Class Hierarchy

A core inventive step is the hierarchical classification of sources with associated authority weights and decay half-lives:

Tier	Class	Authority Weight (W_a)	Decay Half-Life	Example Sources
0	Regulatory	1.0	Never	FDA labels, SEC filings, WHO guidelines
1	Clinical	0.9	2 years	Peer-reviewed RCTs, Phase III trials
2	Observational	0.7	1 year	Real-world evidence, cohort studies
3	Expert	0.5	6 months	Physician guidelines, professional opinions
4	Community	0.2	3 months	Patient registries, curated forums
5	Anecdotal	0.1	30 days	Reddit posts, individual testimonials

pub enum SourceClass {
    Regulatory,    // Tier 0: Highest authority, never decays
    Clinical,      // Tier 1: Peer-reviewed research
    Observational, // Tier 2: Real-world evidence
    Expert,        // Tier 3: Professional opinions
    Community,     // Tier 4: Curated community knowledge
    Anecdotal,     // Tier 5: Individual reports, fast decay
}

impl SourceClass {
    pub fn tier(&self) -> u8 {
        match self {
            SourceClass::Regulatory => 0,
            SourceClass::Clinical => 1,
            SourceClass::Observational => 2,
            SourceClass::Expert => 3,
            SourceClass::Community => 4,
            SourceClass::Anecdotal => 5,
        }
    }

    pub fn authority_weight(&self) -> f32 {
        match self {
            SourceClass::Regulatory => 1.0,
            SourceClass::Clinical => 0.9,
            SourceClass::Observational => 0.7,
            SourceClass::Expert => 0.5,
            SourceClass::Community => 0.2,
            SourceClass::Anecdotal => 0.1,
        }
    }

    pub fn decay_half_life_days(&self) -> Option<u32> {
        match self {
            SourceClass::Regulatory => None,  // Never decays
            SourceClass::Clinical => Some(730),
            SourceClass::Observational => Some(365),
            SourceClass::Expert => Some(180),
            SourceClass::Community => Some(90),
            SourceClass::Anecdotal => Some(30),
        }
    }
}

Rationale: This hierarchy enables the system to mathematically distinguish between "this violates the law" (Tier 0 conflict) and "this contradicts a Reddit post" (Tier 5 conflict), automating triage that would otherwise require human judgment.

3. Semantic Decay Calculation

Assertion relevance decays based on source class half-life:

effective_confidence = original_confidence × decay_factor

decay_factor = exp(-ln(2) × elapsed_days / half_life_days)

For source classes with half_life_days = None (Regulatory), decay_factor = 1.0 always.

Example Calculation:

Assertion: Anecdotal (Tier 5), confidence = 0.8, age = 45 days
Half-life: 30 days
Decay factor: exp(-ln(2) × 45 / 30) = exp(-1.039) ≈ 0.354
Effective confidence: 0.8 × 0.354 ≈ 0.28

Example Calculation (Regulatory):

Assertion: Regulatory (Tier 0), confidence = 0.9, age = 3650 days (10 years)
Half-life: None (never decays)
Decay factor: 1.0
Effective confidence: 0.9 × 1.0 = 0.9

4. Resolution Lenses

Lenses collapse the probabilistic assertion space into concrete query results. Multiple lens types serve different use cases:

4.1 Winner-Picking Lenses

Lens	Algorithm
Recency	Return assertion with most recent timestamp
Consensus	Return assertion whose object value has highest cluster density
Authority	Weight by signing agent's TrustRank reputation
Vote-Aware	Weight by votes from the Ballot Box stream
EpochAware	Filter out assertions from superseded epochs

4.2 Analysis Lenses

Lens	Algorithm
Skeptic	Return all competing claims with conflict score and weight shares
Layered	Per-source-class resolution (tier-by-tier visibility)
Constraints	Return must_use/forbidden assertions for a context

4.3 Consensus Lens Algorithm

fn resolve_consensus(
    candidates: Vec<&Assertion>,
    trust_ranks: &TrustRankStore,
) -> Option<Assertion> {
    // Group by object value
    let mut clusters: HashMap<ObjectValue, Vec<&Assertion>> = HashMap::new();
    for assertion in &candidates {
        clusters.entry(assertion.object.clone())
            .or_default()
            .push(assertion);
    }

    // Calculate weighted support for each cluster
    let mut cluster_weights: Vec<(ObjectValue, f32)> = clusters
        .into_iter()
        .map(|(value, assertions)| {
            let weight = assertions.iter()
                .map(|a| {
                    let base_weight = a.source_class.authority_weight();
                    let trust_modifier = trust_ranks.get_average(&a.signatures);
                    let decay = compute_decay(a);
                    base_weight * trust_modifier * decay * a.confidence
                })
                .sum();
            (value, weight)
        })
        .collect();

    // Return highest-weighted cluster's representative
    cluster_weights.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
    cluster_weights.first().map(|(value, _)| {
        find_representative_assertion(&candidates, value)
    })
}

4.4 Skeptic Lens Algorithm

fn resolve_skeptic(
    candidates: Vec<&Assertion>,
    trust_ranks: &TrustRankStore,
) -> ConflictAnalysis {
    // Group by object value and compute weights
    let claims = compute_claims_with_weights(&candidates, trust_ranks);

    // Calculate conflict score using Shannon entropy
    let total_weight: f32 = claims.iter().map(|c| c.weight_share).sum();
    let entropy: f32 = claims.iter()
        .map(|c| {
            let p = c.weight_share / total_weight;
            if p > 0.0 { -p * p.ln() } else { 0.0 }
        })
        .sum();

    let max_entropy = (claims.len() as f32).ln();
    let conflict_score = if max_entropy > 0.0 {
        entropy / max_entropy
    } else {
        0.0
    };

    let status = match conflict_score {
        s if s < 0.1 => ResolutionStatus::Unanimous,
        s if s < 0.4 => ResolutionStatus::Agreed,
        _ => ResolutionStatus::Contested,
    };

    ConflictAnalysis {
        status,
        conflict_score,
        claims,
        candidates_count: candidates.len(),
    }
}

5. The Ballot Box (High-Velocity Consensus)

To prevent lock contention on assertions, agents vote via a separate stream:

pub struct Vote {
    /// Hash of the assertion being voted on
    pub assertion_hash: Hash,

    /// Ed25519 public key of the voter
    pub agent_id: [u8; 32],

    /// Weight of the vote (0.0 = reject, 1.0 = full endorsement)
    pub weight: f32,

    /// Signature over the assertion_hash
    pub signature: [u8; 64],

    /// When the vote was cast
    pub timestamp: u64,

    /// Optional: URL where claim was observed (provenance witness)
    pub source_url: Option<String>,

    /// Optional: Context of observation
    pub observed_context: Option<Vec<u8>>,
}

Key Insight: Votes are append-only. An agent changes their vote by submitting a new one with a later timestamp. The lens engine uses the most recent vote from each agent.

Provenance Witness: The source_url field transforms votes from opinions into observations, enabling "how many people saw this claim on this page?" rather than just "how many agree?"

6. Trust Packs (Personalized Consensus)

Trust Packs are curated lists of trusted agents that filter consensus:

pub struct TrustPack {
    /// Content-addressed pack ID (BLAKE3 hash)
    pub id: PackId,

    /// Human-readable name (e.g., "Mayo_Clinic_Experts")
    pub name: String,

    /// Ed25519 public key of the pack maintainer
    pub maintainer: [u8; 32],

    /// Agent public keys in this pack (BitSet for efficiency)
    pub agents: RoaringBitmap,

    /// Unix timestamp when pack was created
    pub created_at: u64,

    /// Unix timestamp of last modification
    pub updated_at: u64,

    /// Optional cryptographic signature of the pack contents
    pub signature: Option<[u8; 64]>,
}

Query-Time Filtering:

fn resolve_with_trust_pack(
    candidates: Vec<&Assertion>,
    trust_pack: &TrustPack,
) -> Vec<&Assertion> {
    candidates.into_iter()
        .filter(|a| {
            a.signatures.iter()
                .any(|sig| trust_pack.contains_agent(&sig.agent_id))
        })
        .collect()
}

Use Case: Users subscribe to packs like "Skeptical Cardio Pack" to filter medical claims through vetted cardiologists, or "SEC Filings Only" to see only regulatory-class assertions.

7. Epoch Supersession

Epochs represent paradigm contexts. When knowledge paradigms shift, old epochs can be superseded:

pub struct Epoch {
    pub id: EpochId,
    pub name: String,                           // "Pre-2024", "Newtonian"
    pub supersedes: Option<EpochId>,            // What this replaces
    pub supersession_type: Option<SupersessionType>,
    pub start_timestamp: u64,
    pub end_timestamp: Option<u64>,
}

pub enum SupersessionType {
    Invalidation,  // Old epoch was factually wrong (e.g., "Earth is flat")
    Temporal,      // Old epoch was correct but outdated (e.g., "President is Obama")
    Refinement,    // Old epoch was a simplification (e.g., Newtonian → Relativity)
}

Cascade Behavior:

Invalidation: Assertions in superseded epoch marked Deprecated, downstream dependents flagged
Temporal: Assertions in superseded epoch excluded from default queries but available via as_of
Refinement: Both epochs valid; queries can specify which context

8. Lifecycle Stages

Assertions progress through stages without mutation (new assertions are created):

pub enum LifecycleStage {
    Proposed,      // Initial submission, not for production use
    UnderReview,   // Gathering votes and feedback
    Approved,      // Accepted as current truth
    Deprecated,    // Was true, now superseded
    Rejected,      // Explicitly declined
}

Transition Rules:

Proposed → UnderReview: Automatic after initial submission
UnderReview → Approved: Vote threshold reached
UnderReview → Rejected: Rejection threshold reached
Approved → Deprecated: Superseding assertion approved or source retracted

9. Materialized Views (O(1) Query Latency)

For common queries, pre-computed resolution ensures sub-millisecond response:

pub struct MaterializedView {
    /// The winning assertion from lens resolution
    pub winner: Assertion,

    /// Which lens produced this (e.g., "VoteAwareConsensus")
    pub lens_name: String,

    /// Confidence in the resolution (0.0 to 1.0)
    pub resolution_confidence: f32,

    /// How many candidates were considered
    pub candidates_count: usize,

    /// When this view was computed
    pub materialized_at: u64,
}

Storage Layout:

Key Pattern	Value	Purpose
`H:{hash}`	Serialized Assertion	Primary assertion storage
`S:{subject}`	`Vec<Hash>`	Subject index
`SP:{subject}:{predicate}`	`Vec<Hash>`	Compound index
`MV:{subject}:{predicate}`	MaterializedView	Pre-computed winner
`V:{assertion_hash}:{vote_hash}`	Vote	Individual votes
`TR:{agent_id}`	TrustRank	Agent reputation
`TP:{pack_id}`	TrustPack	Curated agent lists

10. Query Audit Trail

Every query is logged for "why did you believe that?" debugging:

pub struct QueryAudit {
    pub query_id: QueryId,
    pub agent_id: Option<[u8; 32]>,
    pub timestamp: u64,
    pub params: QueryParams,
    pub result_hash: Option<Hash>,
    pub result_confidence: f32,
    pub contributing_assertions: Vec<ContributingAssertion>,
}

pub struct ContributingAssertion {
    pub assertion_hash: Hash,
    pub weight: f32,          // How much this influenced the result
    pub source_hash: Hash,
    pub lifecycle: LifecycleStage,
}

Use Case: When an AI agent makes a recommendation that later proves wrong, the audit trail shows exactly which assertions contributed and with what weights.

11. Invalidation Cascades

When upstream evidence is retracted, downstream decisions are flagged:

fn propagate_retraction(
    retracted_hash: Hash,
    storage: &mut Storage,
) -> Vec<Hash> {
    let mut affected = Vec::new();
    let mut queue = vec![retracted_hash];

    while let Some(hash) = queue.pop() {
        // Find all assertions that cite this one as parent
        let dependents = storage.find_by_parent_hash(hash);

        for dependent in dependents {
            // Update lifecycle to indicate dependency on retracted evidence
            let updated = dependent.with_lifecycle(LifecycleStage::Deprecated);
            storage.store_assertion(&updated);
            affected.push(updated.id);
            queue.push(updated.id);
        }
    }

    // Notify consumers via query audit matching
    notify_affected_consumers(&affected, storage);

    affected
}

12. Performance Characteristics

12.1 Query Latency by Graph Size

Assertions	p50 Latency (MV hit)	p99 Latency (MV miss)	Memory
10,000	0.1ms	5ms	100MB
100,000	0.1ms	15ms	800MB
1,000,000	0.2ms	50ms	6GB
10,000,000	0.5ms	200ms	50GB

12.2 Write Throughput

Operation	Throughput	Notes
Assertion ingestion	50,000/sec	With signature verification
Vote ingestion	200,000/sec	Append-only, minimal verification
MV materialization	10,000/sec	Background async

12.3 Space Efficiency

Component	Size per Unit
Assertion (avg)	500 bytes
Vote	150 bytes
Index entry	40 bytes
MV entry	600 bytes

13. Alternative Embodiments

13A. Distributed Deployment

The system may be deployed across multiple nodes with:

Merkle DAG Sync: Content-addressed assertions enable efficient diff-based replication
SWIM Gossip: Cluster membership via failure detection protocol
Sharded Storage: Subject-based partitioning across nodes

13B. Vector Similarity Search

The optional vector field enables semantic similarity queries:

Find assertions semantically similar to a query embedding
Cluster related assertions by embedding space proximity
Surface emerging signals via vector clustering

13C. Visual Provenance

The optional visual_hash (perceptual hash) enables:

Link assertions to screenshots of source documents
Detect duplicate visual evidence across assertions
Verify that cited sources contain the claimed content

13D. Real-Time Streaming

The system may expose:

WebSocket subscriptions for assertion/vote streams
Server-Sent Events for MV update notifications
Webhook callbacks for invalidation cascades

Claims

[See patent-disclosure.md for full claim listing]

Abstract

A database system and method for storing and resolving conflicting assertions in a probabilistic knowledge graph. The system stores signed assertions with source class authority weights, preserves contradictions without forced resolution, and applies configurable lens algorithms at query time to collapse probability into answers. Source classes form a six-tier hierarchy with associated decay half-lives, enabling semantic decay where anecdotal evidence fades while regulatory evidence persists. Trust Packs enable personalized consensus filtering by restricting queries to assertions from trusted agents. The system maintains query audit trails for provenance debugging and propagates invalidation cascades when upstream evidence is retracted.

Revision History

Date	Author	Changes
2026-02-04	Initial	Complete specification with data structures, algorithms, and performance

22 KiB Raw Blame History Unescape Escape