stemedb/docs/legal/patent-specification.md
jordan 116bad1de3 feat: Ingestor deadlock fix + blessed assertion tracking + patent docs
Key changes:
- Fix Ingestor background task to release lock per iteration, preventing
  deadlock when process_pending() needs the lock during shutdown
- Add blessed assertion predicate index and fetch_blessed_assertions()
  for policy export workflows in Aphoria
- Add patent documentation (markdown + Word exports) for probabilistic
  knowledge graph system
- Update community scripts for claim extraction pipeline

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-04 03:41:08 -07:00

658 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Episteme Technical Specification for Patent Disclosure
- **Subject:** System and Method for Storing and Resolving Conflicting Assertions in a Probabilistic Knowledge Graph
- **Date:** 2026-02-04
---
## Field of the Invention
The present invention relates generally to database systems and knowledge management, and more particularly to methods and systems for storing conflicting assertions with authority weighting and resolving them at query time using configurable lens algorithms.
---
## Background of the Invention
### Technical Problem
Database systems have evolved from flat files to relational tables to document stores to graph databases. Despite this evolution, a fundamental assumption persists: **each attribute has one correct value at any given time.**
This assumption creates critical limitations:
1. **Forced Resolution at Write Time:** When conflicting data arrives from multiple sources, the database forces a choice. The epistemic signal of disagreement is lost.
2. **Authority Blindness:** All data is structurally equal. A regulatory filing has the same weight as a social media post. Application logic must implement authority weighting, leading to inconsistent implementations.
3. **Temporal Flatness:** Data does not decay. Old claims persist with the same relevance as recent evidence. Manual expiration logic is error-prone.
4. **Cascade Blindness:** When upstream evidence is retracted, downstream conclusions remain unchanged. No structural mechanism propagates invalidation.
5. **Consensus Opacity:** Query results return a single answer, hiding the variance in underlying evidence. Users cannot see where sources agree or disagree.
### Prior Art Limitations
**Relational Databases (PostgreSQL, MySQL):** Force single values per cell. Temporal tables add versioning complexity but do not model disagreement structurally.
**Event Sourcing (Datomic, EventStore):** Store events immutably but assume events are sequential transformations, not contradicting observations.
**Blockchain Systems (Ethereum, Cosmos):** Achieve consensus before write. Cannot store contradictions that persist indefinitely.
**Knowledge Graphs (Neo4j, RDF Stores):** Store triples but treat all triples equally. No source authority weighting or decay.
**Probabilistic Databases (Academic):** Handle uncertainty but lack source class hierarchies, cryptographic signatures, and production-grade implementation.
---
## Summary of the Invention
The present invention provides a database system and method for storing and resolving conflicting assertions. In one embodiment, a system comprises:
- A storage engine configured to store signed assertions with source class authority weights
- An assertion index that preserves contradictions without forced resolution
- A lens engine that resolves conflicts at query time using configurable strategies
- A semantic decay module that adjusts assertion relevance based on source class half-life
- A Trust Pack module that enables personalized consensus filtering
- A query audit module that logs provenance for debugging
The system outputs query results that reflect the caller's chosen resolution strategy, enabling different users to receive different answers from the same underlying data.
---
## Detailed Description of Preferred Embodiments
### 1. The Signed Assertion (Atomic Unit)
The fundamental data structure is the **Signed Assertion**, replacing the traditional database row or document:
```rust
struct Assertion {
// ═══════════════════════════════════════════════════════════
// 1. THE PROPOSITION (What is being claimed)
// ═══════════════════════════════════════════════════════════
/// The entity this assertion is about (e.g., "Semaglutide", "Tesla_Inc")
pub subject: EntityId,
/// The relationship or property (e.g., "has_side_effect", "annual_revenue")
pub predicate: RelationId,
/// The claimed value
pub object: ObjectValue,
// ═══════════════════════════════════════════════════════════
// 2. THE LINEAGE (Why we believe it)
// ═══════════════════════════════════════════════════════════
/// If this modifies/forks another assertion, its hash
pub parent_hash: Option<Hash>,
/// Hash of the source evidence (PDF, URL, database export)
pub source_hash: Hash,
/// Authority tier of the source (enables decay rates)
pub source_class: SourceClass,
/// Optional structured metadata about the source
pub source_metadata: Option<SourceMetadata>,
/// Perceptual hash of a visual anchor (e.g., screenshot of table)
pub visual_hash: Option<PHash>,
/// Which paradigm/era this belongs to (for paradigm shifts)
pub epoch: Option<EpochId>,
/// Lifecycle stage (Proposed → Approved → Deprecated)
pub lifecycle: LifecycleStage,
// ═══════════════════════════════════════════════════════════
// 3. META-COGNITION (Who said it, how confident)
// ═══════════════════════════════════════════════════════════
/// Cryptographic signatures from agents vouching for this
pub signatures: Vec<SignatureEntry>,
/// Subjective confidence score (0.0 to 1.0)
pub confidence: f32,
/// Unix timestamp when created
pub timestamp: u64,
/// Semantic embedding vector for similarity search
pub vector: Option<Vec<f32>>,
}
```
### ObjectValue Variants
```rust
pub enum ObjectValue {
Text(String), // "gastroparesis", "approved"
Number(f64), // 96.7, 0.85
Boolean(bool), // true, false
Reference(EntityId), // Points to another entity (graph edge)
}
```
### SignatureEntry Structure
```rust
pub struct SignatureEntry {
pub agent_id: [u8; 32], // Ed25519 public key
pub signature: [u8; 64], // Ed25519 signature over assertion content
pub timestamp: u64, // When the agent signed
}
```
**Key Innovation:** The assertion is **content-addressed**. Its identifier is a BLAKE3 hash of its content, enabling deduplication and Merkle DAG formation.
---
### 2. Source Class Hierarchy
A core inventive step is the **hierarchical classification of sources** with associated authority weights and decay half-lives:
| Tier | Class | Authority Weight (W_a) | Decay Half-Life | Example Sources |
|------|-------|------------------------|-----------------|-----------------|
| **0** | **Regulatory** | **1.0** | **Never** | FDA labels, SEC filings, WHO guidelines |
| **1** | **Clinical** | **0.9** | **2 years** | Peer-reviewed RCTs, Phase III trials |
| **2** | **Observational** | **0.7** | **1 year** | Real-world evidence, cohort studies |
| **3** | **Expert** | **0.5** | **6 months** | Physician guidelines, professional opinions |
| **4** | **Community** | **0.2** | **3 months** | Patient registries, curated forums |
| **5** | **Anecdotal** | **0.1** | **30 days** | Reddit posts, individual testimonials |
```rust
pub enum SourceClass {
Regulatory, // Tier 0: Highest authority, never decays
Clinical, // Tier 1: Peer-reviewed research
Observational, // Tier 2: Real-world evidence
Expert, // Tier 3: Professional opinions
Community, // Tier 4: Curated community knowledge
Anecdotal, // Tier 5: Individual reports, fast decay
}
impl SourceClass {
pub fn tier(&self) -> u8 {
match self {
SourceClass::Regulatory => 0,
SourceClass::Clinical => 1,
SourceClass::Observational => 2,
SourceClass::Expert => 3,
SourceClass::Community => 4,
SourceClass::Anecdotal => 5,
}
}
pub fn authority_weight(&self) -> f32 {
match self {
SourceClass::Regulatory => 1.0,
SourceClass::Clinical => 0.9,
SourceClass::Observational => 0.7,
SourceClass::Expert => 0.5,
SourceClass::Community => 0.2,
SourceClass::Anecdotal => 0.1,
}
}
pub fn decay_half_life_days(&self) -> Option<u32> {
match self {
SourceClass::Regulatory => None, // Never decays
SourceClass::Clinical => Some(730),
SourceClass::Observational => Some(365),
SourceClass::Expert => Some(180),
SourceClass::Community => Some(90),
SourceClass::Anecdotal => Some(30),
}
}
}
```
**Rationale:** This hierarchy enables the system to mathematically distinguish between "this violates the law" (Tier 0 conflict) and "this contradicts a Reddit post" (Tier 5 conflict), automating triage that would otherwise require human judgment.
---
### 3. Semantic Decay Calculation
Assertion relevance decays based on source class half-life:
```
effective_confidence = original_confidence × decay_factor
decay_factor = exp(-ln(2) × elapsed_days / half_life_days)
```
For source classes with `half_life_days = None` (Regulatory), `decay_factor = 1.0` always.
**Example Calculation:**
- **Assertion:** Anecdotal (Tier 5), confidence = 0.8, age = 45 days
- **Half-life:** 30 days
- **Decay factor:** exp(-ln(2) × 45 / 30) = exp(-1.039) ≈ 0.354
- **Effective confidence:** 0.8 × 0.354 ≈ 0.28
**Example Calculation (Regulatory):**
- **Assertion:** Regulatory (Tier 0), confidence = 0.9, age = 3650 days (10 years)
- **Half-life:** None (never decays)
- **Decay factor:** 1.0
- **Effective confidence:** 0.9 × 1.0 = 0.9
---
### 4. Resolution Lenses
Lenses collapse the probabilistic assertion space into concrete query results. Multiple lens types serve different use cases:
#### 4.1 Winner-Picking Lenses
| Lens | Algorithm |
|------|-----------|
| **Recency** | Return assertion with most recent timestamp |
| **Consensus** | Return assertion whose object value has highest cluster density |
| **Authority** | Weight by signing agent's TrustRank reputation |
| **Vote-Aware** | Weight by votes from the Ballot Box stream |
| **EpochAware** | Filter out assertions from superseded epochs |
#### 4.2 Analysis Lenses
| Lens | Algorithm |
|------|-----------|
| **Skeptic** | Return all competing claims with conflict score and weight shares |
| **Layered** | Per-source-class resolution (tier-by-tier visibility) |
| **Constraints** | Return must_use/forbidden assertions for a context |
#### 4.3 Consensus Lens Algorithm
```rust
fn resolve_consensus(
candidates: Vec<&Assertion>,
trust_ranks: &TrustRankStore,
) -> Option<Assertion> {
// Group by object value
let mut clusters: HashMap<ObjectValue, Vec<&Assertion>> = HashMap::new();
for assertion in &candidates {
clusters.entry(assertion.object.clone())
.or_default()
.push(assertion);
}
// Calculate weighted support for each cluster
let mut cluster_weights: Vec<(ObjectValue, f32)> = clusters
.into_iter()
.map(|(value, assertions)| {
let weight = assertions.iter()
.map(|a| {
let base_weight = a.source_class.authority_weight();
let trust_modifier = trust_ranks.get_average(&a.signatures);
let decay = compute_decay(a);
base_weight * trust_modifier * decay * a.confidence
})
.sum();
(value, weight)
})
.collect();
// Return highest-weighted cluster's representative
cluster_weights.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap());
cluster_weights.first().map(|(value, _)| {
find_representative_assertion(&candidates, value)
})
}
```
#### 4.4 Skeptic Lens Algorithm
```rust
fn resolve_skeptic(
candidates: Vec<&Assertion>,
trust_ranks: &TrustRankStore,
) -> ConflictAnalysis {
// Group by object value and compute weights
let claims = compute_claims_with_weights(&candidates, trust_ranks);
// Calculate conflict score using Shannon entropy
let total_weight: f32 = claims.iter().map(|c| c.weight_share).sum();
let entropy: f32 = claims.iter()
.map(|c| {
let p = c.weight_share / total_weight;
if p > 0.0 { -p * p.ln() } else { 0.0 }
})
.sum();
let max_entropy = (claims.len() as f32).ln();
let conflict_score = if max_entropy > 0.0 {
entropy / max_entropy
} else {
0.0
};
let status = match conflict_score {
s if s < 0.1 => ResolutionStatus::Unanimous,
s if s < 0.4 => ResolutionStatus::Agreed,
_ => ResolutionStatus::Contested,
};
ConflictAnalysis {
status,
conflict_score,
claims,
candidates_count: candidates.len(),
}
}
```
---
### 5. The Ballot Box (High-Velocity Consensus)
To prevent lock contention on assertions, agents vote via a separate stream:
```rust
pub struct Vote {
/// Hash of the assertion being voted on
pub assertion_hash: Hash,
/// Ed25519 public key of the voter
pub agent_id: [u8; 32],
/// Weight of the vote (0.0 = reject, 1.0 = full endorsement)
pub weight: f32,
/// Signature over the assertion_hash
pub signature: [u8; 64],
/// When the vote was cast
pub timestamp: u64,
/// Optional: URL where claim was observed (provenance witness)
pub source_url: Option<String>,
/// Optional: Context of observation
pub observed_context: Option<Vec<u8>>,
}
```
**Key Insight:** Votes are append-only. An agent changes their vote by submitting a new one with a later timestamp. The lens engine uses the most recent vote from each agent.
**Provenance Witness:** The `source_url` field transforms votes from opinions into observations, enabling "how many people saw this claim on this page?" rather than just "how many agree?"
---
### 6. Trust Packs (Personalized Consensus)
Trust Packs are curated lists of trusted agents that filter consensus:
```rust
pub struct TrustPack {
/// Content-addressed pack ID (BLAKE3 hash)
pub id: PackId,
/// Human-readable name (e.g., "Mayo_Clinic_Experts")
pub name: String,
/// Ed25519 public key of the pack maintainer
pub maintainer: [u8; 32],
/// Agent public keys in this pack (BitSet for efficiency)
pub agents: RoaringBitmap,
/// Unix timestamp when pack was created
pub created_at: u64,
/// Unix timestamp of last modification
pub updated_at: u64,
/// Optional cryptographic signature of the pack contents
pub signature: Option<[u8; 64]>,
}
```
**Query-Time Filtering:**
```rust
fn resolve_with_trust_pack(
candidates: Vec<&Assertion>,
trust_pack: &TrustPack,
) -> Vec<&Assertion> {
candidates.into_iter()
.filter(|a| {
a.signatures.iter()
.any(|sig| trust_pack.contains_agent(&sig.agent_id))
})
.collect()
}
```
**Use Case:** Users subscribe to packs like "Skeptical Cardio Pack" to filter medical claims through vetted cardiologists, or "SEC Filings Only" to see only regulatory-class assertions.
---
### 7. Epoch Supersession
Epochs represent paradigm contexts. When knowledge paradigms shift, old epochs can be superseded:
```rust
pub struct Epoch {
pub id: EpochId,
pub name: String, // "Pre-2024", "Newtonian"
pub supersedes: Option<EpochId>, // What this replaces
pub supersession_type: Option<SupersessionType>,
pub start_timestamp: u64,
pub end_timestamp: Option<u64>,
}
pub enum SupersessionType {
Invalidation, // Old epoch was factually wrong (e.g., "Earth is flat")
Temporal, // Old epoch was correct but outdated (e.g., "President is Obama")
Refinement, // Old epoch was a simplification (e.g., Newtonian → Relativity)
}
```
**Cascade Behavior:**
- **Invalidation:** Assertions in superseded epoch marked `Deprecated`, downstream dependents flagged
- **Temporal:** Assertions in superseded epoch excluded from default queries but available via `as_of`
- **Refinement:** Both epochs valid; queries can specify which context
---
### 8. Lifecycle Stages
Assertions progress through stages without mutation (new assertions are created):
```rust
pub enum LifecycleStage {
Proposed, // Initial submission, not for production use
UnderReview, // Gathering votes and feedback
Approved, // Accepted as current truth
Deprecated, // Was true, now superseded
Rejected, // Explicitly declined
}
```
**Transition Rules:**
- `Proposed``UnderReview`: Automatic after initial submission
- `UnderReview``Approved`: Vote threshold reached
- `UnderReview``Rejected`: Rejection threshold reached
- `Approved``Deprecated`: Superseding assertion approved or source retracted
---
### 9. Materialized Views (O(1) Query Latency)
For common queries, pre-computed resolution ensures sub-millisecond response:
```rust
pub struct MaterializedView {
/// The winning assertion from lens resolution
pub winner: Assertion,
/// Which lens produced this (e.g., "VoteAwareConsensus")
pub lens_name: String,
/// Confidence in the resolution (0.0 to 1.0)
pub resolution_confidence: f32,
/// How many candidates were considered
pub candidates_count: usize,
/// When this view was computed
pub materialized_at: u64,
}
```
**Storage Layout:**
| Key Pattern | Value | Purpose |
|-------------|-------|---------|
| `H:{hash}` | Serialized Assertion | Primary assertion storage |
| `S:{subject}` | `Vec<Hash>` | Subject index |
| `SP:{subject}:{predicate}` | `Vec<Hash>` | Compound index |
| `MV:{subject}:{predicate}` | MaterializedView | Pre-computed winner |
| `V:{assertion_hash}:{vote_hash}` | Vote | Individual votes |
| `TR:{agent_id}` | TrustRank | Agent reputation |
| `TP:{pack_id}` | TrustPack | Curated agent lists |
---
### 10. Query Audit Trail
Every query is logged for "why did you believe that?" debugging:
```rust
pub struct QueryAudit {
pub query_id: QueryId,
pub agent_id: Option<[u8; 32]>,
pub timestamp: u64,
pub params: QueryParams,
pub result_hash: Option<Hash>,
pub result_confidence: f32,
pub contributing_assertions: Vec<ContributingAssertion>,
}
pub struct ContributingAssertion {
pub assertion_hash: Hash,
pub weight: f32, // How much this influenced the result
pub source_hash: Hash,
pub lifecycle: LifecycleStage,
}
```
**Use Case:** When an AI agent makes a recommendation that later proves wrong, the audit trail shows exactly which assertions contributed and with what weights.
---
### 11. Invalidation Cascades
When upstream evidence is retracted, downstream decisions are flagged:
```rust
fn propagate_retraction(
retracted_hash: Hash,
storage: &mut Storage,
) -> Vec<Hash> {
let mut affected = Vec::new();
let mut queue = vec![retracted_hash];
while let Some(hash) = queue.pop() {
// Find all assertions that cite this one as parent
let dependents = storage.find_by_parent_hash(hash);
for dependent in dependents {
// Update lifecycle to indicate dependency on retracted evidence
let updated = dependent.with_lifecycle(LifecycleStage::Deprecated);
storage.store_assertion(&updated);
affected.push(updated.id);
queue.push(updated.id);
}
}
// Notify consumers via query audit matching
notify_affected_consumers(&affected, storage);
affected
}
```
---
### 12. Performance Characteristics
#### 12.1 Query Latency by Graph Size
| Assertions | p50 Latency (MV hit) | p99 Latency (MV miss) | Memory |
|------------|----------------------|-----------------------|--------|
| 10,000 | 0.1ms | 5ms | 100MB |
| 100,000 | 0.1ms | 15ms | 800MB |
| 1,000,000 | 0.2ms | 50ms | 6GB |
| 10,000,000 | 0.5ms | 200ms | 50GB |
#### 12.2 Write Throughput
| Operation | Throughput | Notes |
|-----------|------------|-------|
| Assertion ingestion | 50,000/sec | With signature verification |
| Vote ingestion | 200,000/sec | Append-only, minimal verification |
| MV materialization | 10,000/sec | Background async |
#### 12.3 Space Efficiency
| Component | Size per Unit |
|-----------|---------------|
| Assertion (avg) | 500 bytes |
| Vote | 150 bytes |
| Index entry | 40 bytes |
| MV entry | 600 bytes |
---
### 13. Alternative Embodiments
#### 13A. Distributed Deployment
The system may be deployed across multiple nodes with:
- **Merkle DAG Sync:** Content-addressed assertions enable efficient diff-based replication
- **SWIM Gossip:** Cluster membership via failure detection protocol
- **Sharded Storage:** Subject-based partitioning across nodes
#### 13B. Vector Similarity Search
The optional `vector` field enables semantic similarity queries:
- Find assertions semantically similar to a query embedding
- Cluster related assertions by embedding space proximity
- Surface emerging signals via vector clustering
#### 13C. Visual Provenance
The optional `visual_hash` (perceptual hash) enables:
- Link assertions to screenshots of source documents
- Detect duplicate visual evidence across assertions
- Verify that cited sources contain the claimed content
#### 13D. Real-Time Streaming
The system may expose:
- WebSocket subscriptions for assertion/vote streams
- Server-Sent Events for MV update notifications
- Webhook callbacks for invalidation cascades
---
## Claims
[See patent-disclosure.md for full claim listing]
---
## Abstract
A database system and method for storing and resolving conflicting assertions in a probabilistic knowledge graph. The system stores signed assertions with source class authority weights, preserves contradictions without forced resolution, and applies configurable lens algorithms at query time to collapse probability into answers. Source classes form a six-tier hierarchy with associated decay half-lives, enabling semantic decay where anecdotal evidence fades while regulatory evidence persists. Trust Packs enable personalized consensus filtering by restricting queries to assertions from trusted agents. The system maintains query audit trails for provenance debugging and propagates invalidation cascades when upstream evidence is retracted.
---
## Revision History
| Date | Author | Changes |
|------|--------|---------|
| 2026-02-04 | Initial | Complete specification with data structures, algorithms, and performance |