Key changes: - Fix Ingestor background task to release lock per iteration, preventing deadlock when process_pending() needs the lock during shutdown - Add blessed assertion predicate index and fetch_blessed_assertions() for policy export workflows in Aphoria - Add patent documentation (markdown + Word exports) for probabilistic knowledge graph system - Update community scripts for claim extraction pipeline Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
38 KiB
Intellectual Property Disclosure: Episteme (StemeDB) Probabilistic Knowledge Database
- Date: 2026-02-04
- Subject: System and Method for Storing and Resolving Conflicting Assertions in a Probabilistic Knowledge Graph
Executive Summary
Episteme (internal codename: StemeDB) is a probabilistic knowledge graph database that stores signed assertions rather than deterministic facts. It introduces a novel database architecture that:
- Preserves contradictions without forcing resolution at write time
- Resolves conflicts at read time via configurable Lens algorithms
- Weights assertions by source authority using a hierarchical classification system
- Applies semantic decay where evidence freshness varies by source class
- Enables Trust Packs for personalized reality filtering
Current databases (relational, document, graph) fundamentally assume a single truth. When conflicting data arrives, they either force a choice (losing the disagreement) or require complex version-table schemes. This creates computational inefficiency and prevents structural modeling of epistemic uncertainty.
Episteme solves this by treating knowledge as a probabilistic marketplace where assertions compete and resolution strategies are applied at query time.
Technical Problem Addressed
The "Tower of Babel" Problem
When multiple autonomous agents observe the world and report conflicting information, traditional databases fail:
-
Forced Resolution: Relational databases require a single value per cell. Conflicting observations must be merged or discarded at write time, losing the epistemic signal.
-
Authority Blindness: All rows are equal. A regulatory filing has the same structural weight as a Reddit post. Authority weighting must be implemented in application logic.
-
Temporal Rigidity: No native mechanism for semantic decay. Old anecdotal claims persist with the same weight as recent clinical evidence.
-
Cascade Failure: When upstream evidence is retracted, downstream decisions that relied on it remain in the database without structural notification.
-
Consensus Opacity: No mechanism to surface "where do sources agree and disagree?" Query results hide variance instead of exposing it.
Real-World Example: A patient researching Semaglutide side effects found conflicting information: her physician said "well-tolerated" while Reddit users flagged gastroparesis months before the FDA added the warning. Traditional databases offered no way to structurally weight these sources or surface the disagreement.
Technical Solution
A database system that:
- Stores immutable, signed assertions as the atomic unit (not rows or documents)
- Assigns source class authority weights based on a six-tier hierarchy
- Preserves contradicting assertions without forced resolution
- Applies resolution lenses at query time to collapse probability into answers
- Computes semantic decay based on source class half-life
- Supports Trust Packs for personalized consensus filtering
- Maintains query audit trails for "why did you believe that?" debugging
- Propagates invalidation cascades when upstream evidence is retracted
Use Cases
1. Multi-Agent Research Systems
AI agents investigating complex topics produce conflicting findings. Episteme stores all assertions, enabling consensus to emerge from disagreement rather than forcing premature resolution.
2. Regulatory Intelligence
SEC filings, FDA warnings, and NIST guidelines outweigh vendor documentation by structural design. The database mathematically distinguishes "this violates the law" from "this contradicts a blog post."
3. Medical Decision Support
Clinical trials, real-world evidence, and patient reports coexist with appropriate weighting. Patients and physicians see both the "official answer" and "emerging signals" from lower-tier sources.
4. Financial Analysis
Analyst estimates, earnings reports, and market rumors are stored with source provenance. Users filter by Trust Packs representing their preferred analysts.
Patentability Analysis
To be patentable, an invention must be (1) Statutory, (2) Novel, (3) Useful, and (4) Non-Obvious.
1. Statutory Subject Matter (Eligible Category)
Requirement: Must be a process, machine, manufacture, or composition of matter. Abstract ideas are not eligible unless applied practically.
Episteme Argument:
- The claims recite specific data structures: signed assertions with source class, decay half-life, cryptographic signatures
- The claims recite machine-specific operations: content-addressed storage, Merkle DAG traversal, lens-based resolution
- The operations cannot be performed mentally: a human cannot traverse thousands of assertions with authority weighting in sub-millisecond time
- Per Enfish v. Microsoft (Fed. Cir. 2016): Database architecture improvements are patent-eligible
2. Novelty (New)
Requirement: Must not be known, used, or published before.
Episteme Argument:
- Prior Art: Databases store facts. Event sourcing replays events. Blockchain achieves consensus before write.
- The Invention: Episteme stores contradicting assertions and resolves via configurable lenses at read time with authority weighting and semantic decay.
- Distinction: No existing database combines:
- Signed assertions with source class hierarchy
- Read-time resolution via lenses
- Semantic decay by source tier
- Contradiction coexistence without forced resolution
- Trust Pack personalization
3. Utility (Useful)
Requirement: Must provide a specific, substantial, and credible benefit.
Episteme Argument:
- Demonstrated Benefit: Enables AI agent memory systems that preserve disagreement
- Structural Improvement: Source authority weighting is built into the data model, not application logic
- Industrial Application: Applicable to medical research, financial analysis, regulatory intelligence, and any domain with conflicting sources
4. Non-Obviousness (Inventive Step)
Requirement: Must not be a trivial combination of existing things.
Episteme Argument:
- It is not obvious to combine "append-only ledgers" (blockchain concept) with "read-time resolution" (MVCC concept) with "authority-weighted source hierarchies" (new concept)
- Database experts focus on consistency models; they do not focus on modeling epistemic uncertainty structurally
- The combination of signed assertions + source class decay + Trust Pack filtering requires domain expertise across cryptography, databases, and epistemology
Proposed Claims
Independent Claim 1: Core Data Model (System)
A database system for storing and resolving conflicting assertions, the system comprising:
(a) a storage engine comprising:
- a write-ahead log configured to persist assertions with fsync durability before acknowledgment,
- a content-addressed index wherein each assertion's identifier is computed as a cryptographic hash of the assertion's content using the BLAKE3 algorithm,
- a compound index keyed by subject-predicate pairs storing references to assertion identifiers,
- wherein each stored assertion comprises a proposition (subject identifier, predicate identifier, object value), a source class selected from a hierarchical classification with associated authority weight and decay half-life, at least one Ed25519 cryptographic signature binding the assertion to an agent identity, and a timestamp;
(b) an assertion index configured to store multiple assertions for the same subject-predicate pair without requiring conflict resolution at write time, wherein the index permits contradicting object values to coexist for the same subject-predicate pair;
(c) a lens engine configured to, at query time, apply a resolution lens to the plurality of stored assertions matching a query predicate, wherein the resolution lens collapses conflicting assertions into a query result based on at least one of: source class authority weight, temporal decay computed as an exponential function of elapsed time divided by source class half-life, cryptographic signature verification using Ed25519, or weighted consensus among assertions;
(d) wherein the system preserves all stored assertions regardless of conflicts in the append-only write-ahead log, enabling subsequent queries with different resolution lenses to produce different results from the same underlying data, and enabling time-travel queries by traversing the append-only log to reconstruct historical state.
Independent Claim 2: Semantic Decay (Method)
A computer-implemented method for time-weighted knowledge retrieval comprising:
(a) storing a plurality of assertions, each assertion associated with a source class, wherein each source class has an assigned decay half-life;
(b) receiving a query for assertions matching a subject-predicate pattern;
(c) for each matching assertion, computing a decay-adjusted confidence score by:
- determining an elapsed time since the assertion timestamp,
- retrieving the decay half-life for the assertion's source class,
- computing a decay factor as an exponential function of elapsed time divided by half-life,
- multiplying the assertion's original confidence by the decay factor;
(d) ranking or filtering query results based on decay-adjusted confidence scores;
wherein assertions from source classes with longer half-lives (regulatory, clinical) maintain relevance longer than assertions from source classes with shorter half-lives (community, anecdotal).
Independent Claim 3: Invalidation Cascades (Method)
A computer-implemented method for propagating evidence retraction through a knowledge graph, comprising:
(a) maintaining a dependency graph linking assertions to downstream assertions that cite or depend upon them via parent hash references;
(b) receiving a retraction event for a source assertion, the retraction event comprising at least one of: explicit retraction by the assertion's signer, expiration of the assertion's validity period, or supersession by a higher-authority assertion on the same subject-predicate pair;
(c) traversing the dependency graph to identify all downstream assertions that depend on the retracted assertion;
(d) for each downstream assertion, updating a lifecycle status to indicate the dependency on retracted evidence;
(e) notifying registered consumers who previously queried the retracted assertion or its downstream dependents via query audit trail matching.
Independent Claim 4: Trust Packs (System)
A system for personalized knowledge filtering comprising:
(a) a database storing a plurality of signed assertions from a plurality of agents, wherein each agent is identified by a unique public key;
(b) a trust pack registry storing trust pack definitions, each trust pack comprising:
- a unique pack identifier,
- a cryptographic signature from a pack maintainer,
- a compressed bitmap data structure (roaring bitmap) encoding the set of agent public keys representing trusted sources for a particular domain or perspective;
(c) a query engine configured to:
- receive a query specifying a trust pack identifier,
- load the trust pack from the registry into memory,
- for each candidate assertion matching the query, extract the signing agent identifiers,
- perform a bitmap intersection operation between the trust pack agent set and the assertion signer set,
- filter query results to include only assertions where the bitmap intersection yields a non-empty result;
(d) wherein different users querying the same subject-predicate pair with different trust packs receive different results reflecting their respective trust configurations, and wherein the bitmap intersection operation provides O(1) membership checking for efficient filtering at scale.
Independent Claim 5: Skeptic Lens (System)
A system for surfacing epistemic conflict comprising:
(a) a database storing a plurality of assertions for a given subject-predicate pair, wherein different assertions may assert different object values;
(b) a conflict analysis engine configured to:
- group assertions by object value,
- compute an authority-weighted support score for each distinct object value,
- calculate a conflict score indicating the degree of disagreement among assertions using normalized entropy;
(c) an output module configured to return, for a query, a conflict analysis comprising:
- all distinct object values asserted,
- the authority-weighted support for each object value,
- the conflict score,
- representative assertion identifiers for each competing claim;
wherein the system exposes disagreement to the querying agent rather than hiding variance behind a single resolved answer.
Independent Claim 6: Query Audit Trail (Method)
A computer-implemented method for epistemic provenance tracking, comprising:
(a) receiving a query for assertions matching a subject-predicate pattern, the query specifying a resolution lens;
(b) resolving the query using the specified lens algorithm that collapses conflicting assertions into a result by computing authority-weighted scores for candidate assertions;
(c) for each candidate assertion considered during resolution, recording a contribution weight indicating how much the assertion influenced the final result;
(d) logging a query audit record to persistent storage, the query audit record comprising:
- a unique query identifier computed as a cryptographic hash,
- the querying agent's public key identifier,
- a timestamp,
- the query parameters including subject, predicate, and lens specification,
- a cryptographic hash of the resolution result,
- a list of contributing assertion identifiers with their respective contribution weights;
(e) wherein the query audit record enables subsequent debugging by identifying which assertions contributed to a decision and with what weights;
(f) supporting query replay by re-executing a historical query with current data and comparing the result hash to detect epistemic drift, wherein epistemic drift is defined as a change in resolution result caused by new assertions, votes, or retracted evidence.
Independent Claim 7: Content-Addressed Merkle DAG (System)
A database system for immutable knowledge storage comprising:
(a) a content-addressed storage engine wherein each assertion's unique identifier is computed as a BLAKE3 cryptographic hash of the assertion's serialized content, ensuring that identical assertions produce identical identifiers and enabling automatic deduplication;
(b) a parent hash field in each assertion that references zero or more predecessor assertions by their content-addressed identifiers, wherein the parent hash indicates that the current assertion modifies, supersedes, or depends upon the referenced predecessor;
(c) a directed acyclic graph (DAG) structure formed by the parent hash references, wherein:
- the graph is append-only with no mutations to existing nodes,
- each node (assertion) is immutable once stored,
- the graph preserves complete history of all assertions;
(d) a Merkle root computation module that computes a single hash representing the entire database state by traversing the DAG;
(e) whereby the content-addressed Merkle DAG enables:
- efficient diff detection between database states by comparing Merkle roots,
- distributed synchronization via Merkle proof exchange wherein only differing subtrees are transferred,
- immutable audit trail of assertion provenance by traversing parent hash references,
- cryptographic verification that no historical assertions have been tampered with.
Dependent Claims: Source Class Hierarchy (Claims 8-10)
Claim 8. The system of claim 1, wherein the source class hierarchy comprises exactly six tiers with the following specific values:
- Tier 0 (Regulatory): authority weight 1.0, decay half-life infinite (never decays),
- Tier 1 (Clinical): authority weight 0.9, decay half-life 730 days (2 years),
- Tier 2 (Observational): authority weight 0.7, decay half-life 365 days (1 year),
- Tier 3 (Expert): authority weight 0.5, decay half-life 180 days (6 months),
- Tier 4 (Community): authority weight 0.2, decay half-life 90 days (3 months),
- Tier 5 (Anecdotal): authority weight 0.1, decay half-life 30 days (1 month).
Claim 9. The system of claim 1, wherein the storage engine maintains a source class index keyed by source class tier, enabling queries that filter assertions by authority tier range.
Claim 10. The system of claim 1, wherein assertions from different source classes are stored in separate index partitions, and wherein the query engine performs partition pruning to optimize tier-specific queries.
Dependent Claims: Resolution Lenses (Claims 11-15)
Claim 11. The system of claim 1, wherein the lens engine supports a recency lens that returns the assertion with the most recent timestamp.
Claim 12. The system of claim 1, wherein the lens engine supports a consensus lens that groups assertions by object value, computes an authority-weighted support score for each cluster, and returns the representative assertion from the cluster with highest support.
Claim 13. The system of claim 1, wherein the lens engine supports an authority lens that weights assertions by the trust rank reputation of the signing agents, wherein trust rank is stored in a separate trust rank index.
Claim 14. The system of claim 1, wherein the lens engine supports a vote-aware lens that aggregates votes from a separate ballot box stream and weights assertions by vote totals, using only the most recent vote from each agent.
Claim 15. The system of claim 1, wherein the lens engine supports an epoch-aware lens that filters assertions based on paradigm context, excluding assertions tagged with epochs that have been superseded by more recent epochs.
Dependent Claims: Invalidation Cascade Implementation (Claim 16)
Claim 16. The method of claim 3, wherein traversing the dependency graph comprises breadth-first search (BFS) starting from the retracted assertion, wherein the BFS maintains a visited set to prevent cycles and terminates when all reachable dependent assertions have been processed.
Dependent Claims: Ballot Box Pattern (Claims 17-20)
Claim 17. The system of claim 1, further comprising a ballot box module configured to:
- receive votes from agents on existing assertions, each vote comprising an assertion hash, agent identifier, weight (0.0 to 1.0), and Ed25519 cryptographic signature,
- store votes in an append-only vote log separate from the assertion store,
- periodically materialize aggregated vote counts into a consensus view queryable by the lens engine.
Claim 18. The system of claim 17, wherein the ballot box module enables high-velocity consensus by accepting votes at a rate exceeding 100,000 votes per second without write contention on assertion records.
Claim 19. The system of claim 17, wherein an agent may change their vote by submitting a new vote with a later timestamp, and the lens engine uses only the most recent vote from each agent when computing vote totals.
Claim 20. The system of claim 17, wherein votes include an optional source URL and observed context bytes, enabling provenance tracking of where claims were observed and transforming votes from opinions into cryptographic witnesses.
Dependent Claims: Materialized Views (Claims 21-23)
Claim 21. The system of claim 1, further comprising a materializer configured to pre-compute resolution results for common subject-predicate pairs and store them in materialized view records keyed by MV:{subject}:{predicate} for O(1) query latency.
Claim 22. The system of claim 21, wherein materialized views are updated asynchronously by a background worker that monitors assertion and vote streams, and wherein the materializer processes updates in batches to amortize computation cost.
Claim 23. The system of claim 21, wherein materialized views include metadata comprising: the winning assertion hash, the lens name that produced the resolution, a resolution confidence score between 0.0 and 1.0, the count of candidates considered, and a timestamp of materialization.
Dependent Claims: Epoch Supersession (Claims 24-27)
Claim 24. The system of claim 1, further comprising an epoch registry storing epoch definitions, each epoch comprising: a unique epoch identifier (BLAKE3 hash), a human-readable name, a start timestamp, an optional end timestamp, and an optional reference to a superseded epoch by its identifier.
Claim 25. The system of claim 24, wherein epoch supersession types comprise:
- invalidation, indicating the old epoch was factually wrong and all assertions in it should be treated as deprecated,
- temporal, indicating the old epoch was correct at the time but is now outdated,
- refinement, indicating the old epoch was a simplification that has been superseded by a more accurate model.
Claim 26. The system of claim 24, wherein assertions tagged with a superseded epoch are excluded from default query results by the lens engine, but remain accessible via explicit as-of queries or historical queries.
Claim 27. The system of claim 24, wherein the lens engine supports as-of queries by accepting a timestamp parameter and returning the state of knowledge as it existed at that timestamp, computed by traversing the append-only assertion log and filtering to assertions with timestamps before the specified time.
Dependent Claims: Query Audit Implementation (Claims 28-30)
Claim 28. The method of claim 6, wherein the query audit record is stored in an append-only audit log indexed by query identifier and by querying agent identifier, enabling efficient retrieval of all queries made by a specific agent.
Claim 29. The method of claim 6, wherein the contribution weight for each contributing assertion is computed as the assertion's authority-weighted score divided by the sum of all candidate authority-weighted scores.
Claim 30. The method of claim 6, further comprising an alert module configured to detect epistemic drift by periodically replaying historical queries and notifying agents when their prior query results would differ under current data.
Dependent Claims: Content-Addressing (Claims 31-33)
Claim 31. The system of claim 7, wherein the BLAKE3 cryptographic hash produces a 256-bit (32-byte) identifier, and wherein the system rejects any attempt to store an assertion with an identifier matching an existing assertion (deduplication).
Claim 32. The system of claim 7, wherein the Merkle root is computed incrementally using a streaming algorithm that processes assertions in order of arrival, enabling efficient root updates without recomputing the entire tree.
Claim 33. The system of claim 7, wherein Merkle proof exchange for distributed synchronization comprises: computing the local Merkle root, receiving a remote Merkle root, traversing the tree to identify differing subtrees, and requesting only assertions from differing subtrees.
Dependent Claims: Fallback Positions (Claims 34-38)
Claim 34. The system of claim 1, wherein the storage engine comprises a PostgreSQL database with assertions stored in a JSONB column, and wherein the compound index is implemented as a PostgreSQL GIN index on the subject and predicate fields.
Claim 35. The system of claim 1, wherein the system further comprises a Redis caching layer that stores materialized views with configurable time-to-live (TTL), enabling sub-millisecond query latency for frequently accessed subject-predicate pairs.
Claim 36. The system of claim 1, wherein the lens engine supports user-defined lens functions compiled to WebAssembly (WASM) and executed in a sandboxed runtime, enabling custom resolution strategies without modifying the core system.
Claim 37. The system of claim 1, wherein assertions include an optional vector embedding field comprising a fixed-length array of floating-point values, and wherein the system supports semantic similarity queries via approximate nearest neighbor (ANN) search on the embedding vectors.
Claim 38. The system of claim 2, wherein the decay formula is: effective_confidence = original_confidence × exp(-ln(2) × elapsed_days / half_life_days), and wherein for source classes with no decay (Regulatory, Tier 0), the decay factor is always 1.0.
Prior Art Concerns and Distinction Strategy
Search Summary
After comprehensive search, no single reference or obvious combination teaches the core invention: a database that stores conflicting assertions with source class authority weights and resolves them at query time via configurable lenses with semantic decay.
Category 1: Traditional Databases (Postgres, MySQL, MongoDB)
What They Teach:
- ACID transactions
- Single value per cell (relational) or document (NoSQL)
- Temporal tables (SQL:2011) for version history
What They Do NOT Teach:
- Multiple conflicting values for the same attribute without versioning
- Authority weighting by source class
- Query-time resolution strategies
- Semantic decay by source tier
Specification Language:
"Unlike traditional databases that require a single canonical value per attribute or maintain complex version tables, the present invention stores multiple conflicting assertions for the same subject-predicate pair and resolves them at query time using configurable lens strategies, fundamentally changing the database paradigm from 'store facts' to 'store evidence.'"
Category 2: Event Sourcing / CQRS (Datomic, EventStore)
What They Teach:
- Append-only event logs
- Read-time materialization
- Time-travel queries
What They Do NOT Teach:
- Events can contradict without resolution
- Authority weighting for events
- Semantic decay based on source class
- Trust Pack filtering
Critical Distinction from Martin Fowler's Event Sourcing Pattern:
Event sourcing as defined by Martin Fowler and implemented in systems like Datomic, EventStore, and Axon Framework stores sequential state transformations (events). Events describe changes that have occurred: "OrderPlaced", "PaymentReceived", "ItemShipped". These events form a non-contradicting sequence that is replayed to reconstruct current state.
In contrast, Episteme stores potentially contradicting observations (assertions). Multiple agents may observe the same subject-predicate pair and report different values: Agent A says "drug X causes side effect Y", Agent B says "drug X does not cause side effect Y". These assertions coexist indefinitely and may never resolve. Resolution happens at query time via lenses, not at write time via event ordering.
| Feature | Event Sourcing | Episteme |
|---|---|---|
| Data semantics | Events (state changes) | Assertions (observations) |
| Contradiction handling | Events don't contradict; each describes what happened | Assertions may contradict; multiple observations of same fact |
| State reconstruction | Replay events in order | Apply lens to collapse probability |
| Authority weighting | No; all events are equal | Yes; source class hierarchy |
| Time travel | Replay subset of events | Query with as-of timestamp |
Specification Language:
"In contrast to event sourcing systems that replay events to reconstruct state, wherein events represent sequential transformations that do not contradict, the present invention treats assertions as potentially conflicting evidence that may never resolve, applying lens-based resolution strategies at query time to collapse probability into answers. Unlike events which describe 'what happened' in a non-contradicting sequence, assertions describe 'what is believed' and may directly contradict other assertions about the same subject."
Category 3: Blockchain / Distributed Ledgers
What They Teach:
- Signed transactions
- Immutable append-only storage
- Cryptographic verification
What They Do NOT Teach:
- Consensus achieved at read time, not write time
- Source class authority hierarchy
- Semantic decay
- Trust Pack personalization
Specification Language:
"Unlike blockchain systems that achieve distributed consensus before recording transactions, the present invention deliberately stores contradicting assertions without consensus and defers resolution to query time, enabling different users to apply different resolution strategies to the same underlying data."
Category 4: Knowledge Graphs (Neo4j, GraphDB)
What They Teach:
- Triple storage (subject-predicate-object)
- Graph traversal
- Semantic querying
What They Do NOT Teach:
- Contradicting triples coexist
- Authority weighting
- Cryptographic signatures on triples
- Read-time resolution lenses
Specification Language:
"Unlike knowledge graph databases that store triples as facts wherein conflicts are resolved at write time or by last-write-wins semantics, the present invention stores assertions as signed evidence with authority weighting, preserving contradictions and enabling lens-based resolution at query time."
Category 5: Probabilistic Databases (Academic) — CLOSEST PRIOR ART
Relevant Prior Art:
- Trio (Stanford, 2006-2009)
- MayBMS (Cornell, 2005-2010)
- MCDB (Duke, 2008)
What They Teach:
- Uncertainty representation in databases
- Probabilistic query processing
- Lineage tracking
What They Do NOT Teach:
- Source class authority hierarchy
- Cryptographic signatures on tuples
- Trust Pack personalization
- Semantic decay by source tier
- Production-grade implementation
Critical Distinctions from Trio and MayBMS:
Academic probabilistic databases like Trio (Stanford) and MayBMS (Cornell) model tuple-level uncertainty: "Is this tuple present in the database?" or "What is the probability that this row exists?" They use possible worlds semantics to represent multiple potential database states.
Episteme models assertion-level conflict with authority weighting: "Multiple sources make different claims about the same fact, and we weight them by source authority." The fundamental difference:
| Aspect | Trio / MayBMS | Episteme |
|---|---|---|
| Uncertainty type | Tuple existence uncertainty | Competing claims about facts |
| Weights represent | Probability of tuple existence | Authority of information source |
| Weight source | Statistical model | Source class hierarchy |
| Weight stability | Static probability | Decays based on source class half-life |
| Agent provenance | No agent binding | Cryptographic signatures from agents |
| Personalization | No | Trust Packs filter by trusted agents |
| Invalidation | No cascade mechanism | Dependency graph traversal |
| Implementation | Academic prototype | Production-grade with WAL, indexes |
Specific Trio Distinction: Trio represents uncertainty with (data, lineage, probability) triples. The probability is a static value representing belief that the data is correct. Episteme's assertions have authority weights derived from source class (structural, not statistical) and decay over time based on source class half-life. A Trio tuple with probability 0.8 remains 0.8 forever; an Episteme Tier-5 (Anecdotal) assertion decays to 0.4 effective confidence after 30 days.
Specific MayBMS Distinction: MayBMS uses U-relations (uncertain relations) with probability distributions over attribute values. It supports possible worlds queries but has no concept of source authority, agent signatures, or Trust Pack filtering. MayBMS could not answer "What do Mayo Clinic doctors believe?" because it has no agent identity model.
Specification Language:
"While academic probabilistic databases such as Trio (Stanford) and MayBMS (Cornell) model tuple-level uncertainty using possible worlds semantics, the present invention models assertion-level conflict with source authority weighting. Unlike Trio where probability represents statistical belief in tuple existence, Episteme's authority weights represent the structural credibility of the information source (regulatory vs. anecdotal) and decay over time based on source class half-life. Unlike MayBMS which has no agent identity model, Episteme binds assertions to agents via Ed25519 cryptographic signatures and enables Trust Pack filtering to answer queries like 'What do trusted experts in domain X believe?' These distinctions transform the system from an uncertainty model to an epistemics model."
Prior Art Gap Analysis
| Feature | Traditional DB | Event Sourcing | Blockchain | Knowledge Graph | Probabilistic DB | Episteme |
|---|---|---|---|---|---|---|
| Store contradictions | No | No | No | No | Yes | Yes |
| Source class hierarchy | No | No | No | No | No | Yes |
| Authority weighting | No | No | No | No | Partial | Yes |
| Semantic decay | No | No | No | No | No | Yes |
| Query-time resolution | No | Partial | No | No | Yes | Yes |
| Trust Pack filtering | No | No | No | No | No | Yes |
| Cryptographic signatures | No | No | Yes | No | No | Yes |
| Invalidation cascades | Manual | Manual | No | Manual | No | Yes |
§101 Prosecution Strategy
Primary Argument: Technical Improvement to Database Technology
Per Enfish v. Microsoft (Fed. Cir. 2016), improvements to database architecture are patent-eligible. The claims should be framed as:
"The present invention improves database technology itself by providing a new data model that stores conflicting assertions structurally and resolves them at query time, rather than forcing resolution at write time as required by traditional databases. This is a fundamental change to how databases store and retrieve data, analogous to the self-referential table structure found eligible in Enfish."
Step 2A, Prong One: Not an Abstract Idea
The claims are not directed to an abstract idea. They recite a specific database architecture with:
- Specific data structures: Signed assertions with source class, decay half-life, Ed25519 cryptographic signatures, stored in a BLAKE3 content-addressed Merkle DAG
- Specific algorithms: Lens-based resolution using exponential decay formula, invalidation cascade via BFS traversal, Shannon entropy conflict scoring, roaring bitmap intersection for Trust Pack filtering
- Specific storage layout: Write-ahead log with fsync durability, compound indexes keyed by subject-predicate pairs, materialized view cache
Cannot Be Performed Mentally: The claims recite operations that cannot be performed by a human:
- Traversing thousands of assertions with authority weighting in sub-millisecond time
- Computing Shannon entropy conflict scores across assertion clusters
- Propagating invalidation cascades through a dependency graph via BFS
- Applying exponential decay based on source class half-life across all candidates
- Performing roaring bitmap intersection for Trust Pack filtering
Cite: Enfish v. Microsoft (Fed. Cir. 2016): Database architecture improvements are patent-eligible.
Step 2A, Prong Two: Practical Application
The claims integrate any alleged abstract idea into a practical application by providing a specific technical solution to a specific technical problem:
- Technical Problem: Traditional databases cannot structurally model epistemic uncertainty. They force a single value per attribute, losing the signal when sources disagree.
- Technical Solution: Authority-weighted assertions stored in a content-addressed Merkle DAG, resolved at query time via lens algorithms using specific formulas (exponential decay, Shannon entropy, bitmap intersection).
The improvement is to the database technology itself, not merely using a database to perform an abstract task.
Cite: Core Wireless v. LG (Fed. Cir. 2018): Claims providing specific technical improvements are not abstract.
Step 2B: Significantly More (Berkheimer Argument)
The ordered combination of elements is not well-understood, routine, or conventional:
Combination 1: BLAKE3 content-addressed storage + Ed25519 signatures + source class hierarchy + semantic decay Combination 2: Append-only Merkle DAG + compound indexes + materialized views + lens resolution Combination 3: Roaring bitmap Trust Packs + ballot box voting + invalidation cascades + query audit
No prior art teaches these combinations. Under Berkheimer v. HP Inc., 881 F.3d 1360 (Fed. Cir. 2018), the conventional nature of claim elements is a factual question. The examiner must provide evidence that this specific combination is conventional, and no such evidence exists because:
- No production database uses source class hierarchies with decay half-lives
- No database combines Trust Pack bitmap filtering with lens-based resolution
- No system provides invalidation cascades through a signed assertion dependency graph
Evidentiary Support:
- Consider Rule 132 declaration from PHOSITA attesting to technical improvement
- Specification benchmarks demonstrating sub-millisecond resolution latency
- Prior art search showing no combined teaching of the claimed features
Supporting Documents
| Document | Purpose |
|---|---|
| patent-specification.md | Technical detail: data structures, algorithms, benchmarks |
| patent-figures.md | Descriptions of required patent figures |
Revision History
| Date | Author | Changes |
|---|---|---|
| 2026-02-04 | Initial | First draft with 5 independent claims and 25 dependent claims |
| 2026-02-04 | Rev 2 | Strengthened per counsel analysis: (1) Added technical implementation details to Claim 1 (WAL, BLAKE3, compound index); (2) Strengthened Claim 4 with roaring bitmap implementation; (3) Added Independent Claim 6 (Query Audit Trail) and Claim 7 (Content-Addressed Merkle DAG); (4) Added BFS traversal for invalidation cascades (Claim 16); (5) Added fallback position claims 34-38 (PostgreSQL, Redis, WASM, vector embeddings); (6) Expanded prior art distinctions for Trio/MayBMS and event sourcing; (7) Enhanced §101 strategy with specific technical arguments |