stemedb/future-vision.md

# Vision: Epistemic Logits (The Neuro-Symbolic Cortex)

> **Status:** Vision / L9 Roadmap
> **Target:** Solves "Intrinsic Hallucination"
> **Core Concept:** StemeDB is no longer just a database we query; it is a constraint layer applied to the model's probability distribution during inference.

---

## 1. The Problem: The "RAG Ceiling"

Current architectures (including our own Aphoria/ADK stack) rely on **Retrieval Augmented Generation (RAG)**. This is a "Glass Box" system, but it is composed of two disconnected brains:

1.  **The Retriever (StemeDB):** Knows what is true, what is conflicted, and who said what.
2.  **The Generator (LLM):** Knows how to predict the next token based on statistical patterns.

In our current architecture, we paste the Truth (1) into the Context Window of the Generator (2) and *hope* the Generator attends to it.

**The Failure Mode:** The Generator can still ignore the context. It can hallucinate. It can state a high-conflict fact with absolute certainty ("X is true") instead of qualified uncertainty ("Some sources claim X").

**We cannot fix this by prompting. We must fix it by math.**

---

## 2. The Solution: Epistemic Logits

**Epistemic Logits** is a decoding strategy that modifies the probability distribution of the LLM's output layer in real-time, based on the `ConflictScore` and `TrustRank` of the concepts being generated.

We move StemeDB from the **Input Layer** (Prompt) to the **Activation Layer** (Logits).

### The Core Equation

$$ P_{final}(token) = P_{model}(token) \times E(Subject, Predicate) $$

Where $E$ is the **Epistemic Function**:
*   If `ConflictScore > 0.8` (High Disagreement) AND `Token` implies certainty ("is", "proven", "fact"), then $E \to 0$ (Penalty).
*   If `ConflictScore > 0.8` AND `Token` implies uncertainty ("reported", "alleged", "contested"), then $E \to 1$ (Boost).
*   If `SourceTier` is Low (Anecdotal) AND `Time` is old (Decayed), then $E \to 0$.

**Result:** The model *physically cannot* state a contested claim as a fact. It effectively has a "physics engine" for Truth.

---

## 3. Architecture: The Neuro-Symbolic Stack

```ascii
[ User Query ]
      │
      ▼
[ 1. Semantic Router ] ───► [ StemeDB (The Graph) ]
      │                          │
      │ (Context)                │ (Constraints & Scores)
      ▼                          ▼
[ 2. LLM Core ]          [ 3. Epistemic Decoder ]
(Transformer)            (Logit Processor)
      │                          │
      └──► [ Raw Logits ] ──────►│
                                 │ ◄── "Don't say 'proven' if Conflict > 0.5"
                                 │
                                 ▼
                         [ Final Token ]
```

### Component 1: The Lookahead Mapper
To constrain logits, we must know what the model is *about* to say. We implement a lightweight "Concept Probe" (a small BERT model or sparse autoencoder) that runs parallel to the main LLM.
*   **Input:** Current generation stream.
*   **Output:** The `StemeDB::SubjectID` the stream is discussing.

### Component 2: The Constraint Projector
Once the Subject is identified, StemeDB projects the **Epistemic State** of that subject into a set of forbidden/boosted tokens.
*   *State:* `Semaglutide::has_side_effect` -> Conflict: High.
*   *Constraint:* Ban absolute assertions. Boost attribution markers ("According to FDA...", "Patients report...").

### Component 3: The Reward Loop (RLHF on Reality)
We use the `VoteStore` not just for consensus, but to train a **Reward Model**.
*   **Data:** Millions of historical votes where Agents disagreed.
*   **Training:** Fine-tune the LLM to prefer outputs that align with the *weighted consensus* of the Graph.
*   **Outcome:** The model "intuitively" knows which sources are trustworthy (Tier 0/1) without needing RAG retrieval for every fact.

---

## 4. Implementation Roadmap (The Path to L9)

### Phase 1: Structured Decoding (The "Guardrails")
*Integrate StemeDB with grammar-constrained generation libraries (like `guidance` or `outlines`).*

*   **Mechanism:** Force the LLM to output a citation struct `{ claim: "...", source_id: "...", confidence: 0.0-1.0 }` for every assertion.
*   **Validation:** If the generated `source_id` does not exist in StemeDB, or if the `confidence` doesn't match the `VoteStore`, reject the token stream and regenerate.
*   **Deliverable:** `crates/stemedb-guidance`: A Rust binding for grammar-constrained sampling backed by the KV store.

### Phase 2: DPO Pipeline (The "Training")
*Direct Preference Optimization using StemeDB history.*

*   **Mechanism:** Export the `VoteStore` history as `(Prompt, Chosen, Rejected)` tuples.
    *   *Chosen:* An assertion supported by Tier 0 (Regulatory) sources.
    *   *Rejected:* A conflicting assertion supported only by Tier 5 (Anecdotal) sources.
*   **Action:** Fine-tune a Llama-3 8B model on this dataset.
*   **Deliverable:** `crates/stemedb-rlhf`: A pipeline that turns WAL segments into HuggingFace datasets.

### Phase 3: The Logit Processor (The "Cortex")
*Real-time intervention.*

*   **Mechanism:** A custom sampler (integrated into `llama.cpp` or `vLLM`) that queries StemeDB's `MaterializedView` in real-time (sub-millisecond) during inference.
*   **Optimization:** This requires the `HybridStore` to be memory-mapped into the inference engine's address space for zero-latency lookups.
*   **Deliverable:** `episteme-inference`: A standalone inference server that speaks OpenAI API but enforces StemeDB truth constraints.

---

## 5. The Impact

When we achieve Epistemic Logits, we solve the **Liability Gap**.

Currently, no enterprise can deploy an autonomous agent for critical tasks (Medical, Legal, Finance) because they cannot guarantee the output.

With Epistemic Logits, we provide a mathematical guarantee: **"This system is incapable of stating a claim with higher confidence than the underlying evidence supports."**

This transforms AI from a creative writing tool into a **fiduciary instrument**.