# Episteme: A Database for Claims, Not Facts

Episteme stores **assertions** - claims about the world with provenance, confidence scores, and authority metadata. Unlike traditional databases that force you to pick "the right answer," Episteme holds all competing claims simultaneously and lets you resolve disagreements at query time.

**Think of it as "Git for Truth":** Just as Git lets developers work on different versions of code and merge them intelligently, Episteme lets AI agents and humans contribute different observations about the world and resolve conflicts based on context.

---

## The Core Problem: Databases Erase Disagreement

### Example: Semaglutide (Ozempic) and Gastroparesis

In early 2024, three authoritative sources made conflicting claims about semaglutide's safety:

| Source | Says | Authority |
|--------|------|-----------|
| FDA label | "Gastroparesis rare" | Tier 0: Regulatory |
| STEP 1 clinical trial | "No gastroparesis signal detected" | Tier 1: Clinical |
| Patient communities (Reddit, forums) | "Stomach paralysis, can't eat, hospitalized" (500+ reports) | Tier 4: Community |

**Traditional Database (Postgres):**
```sql
UPDATE drugs SET gastroparesis_risk = 'rare' WHERE name = 'semaglutide';
-- The clinical trial data is erased.
-- The patient reports never make it in.
-- No record of who said what or when.
```

**Result:** A doctor queries the database, sees "rare," prescribes confidently. Six months later, the FDA updates the label to "boxed warning" after investigating patient clusters. The database had the pieces but erased the disagreement.

**Episteme Approach:**
```json
POST /v1/assert - FDA position
POST /v1/assert - Clinical trial findings
POST /v1/assert - Patient reports

GET /v1/skeptic?subject=drugs/semaglutide&predicate=gastroparesis_risk
→ Returns conflict_score: 0.85, all three claims, authority weights
```

**Result:** The doctor sees the conflict *before* prescribing. The disagreement itself is the insight.

---

## What IS an Assertion?

An assertion is a **structured claim** with these components:

```json
{
  "subject": "drugs/semaglutide",           // WHAT you're talking about
  "predicate": "weight_loss_percentage",    // WHICH property
  "object": {"type": "Number", "value": 15.0}, // WHAT you claim
  "confidence": 0.95,                       // HOW sure you are (0.0-1.0)
  "source_class": "Clinical",               // WHY it's trustworthy
  "source_hash": "step1_trial_2022...",     // WHERE it came from
  "signatures": [...]                       // WHO vouches for it
}
```

### Assertions vs. Facts

| Concept | What It Means | Example |
|---------|---------------|---------|
| **Fact** | The "one true value" | `weight_loss = 15%` |
| **Assertion** | "Agent X claims Y based on source Z" | "STEP 1 trial (Clinical, conf=0.95) asserts weight_loss = 15%" |

**Key Insight:** Episteme doesn't store "semaglutide causes 15% weight loss" as a fact. It stores "the STEP 1 trial asserted 15% weight loss with 95% confidence on Nov 20, 2023." Later, when the STEP 1 extension finds patients regain two-thirds of the weight, that becomes a *new assertion* that coexists with the original.

---

## Real-World Example: Modeling Semaglutide Knowledge

### 1. Efficacy Claims (Quantitative Evidence)

**Claim:** Semaglutide induces 15-20% weight loss in most patients.

```bash
curl -X POST http://localhost:18180/v1/assert \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "drugs/semaglutide/efficacy",
    "predicate": "weight_loss_percentage_mean",
    "object": {"type": "Number", "value": 17.5},
    "confidence": 0.95,
    "source_class": "Clinical",
    "source_hash": "7a3f8b2e...",  # BLAKE3 hash of STEP 1 trial PDF
    "source_metadata": "{\"trial_name\": \"STEP 1\", \"n\": 1961, \"year\": 2021}",
    "signatures": [{
      "agent_id": "clinical_trials_analyzer_v2",
      "signature": "ed25519_sig...",
      "version": 2
    }]
  }'
```

**Why this structure matters:**
- `subject`: Hierarchical path (`drugs/semaglutide/efficacy`) enables domain queries
- `predicate`: Specific metric (`weight_loss_percentage_mean`) allows aggregation
- `object.type`: Typed value (`Number`) enables mathematical operations
- `source_metadata`: Structured context (trial details) for reproducibility

### 2. The Rebound Effect (Conflicting Longitudinal Data)

**Claim:** Patients regain two-thirds of lost weight after discontinuation.

```bash
curl -X POST http://localhost:18180/v1/assert \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "drugs/semaglutide/persistence",
    "predicate": "weight_regain_ratio_1yr",
    "object": {"type": "Number", "value": 0.67},
    "confidence": 0.9,
    "source_class": "Clinical",
    "source_hash": "4c9d2a1f...",  # STEP 1 extension
    "source_metadata": "{\"trial_name\": \"STEP 1 Extension\", \"followup_months\": 12, \"year\": 2022}",
    "signatures": [...]
  }'
```

**This contradicts the initial "transformative weight loss" narrative.** Both assertions coexist. Query with `lens=Recency` to get the latest understanding, or `lens=Consensus` to see if most sources agree.

### 3. Safety Signals (Authority Tier Hierarchy)

**Three-tier safety picture:**

#### Tier 0: Regulatory (FDA Label - Never Fades)
```json
{
  "subject": "drugs/semaglutide/safety/thyroid",
  "predicate": "boxed_warning",
  "object": {"type": "Text", "value": "Medullary thyroid carcinoma risk in rodents"},
  "confidence": 1.0,
  "source_class": "Regulatory",
  "source_hash": "fda_label_2024..."
}
```

#### Tier 1: Clinical (2-Year Half-Life)
```json
{
  "subject": "drugs/semaglutide/safety/thyroid",
  "predicate": "thyroid_cancer_risk_humans",
  "object": {"type": "Text", "value": "no_consistent_association"},
  "confidence": 0.85,
  "source_class": "Clinical",
  "source_hash": "ema_review_2023..."
}
```

#### Tier 4: Community (30-Day Half-Life)
```json
{
  "subject": "drugs/semaglutide/adverse_events",
  "predicate": "gastroparesis_reports",
  "object": {"type": "Number", "value": 500},
  "confidence": 0.3,
  "source_class": "Community",
  "source_hash": "reddit_cluster_feb2024...",
  "source_metadata": "{\"platform\": \"reddit\", \"timeframe_days\": 90}"
}
```

**Query for conflict:**
```bash
curl "http://localhost:18180/v1/skeptic?subject=drugs/semaglutide/safety&predicate=*"
```

**Response:**
```json
{
  "conflict_score": 0.82,
  "conflicts": [
    {
      "subject": "drugs/semaglutide/safety/thyroid",
      "predicate": "cancer_risk",
      "claims": [
        {
          "value": "risk in rodents",
          "source_tier": "Regulatory",
          "confidence": 1.0,
          "authority_weight": 1.0
        },
        {
          "value": "no_consistent_association",
          "source_tier": "Clinical",
          "confidence": 0.85,
          "authority_weight": 0.8
        }
      ],
      "interpretation": "Regulatory warning based on animal models contradicts human epidemiology. Both valid - animal models guide precaution, human data shows absence of signal to date."
    }
  ]
}
```

---

## Understanding Source Authority Tiers

Source tiers control how long assertions stay relevant (decay curves) and how much weight they carry in consensus.

| Tier | Source Type | Examples | Decay Half-Life | Use When |
|------|-------------|----------|-----------------|----------|
| **0: Regulatory** | Government/standards bodies | FDA label, SEC filing, ISO standard, RFC | Never fades | Official regulatory guidance |
| **1: Clinical** | Peer-reviewed trials | Phase III RCT, Cochrane review, NEJM publication | 2 years | Gold-standard clinical evidence |
| **2: Observational** | Real-world studies | Cohort studies, registry data | 1 year | Population-level observational data |
| **3: Expert** | Domain expert opinions | Doctor recommendations, analyst reports | 6 months | Professional judgment |
| **4: Community** | Patient registries, forums | Patient registry, professional network | 3 months | Aggregated community data |
| **5: Anecdotal** | Individual reports | Reddit post, Twitter thread, single patient | 30 days | Individual anecdotes, signals |

### Decay Curve Visualization

```
Confidence Over Time
  1.0 ┤━━━━━━━━━━━━━━━━━━━━━━━━━━━  Tier 0: FDA label (permanent)
      │
  0.8 ┤─────────╲
      │          ╲─────────╲          Tier 1: Clinical trial
  0.6 ┤                     ╲──────   (2yr half-life)
      │         ╲                  ╲─
  0.4 ┤          ╲─────
      │     ╲          ╲─────        Tier 3: Expert opinion
  0.2 ┤ ╲    ╲────                   (6mo half-life)
      │  ╲╲       ╲────────────
  0.0 ┤───╲╲─────────────────────    Tier 5: Reddit post (30d half-life)
      └──┬──────┬──────┬──────┬──
        0mo    3mo    6mo   12mo
```

**Example:** An FDA label from 2019 has the same authority today. A Reddit post from 3 months ago is essentially noise.

### The Math
```
confidence(t) = initial_confidence × 0.5^(t / half_life)

Example: Clinical trial with initial confidence 0.95
  At 1 year:  0.95 × 0.5^(1/2)  = 0.67
  At 2 years: 0.95 × 0.5^(2/2)  = 0.475
  At 4 years: 0.95 × 0.5^(4/2)  = 0.24
```

**Critical Rule:** A million Tier 5 (Anecdotal) posts cannot outvote a single Tier 0 (Regulatory) assertion. But they can signal "something is happening here" that deserves investigation - exactly what happened with semaglutide gastroparesis reports.

---

## Time-Travel Queries: "What Did We Know Then?"

### The Use Case
*"I started semaglutide in June 2023. What was the known safety profile at that time?"*

**Query as of June 15, 2023:**
```bash
curl "http://localhost:18180/v1/query?subject=drugs/semaglutide/safety&as_of=2023-06-15T00:00:00Z"
```

**Response:**
```json
{
  "subject": "drugs/semaglutide/safety",
  "as_of": "2023-06-15T00:00:00Z",
  "assertions": [
    {
      "predicate": "thyroid_warning",
      "value": "Boxed warning: thyroid tumors in rodents",
      "source_tier": "Regulatory",
      "confidence": 1.0,
      "timestamp": "2021-06-04T00:00:00Z"  // FDA approval date
    },
    {
      "predicate": "gastroparesis_signal",
      "value": "No gastroparesis signal detected",
      "source_tier": "Clinical",
      "confidence": 0.9,
      "timestamp": "2022-11-20T00:00:00Z"  // STEP 1 publication
    }
    // Reddit cluster reports from Feb 2024 NOT included - they're after the as_of date
  ]
}
```

**Query as of February 1, 2024:**
```bash
curl "http://localhost:18180/v1/query?subject=drugs/semaglutide/safety&as_of=2024-02-01T00:00:00Z"
```

**Response NOW includes:**
```json
{
  "assertions": [
    // ... previous assertions ...
    {
      "predicate": "gastroparesis_reports",
      "value": 500,
      "source_tier": "Community",
      "confidence": 0.3,
      "timestamp": "2024-01-28T00:00:00Z"  // Patient cluster detected
    }
  ],
  "conflict_detected": true,
  "conflict_score": 0.85
}
```

**Why This Matters:**
- **Liability protection:** "Based on available evidence at decision time, no gastroparesis signal was known."
- **Learning:** Track how medical understanding evolved month-by-month.
- **AI audit trails:** "Why did the AI recommend this? Here's exactly what it knew on that date."

---

## Practical Guidance: Writing Good Assertions

### ✅ Good Assertion: Specific, Sourced, Falsifiable

```json
{
  "subject": "drugs/semaglutide/metabolic_effects",
  "predicate": "lean_mass_loss_percentage",
  "object": {"type": "Number", "value": 25.0},
  "confidence": 0.9,
  "source_class": "Clinical",
  "source_hash": "step1_body_composition_analysis_hash",
  "source_metadata": "{\"measurement_method\": \"DEXA_scan\", \"study\": \"STEP_1\", \"note\": \"Of total weight lost, 25% is lean tissue. However, lean-to-fat ratio improves.\"}"
}
```

**Why it's good:**
- **Specific predicate:** Not just "causes_muscle_loss" but "lean_mass_loss_percentage"
- **Quantitative:** 25% is verifiable, not subjective
- **Sourced:** Points to exact trial with measurement method
- **Contextualized:** Metadata notes the nuance (ratio still improves)

### ❌ Bad Assertion: Vague, Unsourced, Opinion

```json
{
  "subject": "drugs/semaglutide",
  "predicate": "is_good",
  "object": {"type": "Boolean", "value": true},
  "confidence": 1.0,
  "source_class": "Anecdotal",
  "source_hash": "my_opinion"
}
```

**Why it's bad:**
- **Vague predicate:** "is_good" is meaningless - good for what?
- **Subjective value:** Boolean opinion, not measurable
- **No provenance:** "my_opinion" isn't a real source
- **No context:** Why is it good? Good for whom?

### Decision Tree: Choosing Predicates

```
Is it a property that changes over time?
  ├─ YES → Use timestamped assertions
  │   Examples: "weight_loss_percentage", "market_cap_usd"
  │
  └─ NO → Use static assertions
      Examples: "chemical_structure", "fda_approval_date"

Is it measurable/quantitative?
  ├─ YES → Use Number type
  │   Examples: 15.0 (percentage), 1000000 (dollars)
  │
  └─ NO → Use Text or Boolean
      Examples: "gastroparesis" (Text), true (Boolean)

Does it involve relationships?
  ├─ YES → Use hierarchical subjects
  │   Examples: "drugs/semaglutide/interactions/sglt2_inhibitors"
  │
  └─ NO → Use flat subjects
      Examples: "drugs/semaglutide"
```

---

## Common Patterns

### Pattern 1: Conflicting Clinical Evidence

**Scenario:** Two trials, different populations, contradictory results.

```bash
# Trial 1: Weight loss in obesity cohort
POST /v1/assert
{
  "subject": "drugs/semaglutide/efficacy/obesity",
  "predicate": "weight_loss_percentage",
  "object": {"type": "Number", "value": 17.5},
  "source_class": "Clinical",
  "source_metadata": "{\"trial\": \"STEP_1\", \"population\": \"obesity_without_diabetes\"}"
}

# Trial 2: Weight loss in T2D cohort
POST /v1/assert
{
  "subject": "drugs/semaglutide/efficacy/diabetes",
  "predicate": "weight_loss_percentage",
  "object": {"type": "Number", "value": 12.4},
  "source_class": "Clinical",
  "source_metadata": "{\"trial\": \"SUSTAIN_9\", \"population\": \"type_2_diabetes\"}"
}

# Query both
GET /v1/query?subject=drugs/semaglutide/efficacy/*&predicate=weight_loss_percentage
→ Returns both, no conflict (different subjects)
```

### Pattern 2: Signal Escalation (Anecdotal → Investigation)

**The gastroparesis detection story:**

```bash
# Month 1: First anecdotal reports (Tier 5)
POST /v1/assert {"subject": "drugs/semaglutide/signals", "predicate": "gastroparesis_reports", "object": {"type": "Number", "value": 50}, "source_class": "Anecdotal"}

# Month 3: Cluster detected (Tier 4)
POST /v1/assert {"subject": "drugs/semaglutide/signals", "predicate": "gastroparesis_reports", "object": {"type": "Number", "value": 500}, "source_class": "Community"}

# Month 6: Retrospective study (Tier 2)
POST /v1/assert {"subject": "drugs/semaglutide/adverse_events", "predicate": "gastroparesis_signal", "object": {"type": "Text", "value": "statistically_significant"}, "source_class": "Observational"}

# Month 12: FDA investigation (Tier 0)
POST /v1/assert {"subject": "drugs/semaglutide/safety", "predicate": "gastroparesis_warning", "object": {"type": "Text", "value": "under_investigation"}, "source_class": "Regulatory"}
```

**Escalation policy:**
```json
{
  "policy_name": "drug_safety_escalation",
  "rules": [
    {
      "condition": "Community reports > 100 AND no Clinical signal",
      "action": "flag_for_investigation"
    },
    {
      "condition": "Observational signal AND Regulatory silence",
      "action": "notify_pharmacovigilance_team"
    }
  ]
}
```

### Pattern 3: Synergistic Drug Interactions

**Scenario:** Semaglutide + SGLT2 inhibitors have additive benefits.

```bash
POST /v1/assert
{
  "subject": "drugs/semaglutide/interactions/sglt2_inhibitors",
  "predicate": "hba1c_reduction_synergy",
  "object": {"type": "Number", "value": -1.42},  // Percentage point reduction
  "confidence": 0.95,
  "source_class": "Clinical",
  "source_metadata": "{\"trial\": \"SUSTAIN_9\", \"note\": \"Additive glucose control beyond either drug alone\"}"
}
```

---

## API Workflow: From Raw Evidence to Queryable Knowledge

### Step 1: Ingest Evidence
```bash
# An AI agent reads a clinical trial PDF and extracts structured claims
POST /v1/assert (efficacy data)
POST /v1/assert (safety data)
POST /v1/assert (dosing data)
```

### Step 2: Query for Conflicts
```bash
GET /v1/skeptic?subject=drugs/semaglutide&predicate=*
→ Returns conflict_score for each predicate
```

### Step 3: Resolve with Lenses
```bash
# Regulatory perspective (trust FDA)
GET /v1/query?subject=drugs/semaglutide&lens=Authority&source_tier=Regulatory

# Recency perspective (latest data)
GET /v1/query?subject=drugs/semaglutide&lens=Recency

# Consensus perspective (majority vote)
GET /v1/query?subject=drugs/semaglutide&lens=Consensus
```

### Step 4: Audit Trail
```bash
# Who asserted what and when?
GET /v1/audit?subject=drugs/semaglutide/safety/gastroparesis
→ Returns full provenance chain with signatures
```

---

## Getting Started

### 1. Start the Server
```bash
cargo run --package stemedb-api
# Server runs on http://localhost:18180
```

### 2. Explore Interactive Docs
```
http://localhost:18180/swagger-ui
```

### 3. Run Your First Query
```bash
curl http://localhost:18180/v1/health
# {"status":"healthy","version":"0.1.0"}
```

### 4. Create Your First Assertion
See the [Go SDK examples](https://github.com/.../sdk/go/examples/) for working code with Ed25519 signature generation.

---

## When NOT to Use Episteme

Episteme is designed for **knowledge with disagreement**. If you have:

- ❌ Simple key-value storage (use Redis)
- ❌ Transactional OLTP (use Postgres)
- ❌ Single source of truth with no conflicts (use any SQL DB)
- ❌ Real-time analytics (use ClickHouse)

Use Episteme when:

- ✅ Multiple sources contradict each other
- ✅ Authority tiers matter (FDA > Reddit)
- ✅ Time-travel queries are critical
- ✅ AI agents need to see disagreement before acting
- ✅ Audit trails must show "what was known when"

---

## Next Steps

- **[Full API Reference](/swagger-ui)** - Interactive OpenAPI documentation
- **[Go SDK Guide](/docs/sdk/go-usage-guide)** - Build applications with the Go client
- **[Use Cases](/docs/app-concepts/)** - Consumer health, financial due diligence, AI agent debugging
- **[Data Structures](/docs/data-structures)** - Deep dive into assertions, epochs, and lenses