stemedb/crates/stemedb-api/docs/api-intro.md
jordan 58594bc7b9 feat: add feed endpoint, dashboard feed panel, and FindMyHealth app
- Add /v1/feed API endpoint with handler and tests
- Remove health endpoint rate limiting (behind firewall, caused spurious 429s)
- Add dashboard feed panel with list, row, empty state, and loading skeleton
- Update home page to show feed instead of redirecting to skeptic
- Improve API key auth middleware and DTO create/query params
- Add OpenAPI conceptual guide (api-intro.md) with semaglutide examples
- Add FindMyHealth application scaffolding (vision, architecture, prototypes)
- Add FindMyHealth designer/writer and Aphoria founder-CEO agents
- Update roadmap with current progress

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-16 17:16:17 -07:00

18 KiB
Raw Blame History

Episteme: A Database for Claims, Not Facts

Episteme stores assertions - claims about the world with provenance, confidence scores, and authority metadata. Unlike traditional databases that force you to pick "the right answer," Episteme holds all competing claims simultaneously and lets you resolve disagreements at query time.

Think of it as "Git for Truth": Just as Git lets developers work on different versions of code and merge them intelligently, Episteme lets AI agents and humans contribute different observations about the world and resolve conflicts based on context.


The Core Problem: Databases Erase Disagreement

Example: Semaglutide (Ozempic) and Gastroparesis

In early 2024, three authoritative sources made conflicting claims about semaglutide's safety:

Source Says Authority
FDA label "Gastroparesis rare" Tier 0: Regulatory
STEP 1 clinical trial "No gastroparesis signal detected" Tier 1: Clinical
Patient communities (Reddit, forums) "Stomach paralysis, can't eat, hospitalized" (500+ reports) Tier 4: Community

Traditional Database (Postgres):

UPDATE drugs SET gastroparesis_risk = 'rare' WHERE name = 'semaglutide';
-- The clinical trial data is erased.
-- The patient reports never make it in.
-- No record of who said what or when.

Result: A doctor queries the database, sees "rare," prescribes confidently. Six months later, the FDA updates the label to "boxed warning" after investigating patient clusters. The database had the pieces but erased the disagreement.

Episteme Approach:

POST /v1/assert - FDA position
POST /v1/assert - Clinical trial findings
POST /v1/assert - Patient reports

GET /v1/skeptic?subject=drugs/semaglutide&predicate=gastroparesis_risk
 Returns conflict_score: 0.85, all three claims, authority weights

Result: The doctor sees the conflict before prescribing. The disagreement itself is the insight.


What IS an Assertion?

An assertion is a structured claim with these components:

{
  "subject": "drugs/semaglutide",           // WHAT you're talking about
  "predicate": "weight_loss_percentage",    // WHICH property
  "object": {"type": "Number", "value": 15.0}, // WHAT you claim
  "confidence": 0.95,                       // HOW sure you are (0.0-1.0)
  "source_class": "Clinical",               // WHY it's trustworthy
  "source_hash": "step1_trial_2022...",     // WHERE it came from
  "signatures": [...]                       // WHO vouches for it
}

Assertions vs. Facts

Concept What It Means Example
Fact The "one true value" weight_loss = 15%
Assertion "Agent X claims Y based on source Z" "STEP 1 trial (Clinical, conf=0.95) asserts weight_loss = 15%"

Key Insight: Episteme doesn't store "semaglutide causes 15% weight loss" as a fact. It stores "the STEP 1 trial asserted 15% weight loss with 95% confidence on Nov 20, 2023." Later, when the STEP 1 extension finds patients regain two-thirds of the weight, that becomes a new assertion that coexists with the original.


Real-World Example: Modeling Semaglutide Knowledge

1. Efficacy Claims (Quantitative Evidence)

Claim: Semaglutide induces 15-20% weight loss in most patients.

curl -X POST http://localhost:18180/v1/assert \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "drugs/semaglutide/efficacy",
    "predicate": "weight_loss_percentage_mean",
    "object": {"type": "Number", "value": 17.5},
    "confidence": 0.95,
    "source_class": "Clinical",
    "source_hash": "7a3f8b2e...",  # BLAKE3 hash of STEP 1 trial PDF
    "source_metadata": "{\"trial_name\": \"STEP 1\", \"n\": 1961, \"year\": 2021}",
    "signatures": [{
      "agent_id": "clinical_trials_analyzer_v2",
      "signature": "ed25519_sig...",
      "version": 2
    }]
  }'

Why this structure matters:

  • subject: Hierarchical path (drugs/semaglutide/efficacy) enables domain queries
  • predicate: Specific metric (weight_loss_percentage_mean) allows aggregation
  • object.type: Typed value (Number) enables mathematical operations
  • source_metadata: Structured context (trial details) for reproducibility

2. The Rebound Effect (Conflicting Longitudinal Data)

Claim: Patients regain two-thirds of lost weight after discontinuation.

curl -X POST http://localhost:18180/v1/assert \
  -H "Content-Type: application/json" \
  -d '{
    "subject": "drugs/semaglutide/persistence",
    "predicate": "weight_regain_ratio_1yr",
    "object": {"type": "Number", "value": 0.67},
    "confidence": 0.9,
    "source_class": "Clinical",
    "source_hash": "4c9d2a1f...",  # STEP 1 extension
    "source_metadata": "{\"trial_name\": \"STEP 1 Extension\", \"followup_months\": 12, \"year\": 2022}",
    "signatures": [...]
  }'

This contradicts the initial "transformative weight loss" narrative. Both assertions coexist. Query with lens=Recency to get the latest understanding, or lens=Consensus to see if most sources agree.

3. Safety Signals (Authority Tier Hierarchy)

Three-tier safety picture:

Tier 0: Regulatory (FDA Label - Never Fades)

{
  "subject": "drugs/semaglutide/safety/thyroid",
  "predicate": "boxed_warning",
  "object": {"type": "Text", "value": "Medullary thyroid carcinoma risk in rodents"},
  "confidence": 1.0,
  "source_class": "Regulatory",
  "source_hash": "fda_label_2024..."
}

Tier 1: Clinical (2-Year Half-Life)

{
  "subject": "drugs/semaglutide/safety/thyroid",
  "predicate": "thyroid_cancer_risk_humans",
  "object": {"type": "Text", "value": "no_consistent_association"},
  "confidence": 0.85,
  "source_class": "Clinical",
  "source_hash": "ema_review_2023..."
}

Tier 4: Community (30-Day Half-Life)

{
  "subject": "drugs/semaglutide/adverse_events",
  "predicate": "gastroparesis_reports",
  "object": {"type": "Number", "value": 500},
  "confidence": 0.3,
  "source_class": "Community",
  "source_hash": "reddit_cluster_feb2024...",
  "source_metadata": "{\"platform\": \"reddit\", \"timeframe_days\": 90}"
}

Query for conflict:

curl "http://localhost:18180/v1/skeptic?subject=drugs/semaglutide/safety&predicate=*"

Response:

{
  "conflict_score": 0.82,
  "conflicts": [
    {
      "subject": "drugs/semaglutide/safety/thyroid",
      "predicate": "cancer_risk",
      "claims": [
        {
          "value": "risk in rodents",
          "source_tier": "Regulatory",
          "confidence": 1.0,
          "authority_weight": 1.0
        },
        {
          "value": "no_consistent_association",
          "source_tier": "Clinical",
          "confidence": 0.85,
          "authority_weight": 0.8
        }
      ],
      "interpretation": "Regulatory warning based on animal models contradicts human epidemiology. Both valid - animal models guide precaution, human data shows absence of signal to date."
    }
  ]
}

Understanding Source Authority Tiers

Source tiers control how long assertions stay relevant (decay curves) and how much weight they carry in consensus.

Tier Source Type Examples Decay Half-Life Use When
0: Regulatory Government/standards bodies FDA label, SEC filing, ISO standard, RFC Never fades Official regulatory guidance
1: Clinical Peer-reviewed trials Phase III RCT, Cochrane review, NEJM publication 2 years Gold-standard clinical evidence
2: Observational Real-world studies Cohort studies, registry data 1 year Population-level observational data
3: Expert Domain expert opinions Doctor recommendations, analyst reports 6 months Professional judgment
4: Community Patient registries, forums Patient registry, professional network 3 months Aggregated community data
5: Anecdotal Individual reports Reddit post, Twitter thread, single patient 30 days Individual anecdotes, signals

Decay Curve Visualization

Confidence Over Time
  1.0 ┤━━━━━━━━━━━━━━━━━━━━━━━━━━━  Tier 0: FDA label (permanent)
      │
  0.8 ┤─────────╲
      │          ╲─────────╲          Tier 1: Clinical trial
  0.6 ┤                     ╲──────   (2yr half-life)
      │         ╲                  ╲─
  0.4 ┤          ╲─────
      │     ╲          ╲─────        Tier 3: Expert opinion
  0.2 ┤ ╲    ╲────                   (6mo half-life)
      │  ╲╲       ╲────────────
  0.0 ┤───╲╲─────────────────────    Tier 5: Reddit post (30d half-life)
      └──┬──────┬──────┬──────┬──
        0mo    3mo    6mo   12mo

Example: An FDA label from 2019 has the same authority today. A Reddit post from 3 months ago is essentially noise.

The Math

confidence(t) = initial_confidence × 0.5^(t / half_life)

Example: Clinical trial with initial confidence 0.95
  At 1 year:  0.95 × 0.5^(1/2)  = 0.67
  At 2 years: 0.95 × 0.5^(2/2)  = 0.475
  At 4 years: 0.95 × 0.5^(4/2)  = 0.24

Critical Rule: A million Tier 5 (Anecdotal) posts cannot outvote a single Tier 0 (Regulatory) assertion. But they can signal "something is happening here" that deserves investigation - exactly what happened with semaglutide gastroparesis reports.


Time-Travel Queries: "What Did We Know Then?"

The Use Case

"I started semaglutide in June 2023. What was the known safety profile at that time?"

Query as of June 15, 2023:

curl "http://localhost:18180/v1/query?subject=drugs/semaglutide/safety&as_of=2023-06-15T00:00:00Z"

Response:

{
  "subject": "drugs/semaglutide/safety",
  "as_of": "2023-06-15T00:00:00Z",
  "assertions": [
    {
      "predicate": "thyroid_warning",
      "value": "Boxed warning: thyroid tumors in rodents",
      "source_tier": "Regulatory",
      "confidence": 1.0,
      "timestamp": "2021-06-04T00:00:00Z"  // FDA approval date
    },
    {
      "predicate": "gastroparesis_signal",
      "value": "No gastroparesis signal detected",
      "source_tier": "Clinical",
      "confidence": 0.9,
      "timestamp": "2022-11-20T00:00:00Z"  // STEP 1 publication
    }
    // Reddit cluster reports from Feb 2024 NOT included - they're after the as_of date
  ]
}

Query as of February 1, 2024:

curl "http://localhost:18180/v1/query?subject=drugs/semaglutide/safety&as_of=2024-02-01T00:00:00Z"

Response NOW includes:

{
  "assertions": [
    // ... previous assertions ...
    {
      "predicate": "gastroparesis_reports",
      "value": 500,
      "source_tier": "Community",
      "confidence": 0.3,
      "timestamp": "2024-01-28T00:00:00Z"  // Patient cluster detected
    }
  ],
  "conflict_detected": true,
  "conflict_score": 0.85
}

Why This Matters:

  • Liability protection: "Based on available evidence at decision time, no gastroparesis signal was known."
  • Learning: Track how medical understanding evolved month-by-month.
  • AI audit trails: "Why did the AI recommend this? Here's exactly what it knew on that date."

Practical Guidance: Writing Good Assertions

Good Assertion: Specific, Sourced, Falsifiable

{
  "subject": "drugs/semaglutide/metabolic_effects",
  "predicate": "lean_mass_loss_percentage",
  "object": {"type": "Number", "value": 25.0},
  "confidence": 0.9,
  "source_class": "Clinical",
  "source_hash": "step1_body_composition_analysis_hash",
  "source_metadata": "{\"measurement_method\": \"DEXA_scan\", \"study\": \"STEP_1\", \"note\": \"Of total weight lost, 25% is lean tissue. However, lean-to-fat ratio improves.\"}"
}

Why it's good:

  • Specific predicate: Not just "causes_muscle_loss" but "lean_mass_loss_percentage"
  • Quantitative: 25% is verifiable, not subjective
  • Sourced: Points to exact trial with measurement method
  • Contextualized: Metadata notes the nuance (ratio still improves)

Bad Assertion: Vague, Unsourced, Opinion

{
  "subject": "drugs/semaglutide",
  "predicate": "is_good",
  "object": {"type": "Boolean", "value": true},
  "confidence": 1.0,
  "source_class": "Anecdotal",
  "source_hash": "my_opinion"
}

Why it's bad:

  • Vague predicate: "is_good" is meaningless - good for what?
  • Subjective value: Boolean opinion, not measurable
  • No provenance: "my_opinion" isn't a real source
  • No context: Why is it good? Good for whom?

Decision Tree: Choosing Predicates

Is it a property that changes over time?
  ├─ YES → Use timestamped assertions
  │   Examples: "weight_loss_percentage", "market_cap_usd"
  │
  └─ NO → Use static assertions
      Examples: "chemical_structure", "fda_approval_date"

Is it measurable/quantitative?
  ├─ YES → Use Number type
  │   Examples: 15.0 (percentage), 1000000 (dollars)
  │
  └─ NO → Use Text or Boolean
      Examples: "gastroparesis" (Text), true (Boolean)

Does it involve relationships?
  ├─ YES → Use hierarchical subjects
  │   Examples: "drugs/semaglutide/interactions/sglt2_inhibitors"
  │
  └─ NO → Use flat subjects
      Examples: "drugs/semaglutide"

Common Patterns

Pattern 1: Conflicting Clinical Evidence

Scenario: Two trials, different populations, contradictory results.

# Trial 1: Weight loss in obesity cohort
POST /v1/assert
{
  "subject": "drugs/semaglutide/efficacy/obesity",
  "predicate": "weight_loss_percentage",
  "object": {"type": "Number", "value": 17.5},
  "source_class": "Clinical",
  "source_metadata": "{\"trial\": \"STEP_1\", \"population\": \"obesity_without_diabetes\"}"
}

# Trial 2: Weight loss in T2D cohort
POST /v1/assert
{
  "subject": "drugs/semaglutide/efficacy/diabetes",
  "predicate": "weight_loss_percentage",
  "object": {"type": "Number", "value": 12.4},
  "source_class": "Clinical",
  "source_metadata": "{\"trial\": \"SUSTAIN_9\", \"population\": \"type_2_diabetes\"}"
}

# Query both
GET /v1/query?subject=drugs/semaglutide/efficacy/*&predicate=weight_loss_percentage
→ Returns both, no conflict (different subjects)

Pattern 2: Signal Escalation (Anecdotal → Investigation)

The gastroparesis detection story:

# Month 1: First anecdotal reports (Tier 5)
POST /v1/assert {"subject": "drugs/semaglutide/signals", "predicate": "gastroparesis_reports", "object": {"type": "Number", "value": 50}, "source_class": "Anecdotal"}

# Month 3: Cluster detected (Tier 4)
POST /v1/assert {"subject": "drugs/semaglutide/signals", "predicate": "gastroparesis_reports", "object": {"type": "Number", "value": 500}, "source_class": "Community"}

# Month 6: Retrospective study (Tier 2)
POST /v1/assert {"subject": "drugs/semaglutide/adverse_events", "predicate": "gastroparesis_signal", "object": {"type": "Text", "value": "statistically_significant"}, "source_class": "Observational"}

# Month 12: FDA investigation (Tier 0)
POST /v1/assert {"subject": "drugs/semaglutide/safety", "predicate": "gastroparesis_warning", "object": {"type": "Text", "value": "under_investigation"}, "source_class": "Regulatory"}

Escalation policy:

{
  "policy_name": "drug_safety_escalation",
  "rules": [
    {
      "condition": "Community reports > 100 AND no Clinical signal",
      "action": "flag_for_investigation"
    },
    {
      "condition": "Observational signal AND Regulatory silence",
      "action": "notify_pharmacovigilance_team"
    }
  ]
}

Pattern 3: Synergistic Drug Interactions

Scenario: Semaglutide + SGLT2 inhibitors have additive benefits.

POST /v1/assert
{
  "subject": "drugs/semaglutide/interactions/sglt2_inhibitors",
  "predicate": "hba1c_reduction_synergy",
  "object": {"type": "Number", "value": -1.42},  // Percentage point reduction
  "confidence": 0.95,
  "source_class": "Clinical",
  "source_metadata": "{\"trial\": \"SUSTAIN_9\", \"note\": \"Additive glucose control beyond either drug alone\"}"
}

API Workflow: From Raw Evidence to Queryable Knowledge

Step 1: Ingest Evidence

# An AI agent reads a clinical trial PDF and extracts structured claims
POST /v1/assert (efficacy data)
POST /v1/assert (safety data)
POST /v1/assert (dosing data)

Step 2: Query for Conflicts

GET /v1/skeptic?subject=drugs/semaglutide&predicate=*
→ Returns conflict_score for each predicate

Step 3: Resolve with Lenses

# Regulatory perspective (trust FDA)
GET /v1/query?subject=drugs/semaglutide&lens=Authority&source_tier=Regulatory

# Recency perspective (latest data)
GET /v1/query?subject=drugs/semaglutide&lens=Recency

# Consensus perspective (majority vote)
GET /v1/query?subject=drugs/semaglutide&lens=Consensus

Step 4: Audit Trail

# Who asserted what and when?
GET /v1/audit?subject=drugs/semaglutide/safety/gastroparesis
→ Returns full provenance chain with signatures

Getting Started

1. Start the Server

cargo run --package stemedb-api
# Server runs on http://localhost:18180

2. Explore Interactive Docs

http://localhost:18180/swagger-ui

3. Run Your First Query

curl http://localhost:18180/v1/health
# {"status":"healthy","version":"0.1.0"}

4. Create Your First Assertion

See the Go SDK examples for working code with Ed25519 signature generation.


When NOT to Use Episteme

Episteme is designed for knowledge with disagreement. If you have:

  • Simple key-value storage (use Redis)
  • Transactional OLTP (use Postgres)
  • Single source of truth with no conflicts (use any SQL DB)
  • Real-time analytics (use ClickHouse)

Use Episteme when:

  • Multiple sources contradict each other
  • Authority tiers matter (FDA > Reddit)
  • Time-travel queries are critical
  • AI agents need to see disagreement before acting
  • Audit trails must show "what was known when"

Next Steps