stemedb/docs/app-concepts/index.md
jordan 1ce4004807 feat: Complete Phase 2 (The Cortex) - query, lens, and API layers
This commit adds the read path (Cortex) to complement the write path (Spine):

## Crates
- stemedb-api: HTTP API with axum + utoipa OpenAPI
  - /v1/assert, /v1/query, /v1/epoch, /v1/skeptic, /v1/trace, /v1/audit
  - Metered endpoints with quota enforcement
  - Ed25519 signature verification
- stemedb-lens: Truth resolution lenses
  - RecencyLens, ConsensusLens, ConfidenceLens
  - VoteAwareConsensusLens (Ballot Box pattern)
  - TrustAwareAuthorityLens (The Hive pattern)
  - SkepticLens (conflict analysis)
  - EpochAwareLens (paradigm-safe queries)
- stemedb-query: Query engine with materialized views

## Storage Extensions
- VoteStore: Vote aggregation with cached counts
- TrustRankStore: Agent reputation with decay
- AuditStore: Query audit trail
- IndexStore: SP/P/S index structures
- SupersessionStore: Epoch supersession chains

## SDKs
- sdk/go/steme: Go HTTP client with Ed25519 signing
- sdk/go/adk: ADK-Go tools for AI agents

## Documentation
- Updated CLAUDE.md, architecture.md, roadmap.md
- New ai-lookup entries for all services
- Use case docs for consumer health intelligence
- Arena roadmap for simulation advancement

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 13:22:44 -07:00

8.0 KiB

Application Layer Concepts

Audience: Teams building applications on top of Episteme. Not covered here: Database internals (see architecture.md) or implementation roadmap (see roadmap.md).


The Boundary Principle

Episteme is a database engine. It stores assertions, resolves conflicts via lenses, and serves queries. Everything else is the application layer.

Episteme Owns

Domain Responsibilities
Storage WAL, KV store, indexes, content-addressing via BLAKE3
Types Assertion, Vote, Epoch, LifecycleStage, MaterializedView
Resolution Lenses (Recency, Consensus, Authority, Skeptic, Layered, Constraints)
Query Filtering, audit trail, time-travel (as_of), decay at resolution time
Integrity Ed25519 signature verification, checksums, epoch supersession cascades
API HTTP endpoints: POST /assert, POST /vote, POST /epoch, GET /query

Application Layer Owns

Domain Responsibilities
Data Acquisition Crawlers, scrapers, API consumers (PubMed, Reddit, FAERS, etc.)
Data Transformation NLP extraction, claim identification, confidence assignment
Classification Source-class tier assignment, study design classification
Enrichment DOI resolution, journal metadata, engagement metrics
Intelligence Cluster detection, anomaly detection, signal surfacing
Presentation Dashboards, summaries, consumer UX, LLM-powered synthesis
Orchestration Agent swarms, reviewers, curators
Client Libraries SDKs that wrap the HTTP API (Go, Python, etc.)
Infrastructure Key management, budget enforcement, job scheduling

The Test

For any feature, ask: "Does this require changes to Episteme's storage, indexing, resolution, or API?"

  • If yes → Database feature. It goes in Episteme.
  • If no, it just calls the API → Application layer. It goes in your app.
  • If both → Split it. Episteme provides the primitive; your app provides the intelligence.

Examples

Feature DB Primitive App Intelligence
Source-class decay Decay formula applied during lens resolution Which tier a source belongs to
Cluster escalation Accept/store/query escalation assertions Detect clusters, compute thresholds
Disagreement dashboard Skeptic Lens + conflict_score UI rendering, LLM summaries
Time-travel as_of query parameter "What changed since you last looked" UX
Visual anchoring Store + query visual_hash Image hashing, screenshot comparison
Pharma data Ingest via POST /assert Crawl PubMed, extract claims, classify

Core Application Components

These are the building blocks that most Episteme-powered applications need.

1. Ingestion Pipeline

Transforms raw source material into signed assertions.

[Raw Source] → [Crawler/API] → [NLP Extraction] → [Classification] → [Enrichment] → [Signing] → [POST /assert]

Responsibilities:

  • Fetch documents from source systems (PubMed API, Reddit API, web scraping)
  • Extract claims as subject/predicate/object triples
  • Assign confidence based on extraction quality
  • Classify source tier (0-6)
  • Enrich with structured metadata
  • Sign with agent key
  • Submit to Episteme

See: Consumer Health Ingestion

2. Source-Class Classifier

Determines the authority tier of each source.

Tier Source Type Examples
0 Regulatory action FDA label change, EMA withdrawal
1 Peer-reviewed RCT, meta-analysis NEJM, Lancet, JAMA
2 Observational study, real-world evidence Insurance claims, EHR studies
3 Pharmacovigilance FAERS reports, EudraVigilance
4 Clinician anecdote, case report Conference presentations, case series
5 Patient community Reddit, forums, patient registries
6 Media, influencer, commercial TikTok, news articles, pharma marketing

Implementation: Rule-based classifier or ML model that maps source metadata (URL pattern, DOI prefix, platform) to tier.

Domain-specific: The tier map for pharma differs from finance differs from legal.

3. Background Gardener

Monitors the knowledge graph for signals that warrant attention.

Responsibilities:

  • Query assertion counts by subject/predicate/tier
  • Detect unusual clustering (e.g., 1,847 Tier-5 assertions in 6 months)
  • Generate escalation assertions when thresholds are crossed
  • Trigger TrustRank decay on schedule
  • Update agent reputations based on outcomes

Episteme primitives it uses:

  • GET /query with aggregation
  • POST /assert for escalation assertions
  • POST /admin/decay-trust-ranks for scheduled decay

4. Presentation Layer

Renders Episteme query results for end users.

Options:

  • Dashboard: React/Vue app that visualizes conflict scores, tier positions, timelines
  • API Gateway: REST/GraphQL layer that adds business logic before returning data
  • Chat Interface: LLM-powered conversational access to the knowledge graph
  • Reports: Scheduled exports (PDF, email digests)

Episteme primitives it uses:

  • GET /query with various lenses
  • Layered Consensus for per-tier breakdown
  • Skeptic Lens for disagreement surfacing
  • as_of for time-travel views
  • since for change tracking

5. Agent Integration

Connects AI agents to Episteme for reading and writing knowledge.

ADK-Go Tool Example:

// Tool: Query the knowledge graph
func QueryKnowledge(ctx context.Context, subject, predicate string, lens string) (string, error) {
    resp, err := episteme.Query(ctx, &QueryParams{
        Subject:   subject,
        Predicate: predicate,
        Lens:      lens,
    })
    if err != nil {
        return "", err
    }
    return formatForAgent(resp), nil
}

// Tool: Assert a new fact
func AssertFact(ctx context.Context, subject, predicate, object string, confidence float32) error {
    return episteme.Assert(ctx, &AssertionRequest{
        Subject:    subject,
        Predicate:  predicate,
        Object:     ObjectText(object),
        Confidence: confidence,
        Lifecycle:  "Proposed",
    })
}

See: ADK-Go Integration Guide


Vertical-Specific Guides

Vertical Guide Key Components
Consumer Health consumer-health.md Pharma crawlers, tier classification, disagreement dashboard
Financial Due Diligence (planned) SEC filings, analyst extraction, invalidation cascades
Agile Agent Team (planned) Constraints lens, lifecycle workflow, audit trail

Quick Reference: What Goes Where

If you need to... Episteme provides... You build...
Store conflicting facts Assertion type, append-only DAG Nothing — just POST assertions
Resolve conflicts Lenses (Recency, Consensus, Skeptic, Layered) Lens selection logic
Query historical state as_of parameter Time-travel UI
Track changes since parameter + MV changelog Notification system
Weight by source authority source_class field + source-aware decay Tier classifier
Detect emerging signals Skeptic Lens + conflict_score Gardener (threshold logic)
Show per-tier consensus Layered Consensus Lens Dashboard UI
Extract claims from papers Nothing — pre-assertion transform NLP pipeline
Sign assertions Signature verification Agent wallet / key management
Generate summaries Structured query responses LLM summarizer

See Also