stemedb/crates/stemedb-ontology
jordan ad07a75d0a feat: add source content to source registry, signed assertions, feed endpoint, dashboard enhancements
- Add `content: Option<String>` to SourceRecord with rkyv schema evolution
  (LegacySourceRecord compat deserializer for backward compatibility)
- Add MAX_SOURCE_CONTENT_LEN (1MB) limit with API validation
- Strip content from list responses, include in single-source GET
- Update Go SDK RegisterSourceRequest with Content field
- FCM pipeline extracts PDF text via pdftotext and passes to registration
- Dashboard impact panel fetches and displays source content with expand/collapse
- Add feed endpoint, dashboard feed panel, and signed assertion support
- Update data-structures.md, API docs, and storage docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-19 21:54:27 -07:00
..
src feat: add source content to source registry, signed assertions, feed endpoint, dashboard enhancements 2026-02-19 21:54:27 -07:00
tests feat: Aphoria enterprise features + ontology SDK + file length compliance 2026-02-05 12:55:29 -07:00
Cargo.toml feat: Aphoria enterprise features + ontology SDK + file length compliance 2026-02-05 12:55:29 -07:00
README.md feat: Aphoria security extractors + LLM evaluation architecture + ontology docs 2026-02-05 15:22:55 -07:00

stemedb-ontology

Domain Ontology Layer for Episteme - defines how subjects are structured based on predicate type and domain. Ensures conflicts collide correctly when different sources report on the same thing.

Module Overview

Module Purpose
domain.rs Domain, EntityType, PredicateSchema, SourceTier builders
subject.rs SubjectBuilder for canonical subject construction
validator.rs Validates assertions against domain rules
client.rs HTTP client for StemeDB API
dto/ Request/response DTOs for API communication
pharma/ Pharmaceutical domain (reference implementation)

Quick Start

CLI Usage (steme-pharma)

# Build the CLI
cargo build --release -p stemedb-ontology

# Ingest FDA label data
./target/release/steme-pharma ingest semaglutide,tirzepatide

# Ingest with mock conflicts for testing
./target/release/steme-pharma ingest semaglutide --with-conflicts

# Query conflicts (Skeptic lens - default)
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" hba1c_reduction_percent

# Query with source hierarchy (Layered Consensus)
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" weight_loss_percent --mode layered

# Compare two drugs
./target/release/steme-pharma compare \
    "Semaglutide:Type2Diabetes" \
    "Tirzepatide:Type2Diabetes" \
    --predicate hba1c_reduction_percent

# Explore available predicates for a subject
./target/release/steme-pharma explore "Semaglutide:Type2Diabetes"

# Validate a subject/predicate combination
./target/release/steme-pharma validate "Semaglutide:Type2Diabetes" hba1c_reduction_percent

# JSON output (for scripting)
./target/release/steme-pharma --format json query "Semaglutide" nausea_rate

Programmatic Usage

use stemedb_ontology::{pharma, SubjectBuilder, Validator};
use stemedb_ontology::client::StemeClient;
use stemedb_ontology::pharma::extractors::{FdaLabelExtractor, MedicalExtractor, SourceInput};
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;

// Load the pharma domain definition
let domain = pharma::definition();

// Build a subject using the ontology
let schema = domain.get_schema("efficacy").unwrap();
let mut entities = std::collections::HashMap::new();
entities.insert("Drug".to_string(), "Semaglutide".to_string());
entities.insert("Indication".to_string(), "Type2Diabetes".to_string());
let subject = SubjectBuilder::build(schema, &entities, &domain).unwrap();
assert_eq!(subject, "Semaglutide:Type2Diabetes");

// Validate assertions
let validator = Validator::new(&domain);
let result = validator.validate("hba1c_reduction_percent", &subject, 0.95);
assert!(result.is_ok());

// Extract and ingest claims
let client = StemeClient::new("http://localhost:18180");
let extractor = FdaLabelExtractor::new();
let signing_key = SigningKey::generate(&mut OsRng);
let agent_id = signing_key.verifying_key().to_bytes();
let hlc = uhlc::HLCBuilder::new().build();

let claims = extractor.extract(&SourceInput::DrugName("semaglutide".into())).await?;
for claim in claims {
    let assertion = claim.to_assertion(&signing_key, agent_id, &hlc);
    let hash = client.assert(&assertion).await?;
    println!("Ingested: {}", hash);
}

// Query for conflicts
let skeptic = client.skeptic("Semaglutide:Type2Diabetes", "hba1c_reduction_percent").await?;
println!("Conflict score: {}", skeptic.conflict_score);

Architecture

                  ┌─────────────────────────────────────┐
                  │           Domain Definition         │
                  │  (EntityTypes, Schemas, Hierarchy)  │
                  └──────────────┬──────────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
         v                       v                       v
┌─────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│  SubjectBuilder │   │    Validator     │   │  MedicalExtractor│
│                 │   │                  │   │    (trait)       │
│ Build canonical │   │ Validate against │   │ Extract claims   │
│ subject strings │   │ domain rules     │   │ from sources     │
└────────┬────────┘   └────────┬─────────┘   └────────┬─────────┘
         │                     │                      │
         └──────────────┬──────┴──────────────────────┘
                        │
                        v
              ┌───────────────────┐
              │    StemeClient    │
              │                   │
              │ Submit assertions │
              │ Query with lenses │
              └─────────┬─────────┘
                        │
                        v
              ┌───────────────────┐
              │    StemeDB API    │
              │  :18180/v1/*      │
              └───────────────────┘

Subject Patterns

Different predicate types use different subject structures to ensure proper collision:

Category Pattern Example Use Case
Efficacy {Drug}:{Indication} Semaglutide:Type2Diabetes Outcome measures for specific conditions
Safety {Drug} Semaglutide Adverse events (apply across indications)
Mechanism {Drug}:{Target} Semaglutide:GLP1R Pharmacology details
Comparison {Drug}:{Comparator}:{Indication} Semaglutide:Tirzepatide:Type2Diabetes Head-to-head trials

Source Hierarchy

Claims are weighted by source authority:

Tier Source Class Weight Examples
0 Regulatory 1.0 FDA Labels, EMA Reports
1 Clinical 0.9 Phase III RCTs, Lancet, NEJM
2 Observational 0.7 Real-World Evidence, FAERS
3 Expert 0.5 Guidelines, ADA Standards
4 Community 0.3 PatientsLikeMe, Moderated Forums
5 Anecdotal 0.1 Reddit, Twitter, Blog Posts

Adding a New Domain

See Adding a Domain Guide for step-by-step instructions on implementing new domains (e.g., cardiology, finance).

Testing

# Run all ontology tests
cargo test -p stemedb-ontology

# Run with output
cargo test -p stemedb-ontology -- --nocapture

# Consumer Health UAT
cargo test -p stemedb-ontology --test consumer_health_uat