stemedb/crates/stemedb-ontology
jordan bbe6aedc40 feat: Aphoria security extractors + LLM evaluation architecture + ontology docs
New security extractors:
- insecure_deserialization, orm_injection, path_traversal, security_headers
- ssrf, unvalidated_redirects, weak_password, xxe
- Enhanced tls_version extractor with comprehensive cipher/protocol checks

Architecture docs:
- Scout-judge extraction pattern for LLM-based code analysis
- LLM prompt evaluation framework
- LLM eval implementation guide

Core improvements:
- stemedb-ontology README and client enhancements
- WAL journal/segment instrumentation
- Signing and ingestion refinements
- Consumer health demo script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:22:55 -07:00
..
src feat: Aphoria security extractors + LLM evaluation architecture + ontology docs 2026-02-05 15:22:55 -07:00
tests feat: Aphoria enterprise features + ontology SDK + file length compliance 2026-02-05 12:55:29 -07:00
Cargo.toml feat: Aphoria enterprise features + ontology SDK + file length compliance 2026-02-05 12:55:29 -07:00
README.md feat: Aphoria security extractors + LLM evaluation architecture + ontology docs 2026-02-05 15:22:55 -07:00

stemedb-ontology

Domain Ontology Layer for Episteme - defines how subjects are structured based on predicate type and domain. Ensures conflicts collide correctly when different sources report on the same thing.

Module Overview

Module Purpose
domain.rs Domain, EntityType, PredicateSchema, SourceTier builders
subject.rs SubjectBuilder for canonical subject construction
validator.rs Validates assertions against domain rules
client.rs HTTP client for StemeDB API
dto/ Request/response DTOs for API communication
pharma/ Pharmaceutical domain (reference implementation)

Quick Start

CLI Usage (steme-pharma)

# Build the CLI
cargo build --release -p stemedb-ontology

# Ingest FDA label data
./target/release/steme-pharma ingest semaglutide,tirzepatide

# Ingest with mock conflicts for testing
./target/release/steme-pharma ingest semaglutide --with-conflicts

# Query conflicts (Skeptic lens - default)
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" hba1c_reduction_percent

# Query with source hierarchy (Layered Consensus)
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" weight_loss_percent --mode layered

# Compare two drugs
./target/release/steme-pharma compare \
    "Semaglutide:Type2Diabetes" \
    "Tirzepatide:Type2Diabetes" \
    --predicate hba1c_reduction_percent

# Explore available predicates for a subject
./target/release/steme-pharma explore "Semaglutide:Type2Diabetes"

# Validate a subject/predicate combination
./target/release/steme-pharma validate "Semaglutide:Type2Diabetes" hba1c_reduction_percent

# JSON output (for scripting)
./target/release/steme-pharma --format json query "Semaglutide" nausea_rate

Programmatic Usage

use stemedb_ontology::{pharma, SubjectBuilder, Validator};
use stemedb_ontology::client::StemeClient;
use stemedb_ontology::pharma::extractors::{FdaLabelExtractor, MedicalExtractor, SourceInput};
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;

// Load the pharma domain definition
let domain = pharma::definition();

// Build a subject using the ontology
let schema = domain.get_schema("efficacy").unwrap();
let mut entities = std::collections::HashMap::new();
entities.insert("Drug".to_string(), "Semaglutide".to_string());
entities.insert("Indication".to_string(), "Type2Diabetes".to_string());
let subject = SubjectBuilder::build(schema, &entities, &domain).unwrap();
assert_eq!(subject, "Semaglutide:Type2Diabetes");

// Validate assertions
let validator = Validator::new(&domain);
let result = validator.validate("hba1c_reduction_percent", &subject, 0.95);
assert!(result.is_ok());

// Extract and ingest claims
let client = StemeClient::new("http://localhost:18180");
let extractor = FdaLabelExtractor::new();
let signing_key = SigningKey::generate(&mut OsRng);
let agent_id = signing_key.verifying_key().to_bytes();
let hlc = uhlc::HLCBuilder::new().build();

let claims = extractor.extract(&SourceInput::DrugName("semaglutide".into())).await?;
for claim in claims {
    let assertion = claim.to_assertion(&signing_key, agent_id, &hlc);
    let hash = client.assert(&assertion).await?;
    println!("Ingested: {}", hash);
}

// Query for conflicts
let skeptic = client.skeptic("Semaglutide:Type2Diabetes", "hba1c_reduction_percent").await?;
println!("Conflict score: {}", skeptic.conflict_score);

Architecture

                  ┌─────────────────────────────────────┐
                  │           Domain Definition         │
                  │  (EntityTypes, Schemas, Hierarchy)  │
                  └──────────────┬──────────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
         v                       v                       v
┌─────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│  SubjectBuilder │   │    Validator     │   │  MedicalExtractor│
│                 │   │                  │   │    (trait)       │
│ Build canonical │   │ Validate against │   │ Extract claims   │
│ subject strings │   │ domain rules     │   │ from sources     │
└────────┬────────┘   └────────┬─────────┘   └────────┬─────────┘
         │                     │                      │
         └──────────────┬──────┴──────────────────────┘
                        │
                        v
              ┌───────────────────┐
              │    StemeClient    │
              │                   │
              │ Submit assertions │
              │ Query with lenses │
              └─────────┬─────────┘
                        │
                        v
              ┌───────────────────┐
              │    StemeDB API    │
              │  :18180/v1/*      │
              └───────────────────┘

Subject Patterns

Different predicate types use different subject structures to ensure proper collision:

Category Pattern Example Use Case
Efficacy {Drug}:{Indication} Semaglutide:Type2Diabetes Outcome measures for specific conditions
Safety {Drug} Semaglutide Adverse events (apply across indications)
Mechanism {Drug}:{Target} Semaglutide:GLP1R Pharmacology details
Comparison {Drug}:{Comparator}:{Indication} Semaglutide:Tirzepatide:Type2Diabetes Head-to-head trials

Source Hierarchy

Claims are weighted by source authority:

Tier Source Class Weight Examples
0 Regulatory 1.0 FDA Labels, EMA Reports
1 Clinical 0.9 Phase III RCTs, Lancet, NEJM
2 Observational 0.7 Real-World Evidence, FAERS
3 Expert 0.5 Guidelines, ADA Standards
4 Community 0.3 PatientsLikeMe, Moderated Forums
5 Anecdotal 0.1 Reddit, Twitter, Blog Posts

Adding a New Domain

See Adding a Domain Guide for step-by-step instructions on implementing new domains (e.g., cardiology, finance).

Testing

# Run all ontology tests
cargo test -p stemedb-ontology

# Run with output
cargo test -p stemedb-ontology -- --nocapture

# Consumer Health UAT
cargo test -p stemedb-ontology --test consumer_health_uat