- Add `content: Option<String>` to SourceRecord with rkyv schema evolution (LegacySourceRecord compat deserializer for backward compatibility) - Add MAX_SOURCE_CONTENT_LEN (1MB) limit with API validation - Strip content from list responses, include in single-source GET - Update Go SDK RegisterSourceRequest with Content field - FCM pipeline extracts PDF text via pdftotext and passes to registration - Dashboard impact panel fetches and displays source content with expand/collapse - Add feed endpoint, dashboard feed panel, and signed assertion support - Update data-structures.md, API docs, and storage docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| src | ||
| tests | ||
| Cargo.toml | ||
| README.md | ||
stemedb-ontology
Domain Ontology Layer for Episteme - defines how subjects are structured based on predicate type and domain. Ensures conflicts collide correctly when different sources report on the same thing.
Module Overview
| Module | Purpose |
|---|---|
domain.rs |
Domain, EntityType, PredicateSchema, SourceTier builders |
subject.rs |
SubjectBuilder for canonical subject construction |
validator.rs |
Validates assertions against domain rules |
client.rs |
HTTP client for StemeDB API |
dto/ |
Request/response DTOs for API communication |
pharma/ |
Pharmaceutical domain (reference implementation) |
Quick Start
CLI Usage (steme-pharma)
# Build the CLI
cargo build --release -p stemedb-ontology
# Ingest FDA label data
./target/release/steme-pharma ingest semaglutide,tirzepatide
# Ingest with mock conflicts for testing
./target/release/steme-pharma ingest semaglutide --with-conflicts
# Query conflicts (Skeptic lens - default)
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" hba1c_reduction_percent
# Query with source hierarchy (Layered Consensus)
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" weight_loss_percent --mode layered
# Compare two drugs
./target/release/steme-pharma compare \
"Semaglutide:Type2Diabetes" \
"Tirzepatide:Type2Diabetes" \
--predicate hba1c_reduction_percent
# Explore available predicates for a subject
./target/release/steme-pharma explore "Semaglutide:Type2Diabetes"
# Validate a subject/predicate combination
./target/release/steme-pharma validate "Semaglutide:Type2Diabetes" hba1c_reduction_percent
# JSON output (for scripting)
./target/release/steme-pharma --format json query "Semaglutide" nausea_rate
Programmatic Usage
use stemedb_ontology::{pharma, SubjectBuilder, Validator};
use stemedb_ontology::client::StemeClient;
use stemedb_ontology::pharma::extractors::{FdaLabelExtractor, MedicalExtractor, SourceInput};
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
// Load the pharma domain definition
let domain = pharma::definition();
// Build a subject using the ontology
let schema = domain.get_schema("efficacy").unwrap();
let mut entities = std::collections::HashMap::new();
entities.insert("Drug".to_string(), "Semaglutide".to_string());
entities.insert("Indication".to_string(), "Type2Diabetes".to_string());
let subject = SubjectBuilder::build(schema, &entities, &domain).unwrap();
assert_eq!(subject, "Semaglutide:Type2Diabetes");
// Validate assertions
let validator = Validator::new(&domain);
let result = validator.validate("hba1c_reduction_percent", &subject, 0.95);
assert!(result.is_ok());
// Extract and ingest claims
let client = StemeClient::new("http://localhost:18180");
let extractor = FdaLabelExtractor::new();
let signing_key = SigningKey::generate(&mut OsRng);
let agent_id = signing_key.verifying_key().to_bytes();
let hlc = uhlc::HLCBuilder::new().build();
let claims = extractor.extract(&SourceInput::DrugName("semaglutide".into())).await?;
for claim in claims {
let assertion = claim.to_assertion(&signing_key, agent_id, &hlc);
let hash = client.assert(&assertion).await?;
println!("Ingested: {}", hash);
}
// Query for conflicts
let skeptic = client.skeptic("Semaglutide:Type2Diabetes", "hba1c_reduction_percent").await?;
println!("Conflict score: {}", skeptic.conflict_score);
Architecture
┌─────────────────────────────────────┐
│ Domain Definition │
│ (EntityTypes, Schemas, Hierarchy) │
└──────────────┬──────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
v v v
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ SubjectBuilder │ │ Validator │ │ MedicalExtractor│
│ │ │ │ │ (trait) │
│ Build canonical │ │ Validate against │ │ Extract claims │
│ subject strings │ │ domain rules │ │ from sources │
└────────┬────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
└──────────────┬──────┴──────────────────────┘
│
v
┌───────────────────┐
│ StemeClient │
│ │
│ Submit assertions │
│ Query with lenses │
└─────────┬─────────┘
│
v
┌───────────────────┐
│ StemeDB API │
│ :18180/v1/* │
└───────────────────┘
Subject Patterns
Different predicate types use different subject structures to ensure proper collision:
| Category | Pattern | Example | Use Case |
|---|---|---|---|
| Efficacy | {Drug}:{Indication} |
Semaglutide:Type2Diabetes |
Outcome measures for specific conditions |
| Safety | {Drug} |
Semaglutide |
Adverse events (apply across indications) |
| Mechanism | {Drug}:{Target} |
Semaglutide:GLP1R |
Pharmacology details |
| Comparison | {Drug}:{Comparator}:{Indication} |
Semaglutide:Tirzepatide:Type2Diabetes |
Head-to-head trials |
Source Hierarchy
Claims are weighted by source authority:
| Tier | Source Class | Weight | Examples |
|---|---|---|---|
| 0 | Regulatory | 1.0 | FDA Labels, EMA Reports |
| 1 | Clinical | 0.9 | Phase III RCTs, Lancet, NEJM |
| 2 | Observational | 0.7 | Real-World Evidence, FAERS |
| 3 | Expert | 0.5 | Guidelines, ADA Standards |
| 4 | Community | 0.3 | PatientsLikeMe, Moderated Forums |
| 5 | Anecdotal | 0.1 | Reddit, Twitter, Blog Posts |
Adding a New Domain
See Adding a Domain Guide for step-by-step instructions on implementing new domains (e.g., cardiology, finance).
Testing
# Run all ontology tests
cargo test -p stemedb-ontology
# Run with output
cargo test -p stemedb-ontology -- --nocapture
# Consumer Health UAT
cargo test -p stemedb-ontology --test consumer_health_uat