stemedb/crates/stemedb-ontology/README.md
jordan bbe6aedc40 feat: Aphoria security extractors + LLM evaluation architecture + ontology docs
New security extractors:
- insecure_deserialization, orm_injection, path_traversal, security_headers
- ssrf, unvalidated_redirects, weak_password, xxe
- Enhanced tls_version extractor with comprehensive cipher/protocol checks

Architecture docs:
- Scout-judge extraction pattern for LLM-based code analysis
- LLM prompt evaluation framework
- LLM eval implementation guide

Core improvements:
- stemedb-ontology README and client enhancements
- WAL journal/segment instrumentation
- Signing and ingestion refinements
- Consumer health demo script

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 15:22:55 -07:00

178 lines
7.2 KiB
Markdown

# stemedb-ontology
Domain Ontology Layer for Episteme - defines how subjects are structured based on predicate type and domain. Ensures conflicts collide correctly when different sources report on the same thing.
## Module Overview
| Module | Purpose |
|--------|---------|
| `domain.rs` | Domain, EntityType, PredicateSchema, SourceTier builders |
| `subject.rs` | SubjectBuilder for canonical subject construction |
| `validator.rs` | Validates assertions against domain rules |
| `client.rs` | HTTP client for StemeDB API |
| `dto/` | Request/response DTOs for API communication |
| `pharma/` | Pharmaceutical domain (reference implementation) |
## Quick Start
### CLI Usage (steme-pharma)
```bash
# Build the CLI
cargo build --release -p stemedb-ontology
# Ingest FDA label data
./target/release/steme-pharma ingest semaglutide,tirzepatide
# Ingest with mock conflicts for testing
./target/release/steme-pharma ingest semaglutide --with-conflicts
# Query conflicts (Skeptic lens - default)
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" hba1c_reduction_percent
# Query with source hierarchy (Layered Consensus)
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" weight_loss_percent --mode layered
# Compare two drugs
./target/release/steme-pharma compare \
"Semaglutide:Type2Diabetes" \
"Tirzepatide:Type2Diabetes" \
--predicate hba1c_reduction_percent
# Explore available predicates for a subject
./target/release/steme-pharma explore "Semaglutide:Type2Diabetes"
# Validate a subject/predicate combination
./target/release/steme-pharma validate "Semaglutide:Type2Diabetes" hba1c_reduction_percent
# JSON output (for scripting)
./target/release/steme-pharma --format json query "Semaglutide" nausea_rate
```
### Programmatic Usage
```rust
use stemedb_ontology::{pharma, SubjectBuilder, Validator};
use stemedb_ontology::client::StemeClient;
use stemedb_ontology::pharma::extractors::{FdaLabelExtractor, MedicalExtractor, SourceInput};
use ed25519_dalek::SigningKey;
use rand::rngs::OsRng;
// Load the pharma domain definition
let domain = pharma::definition();
// Build a subject using the ontology
let schema = domain.get_schema("efficacy").unwrap();
let mut entities = std::collections::HashMap::new();
entities.insert("Drug".to_string(), "Semaglutide".to_string());
entities.insert("Indication".to_string(), "Type2Diabetes".to_string());
let subject = SubjectBuilder::build(schema, &entities, &domain).unwrap();
assert_eq!(subject, "Semaglutide:Type2Diabetes");
// Validate assertions
let validator = Validator::new(&domain);
let result = validator.validate("hba1c_reduction_percent", &subject, 0.95);
assert!(result.is_ok());
// Extract and ingest claims
let client = StemeClient::new("http://localhost:18180");
let extractor = FdaLabelExtractor::new();
let signing_key = SigningKey::generate(&mut OsRng);
let agent_id = signing_key.verifying_key().to_bytes();
let hlc = uhlc::HLCBuilder::new().build();
let claims = extractor.extract(&SourceInput::DrugName("semaglutide".into())).await?;
for claim in claims {
let assertion = claim.to_assertion(&signing_key, agent_id, &hlc);
let hash = client.assert(&assertion).await?;
println!("Ingested: {}", hash);
}
// Query for conflicts
let skeptic = client.skeptic("Semaglutide:Type2Diabetes", "hba1c_reduction_percent").await?;
println!("Conflict score: {}", skeptic.conflict_score);
```
## Architecture
```
┌─────────────────────────────────────┐
│ Domain Definition │
│ (EntityTypes, Schemas, Hierarchy) │
└──────────────┬──────────────────────┘
┌───────────────────────┼───────────────────────┐
│ │ │
v v v
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ SubjectBuilder │ │ Validator │ │ MedicalExtractor│
│ │ │ │ │ (trait) │
│ Build canonical │ │ Validate against │ │ Extract claims │
│ subject strings │ │ domain rules │ │ from sources │
└────────┬────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
└──────────────┬──────┴──────────────────────┘
v
┌───────────────────┐
│ StemeClient │
│ │
│ Submit assertions │
│ Query with lenses │
└─────────┬─────────┘
v
┌───────────────────┐
│ StemeDB API │
│ :18180/v1/* │
└───────────────────┘
```
## Subject Patterns
Different predicate types use different subject structures to ensure proper collision:
| Category | Pattern | Example | Use Case |
|----------|---------|---------|----------|
| Efficacy | `{Drug}:{Indication}` | `Semaglutide:Type2Diabetes` | Outcome measures for specific conditions |
| Safety | `{Drug}` | `Semaglutide` | Adverse events (apply across indications) |
| Mechanism | `{Drug}:{Target}` | `Semaglutide:GLP1R` | Pharmacology details |
| Comparison | `{Drug}:{Comparator}:{Indication}` | `Semaglutide:Tirzepatide:Type2Diabetes` | Head-to-head trials |
## Source Hierarchy
Claims are weighted by source authority:
| Tier | Source Class | Weight | Examples |
|------|--------------|--------|----------|
| 0 | Regulatory | 1.0 | FDA Labels, EMA Reports |
| 1 | Clinical | 0.9 | Phase III RCTs, Lancet, NEJM |
| 2 | Observational | 0.7 | Real-World Evidence, FAERS |
| 3 | Expert | 0.5 | Guidelines, ADA Standards |
| 4 | Community | 0.3 | PatientsLikeMe, Moderated Forums |
| 5 | Anecdotal | 0.1 | Reddit, Twitter, Blog Posts |
## Adding a New Domain
See [Adding a Domain Guide](../../docs/guides/adding-a-domain.md) for step-by-step instructions on implementing new domains (e.g., cardiology, finance).
## Testing
```bash
# Run all ontology tests
cargo test -p stemedb-ontology
# Run with output
cargo test -p stemedb-ontology -- --nocapture
# Consumer Health UAT
cargo test -p stemedb-ontology --test consumer_health_uat
```
## Related Documentation
- [What is Episteme](../../what-is-episteme.md)
- [Architecture](../../architecture.md)
- [Vision](../../vision.md)
- [Consumer Health Use Case](../../docs/app-concepts/consumer-health.md)