New security extractors: - insecure_deserialization, orm_injection, path_traversal, security_headers - ssrf, unvalidated_redirects, weak_password, xxe - Enhanced tls_version extractor with comprehensive cipher/protocol checks Architecture docs: - Scout-judge extraction pattern for LLM-based code analysis - LLM prompt evaluation framework - LLM eval implementation guide Core improvements: - stemedb-ontology README and client enhancements - WAL journal/segment instrumentation - Signing and ingestion refinements - Consumer health demo script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
178 lines
7.2 KiB
Markdown
178 lines
7.2 KiB
Markdown
# stemedb-ontology
|
|
|
|
Domain Ontology Layer for Episteme - defines how subjects are structured based on predicate type and domain. Ensures conflicts collide correctly when different sources report on the same thing.
|
|
|
|
## Module Overview
|
|
|
|
| Module | Purpose |
|
|
|--------|---------|
|
|
| `domain.rs` | Domain, EntityType, PredicateSchema, SourceTier builders |
|
|
| `subject.rs` | SubjectBuilder for canonical subject construction |
|
|
| `validator.rs` | Validates assertions against domain rules |
|
|
| `client.rs` | HTTP client for StemeDB API |
|
|
| `dto/` | Request/response DTOs for API communication |
|
|
| `pharma/` | Pharmaceutical domain (reference implementation) |
|
|
|
|
## Quick Start
|
|
|
|
### CLI Usage (steme-pharma)
|
|
|
|
```bash
|
|
# Build the CLI
|
|
cargo build --release -p stemedb-ontology
|
|
|
|
# Ingest FDA label data
|
|
./target/release/steme-pharma ingest semaglutide,tirzepatide
|
|
|
|
# Ingest with mock conflicts for testing
|
|
./target/release/steme-pharma ingest semaglutide --with-conflicts
|
|
|
|
# Query conflicts (Skeptic lens - default)
|
|
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" hba1c_reduction_percent
|
|
|
|
# Query with source hierarchy (Layered Consensus)
|
|
./target/release/steme-pharma query "Semaglutide:Type2Diabetes" weight_loss_percent --mode layered
|
|
|
|
# Compare two drugs
|
|
./target/release/steme-pharma compare \
|
|
"Semaglutide:Type2Diabetes" \
|
|
"Tirzepatide:Type2Diabetes" \
|
|
--predicate hba1c_reduction_percent
|
|
|
|
# Explore available predicates for a subject
|
|
./target/release/steme-pharma explore "Semaglutide:Type2Diabetes"
|
|
|
|
# Validate a subject/predicate combination
|
|
./target/release/steme-pharma validate "Semaglutide:Type2Diabetes" hba1c_reduction_percent
|
|
|
|
# JSON output (for scripting)
|
|
./target/release/steme-pharma --format json query "Semaglutide" nausea_rate
|
|
```
|
|
|
|
### Programmatic Usage
|
|
|
|
```rust
|
|
use stemedb_ontology::{pharma, SubjectBuilder, Validator};
|
|
use stemedb_ontology::client::StemeClient;
|
|
use stemedb_ontology::pharma::extractors::{FdaLabelExtractor, MedicalExtractor, SourceInput};
|
|
use ed25519_dalek::SigningKey;
|
|
use rand::rngs::OsRng;
|
|
|
|
// Load the pharma domain definition
|
|
let domain = pharma::definition();
|
|
|
|
// Build a subject using the ontology
|
|
let schema = domain.get_schema("efficacy").unwrap();
|
|
let mut entities = std::collections::HashMap::new();
|
|
entities.insert("Drug".to_string(), "Semaglutide".to_string());
|
|
entities.insert("Indication".to_string(), "Type2Diabetes".to_string());
|
|
let subject = SubjectBuilder::build(schema, &entities, &domain).unwrap();
|
|
assert_eq!(subject, "Semaglutide:Type2Diabetes");
|
|
|
|
// Validate assertions
|
|
let validator = Validator::new(&domain);
|
|
let result = validator.validate("hba1c_reduction_percent", &subject, 0.95);
|
|
assert!(result.is_ok());
|
|
|
|
// Extract and ingest claims
|
|
let client = StemeClient::new("http://localhost:18180");
|
|
let extractor = FdaLabelExtractor::new();
|
|
let signing_key = SigningKey::generate(&mut OsRng);
|
|
let agent_id = signing_key.verifying_key().to_bytes();
|
|
let hlc = uhlc::HLCBuilder::new().build();
|
|
|
|
let claims = extractor.extract(&SourceInput::DrugName("semaglutide".into())).await?;
|
|
for claim in claims {
|
|
let assertion = claim.to_assertion(&signing_key, agent_id, &hlc);
|
|
let hash = client.assert(&assertion).await?;
|
|
println!("Ingested: {}", hash);
|
|
}
|
|
|
|
// Query for conflicts
|
|
let skeptic = client.skeptic("Semaglutide:Type2Diabetes", "hba1c_reduction_percent").await?;
|
|
println!("Conflict score: {}", skeptic.conflict_score);
|
|
```
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ Domain Definition │
|
|
│ (EntityTypes, Schemas, Hierarchy) │
|
|
└──────────────┬──────────────────────┘
|
|
│
|
|
┌───────────────────────┼───────────────────────┐
|
|
│ │ │
|
|
v v v
|
|
┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐
|
|
│ SubjectBuilder │ │ Validator │ │ MedicalExtractor│
|
|
│ │ │ │ │ (trait) │
|
|
│ Build canonical │ │ Validate against │ │ Extract claims │
|
|
│ subject strings │ │ domain rules │ │ from sources │
|
|
└────────┬────────┘ └────────┬─────────┘ └────────┬─────────┘
|
|
│ │ │
|
|
└──────────────┬──────┴──────────────────────┘
|
|
│
|
|
v
|
|
┌───────────────────┐
|
|
│ StemeClient │
|
|
│ │
|
|
│ Submit assertions │
|
|
│ Query with lenses │
|
|
└─────────┬─────────┘
|
|
│
|
|
v
|
|
┌───────────────────┐
|
|
│ StemeDB API │
|
|
│ :18180/v1/* │
|
|
└───────────────────┘
|
|
```
|
|
|
|
## Subject Patterns
|
|
|
|
Different predicate types use different subject structures to ensure proper collision:
|
|
|
|
| Category | Pattern | Example | Use Case |
|
|
|----------|---------|---------|----------|
|
|
| Efficacy | `{Drug}:{Indication}` | `Semaglutide:Type2Diabetes` | Outcome measures for specific conditions |
|
|
| Safety | `{Drug}` | `Semaglutide` | Adverse events (apply across indications) |
|
|
| Mechanism | `{Drug}:{Target}` | `Semaglutide:GLP1R` | Pharmacology details |
|
|
| Comparison | `{Drug}:{Comparator}:{Indication}` | `Semaglutide:Tirzepatide:Type2Diabetes` | Head-to-head trials |
|
|
|
|
## Source Hierarchy
|
|
|
|
Claims are weighted by source authority:
|
|
|
|
| Tier | Source Class | Weight | Examples |
|
|
|------|--------------|--------|----------|
|
|
| 0 | Regulatory | 1.0 | FDA Labels, EMA Reports |
|
|
| 1 | Clinical | 0.9 | Phase III RCTs, Lancet, NEJM |
|
|
| 2 | Observational | 0.7 | Real-World Evidence, FAERS |
|
|
| 3 | Expert | 0.5 | Guidelines, ADA Standards |
|
|
| 4 | Community | 0.3 | PatientsLikeMe, Moderated Forums |
|
|
| 5 | Anecdotal | 0.1 | Reddit, Twitter, Blog Posts |
|
|
|
|
## Adding a New Domain
|
|
|
|
See [Adding a Domain Guide](../../docs/guides/adding-a-domain.md) for step-by-step instructions on implementing new domains (e.g., cardiology, finance).
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Run all ontology tests
|
|
cargo test -p stemedb-ontology
|
|
|
|
# Run with output
|
|
cargo test -p stemedb-ontology -- --nocapture
|
|
|
|
# Consumer Health UAT
|
|
cargo test -p stemedb-ontology --test consumer_health_uat
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- [What is Episteme](../../what-is-episteme.md)
|
|
- [Architecture](../../architecture.md)
|
|
- [Vision](../../vision.md)
|
|
- [Consumer Health Use Case](../../docs/app-concepts/consumer-health.md)
|