# Adding a New Domain to stemedb-ontology This guide walks you through implementing a new domain (vertical) in the stemedb-ontology crate. By the end, you'll have a working domain with entity types, predicate schemas, and optional extractors. **Time:** ~30 minutes **Prerequisites:** Rust knowledge, familiarity with StemeDB concepts ## Overview A domain in stemedb-ontology defines: 1. **Entity Types** - The kinds of things in your domain (e.g., Drug, Company, Asset) 2. **Predicate Schemas** - How subjects are built for different predicate categories 3. **Source Hierarchy** - How to weight different source authorities 4. **Extractors (optional)** - Code that extracts claims from external sources ## Step 1: Plan Your Domain Model Before writing code, answer these questions: ### What entities exist in your domain? | Entity | Description | Example Values | |--------|-------------|----------------| | ? | ? | ? | **Pharma example:** | Entity | Description | Example Values | |--------|-------------|----------------| | Drug | Pharmaceutical compound | Semaglutide, Tirzepatide | | Indication | Medical condition | Type2Diabetes, Obesity | | Target | Molecular target | GLP1R, GIPR | ### What predicates will you track? Group predicates by category (determines subject pattern): | Category | Subject Pattern | Example Predicates | |----------|-----------------|-------------------| | ? | ? | ? | **Pharma example:** | Category | Subject Pattern | Example Predicates | |----------|-----------------|-------------------| | Efficacy | `{Drug}:{Indication}` | hba1c_reduction_percent, weight_loss_percent | | Safety | `{Drug}` | nausea_rate, has_boxed_warning | | Mechanism | `{Drug}:{Target}` | binding_affinity, mechanism_of_action | ### What sources will provide data? Order from most to least authoritative: | Tier | Source Class | Examples | Weight | |------|--------------|----------|--------| | 0 | Regulatory | ? | 1.0 | | 1 | Clinical | ? | 0.9 | | ... | ... | ... | ... | ## Step 2: Create Domain Module Create the directory structure: ``` crates/stemedb-ontology/src/ {domain}/ mod.rs # Re-exports definition.rs # Domain::new() builder ``` ### Template: `{domain}/mod.rs` ```rust //! {Domain} domain ontology. //! //! This module defines the {domain} vertical with: //! - Entity types (...) //! - Predicate schemas (...) //! - Source hierarchy (...) pub mod definition; pub use definition::definition; // Re-export domain-specific types if any // pub use definition::{...}; ``` ### Template: `{domain}/definition.rs` ```rust //! Compiled-in {domain} domain definition. use crate::domain::{ DefaultLens, Domain, EntityType, NamingConvention, PredicateSchema, SourceTier, }; use stemedb_core::types::SourceClass; /// Build the {domain} domain definition. pub fn definition() -> Domain { let mut domain = Domain::new( "{Domain}", "Description of what this domain covers", ); // ------------------------------------------------------------------------- // Entity Types // ------------------------------------------------------------------------- // Primary entity (e.g., the main subject of claims) domain = domain.with_entity_type( "{PrimaryEntity}", EntityType::required("Description") .with_naming(NamingConvention::CamelCase) // Add aliases for common variations .with_alias("ALIAS", "Canonical"), ); // Secondary entity (for compound subjects) domain = domain.with_entity_type( "{SecondaryEntity}", EntityType::required("Description") .with_naming(NamingConvention::CamelCase), ); // ------------------------------------------------------------------------- // Predicate Schemas // ------------------------------------------------------------------------- // Category 1: Primary predicates (single entity subject) domain = domain.with_predicate_schema( "category1", PredicateSchema::new( "Description of this predicate category", "{PrimaryEntity}", ) .with_predicates(vec![ "predicate_one", "predicate_two", ]) .with_default_lens(DefaultLens::Recency), ); // Category 2: Compound predicates (multi-entity subject) domain = domain.with_predicate_schema( "category2", PredicateSchema::new( "Description", "{PrimaryEntity}:{SecondaryEntity}", ) .with_predicates(vec![ "compound_predicate", ]) .with_default_lens(DefaultLens::LayeredConsensus), ); // ------------------------------------------------------------------------- // Source Hierarchy // ------------------------------------------------------------------------- domain = domain.with_source_hierarchy(vec![ SourceTier::new(SourceClass::Regulatory, "Tier 0: Official Sources") .with_examples(vec!["Government agencies", "Standards bodies"]) .with_weight(1.0), SourceTier::new(SourceClass::Clinical, "Tier 1: Primary Research") .with_examples(vec!["Peer-reviewed journals", "Research institutions"]) .with_weight(0.9) .with_decay(730), // 2 year half-life SourceTier::new(SourceClass::Observational, "Tier 2: Secondary Analysis") .with_examples(vec!["Industry reports", "Analyst research"]) .with_weight(0.7) .with_decay(365), SourceTier::new(SourceClass::Expert, "Tier 3: Expert Opinion") .with_examples(vec!["Industry experts", "Consultants"]) .with_weight(0.5) .with_decay(180), SourceTier::new(SourceClass::Community, "Tier 4: Community") .with_examples(vec!["Professional forums", "Curated discussions"]) .with_weight(0.3) .with_decay(90), SourceTier::new(SourceClass::Anecdotal, "Tier 5: Anecdotal") .with_examples(vec!["Social media", "Blog posts"]) .with_weight(0.1) .with_decay(30), ]); domain } #[cfg(test)] mod tests { use super::*; #[test] fn test_definition_builds() { let domain = definition(); assert_eq!(domain.name, "{Domain}"); assert!(!domain.entity_types.is_empty()); assert!(!domain.predicate_schemas.is_empty()); assert!(!domain.source_hierarchy.is_empty()); } #[test] fn test_entity_normalization() { let domain = definition(); let entity = domain.get_entity_type("{PrimaryEntity}").expect("entity exists"); // Test alias normalization assert_eq!(entity.normalize("ALIAS"), "Canonical"); assert_eq!(entity.normalize("Canonical"), "Canonical"); } #[test] fn test_predicate_schema_lookup() { let domain = definition(); // Direct lookup let schema = domain.get_schema("category1").expect("schema exists"); assert_eq!(schema.subject_pattern, "{PrimaryEntity}"); // Lookup by predicate let schema = domain.schema_for_predicate("predicate_one").expect("found"); assert!(schema.predicates.contains(&"predicate_one".to_string())); } } ``` ## Step 3: Implement Extractors (Optional) If your domain has external data sources, implement the `MedicalExtractor` trait. ### Directory Structure ``` crates/stemedb-ontology/src/ {domain}/ mod.rs definition.rs extractors/ mod.rs {source}.rs ``` ### Template: `{domain}/extractors/mod.rs` ```rust //! Data extractors for {domain}. mod {source}; pub use {source}::{Source}Extractor; // Re-export common traits from parent pub use crate::pharma::extractors::{ ExtractError, MedicalClaim, MedicalExtractor, RetryConfig, SourceInput, }; ``` ### Template: `{domain}/extractors/{source}.rs` ```rust //! {Source} data extractor. use super::{ExtractError, MedicalClaim, MedicalExtractor, SourceInput}; use async_trait::async_trait; use stemedb_core::types::{ObjectValue, SourceClass}; /// Extractor for {Source} data. pub struct {Source}Extractor { http_client: reqwest::Client, base_url: String, } impl {Source}Extractor { /// Create a new extractor. pub fn new() -> Self { Self { http_client: reqwest::Client::new(), base_url: "https://api.example.com".to_string(), } } } impl Default for {Source}Extractor { fn default() -> Self { Self::new() } } #[async_trait] impl MedicalExtractor for {Source}Extractor { fn name(&self) -> &str { "{Source} Extractor" } fn source_class(&self) -> SourceClass { SourceClass::Regulatory // Adjust based on source authority } fn can_handle(&self, source: &SourceInput) -> bool { matches!(source, SourceInput::DrugName(_) | SourceInput::Url(_)) } async fn extract(&self, source: &SourceInput) -> Result, ExtractError> { let query = match source { SourceInput::DrugName(name) => name.clone(), SourceInput::Url(url) => url.clone(), _ => return Err(ExtractError::NotFound("Unsupported input type".into())), }; // Fetch data from source let url = format!("{}/search?q={}", self.base_url, urlencoding::encode(&query)); let response = self.http_client.get(&url).send().await?; if !response.status().is_success() { return Err(ExtractError::ApiError(format!( "HTTP {}", response.status() ))); } // Parse response and extract claims let mut claims = Vec::new(); // Example claim claims.push( MedicalClaim::new( "Subject", "predicate_name", ObjectValue::Float(42.0), ) .with_confidence(0.9) .with_source_url(&url) .with_source_section("Section Name") .with_quote("Supporting quote from source") .with_source_class(self.source_class()) ); Ok(claims) } } ``` ## Step 4: Create CLI Binary (Optional) For user-facing domains, create a CLI tool. ### Template: `src/bin/steme_{domain}.rs` ```rust //! CLI for {domain} domain operations. use clap::Parser; use stemedb_ontology::client::StemeClient; use stemedb_ontology::{domain}::definition; mod cli; mod commands; #[derive(Parser)] #[command(name = "steme-{domain}")] #[command(about = "{Domain} data operations for StemeDB")] struct Cli { #[arg(long, default_value = "http://localhost:18180")] server: String, #[command(subcommand)] command: Commands, } #[derive(clap::Subcommand)] enum Commands { /// Ingest data Ingest { /* args */ }, /// Query data Query { /* args */ }, } #[tokio::main] async fn main() -> Result<(), Box> { let cli = Cli::parse(); let client = StemeClient::new(&cli.server); match cli.command { Commands::Ingest { /* args */ } => { // Implementation } Commands::Query { /* args */ } => { // Implementation } } Ok(()) } ``` ## Step 5: Testing Checklist Before considering your domain complete: - [ ] `cargo build -p stemedb-ontology` succeeds - [ ] `definition()` returns a valid Domain - [ ] All entity types have meaningful descriptions - [ ] All predicate schemas have correct subject patterns - [ ] Entity normalization works (aliases resolve correctly) - [ ] `schema_for_predicate()` finds the right schema - [ ] Source hierarchy has 6 tiers with decreasing weights - [ ] (If extractors) `cargo test -p stemedb-ontology` passes Run the tests: ```bash cargo test -p stemedb-ontology cargo clippy -p stemedb-ontology -- -D warnings ``` ## Step 6: Integration ### Export from lib.rs Edit `crates/stemedb-ontology/src/lib.rs`: ```rust // Add your domain module pub mod {domain}; // Re-export for convenience pub use {domain}::definition as {domain}_domain; ``` ### Update ai-lookup Add entry to `ai-lookup/index.md` under Domain Ontology section. ### Update CLAUDE.md routing (if significant) If your domain is frequently used, add a routing entry in the Find Your Guide table. ## Complete Example: Cardiology Domain (Skeleton) Here's a minimal working example for a cardiology domain: ```rust // crates/stemedb-ontology/src/cardiology/mod.rs //! Cardiology domain ontology. pub mod definition; pub use definition::definition; ``` ```rust // crates/stemedb-ontology/src/cardiology/definition.rs use crate::domain::{DefaultLens, Domain, EntityType, NamingConvention, PredicateSchema, SourceTier}; use stemedb_core::types::SourceClass; pub fn definition() -> Domain { let mut domain = Domain::new( "Cardiology", "Cardiovascular conditions, procedures, and outcomes", ); // Entities domain = domain .with_entity_type( "Condition", EntityType::required("Cardiovascular condition") .with_naming(NamingConvention::CamelCase) .with_alias("MI", "MyocardialInfarction") .with_alias("CHF", "CongestiveHeartFailure") .with_alias("AF", "AtrialFibrillation"), ) .with_entity_type( "Procedure", EntityType::required("Medical procedure") .with_naming(NamingConvention::CamelCase) .with_alias("CABG", "CoronaryArteryBypassGraft") .with_alias("PCI", "PercutaneousCoronaryIntervention"), ) .with_entity_type( "Biomarker", EntityType::required("Diagnostic biomarker") .with_naming(NamingConvention::CamelCase), ); // Schemas domain = domain .with_predicate_schema( "diagnosis", PredicateSchema::new("Diagnostic criteria", "{Condition}") .with_predicates(vec![ "diagnostic_criteria", "staging_system", "severity_classification", ]) .with_default_lens(DefaultLens::Authority), ) .with_predicate_schema( "outcome", PredicateSchema::new("Treatment outcomes", "{Condition}:{Procedure}") .with_predicates(vec![ "mortality_rate", "complication_rate", "readmission_rate", "length_of_stay_days", ]) .with_default_lens(DefaultLens::LayeredConsensus), ) .with_predicate_schema( "biomarker", PredicateSchema::new("Biomarker thresholds", "{Biomarker}") .with_predicates(vec![ "normal_range", "diagnostic_threshold", "prognostic_value", ]) .with_default_lens(DefaultLens::Consensus), ); // Source hierarchy domain = domain.with_source_hierarchy(vec![ SourceTier::new(SourceClass::Regulatory, "Tier 0: Guidelines") .with_examples(vec!["ACC/AHA Guidelines", "ESC Guidelines"]) .with_weight(1.0), SourceTier::new(SourceClass::Clinical, "Tier 1: Clinical Trials") .with_examples(vec!["Landmark RCTs", "Meta-analyses"]) .with_weight(0.9) .with_decay(730), SourceTier::new(SourceClass::Observational, "Tier 2: Registries") .with_examples(vec!["NCDR", "Get With The Guidelines"]) .with_weight(0.7) .with_decay(365), SourceTier::new(SourceClass::Expert, "Tier 3: Expert Consensus") .with_examples(vec!["Consensus statements", "Textbooks"]) .with_weight(0.5) .with_decay(180), SourceTier::new(SourceClass::Community, "Tier 4: Community") .with_examples(vec!["Medical forums", "CME discussions"]) .with_weight(0.3) .with_decay(90), SourceTier::new(SourceClass::Anecdotal, "Tier 5: Anecdotal") .with_examples(vec!["Case reports", "Social media"]) .with_weight(0.1) .with_decay(30), ]); domain } #[cfg(test)] mod tests { use super::*; #[test] fn test_cardiology_domain() { let domain = definition(); assert_eq!(domain.name, "Cardiology"); // Check entity aliases let condition = domain.get_entity_type("Condition").unwrap(); assert_eq!(condition.normalize("MI"), "MyocardialInfarction"); // Check schema lookup let schema = domain.schema_for_predicate("mortality_rate").unwrap(); assert_eq!(schema.subject_pattern, "{Condition}:{Procedure}"); } } ``` ## Troubleshooting ### "Unknown predicate" errors Your predicate isn't in any schema. Add it to the appropriate `with_predicates()` call. ### Subject collision issues If claims that should conflict aren't conflicting, check that: 1. The subject pattern matches your intent 2. Entity values are being normalized consistently 3. The predicate is in the right schema category ### Extractor not finding data 1. Check the API URL is correct 2. Verify the query parameters match the API's expectations 3. Add debug logging to see raw responses ## Next Steps - Run the Consumer Health UAT to see the pharma domain in action - Read the [Lens documentation](../services/lens.md) to understand conflict resolution - Check the [SDK guide](../../ai-lookup/services/sdk.md) for Go integration