diff --git a/ai-lookup/index.md b/ai-lookup/index.md index fce492c..3909fae 100644 --- a/ai-lookup/index.md +++ b/ai-lookup/index.md @@ -40,6 +40,13 @@ Token-efficient fact storage for StemeDB. Query these for quick context without | Phase 6 UAT | `features/phase6-uat.md` | High | 2026-02-02 | Distributed writes UAT results and fixes | | Aphoria Config | `features/aphoria-config.md` | High | 2026-02-04 | Configuration options including hosted mode | +## Domain Ontology + +| Topic | File | Confidence | Updated | Summary | +|-------|------|------------|---------|---------| +| Adding a Domain | `../docs/guides/adding-a-domain.md` | High | 2026-02-05 | Step-by-step guide for implementing new domains | +| Ontology Crate | `../crates/stemedb-ontology/README.md` | High | 2026-02-05 | Module overview, CLI usage, architecture | + ## Use Cases See [use-cases/README.md](../use-cases/README.md) for production scenarios with Postgres Test analysis. diff --git a/applications/aphoria/docs/architecture/README.md b/applications/aphoria/docs/architecture/README.md index 9e181e2..952c431 100644 --- a/applications/aphoria/docs/architecture/README.md +++ b/applications/aphoria/docs/architecture/README.md @@ -299,6 +299,22 @@ exclude = ["vendor://acme/internal/*"] - Edge case analysis - Real-world adoption path +### LLM Extraction Quality + +**Problem:** How do we ensure LLM prompts produce consistent, high-quality extraction results? + +5. **[LLM Prompt Evaluation - Vision](./llm-prompt-evaluation.md)** + - Problem statement and enterprise requirements + - Architecture overview and core components + - Fixture format design + - CI/CD integration patterns + +6. **[LLM Prompt Evaluation - Implementation](./llm-eval-implementation.md)** ← START HERE + - Actionable implementation spec + - Code snippets and file locations + - 5-phase implementation plan (11 days) + - Seed fixture list + --- ## Quick Reference @@ -307,14 +323,16 @@ exclude = ["vendor://acme/internal/*"] | If you need to... | Read this | |-------------------|-----------| -| Understand the problem | [Concept Matching Analysis](./concept-matching-analysis.md) | -| Implement the solution | [Policy Alias Implementation](./policy-alias-implementation.md) | +| Understand concept matching | [Concept Matching Analysis](./concept-matching-analysis.md) | +| Implement policy aliases | [Policy Alias Implementation](./policy-alias-implementation.md) | | Understand design philosophy | [Matching Philosophy](./matching-philosophy.md) | | Validate enterprise scenarios | [Enterprise Validation](./enterprise-validation.md) | +| Test/evaluate LLM prompts | [LLM Eval Implementation](./llm-eval-implementation.md) | | Add a new extractor | `src/extractors/mod.rs` | | Understand scan flow | `src/scan.rs` | | Modify conflict detection | `src/episteme/conflict.rs` | | Work with Trust Packs | `src/policy.rs`, `src/policy_ops.rs` | +| Work with LLM extraction | `src/llm/` | --- @@ -364,6 +382,29 @@ exclude = ["vendor://acme/internal/*"] - ✅ Works for RFC/OWASP corpus by design - ⚠️ Breaks for enterprise policies with different hierarchies (solved by AD-001) +### AD-004: LLM Prompt Evaluation System + +**Status:** Proposed (2026-02-05) + +**Context:** LLM prompts that drive claim extraction are code, but we don't treat them like code. No tests, no metrics, no regression detection. When prompts change, we don't know if quality improved or degraded. + +**Decision:** Build a comprehensive prompt evaluation system with: +- Golden corpus of test fixtures with expected outcomes +- Observation logging for every extraction +- Metrics computation (precision, recall, F1, cost) +- Regression detection against baselines +- CI integration (smoke tests per-PR, full eval nightly) + +**Implementation:** See [LLM Prompt Evaluation Spec](./llm-prompt-evaluation.md) + +**Consequences:** +- ✅ Prompt changes are validated before deployment +- ✅ Regressions are caught automatically +- ✅ Quality is measurable over time +- ✅ Enterprise confidence in extraction reliability +- ⚠️ Requires maintaining golden corpus +- ⚠️ Live evaluation has token cost + --- ## Design Principles @@ -396,8 +437,11 @@ Community sharing is opt-in with anonymization enabled by default. - Declarative extractors (user-defined in TOML) - Hosted mode (team aggregation) - Community corpus (anonymous sharing) +- LLM-in-the-loop extraction (Gemini semantic claims) +- Pattern learning (LLM-extracted patterns remembered) ### In Progress +- **LLM Prompt Evaluation** - Testing, metrics, and regression detection for prompts ([Spec](./llm-prompt-evaluation.md)) - **Policy aliases** - Enterprise policy matching via glob patterns ([AD-001](./policy-alias-implementation.md)) ### Planned (Q1 2026) diff --git a/applications/aphoria/docs/architecture/llm-eval-implementation.md b/applications/aphoria/docs/architecture/llm-eval-implementation.md new file mode 100644 index 0000000..a8d516b --- /dev/null +++ b/applications/aphoria/docs/architecture/llm-eval-implementation.md @@ -0,0 +1,1356 @@ +# LLM Evaluation Implementation Spec + +> **Status:** Implementation Ready +> **Date:** 2026-02-05 +> **Scope:** Aphoria Phase 7.8 + +--- + +## What We Have + +The current LLM extraction pipeline (`src/llm/`): + +``` +src/llm/ +├── mod.rs # Module exports +├── client.rs # GeminiClient - HTTP client for API +├── extractor.rs # LlmExtractor - orchestration, budget, filtering +├── prompt.rs # build_system_prompt() with ontology +├── ontology.rs # OntologyVocabulary from authority assertions +├── cache.rs # LlmCache - BLAKE3 content hash caching +├── types.rs # LlmClaim, LlmClaimsResponse +└── prompts.rs # DEFAULT_SYSTEM_PROMPT, helpers +``` + +**Key characteristics:** +- Uses Gemini API (configured via `GEMINI_API_KEY`) +- Ontology-aware prompts constrain output to authority vocabulary +- Caches by `BLAKE3(prompt + content + model)` (prompt hash included) +- Token budget tracking (`max_tokens_per_scan`, `max_tokens_per_file`) +- Selective triggering (high-value files only) +- Temperature 0.1 for consistency +- Structured decoding via Gemini Response Schema + +--- + +## What We Need + +### 1. Observation Storage (SQLite) + +**Problem:** We can't see what the LLM returned or how claims were scored. JSON files are inefficient for querying. + +**Solution:** SQLite database with retention policies. + +**Location:** `~/.aphoria/eval/observations.db` + +```rust +// src/eval/db.rs + +use chrono::{Duration, Utc}; +use rusqlite::{params, Connection}; + +pub struct EvalDatabase { + conn: Connection, +} + +impl EvalDatabase { + pub fn open(path: &Path) -> Result { + let conn = Connection::open(path)?; + conn.execute_batch(r#" + CREATE TABLE IF NOT EXISTS observations ( + id TEXT PRIMARY KEY, + timestamp TEXT NOT NULL, + prompt_version TEXT NOT NULL, + prompt_hash TEXT NOT NULL, + model TEXT NOT NULL, + input_hash TEXT NOT NULL, + file_path TEXT NOT NULL, + language TEXT NOT NULL, + content_length INTEGER NOT NULL, + raw_response TEXT NOT NULL, + parsed_claims TEXT NOT NULL, -- JSON + final_claims TEXT NOT NULL, -- JSON + input_tokens INTEGER NOT NULL, + output_tokens INTEGER NOT NULL, + parse_success INTEGER NOT NULL, + parse_error TEXT, + cache_hit INTEGER NOT NULL, + latency_ms INTEGER NOT NULL + ); + CREATE INDEX IF NOT EXISTS idx_obs_timestamp ON observations(timestamp); + CREATE INDEX IF NOT EXISTS idx_obs_prompt_hash ON observations(prompt_hash); + "#)?; + Ok(Self { conn }) + } + + /// Enforce retention: keep last 1000 or 30 days, whichever is larger + pub fn enforce_retention(&self) -> Result { + let cutoff = Utc::now() - Duration::days(30); + self.conn.execute( + "DELETE FROM observations + WHERE timestamp < ?1 + AND id NOT IN (SELECT id FROM observations ORDER BY timestamp DESC LIMIT 1000)", + params![cutoff.to_rfc3339()], + ) + } + + pub fn insert(&self, obs: &Observation) -> Result<()> { + self.conn.execute( + r#"INSERT INTO observations ( + id, timestamp, prompt_version, prompt_hash, model, input_hash, + file_path, language, content_length, raw_response, parsed_claims, + final_claims, input_tokens, output_tokens, parse_success, + parse_error, cache_hit, latency_ms + ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17, ?18)"#, + params![ + obs.id.to_string(), + obs.timestamp.to_rfc3339(), + obs.prompt_version, + obs.prompt_hash, + obs.model, + obs.input_hash, + obs.file_path, + obs.language, + obs.content_length, + obs.raw_response, + serde_json::to_string(&obs.parsed_claims)?, + serde_json::to_string(&obs.final_claims)?, + obs.input_tokens, + obs.output_tokens, + obs.parse_success, + obs.parse_error, + obs.cache_hit, + obs.latency_ms, + ], + )?; + Ok(()) + } +} +``` + +**Observation struct:** + +```rust +// src/llm/observation.rs + +use chrono::{DateTime, Utc}; +use serde::{Deserialize, Serialize}; +use uuid::Uuid; + +/// A logged observation from an LLM extraction. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Observation { + /// Unique ID for this observation. + pub id: Uuid, + + /// When this extraction occurred. + pub timestamp: DateTime, + + /// Prompt version (from PROMPT_VERSION constant). + pub prompt_version: String, + + /// BLAKE3 hash of the prompt template. + pub prompt_hash: String, + + /// Model used (e.g., "gemini-2.0-flash"). + pub model: String, + + /// BLAKE3 hash of input content. + pub input_hash: String, + + /// File path (relative to scan root). + pub file_path: String, + + /// Language detected. + pub language: String, + + /// Content length in bytes. + pub content_length: usize, + + /// Raw LLM response (JSON string). + pub raw_response: String, + + /// Parsed claims (after confidence filter, before ontology validation). + pub parsed_claims: Vec, + + /// Final claims (after ontology validation). + pub final_claims: Vec, + + /// Token usage. + pub input_tokens: usize, + pub output_tokens: usize, + + /// Whether parsing succeeded. + pub parse_success: bool, + + /// Parse error if any. + pub parse_error: Option, + + /// Cache status. + pub cache_hit: bool, + + /// Latency in milliseconds. + pub latency_ms: u64, +} + +/// A claim as parsed from LLM JSON (before validation). +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ParsedClaim { + pub subject: String, + pub predicate: String, + pub value: serde_json::Value, + pub confidence: f32, + pub line: usize, +} + +/// A claim after ontology validation. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct FinalClaim { + pub concept_path: String, + pub predicate: String, + pub value: serde_json::Value, + pub confidence: f32, + pub matched_ontology: bool, + pub fuzzy_matched: bool, +} +``` + +**Integration point:** Modify `LlmExtractor::extract()` to emit observations. + +--- + +### 2. Cache Key Includes Prompt Hash + +**Problem:** Cache doesn't invalidate when prompt changes. + +**Solution:** Include prompt hash in cache key. + +```rust +// src/llm/cache.rs + +impl LlmCache { + fn compute_key(content: &str, model: &str, prompt: &str) -> String { + let mut hasher = blake3::Hasher::new(); + hasher.update(content.as_bytes()); + hasher.update(model.as_bytes()); + hasher.update(prompt.as_bytes()); // NEW: prompt included + hasher.finalize().to_hex().to_string() + } +} +``` + +--- + +### 3. Bounded Concurrency + +**Problem:** Sequential execution is slow; unbounded parallelism hits rate limits. + +**Solution:** Tokio Semaphore with configurable concurrency. + +```rust +// src/eval/harness.rs + +use std::sync::Arc; +use tokio::sync::Semaphore; + +pub struct EvalHarness { + extractor: LlmExtractor, + semaphore: Arc, +} + +impl EvalHarness { + pub fn new(extractor: LlmExtractor, max_concurrent: usize) -> Self { + Self { + extractor, + semaphore: Arc::new(Semaphore::new(max_concurrent)), + } + } + + pub async fn run(&self, fixtures: Vec) -> EvalResult { + let handles: Vec<_> = fixtures + .into_iter() + .map(|fixture| { + let sem = self.semaphore.clone(); + let extractor = self.extractor.clone(); + tokio::spawn(async move { + let _permit = sem.acquire().await?; + Self::run_fixture(&extractor, fixture).await + }) + }) + .collect(); + + let results = futures::future::join_all(handles).await; + // aggregate... + } +} +``` + +**Default:** 5 concurrent (configurable via `eval.max_concurrent`) + +--- + +### 4. Rate Limit Resilience + +**Problem:** 429 errors cause evaluation failures. + +**Solution:** Exponential backoff with retries. + +```rust +// src/llm/client.rs + +impl GeminiClient { + async fn call_with_retry(&self, request: &Request) -> Result { + let mut delay = Duration::from_millis(500); + let max_retries = 5; + + for attempt in 0..max_retries { + match self.call(request).await { + Ok(response) => return Ok(response), + Err(e) if e.is_rate_limit() => { + if attempt == max_retries - 1 { + return Err(e); + } + tracing::warn!( + attempt, + delay_ms = delay.as_millis(), + "Rate limited, backing off" + ); + tokio::time::sleep(delay).await; + delay *= 2; + } + Err(e) => return Err(e), + } + } + unreachable!() + } +} +``` + +--- + +### 5. Fixture Format + +**Problem:** No standardized test cases to validate prompt changes. + +**Solution:** TOML fixtures with input, expected output, and rationale. + +```toml +# tests/llm_fixtures/tls/disabled_verification.toml + +[metadata] +id = "tls-001" +name = "TLS verification disabled in Python requests" +category = "tls" +language = "python" +created = "2026-02-05" + +[input] +# The code to analyze +content = ''' +import requests + +def fetch_data(url): + # Disable SSL verification for internal services + response = requests.get(url, verify=False) + return response.json() +''' + +[expected] +# What the LLM MUST extract (recall test) +must_contain = [ + { + subject = "tls/cert_verification", + predicate = "enabled", + value = false, + rationale = "requests.get(verify=False) explicitly disables TLS verification" + }, +] + +# What the LLM MUST NOT extract (precision test) +must_not_contain = [ + { subject = "tls/cert_verification", predicate = "enabled", value = true }, +] + +[scoring] +# How important is this fixture? +weight = 1.0 +# Expected minimum confidence from LLM +min_confidence = 0.8 +``` + +**ExpectedClaim with rationale:** + +```rust +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ExpectedClaim { + pub subject: String, + pub predicate: String, + pub value: serde_json::Value, + /// Optional explanation for why this claim is expected (shown on failure) + #[serde(default)] + pub rationale: Option, +} +``` + +**Fixture categories:** + +``` +tests/llm_fixtures/ +├── manifest.toml # Index + baseline metrics +├── tls/ # TLS/SSL fixtures +│ ├── disabled_verification.toml +│ ├── deprecated_version.toml +│ └── pinning_bypass.toml +├── jwt/ # JWT fixtures +├── secrets/ # Hardcoded secrets fixtures +├── auth/ # Auth bypass fixtures +├── negative/ # Safe code (expect NO claims) +│ ├── safe_tls.toml +│ └── env_var_secrets.toml +└── edge/ # Edge cases + ├── empty_file.toml + └── huge_file.toml +``` + +**Manifest:** + +```toml +# tests/llm_fixtures/manifest.toml + +[corpus] +version = "1.0.0" +total_fixtures = 35 + +[baseline] +# Known-good metrics from last successful run +precision = 0.85 +recall = 0.78 +f1 = 0.81 +prompt_version = "1.0.0" +model = "gemini-2.0-flash" +measured_at = "2026-02-05T10:30:00Z" +``` + +--- + +### 6. Evaluation Harness + +**Problem:** No way to run fixtures and compute metrics. + +**Solution:** Evaluation engine in `src/eval/`. + +```rust +// src/eval/mod.rs + +mod db; +mod fixture; +mod harness; +mod matcher; +mod metrics; +mod perturbation; +mod report; + +pub use db::EvalDatabase; +pub use fixture::{Fixture, FixtureLoader}; +pub use harness::{EvalConfig, EvalHarness, EvalResult}; +pub use metrics::{Metrics, CategoryMetrics}; +pub use perturbation::Perturbator; +pub use report::{Report, ReportFormat}; +``` + +**Core types:** + +```rust +// src/eval/harness.rs + +pub struct EvalConfig { + /// Path to fixtures directory. + pub fixtures_dir: PathBuf, + + /// Categories to run (None = all). + pub categories: Option>, + + /// Max fixtures to run (for smoke tests). + pub max_fixtures: Option, + + /// Evaluation mode. + pub mode: EvalMode, + + /// Baseline to compare against. + pub baseline: Option, + + /// Save observations to database. + pub save_observations: bool, + + /// Maximum concurrent LLM calls. + pub max_concurrent: usize, +} + +pub enum EvalMode { + /// Use real LLM API. + Live, + /// Use cached responses only (fails if not cached). + Cached, + /// Skip LLM, return empty claims (for testing harness). + Mock, + /// Perturbation testing for stability. + Robust, +} + +pub struct EvalResult { + pub run_id: Uuid, + pub started_at: DateTime, + pub completed_at: DateTime, + pub metrics: Metrics, + pub fixture_results: Vec, + pub baseline_comparison: Option, + pub verdict: EvalVerdict, + /// Stability score (only in Robust mode) + pub stability: Option, +} + +pub enum EvalVerdict { + Pass, + Regression { regressions: Vec }, + Error { message: String }, +} +``` + +**Metrics calculation:** + +```rust +// src/eval/metrics.rs + +pub struct Metrics { + /// True positives: expected claims that were extracted. + pub true_positives: usize, + /// False positives: extracted claims that weren't expected. + pub false_positives: usize, + /// False negatives: expected claims that weren't extracted. + pub false_negatives: usize, + + /// Precision = TP / (TP + FP) + pub precision: f64, + /// Recall = TP / (TP + FN) + pub recall: f64, + /// F1 = 2 * (P * R) / (P + R) + pub f1: f64, + + /// Total fixtures. + pub total_fixtures: usize, + /// Fixtures that passed. + pub passed: usize, + /// Fixtures that failed. + pub failed: usize, + + /// Total tokens used. + pub total_tokens: u64, + /// Estimated cost (USD). + pub estimated_cost: f64, + + /// By category. + pub by_category: HashMap, +} + +impl Metrics { + pub fn compute(results: &[FixtureResult]) -> Self { + let mut tp = 0; + let mut fp = 0; + let mut fn_ = 0; + + for result in results { + tp += result.true_positives; + fp += result.false_positives; + fn_ += result.false_negatives; + } + + let precision = if tp + fp > 0 { tp as f64 / (tp + fp) as f64 } else { 0.0 }; + let recall = if tp + fn_ > 0 { tp as f64 / (tp + fn_) as f64 } else { 0.0 }; + let f1 = if precision + recall > 0.0 { + 2.0 * precision * recall / (precision + recall) + } else { + 0.0 + }; + + // ... rest of computation + } +} +``` + +--- + +### 7. Hybrid Type-Coercive Matching + +**Problem:** Strict type matching misses semantically equivalent values. + +**Solution:** Coerce strings to booleans/numbers when reasonable. + +```rust +// src/eval/matcher.rs + +pub struct ClaimMatcher { + /// Tolerance for confidence comparison. + pub confidence_tolerance: f32, +} + +impl ClaimMatcher { + /// Check if extracted claims satisfy must_contain requirements. + pub fn check_must_contain( + &self, + extracted: &[ExtractedClaim], + expected: &[ExpectedClaim], + ) -> MatchResult { + let mut matched = vec![]; + let mut unmatched = vec![]; + + for exp in expected { + if let Some(claim) = self.find_matching_claim(extracted, exp) { + matched.push((exp.clone(), claim.clone())); + } else { + unmatched.push(exp.clone()); + } + } + + MatchResult { matched, unmatched } + } + + /// Check if any extracted claim matches (for must_not_contain). + pub fn check_must_not_contain( + &self, + extracted: &[ExtractedClaim], + forbidden: &[ExpectedClaim], + ) -> Vec<(ExpectedClaim, ExtractedClaim)> { + let mut violations = vec![]; + + for forbid in forbidden { + if let Some(claim) = self.find_matching_claim(extracted, forbid) { + violations.push((forbid.clone(), claim.clone())); + } + } + + violations + } + + fn find_matching_claim( + &self, + extracted: &[ExtractedClaim], + expected: &ExpectedClaim, + ) -> Option<&ExtractedClaim> { + extracted.iter().find(|claim| { + self.subject_matches(&claim.concept_path, &expected.subject) && + claim.predicate == expected.predicate && + self.value_matches(&claim.value, &expected.value) + }) + } + + fn subject_matches(&self, extracted: &str, expected: &str) -> bool { + // Allow matching on tail path (last 2 segments) + let ext_tail = extracted.split('/').rev().take(2).collect::>(); + let exp_tail = expected.split('/').rev().take(2).collect::>(); + ext_tail == exp_tail + } + + fn value_matches(&self, extracted: &ObjectValue, expected: &serde_json::Value) -> bool { + match (extracted, expected) { + // Direct matches + (ObjectValue::Boolean(e), serde_json::Value::Bool(x)) => e == x, + (ObjectValue::Number(e), serde_json::Value::Number(x)) => { + x.as_f64().map(|n| (e - n).abs() < 0.001).unwrap_or(false) + } + (ObjectValue::Text(e), serde_json::Value::String(x)) => e == x, + + // Coercion: string -> boolean + (ObjectValue::Boolean(e), serde_json::Value::String(s)) => { + self.coerce_to_bool(s).map(|b| *e == b).unwrap_or(false) + } + (ObjectValue::Text(e), serde_json::Value::Bool(x)) => { + self.coerce_to_bool(e).map(|b| b == *x).unwrap_or(false) + } + + // Coercion: string -> number + (ObjectValue::Number(e), serde_json::Value::String(s)) => { + s.parse::().map(|n| (e - n).abs() < 0.001).ok().unwrap_or(false) + } + + _ => false, + } + } + + fn coerce_to_bool(&self, s: &str) -> Option { + match s.to_lowercase().as_str() { + "true" | "yes" | "on" | "enabled" | "1" => Some(true), + "false" | "no" | "off" | "disabled" | "0" => Some(false), + _ => None, + } + } +} +``` + +--- + +### 8. Perturbation Testing Mode + +**Problem:** Need to verify LLM consistency across minor input variations. + +**Solution:** Perturbation mode that tests stability. + +```rust +// src/eval/perturbation.rs + +use crate::Language; + +pub struct Perturbator; + +impl Perturbator { + /// Generate perturbed variants of input content + pub fn perturb(content: &str, language: Language) -> Vec { + let mut variants = vec![content.to_string()]; + variants.push(Self::add_trailing_whitespace(content)); + variants.push(Self::normalize_indentation(content)); + variants.push(Self::add_innocuous_comments(content, language)); + variants.push(Self::remove_comments(content, language)); + variants + } + + fn add_trailing_whitespace(content: &str) -> String { + content + .lines() + .map(|line| format!("{} ", line)) + .collect::>() + .join("\n") + } + + fn normalize_indentation(content: &str) -> String { + // Convert tabs to spaces or vice versa + content.replace('\t', " ") + } + + fn add_innocuous_comments(content: &str, language: Language) -> String { + let comment_prefix = match language { + Language::Python => "#", + Language::JavaScript | Language::TypeScript | Language::Rust | Language::Go => "//", + _ => "#", + }; + format!("{} Auto-generated file\n{}", comment_prefix, content) + } + + fn remove_comments(content: &str, language: Language) -> String { + // Simple single-line comment removal + let comment_prefix = match language { + Language::Python => "#", + Language::JavaScript | Language::TypeScript | Language::Rust | Language::Go => "//", + _ => "#", + }; + content + .lines() + .filter(|line| !line.trim().starts_with(comment_prefix)) + .collect::>() + .join("\n") + } +} +``` + +**Stability metric:** % of perturbations producing identical claims. + +CLI: `aphoria eval run --mode robust` + +--- + +### 9. Structured Decoding (Gemini Response Schema) + +**Problem:** Free-form JSON parsing can fail. + +**Solution:** Use Gemini's `response_schema` for guaranteed JSON structure. + +```rust +// src/llm/client.rs + +impl GeminiClient { + fn build_request(&self, content: &str, prompt: &str) -> Request { + Request { + contents: vec![Content { + role: "user".to_string(), + parts: vec![Part::Text(content.to_string())], + }], + generation_config: GenerationConfig { + temperature: 0.1, + response_mime_type: "application/json".to_string(), + response_schema: Some(self.claims_schema()), + }, + } + } + + fn claims_schema(&self) -> Schema { + Schema { + type_: "object".to_string(), + properties: hashmap! { + "claims".to_string() => Schema { + type_: "array".to_string(), + items: Some(Box::new(Schema { + type_: "object".to_string(), + properties: hashmap! { + "subject".to_string() => Schema { type_: "string".to_string(), ..Default::default() }, + "predicate".to_string() => Schema { type_: "string".to_string(), ..Default::default() }, + "value".to_string() => Schema { type_: "any".to_string(), ..Default::default() }, + "confidence".to_string() => Schema { type_: "number".to_string(), ..Default::default() }, + "line".to_string() => Schema { type_: "integer".to_string(), ..Default::default() }, + }, + required: vec!["subject", "predicate", "value", "confidence"], + ..Default::default() + })), + ..Default::default() + }, + }, + required: vec!["claims".to_string()], + ..Default::default() + } + } +} +``` + +**Benefit:** Eliminates JSON parse failures. + +--- + +### 10. Synthetic Corpus Generation + +**Problem:** Manual fixture creation is slow. + +**Solution:** Generate fixtures from real scans with human review. + +```bash +aphoria eval generate-corpus \ + --scan-path /path/to/codebase \ + --output-dir tests/llm_fixtures/synthetic \ + --sample-size 50 +``` + +```rust +// src/eval/corpus.rs + +pub struct CorpusGenerator { + extractor: LlmExtractor, +} + +impl CorpusGenerator { + pub async fn generate( + &self, + scan_path: &Path, + output_dir: &Path, + sample_size: usize, + ) -> Result> { + let findings = self.extractor.scan(scan_path).await?; + let sample = self.stratified_sample(&findings, sample_size); + let mut fixtures = vec![]; + for finding in sample { + fixtures.push(self.create_fixture(&finding, output_dir)?); + } + Ok(fixtures) + } + + fn stratified_sample(&self, findings: &[Finding], size: usize) -> Vec<&Finding> { + // Sample proportionally from each category + let mut by_category: HashMap<&str, Vec<&Finding>> = HashMap::new(); + for f in findings { + by_category.entry(&f.category).or_default().push(f); + } + + let per_category = size / by_category.len().max(1); + let mut sample = vec![]; + for (_, items) in by_category { + sample.extend(items.iter().take(per_category)); + } + sample.truncate(size); + sample + } + + fn create_fixture(&self, finding: &Finding, output_dir: &Path) -> Result { + let fixture = Fixture { + metadata: FixtureMetadata { + id: format!("auto-{}", Uuid::new_v4()), + name: finding.description.clone(), + category: finding.category.clone(), + language: finding.language.to_string(), + created: Utc::now().date_naive().to_string(), + }, + input: FixtureInput { + content: finding.code_snippet.clone(), + }, + expected: FixtureExpected { + must_contain: vec![ExpectedClaim { + subject: finding.subject.clone(), + predicate: finding.predicate.clone(), + value: finding.value.clone(), + rationale: Some("Auto-generated - requires human review".to_string()), + }], + must_not_contain: vec![], + }, + scoring: FixtureScoring { + weight: 1.0, + min_confidence: 0.7, + }, + }; + + let path = output_dir + .join(&finding.category) + .join(format!("{}.toml", fixture.metadata.id)); + std::fs::create_dir_all(path.parent().unwrap())?; + std::fs::write(&path, toml::to_string_pretty(&fixture)?)?; + Ok(path) + } +} +``` + +**Workflow:** Scan -> Human review -> Commit to corpus + +--- + +### 11. CLI Commands + +**Problem:** No way to run evaluations from command line. + +**Solution:** Add `aphoria eval` subcommand. + +```rust +// src/cli.rs additions + +#[derive(Subcommand)] +pub enum Commands { + // ... existing commands ... + + /// Evaluate LLM prompt effectiveness + Eval { + #[command(subcommand)] + command: EvalCommands, + }, +} + +#[derive(Subcommand)] +pub enum EvalCommands { + /// Run evaluation against fixtures + Run { + /// Path to fixtures directory + #[arg(long, default_value = "tests/llm_fixtures")] + fixtures: PathBuf, + + /// Categories to run (comma-separated) + #[arg(long)] + categories: Option, + + /// Maximum fixtures to run + #[arg(long)] + max_fixtures: Option, + + /// Evaluation mode: live, cached, mock, robust + #[arg(long, default_value = "cached")] + mode: String, + + /// Baseline file to compare against + #[arg(long)] + baseline: Option, + + /// Exit with code 1 if regression detected + #[arg(long)] + fail_on_regression: bool, + + /// Regression threshold (default: 0.05 = 5%) + #[arg(long, default_value = "0.05")] + threshold: f64, + + /// Save observation logs + #[arg(long)] + save_observations: bool, + + /// Output format: table, json, markdown + #[arg(long, default_value = "table")] + format: String, + }, + + /// Show current baseline metrics + Baseline { + /// Path to fixtures directory + #[arg(long, default_value = "tests/llm_fixtures")] + fixtures: PathBuf, + }, + + /// Update baseline from latest run + UpdateBaseline { + /// Run ID to use as new baseline + #[arg(long)] + run_id: Option, + + /// Path to fixtures directory + #[arg(long, default_value = "tests/llm_fixtures")] + fixtures: PathBuf, + + /// Required - prevents accidental baseline overwrites + #[arg(long, required = true)] + force: bool, + }, + + /// List fixtures + ListFixtures { + /// Path to fixtures directory + #[arg(long, default_value = "tests/llm_fixtures")] + fixtures: PathBuf, + + /// Filter by category + #[arg(long)] + category: Option, + }, + + /// Validate fixture format + ValidateFixtures { + /// Path to fixtures directory + #[arg(long, default_value = "tests/llm_fixtures")] + fixtures: PathBuf, + }, + + /// Generate fixtures from real scans + GenerateCorpus { + /// Path to codebase to scan + #[arg(long)] + scan_path: PathBuf, + + /// Output directory for generated fixtures + #[arg(long)] + output_dir: PathBuf, + + /// Number of fixtures to generate + #[arg(long, default_value = "50")] + sample_size: usize, + }, +} +``` + +**Usage examples:** + +```bash +# Run smoke test (cached responses, fast) +aphoria eval run --mode cached --max-fixtures 10 + +# Run full evaluation (live API calls) +aphoria eval run --mode live --save-observations + +# Run with baseline comparison +aphoria eval run --baseline tests/llm_fixtures/manifest.toml --fail-on-regression + +# Run perturbation testing +aphoria eval run --mode robust --max-fixtures 5 + +# Show current baseline +aphoria eval baseline + +# Update baseline (requires --force) +aphoria eval update-baseline --force + +# List fixtures +aphoria eval list-fixtures --category tls + +# Validate fixture format +aphoria eval validate-fixtures + +# Generate fixtures from real codebase +aphoria eval generate-corpus --scan-path ./my-project --output-dir ./test-fixtures +``` + +**Baseline safety:** Without `--force`, update-baseline shows: +``` +Current baseline: precision=0.85, recall=0.78, f1=0.81 (2026-02-05) +To update, re-run with --force +``` + +--- + +### 12. Report Output + +**Problem:** Need human-readable and machine-readable output. + +**Solution:** Multiple report formats. + +**Table format (default):** + +``` +╭────────────────────────────────────────────────────────────────────╮ +│ LLM Prompt Evaluation Report │ +├────────────────────────────────────────────────────────────────────┤ +│ Run ID: abc123-def456 │ +│ Date: 2026-02-05 14:30:00 UTC │ +│ Prompt: v1.0.0 │ +│ Model: gemini-2.0-flash │ +╰────────────────────────────────────────────────────────────────────╯ + +Summary +╭──────────┬─────────┬──────────┬────────┬────────╮ +│ Metric │ Current │ Baseline │ Delta │ Status │ +├──────────┼─────────┼──────────┼────────┼────────┤ +│ Precision│ 0.87 │ 0.85 │ +0.02 │ ✓ │ +│ Recall │ 0.76 │ 0.78 │ -0.02 │ ⚠ │ +│ F1 │ 0.81 │ 0.81 │ +0.00 │ ✓ │ +╰──────────┴─────────┴──────────┴────────┴────────╯ + +Verdict: ⚠ REVIEW - Recall dropped by 2% + +Category Breakdown +╭──────────┬──────────┬────────┬────────╮ +│ Category │ Fixtures │ Passed │ Failed │ +├──────────┼──────────┼────────┼────────┤ +│ tls │ 12 │ 11 │ 1 │ +│ jwt │ 8 │ 6 │ 2 │ +│ secrets │ 15 │ 14 │ 1 │ +│ negative │ 10 │ 10 │ 0 │ +╰──────────┴──────────┴────────┴────────╯ + +Regressions (2) +- jwt-003: JWT algorithm none detection + Expected: jwt/algorithm = "none" + Rationale: alg:"none" bypasses signature verification entirely + Got: Not extracted + +- tls-007: TLS version in constants (IMPROVED) + Previously: Not extracted + Now: tls/min_version = "1.0" ✓ + +Cost: 125,430 tokens ($0.12) +``` + +**JSON format:** + +```json +{ + "run_id": "abc123-def456", + "timestamp": "2026-02-05T14:30:00Z", + "prompt_version": "1.0.0", + "model": "gemini-2.0-flash", + "metrics": { + "precision": 0.87, + "recall": 0.76, + "f1": 0.81, + "total_fixtures": 45, + "passed": 41, + "failed": 4 + }, + "baseline_comparison": { + "precision_delta": 0.02, + "recall_delta": -0.02, + "has_regression": true, + "regression_threshold": 0.05 + }, + "stability": 0.92, + "verdict": "review", + "fixture_results": [...] +} +``` + +--- + +## Implementation Plan + +### Phase 1: Core Infrastructure (2 days) + +| Task | File | Description | +|------|------|-------------| +| 1.1 | `src/eval/db.rs` | SQLite database with retention | +| 1.2 | `src/llm/cache.rs` | Update cache key to include prompt hash | +| 1.3 | `src/llm/client.rs` | Exponential backoff for 429s | + +**Acceptance:** Database stores observations, cache invalidates on prompt change, rate limits handled gracefully. + +### Phase 2: Fixture & Matching (2 days) + +| Task | File | Description | +|------|------|-------------| +| 2.1 | `src/eval/fixture.rs` | Define `Fixture`, `ExpectedClaim` with rationale | +| 2.2 | `src/eval/matcher.rs` | Hybrid type-coercive matching | +| 2.3 | `tests/llm_fixtures/` | Create 10 seed fixtures | +| 2.4 | `src/eval/fixture.rs` | Add `FixtureLoader` | + +**Acceptance:** Can load fixtures from TOML, matching handles type coercion. + +### Phase 3: Evaluation Harness (2 days) + +| Task | File | Description | +|------|------|-------------| +| 3.1 | `src/eval/harness.rs` | Bounded concurrency with Semaphore | +| 3.2 | `src/eval/metrics.rs` | Implement `Metrics::compute()` | +| 3.3 | `src/eval/harness.rs` | Baseline comparison | +| 3.4 | `src/eval/perturbation.rs` | Perturbation testing | + +**Acceptance:** Can run fixtures with bounded parallelism, compute precision/recall, measure stability. + +### Phase 4: Structured Decoding (1 day) + +| Task | File | Description | +|------|------|-------------| +| 4.1 | `src/llm/client.rs` | Gemini Response Schema integration | + +**Acceptance:** LLM always returns valid JSON, no parse failures. + +### Phase 5: CLI & Corpus (2 days) + +| Task | File | Description | +|------|------|-------------| +| 5.1 | `src/cli.rs` | Add `EvalCommands` with `--force`, `--mode robust` | +| 5.2 | `src/handlers/eval.rs` | Implement all eval command handlers | +| 5.3 | `src/eval/corpus.rs` | Corpus generation from scans | + +**Acceptance:** `aphoria eval run` works end-to-end, corpus generation functional. + +### Phase 6: Reports & Polish (2 days) + +| Task | File | Description | +|------|------|-------------| +| 6.1 | `src/eval/report.rs` | Table/JSON formats with rationale in failures | +| 6.2 | `src/eval/report.rs` | Stability metrics display | +| 6.3 | `tests/llm_fixtures/` | Expand to 25+ fixtures | + +**Acceptance:** Reports show rationale on missed claims, stability metrics visible. + +**Total:** 11 days + +--- + +## Fixture Seed List + +Initial 10 fixtures to create: + +| ID | Category | Name | Tests | +|----|----------|------|-------| +| tls-001 | tls | Disabled verification (requests) | `verify=False` | +| tls-002 | tls | Deprecated TLS version | `min_version="TLSv1"` | +| jwt-001 | jwt | Algorithm none | `alg: "none"` | +| jwt-002 | jwt | Skip signature verification | `verify=False` | +| secrets-001 | secrets | Hardcoded API key | `API_KEY = "sk_..."` | +| secrets-002 | secrets | High entropy token | Shannon entropy > 4.5 | +| auth-001 | auth | Debug auth bypass | `X-Debug-Auth` header | +| negative-001 | negative | Safe TLS config | `verify=True` (no claims) | +| negative-002 | negative | Env var secrets | `os.getenv()` (no claims) | +| edge-001 | edge | Empty file | Empty content (no claims) | + +--- + +## Configuration + +Add to `aphoria.toml`: + +```toml +[eval] +# Save observations during scans +save_observations = false + +# SQLite database path +database_path = "~/.aphoria/eval/observations.db" + +# Default fixtures directory +fixtures_dir = "tests/llm_fixtures" + +# Regression threshold (5% = 0.05) +regression_threshold = 0.05 + +# Maximum concurrent LLM calls +max_concurrent = 5 + +# Retention: days to keep observations +retention_days = 30 + +# Retention: max observations to keep regardless of age +retention_max_count = 1000 + +# Rate limit: initial backoff delay (ms) +rate_limit_initial_delay_ms = 500 + +# Rate limit: max retries before failing +rate_limit_max_retries = 5 +``` + +--- + +## Success Criteria + +| Metric | Target | +|--------|--------| +| Can run `aphoria eval run` | Works | +| Baseline comparison | Detects 5% regression | +| Fixtures load correctly | 100% valid fixtures load | +| Metrics match manual calculation | Within 0.01 | +| Report is readable | Human-verified | +| Type coercion works | "true" matches true | +| Perturbation mode | Stability metric computed | +| Rate limit handling | Survives 429 burst | + +--- + +## File Structure After Implementation + +``` +applications/aphoria/ +├── src/ +│ ├── eval/ +│ │ ├── mod.rs +│ │ ├── db.rs # SQLite storage +│ │ ├── corpus.rs # Synthetic fixture generation +│ │ ├── fixture.rs # Fixture loading +│ │ ├── harness.rs # Evaluation engine +│ │ ├── matcher.rs # Claim matching (type-coercive) +│ │ ├── metrics.rs # Precision/recall +│ │ ├── perturbation.rs # Perturbation testing +│ │ └── report.rs # Output formatting +│ ├── llm/ +│ │ ├── observation.rs # Observation logging +│ │ └── ... +│ ├── handlers/ +│ │ ├── eval.rs # Eval command handlers +│ │ └── ... +│ └── ... +└── tests/ + └── llm_fixtures/ + ├── manifest.toml + ├── tls/ + ├── jwt/ + ├── secrets/ + ├── auth/ + ├── negative/ + └── edge/ +``` + +--- + +## Verification + +```bash +# Build +cargo build -p aphoria + +# Test +cargo test -p aphoria + +# SQLite retention check +sqlite3 ~/.aphoria/eval/observations.db "SELECT COUNT(*) FROM observations" + +# Bounded concurrency (watch logs) +RUST_LOG=debug aphoria eval run --mode live 2>&1 | grep "permit" + +# Perturbation mode +aphoria eval run --mode robust --max-fixtures 5 + +# Corpus generation +aphoria eval generate-corpus --scan-path ./test-project --output-dir ./test-fixtures +``` + +--- + +## Open Questions Resolved + +| Question | Decision | +|----------|----------| +| Baseline storage | In `manifest.toml` (simple, versioned with fixtures) | +| Observation storage | SQLite with 30-day/1000-count retention | +| Matching strictness | Tail-path + type-coercive matching | +| Mock vs Live in CI | Cached mode for PR, live for manual | +| Parallelism | Bounded (5 default) via Tokio Semaphore | +| Baseline safety | Requires `--force` flag | +| Structured output | Gemini Response Schema | + +--- + +*Ready for implementation.* diff --git a/applications/aphoria/docs/architecture/llm-prompt-evaluation.md b/applications/aphoria/docs/architecture/llm-prompt-evaluation.md new file mode 100644 index 0000000..32ab2f3 --- /dev/null +++ b/applications/aphoria/docs/architecture/llm-prompt-evaluation.md @@ -0,0 +1,1063 @@ +# LLM Prompt Evaluation System + +> **Status:** Proposed (2026-02-05) +> **Phase:** 7.8 (extends Phase 7.5 LLM-in-the-Loop Extraction) +> **Author:** Architecture Team + +--- + +## Problem Statement + +Aphoria's LLM-powered claim extraction (Phase 7.5) uses Gemini to extract security claims from high-value code files. The prompts that drive this extraction are effectively **code that we don't treat like code**: + +| Aspect | Traditional Code | Current Prompts | +| -------------------- | ---------------------- | ------------------------------------ | +| Version Control | Git commits | In files, but no semantic versioning | +| Testing | Unit/integration tests | None | +| Metrics | Coverage, performance | None | +| Regression Detection | CI failures | None | +| Quality Gates | Linting, review | None | + +**The result:** When we change a prompt, we have no systematic way to know if we made things better or worse. We're flying blind. + +### Enterprise Requirements + +For enterprise adoption, customers need assurance that: + +1. **Prompts produce consistent, high-quality results** - Not random outputs +2. **Changes are validated before deployment** - Regressions are caught +3. **Performance is measurable** - Precision, recall, cost are tracked +4. **The system improves over time** - With evidence, not hope + +--- + +## Goals + +### Primary Goals + +1. **Observability** - Understand prompt effectiveness through metrics and logging +2. **Testability** - Validate prompts against known scenarios with expected outcomes +3. **Repeatability** - Run evaluations consistently across environments +4. **Automation** - Scheduled jobs that detect regressions without human intervention + +### Non-Goals (Phase 7.8) + +- Real-time prompt optimization (future: Phase 9) +- A/B testing in production (future: Phase 9) +- Multi-model comparison (future) +- Prompt compression/optimization (future) + +--- + +## Architecture Overview + +``` +┌──────────────────────────────────────────────────────────────────────────────┐ +│ LLM Prompt Evaluation System │ +├──────────────────────────────────────────────────────────────────────────────┤ +│ │ +│ ┌─────────────────┐ │ +│ │ Golden Corpus │ Test fixtures with expected outcomes │ +│ │ (fixtures/) │ - Code snippets │ +│ │ │ - Expected claims (must-contain, must-not-contain) │ +│ └────────┬────────┘ - Metadata (language, category, difficulty) │ +│ │ │ +│ ▼ │ +│ ┌─────────────────┐ │ +│ │ Evaluation │ Orchestrates test runs │ +│ │ Harness │ - Loads fixtures │ +│ │ │ - Invokes LLM Extractor │ +│ │ │ - Compares outputs │ +│ │ │ - Computes metrics │ +│ └────────┬────────┘ │ +│ │ │ +│ ├──────────────────────┐ │ +│ │ │ │ +│ ▼ ▼ │ +│ ┌─────────────────┐ ┌─────────────────┐ │ +│ │ LLM Extractor │ │ Observation │ │ +│ │ (instrumented) │ │ Log │ │ +│ │ │ │ │ │ +│ │ - Same code │ │ - Prompt ver │ │ +│ │ path as │ │ - Input hash │ │ +│ │ production │ │ - Output │ │ +│ │ │ │ - Latency │ │ +│ │ │ │ - Tokens │ │ +│ └─────────────────┘ │ - Model │ │ +│ │ - Timestamp │ │ +│ └────────┬────────┘ │ +│ │ │ +│ ▼ │ +│ ┌─────────────────────────────────────────────────────────────────────────┐ │ +│ │ Metrics & Reports │ │ +│ │ │ │ +│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ +│ │ │ Precision │ │ Recall │ │ F1 Score │ │ Cost │ │ │ +│ │ │ TP/(TP+FP) │ │ TP/(TP+FN) │ │ Harmonic │ │ Tokens/$ │ │ │ +│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ +│ │ │ │ +│ │ ┌────────────────────────────────────────────────────────────────────┐ │ │ +│ │ │ Regression Report │ │ │ +│ │ │ - Comparison against baseline │ │ │ +│ │ │ - Per-fixture deltas │ │ │ +│ │ │ - Category breakdown │ │ │ +│ │ │ - Recommendations │ │ │ +│ │ └────────────────────────────────────────────────────────────────────┘ │ │ +│ └─────────────────────────────────────────────────────────────────────────┘ │ +│ │ +└──────────────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## Core Components + +### 1. Golden Corpus + +A curated set of test fixtures with known expected outcomes. + +#### Fixture Format + +```toml +# fixtures/tls/disabled_verification.toml + +[metadata] +id = "tls-001" +name = "TLS verification disabled in Python requests" +category = "tls" +subcategory = "certificate_verification" +language = "python" +difficulty = "easy" # easy | medium | hard +source = "hand-curated" # hand-curated | production-capture | synthetic +created = "2026-02-05" +updated = "2026-02-05" + +[input] +filename = "client.py" +content = ''' +import requests + +def fetch_data(url): + # Disable SSL verification for internal services + response = requests.get(url, verify=False) + return response.json() +''' + +[expected] +# Claims that MUST be extracted (recall) +must_contain = [ + { subject = "tls/cert_verification", predicate = "enabled", value = false }, +] + +# Claims that MUST NOT be extracted (precision) +must_not_contain = [ + { subject = "tls/cert_verification", predicate = "enabled", value = true }, +] + +# Optional: acceptable alternate formulations +acceptable_variants = [ + { subject = "ssl/verify", predicate = "enabled", value = false }, + { subject = "requests/ssl_verify", predicate = "value", value = false }, +] + +[scoring] +# How to score this fixture +weight = 1.0 # Importance multiplier +min_confidence = 0.7 # Expected minimum confidence +``` + +#### Corpus Organization + +``` +applications/aphoria/tests/llm_fixtures/ +├── README.md # Corpus documentation +├── manifest.toml # Index of all fixtures +├── tls/ +│ ├── disabled_verification.toml +│ ├── deprecated_version.toml +│ └── pinning_bypass.toml +├── jwt/ +│ ├── alg_none.toml +│ ├── skip_signature.toml +│ └── hardcoded_secret.toml +├── secrets/ +│ ├── api_key_in_code.toml +│ ├── password_hardcoded.toml +│ └── high_entropy_token.toml +├── auth/ +│ ├── bypass_pattern.toml +│ └── debug_header.toml +├── negative/ # Files that should NOT trigger claims +│ ├── safe_tls_config.toml +│ ├── proper_jwt_validation.toml +│ └── env_var_secrets.toml +└── edge_cases/ + ├── empty_file.toml + ├── binary_content.toml + ├── huge_file.toml + └── mixed_languages.toml +``` + +#### Manifest Structure + +```toml +# manifest.toml + +[corpus] +version = "1.0.0" +created = "2026-02-05" +description = "Golden corpus for LLM extraction evaluation" + +[categories] +tls = { fixtures = 12, description = "TLS/SSL configuration" } +jwt = { fixtures = 8, description = "JWT authentication" } +secrets = { fixtures = 15, description = "Hardcoded secrets" } +auth = { fixtures = 6, description = "Authentication bypass" } +negative = { fixtures = 10, description = "Safe code (no claims expected)" } +edge_cases = { fixtures = 5, description = "Boundary conditions" } + +[baseline] +# Current known-good metrics +precision = 0.85 +recall = 0.78 +f1 = 0.81 +total_fixtures = 56 +last_updated = "2026-02-05" +prompt_version = "1.0.0" +model = "gemini-2.0-flash" +``` + +--- + +### 2. Observation Log + +Every LLM extraction is logged with full context for replay and analysis. + +#### Log Entry Schema + +```rust +/// A single observation from an LLM extraction +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ExtractionObservation { + /// Unique identifier for this observation + pub id: Uuid, + + /// When this extraction occurred + pub timestamp: DateTime, + + /// Prompt version (semantic version) + pub prompt_version: String, + + /// Model identifier (e.g., "gemini-2.0-flash") + pub model: String, + + /// BLAKE3 hash of input content (for deduplication) + pub input_hash: String, + + /// Input metadata + pub input: ExtractionInput, + + /// Output from LLM + pub output: ExtractionOutput, + + /// Performance metrics + pub metrics: ExtractionMetrics, + + /// Evaluation context (if run during evaluation) + pub evaluation: Option, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ExtractionInput { + /// Filename (may be anonymized) + pub filename: String, + + /// Language detected + pub language: String, + + /// Content length in bytes + pub content_length: usize, + + /// Content preview (first 500 chars, for debugging) + pub content_preview: Option, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ExtractionOutput { + /// Raw LLM response + pub raw_response: String, + + /// Parsed claims (may be empty if parsing failed) + pub claims: Vec, + + /// Whether parsing succeeded + pub parse_success: bool, + + /// Parse error if any + pub parse_error: Option, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct ExtractionMetrics { + /// Total latency (API call + processing) + pub latency_ms: u64, + + /// API latency only + pub api_latency_ms: u64, + + /// Input tokens + pub input_tokens: u32, + + /// Output tokens + pub output_tokens: u32, + + /// Total tokens + pub total_tokens: u32, + + /// Estimated cost (USD) + pub estimated_cost_usd: f64, + + /// Cache hit (if response was cached) + pub cache_hit: bool, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct EvaluationContext { + /// Fixture ID if from golden corpus + pub fixture_id: Option, + + /// Evaluation run ID + pub run_id: Uuid, + + /// Whether this matched expected output + pub matched_expected: bool, + + /// Detailed match results + pub match_details: MatchDetails, +} +``` + +#### Log Storage + +``` +~/.aphoria/eval/ +├── observations/ +│ ├── 2026-02-05/ +│ │ ├── 001_tls-001_success.json +│ │ ├── 002_jwt-003_partial.json +│ │ └── ... +│ └── 2026-02-04/ +│ └── ... +├── runs/ +│ ├── run_abc123.json # Full evaluation run metadata +│ └── run_def456.json +└── baselines/ + ├── v1.0.0.json # Baseline for prompt v1.0.0 + └── latest.json # Symlink to current baseline +``` + +--- + +### 3. Evaluation Harness + +The core engine that runs evaluations. + +#### Public API + +```rust +/// Configuration for an evaluation run +#[derive(Debug, Clone)] +pub struct EvalConfig { + /// Path to fixtures directory + pub fixtures_dir: PathBuf, + + /// Which categories to evaluate (None = all) + pub categories: Option>, + + /// Maximum fixtures to run (for quick smoke tests) + pub max_fixtures: Option, + + /// Whether to use real LLM or mock + pub mode: EvalMode, + + /// Baseline to compare against + pub baseline: Option, + + /// Output directory for results + pub output_dir: PathBuf, + + /// Whether to save observations + pub save_observations: bool, + + /// Parallelism (concurrent LLM calls) + pub parallelism: usize, +} + +#[derive(Debug, Clone)] +pub enum EvalMode { + /// Use real LLM API (costs money, tests actual prompt) + Live { + model: String, + temperature: f32, + }, + /// Use cached responses (fast, deterministic, for CI) + Cached, + /// Use mock responses (for testing harness itself) + Mock, +} + +/// Result of an evaluation run +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct EvalResult { + /// Unique run identifier + pub run_id: Uuid, + + /// When the run started + pub started_at: DateTime, + + /// When the run completed + pub completed_at: DateTime, + + /// Configuration used + pub config: EvalConfigSummary, + + /// Aggregate metrics + pub metrics: AggregateMetrics, + + /// Per-fixture results + pub fixture_results: Vec, + + /// Comparison with baseline (if baseline provided) + pub baseline_comparison: Option, + + /// Overall verdict + pub verdict: EvalVerdict, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct AggregateMetrics { + /// Precision: TP / (TP + FP) + pub precision: f64, + + /// Recall: TP / (TP + FN) + pub recall: f64, + + /// F1 score: 2 * (P * R) / (P + R) + pub f1: f64, + + /// Total fixtures evaluated + pub total_fixtures: usize, + + /// Fixtures that passed + pub passed: usize, + + /// Fixtures that failed + pub failed: usize, + + /// Fixtures that errored (LLM call failed, parse failed, etc.) + pub errored: usize, + + /// Total cost (USD) + pub total_cost_usd: f64, + + /// Total tokens used + pub total_tokens: u64, + + /// Average latency (ms) + pub avg_latency_ms: f64, + + /// Per-category breakdown + pub by_category: HashMap, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub enum EvalVerdict { + /// All checks passed + Pass, + /// Some regressions detected + Regression { details: Vec }, + /// Evaluation failed (errors prevented completion) + Error { message: String }, +} +``` + +#### Claim Matching Logic + +```rust +/// How to match extracted claims against expected claims +pub struct ClaimMatcher { + /// Tolerance for confidence comparison + pub confidence_tolerance: f32, + + /// Whether to normalize concept paths before comparison + pub normalize_paths: bool, + + /// Predicate aliases (e.g., "enabled" == "active" == "on") + pub predicate_aliases: HashMap>, + + /// Value equivalences (e.g., true == "true" == "yes" == 1) + pub value_equivalences: Vec>, +} + +impl ClaimMatcher { + /// Check if extracted claims satisfy must_contain requirements + pub fn check_must_contain( + &self, + extracted: &[ExtractedClaim], + expected: &[ExpectedClaim], + ) -> MatchResult { + // For each expected claim: + // 1. Find matching extracted claim (subject + predicate match) + // 2. Check value compatibility + // 3. Check confidence threshold + // Return: matched, unmatched, partial matches + } + + /// Check if extracted claims violate must_not_contain requirements + pub fn check_must_not_contain( + &self, + extracted: &[ExtractedClaim], + forbidden: &[ExpectedClaim], + ) -> MatchResult { + // For each forbidden claim: + // 1. Check if any extracted claim matches + // 2. Flag violations + } +} +``` + +--- + +### 4. Prompt Versioning + +Prompts are versioned to track changes and correlate with metrics. + +#### Version Schema + +```rust +/// Prompt version identifier +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq, Hash)] +pub struct PromptVersion { + /// Semantic version (major.minor.patch) + pub version: String, + + /// BLAKE3 hash of prompt content + pub content_hash: String, + + /// When this version was created + pub created_at: DateTime, + + /// Description of changes from previous version + pub changelog: Option, +} + +impl PromptVersion { + /// Compute version from prompt content + pub fn from_prompt(prompt: &str, changelog: Option) -> Self { + let content_hash = blake3::hash(prompt.as_bytes()).to_hex().to_string(); + // Version is computed or provided externally + Self { + version: "0.0.0".to_string(), // Placeholder + content_hash, + created_at: Utc::now(), + changelog, + } + } +} +``` + +#### Prompt File Structure + +````rust +// llm/prompt.rs + +/// Current prompt version +pub const PROMPT_VERSION: &str = "1.2.0"; + +/// Changelog for current version +pub const PROMPT_CHANGELOG: &str = "Improved JWT claim extraction accuracy"; + +/// The extraction prompt +pub const EXTRACTION_PROMPT: &str = r#" +You are a security analyst extracting implicit security claims from code. + +Given the following code file, identify any security-relevant configurations, +settings, or patterns. For each finding, output a JSON object with: + +- subject: The concept path (e.g., "tls/cert_verification") +- predicate: The aspect being claimed (e.g., "enabled") +- value: The value found (boolean, string, or number) +- confidence: Your confidence in this extraction (0.0 to 1.0) +- description: Brief explanation + +Focus on: +- TLS/SSL configuration +- Authentication settings +- Cryptographic choices +- Secret/credential handling +- Input validation +- Authorization patterns + +Output as a JSON array. If no security claims are found, output an empty array. + +Code: +```{language} +{content} +```` + +"#; + +```` + +--- + +### 5. Metrics & Reporting + +#### Metrics Computed + +| Metric | Formula | Purpose | +|--------|---------|---------| +| **Precision** | TP / (TP + FP) | Are we avoiding false positives? | +| **Recall** | TP / (TP + FN) | Are we finding all issues? | +| **F1 Score** | 2 * (P * R) / (P + R) | Balanced measure | +| **Confidence Calibration** | Correlation(confidence, correctness) | Are high-confidence claims actually correct? | +| **Parse Success Rate** | Successful parses / Total extractions | Is the prompt producing valid JSON? | +| **Cost per Fixture** | Total cost / Fixtures | Budget tracking | +| **Latency P50/P95** | Percentiles | Performance tracking | + +#### Regression Detection + +```rust +/// Compare current metrics against baseline +pub struct BaselineComparison { + /// Current metrics + pub current: AggregateMetrics, + + /// Baseline metrics + pub baseline: AggregateMetrics, + + /// Deltas + pub precision_delta: f64, + pub recall_delta: f64, + pub f1_delta: f64, + + /// Regression thresholds + pub regression_threshold: f64, // e.g., 0.05 = 5% drop + + /// Fixtures that regressed + pub regressed_fixtures: Vec, + + /// Fixtures that improved + pub improved_fixtures: Vec, +} + +impl BaselineComparison { + pub fn has_regression(&self) -> bool { + self.precision_delta < -self.regression_threshold || + self.recall_delta < -self.regression_threshold || + self.f1_delta < -self.regression_threshold + } +} +```` + +#### Report Format + +```markdown +# Prompt Evaluation Report + +**Run ID:** abc123 +**Date:** 2026-02-05 14:30:00 UTC +**Prompt Version:** 1.2.0 +**Model:** gemini-2.0-flash + +## Summary + +| Metric | Current | Baseline | Delta | Status | +| ------------- | ------- | -------- | ----- | ------ | +| Precision | 0.87 | 0.85 | +0.02 | ✅ | +| Recall | 0.76 | 0.78 | -0.02 | ⚠️ | +| F1 Score | 0.81 | 0.81 | +0.00 | ✅ | +| Parse Success | 98% | 97% | +1% | ✅ | + +**Verdict:** ⚠️ REVIEW - Recall dropped by 2% + +## Cost Analysis + +- Total tokens: 125,430 +- Estimated cost: $0.12 +- Cost per fixture: $0.002 + +## Regressions + +### jwt-003: JWT algorithm none detection + +- **Expected:** `jwt/algorithm = "none"` with confidence > 0.8 +- **Got:** Not extracted +- **Impact:** High (security-critical) + +## Improvements + +### tls-007: TLS version in constants + +- **Previously:** Not extracted +- **Now:** `tls/min_version = "1.0"` with confidence 0.85 +- **Impact:** Medium + +## Category Breakdown + +| Category | Fixtures | Passed | Failed | Precision | Recall | +| -------- | -------- | ------ | ------ | --------- | ------ | +| tls | 12 | 11 | 1 | 0.92 | 0.91 | +| jwt | 8 | 6 | 2 | 0.75 | 0.75 | +| secrets | 15 | 14 | 1 | 0.93 | 0.87 | +| auth | 6 | 6 | 0 | 1.00 | 0.83 | +| negative | 10 | 10 | 0 | 1.00 | N/A | +``` + +--- + +### 6. Jobs & Automation + +#### CI Job (Per-PR) + +```yaml +# .github/workflows/prompt-eval-smoke.yml +name: Prompt Evaluation (Smoke) + +on: + pull_request: + paths: + - "applications/aphoria/src/llm/**" + - "applications/aphoria/tests/llm_fixtures/**" + +jobs: + eval: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Run smoke test + env: + GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} + run: | + cargo run -p aphoria -- eval prompts \ + --mode cached \ + --max-fixtures 20 \ + --categories tls,jwt,secrets \ + --baseline tests/llm_fixtures/baselines/latest.json \ + --fail-on-regression + + - name: Upload report + uses: actions/upload-artifact@v4 + with: + name: eval-report + path: eval-report.md +``` + +**Characteristics:** + +- Runs on PR that touches prompt code or fixtures +- Uses cached responses (fast, deterministic) +- Limited to 20 fixtures (smoke test) +- Fails if regression detected + +#### Nightly Job (Full Evaluation) + +```yaml +# .github/workflows/prompt-eval-nightly.yml +name: Prompt Evaluation (Full) + +on: + schedule: + - cron: "0 3 * * *" # 3am UTC daily + workflow_dispatch: + +jobs: + eval: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: Run full evaluation + env: + GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }} + run: | + cargo run -p aphoria -- eval prompts \ + --mode live \ + --model gemini-2.0-flash \ + --temperature 0 \ + --baseline tests/llm_fixtures/baselines/latest.json \ + --output-dir ./eval-results \ + --save-observations + + - name: Update baseline if improved + run: | + # If F1 improved by > 2%, update baseline + ./scripts/maybe-update-baseline.sh + + - name: Upload results + uses: actions/upload-artifact@v4 + with: + name: eval-results + path: eval-results/ + + - name: Post to Slack + if: failure() + uses: slackapi/slack-github-action@v1 + with: + payload: | + { + "text": "⚠️ Prompt evaluation regression detected", + "attachments": [...] + } +``` + +**Characteristics:** + +- Runs nightly at 3am UTC +- Uses live LLM API (real evaluation) +- Full corpus coverage +- Updates baseline if metrics improve significantly +- Alerts on regression + +#### On-Demand Job (Prompt Iteration) + +```bash +# For prompt development: compare two versions +aphoria eval prompts \ + --mode live \ + --prompt-file ./prompts/experimental.txt \ + --baseline ./baselines/current.json \ + --output-dir ./eval-comparison \ + --verbose + +# View comparison +cat ./eval-comparison/comparison.md +``` + +--- + +## CLI Interface + +``` +USAGE: + aphoria eval prompts [OPTIONS] + +OPTIONS: + --fixtures-dir Path to fixtures directory [default: tests/llm_fixtures] + --categories Categories to evaluate (comma-separated) + --max-fixtures Maximum fixtures to run + --mode Evaluation mode: live, cached, mock [default: cached] + --model Model to use (live mode only) [default: gemini-2.0-flash] + --temperature Temperature (live mode only) [default: 0] + --baseline Baseline to compare against + --output-dir Output directory for results [default: ./eval-results] + --save-observations Save observation logs + --fail-on-regression Exit with code 1 if regression detected + --regression-threshold Threshold for regression (default: 0.05 = 5%) + --verbose Verbose output + --json Output results as JSON + +SUBCOMMANDS: + aphoria eval prompts show-baseline Show current baseline metrics + aphoria eval prompts update-baseline Update baseline from latest run + aphoria eval prompts list-fixtures List available fixtures + aphoria eval prompts add-fixture Add a new fixture interactively + aphoria eval prompts validate-fixtures Validate fixture format +``` + +--- + +## Implementation Plan + +### Phase 7.8.1: Core Infrastructure (Week 1) + +| Task | Description | Effort | +| ----------------- | --------------------------------- | ------ | +| Fixture format | Define TOML schema, parser | 2d | +| Observation log | Schema, writer, reader | 1d | +| Claim matcher | Matching logic with fuzzy support | 2d | +| Prompt versioning | Version extraction, tracking | 1d | + +**Deliverable:** Can load fixtures, run extractions, compare outputs + +### Phase 7.8.2: Evaluation Harness (Week 2) + +| Task | Description | Effort | +| ------------------- | --------------------------- | ------ | +| Evaluation harness | Orchestration, parallelism | 2d | +| Metrics computation | Precision, recall, F1, cost | 1d | +| Baseline comparison | Regression detection | 1d | +| Report generation | Markdown, JSON output | 1d | + +**Deliverable:** Can run full evaluation and generate report + +### Phase 7.8.3: Golden Corpus (Week 2-3) + +| Task | Description | Effort | +| ---------------------- | -------------------------------- | ------ | +| Seed fixtures (20) | Hand-curated test cases | 2d | +| Negative fixtures (10) | Safe code that shouldn't trigger | 1d | +| Edge case fixtures (5) | Boundary conditions | 1d | +| Baseline establishment | Initial metrics snapshot | 1d | + +**Deliverable:** 35+ fixtures covering core categories + +### Phase 7.8.4: CI Integration (Week 3) + +| Task | Description | Effort | +| -------------------- | -------------------------------- | ------ | +| Smoke test workflow | Per-PR cached evaluation | 1d | +| Nightly workflow | Full live evaluation | 1d | +| Baseline auto-update | Script for improvement detection | 1d | +| Alerting | Slack/email on regression | 0.5d | + +**Deliverable:** Automated evaluation in CI + +### Phase 7.8.5: CLI & Documentation (Week 4) + +| Task | Description | Effort | +| ------------- | ------------------------------ | ------ | +| CLI commands | `eval prompts` subcommands | 2d | +| Documentation | Usage guide, fixture authoring | 1d | +| Skill update | `/aphoria-dev` skill update | 0.5d | + +**Deliverable:** Production-ready tooling + +--- + +## Open Questions + +### 1. Where do we store baseline metrics? + +**Options:** + +- **In repository** (`tests/llm_fixtures/baselines/`) - Simple, versioned with code +- **External artifact store** - Separates metrics from code +- **Database** - For historical tracking + +**Recommendation:** Start with repository, migrate to external store when history needed. + +### 2. How strict should matching be? + +**Options:** + +- **Exact match** - Same subject, predicate, value (brittle) +- **Structural match** - Same concept, fuzzy value (looser) +- **Semantic match** - Embeddings-based similarity (complex) + +**Recommendation:** Structural match with configurable fuzzy value matching. + +### 3. Mock vs Live in CI? + +**Options:** + +- **Always mock** - Fast, free, deterministic, tests harness not prompt +- **Always live** - Expensive, slow, tests actual prompt +- **Hybrid** - Mock for smoke, live for nightly + +**Recommendation:** Hybrid approach. Cached (deterministic) for PR, live for nightly. + +### 4. How do we handle model version changes? + +Gemini may update models, causing output drift even without prompt changes. + +**Options:** + +- Pin model version (if API supports) +- Track model version in baseline, re-baseline on model change +- Alert when model version changes + +**Recommendation:** Track model version, require manual baseline update on change. + +### 5. What's the corpus growth strategy? + +**Options:** + +- Hand-curate only (high quality, slow growth) +- Production capture with review (faster growth, needs tooling) +- Synthetic generation (fast, may not reflect reality) + +**Recommendation:** Start hand-curated, add production capture tooling in Phase 9. + +--- + +## Success Metrics + +| Metric | Target | Measurement | +| --------------------------------------- | ------------- | ------------------------------------- | +| Regression detection rate | 100% | Simulated regressions caught | +| False positive rate (regression alerts) | < 5% | Manual review of alerts | +| Prompt iteration cycle time | < 30 min | Time from change to evaluation result | +| Corpus coverage | > 50 fixtures | Fixture count | +| CI job duration (smoke) | < 2 min | Workflow timing | +| CI job duration (nightly) | < 15 min | Workflow timing | + +--- + +## Related Documents + +- [LLM-in-the-Loop Extraction](../../roadmap.md#phase-75-llm-in-the-loop-extraction) - Phase 7.5 implementation +- [Pattern Learning Store](../../roadmap.md#phase-76-pattern-learning-store) - Phase 7.6 implementation +- [LLM Extractor Code](../../src/llm/) - Current implementation + +--- + +## Appendix: Example Fixture + +```toml +# fixtures/secrets/high_entropy_api_key.toml + +[metadata] +id = "secrets-005" +name = "High entropy API key in Python config" +category = "secrets" +subcategory = "api_keys" +language = "python" +difficulty = "medium" +source = "hand-curated" +created = "2026-02-05" +updated = "2026-02-05" +notes = """ +Tests detection of high-entropy strings that look like API keys. +The entropy of 'sk_live_abc123...' is > 4.0 which should trigger detection. +""" + +[input] +filename = "config.py" +content = ''' +import os + +# Configuration for payment processing +STRIPE_API_KEY = "sk_live_51ABC123DEF456GHI789JKL012MNO345PQR678STU901VWX234YZ" +STRIPE_WEBHOOK_SECRET = os.environ.get("STRIPE_WEBHOOK_SECRET") + +DATABASE_URL = os.environ.get("DATABASE_URL", "postgresql://localhost/app") +DEBUG = os.environ.get("DEBUG", "false").lower() == "true" +''' + +[expected] +must_contain = [ + { subject = "secrets/api_key", predicate = "hardcoded", value = true, min_confidence = 0.8 }, +] + +must_not_contain = [ + # STRIPE_WEBHOOK_SECRET uses env var, should NOT be flagged + { subject = "secrets/webhook_secret", predicate = "hardcoded", value = true }, + # DATABASE_URL uses env var with fallback, should NOT be flagged as hardcoded + { subject = "secrets/database_url", predicate = "hardcoded", value = true }, +] + +acceptable_variants = [ + { subject = "stripe/api_key", predicate = "exposed", value = true }, + { subject = "payment/secret", predicate = "hardcoded", value = true }, +] + +[scoring] +weight = 1.5 # Security-critical, weighted higher +min_confidence = 0.8 +``` + +--- + +_Last updated: 2026-02-05_ diff --git a/applications/aphoria/docs/architecture/scout-judge-extraction.md b/applications/aphoria/docs/architecture/scout-judge-extraction.md new file mode 100644 index 0000000..8d8a9a1 --- /dev/null +++ b/applications/aphoria/docs/architecture/scout-judge-extraction.md @@ -0,0 +1,203 @@ +# Scout & Judge: Hybrid Deterministic-Probabilistic Extraction Architecture + +> **Status:** Proposed (2026-02-05) +> **Phase:** 7.9 (Replaces monolithic LLM extraction) +> **Context:** Evolution of Phase 7.5 (LLM-in-the-Loop) + +--- + +## 1. Problem Statement + +The current LLM extraction pipeline ("Monolithic Mode") treats code files as unstructured text. It feeds entire files to the LLM to find security claims. + +**Issues with Monolithic Mode:** +1. **Cost:** 90% of a file is irrelevant to security (imports, UI logic, helpers), yet we pay for every token. +2. **Recall:** LLMs struggle to find "needles in haystacks" (long context window degradation). +3. **Hallucination:** Irrelevant code confuses the model, leading to false positives. +4. **Latency:** Processing large files is slow/blocking. + +## 2. The Solution: Scout & Judge Architecture + +We decouple the **discovery** of potential claims from the **analysis** of those claims. + +* **The Scout (Deterministic):** Uses Abstract Syntax Trees (AST) via `tree-sitter` to find *Regions of Interest* (ROIs) with 100% speed and 0 cost. +* **The Judge (Probabilistic):** Uses the LLM to analyze *only* the specific ROI snippet to extract semantic meaning and confidence. + +### Architectural Diagram + +```mermaid +graph TD + File[Source File] -->|Input| Scout[AST Scout (Tree-sitter)] + + subgraph "The Scout (Local/Fast)" + Scout -->|Parse| AST + AST -->|Query| Query[SCM Queries] + Query -->|Match| Candidate[Candidate Node] + Candidate -->|Expand| Snippet[Context Snippet] + end + + Snippet -->|Input| Judge[LLM Judge (Gemini/Claude)] + + subgraph "The Judge (Remote/Smart)" + Judge -->|Prompt: Analyze this specific call| Claims[Structured Claims] + end + + Claims -->|Output| Aggregator[Claim Aggregator] +``` + +--- + +## 3. Component Details + +### 3.1 The Scout (Tree-sitter) + +The Scout's job is **High Recall**. It should find *anything* that *might* be relevant. It does not need to be precise. + +**Technology:** `tree-sitter` (Rust bindings) + +**Workflow:** +1. **Detect Language:** Identify file type (Python, Go, Rust, JS). +2. **Parse:** Generate AST. +3. **Query:** Run SCM (S-expression) queries to find patterns. + +**Example Query (Python TLS):** +```scm +(call_expression + function: (attribute) @func + arguments: (argument_list + (keyword_argument + name: (identifier) @arg_name + value: (_) @value + ) + ) + (#match? @func "requests\.(get|post|put|delete)") + (#eq? @arg_name "verify") +) +``` + +**Context Expansion:** +The Scout doesn't just grab the line. It grabs the **Logical Context**: +* The function call itself. +* Variable definitions referenced in the call (simple static analysis). +* Surrounding 5 lines for comments. + +### 3.2 The Judge (LLM) + +The Judge's job is **High Precision**. It receives a focused prompt and determines if a claim exists. + +**Input Prompt:** +```text +You are a security analyst. +Analyze this code snippet for TLS verification settings. + +SNIPPET: +# Dev override +should_verify = False +requests.get(url, verify=should_verify) + +CONTEXT: +Variable `should_verify` is defined on line 2. + +TASK: +Does this snippet disable TLS verification? +Output JSON: { "subject": "tls/verification", "value": false, "confidence": 0.95 } +``` + +**Why this wins:** +* **Token Efficiency:** Input reduced from 2000 tokens (file) to ~100 tokens (snippet). +* **Accuracy:** Model has no distractions. +* **Speed:** Parallelizable per-snippet. + +--- + +## 4. Implementation Plan + +### Phase 1: Infrastructure (Dependencies) + +Add `tree-sitter` support to `Cargo.toml`. + +```toml +[dependencies] +tree-sitter = "0.20" +tree-sitter-python = "0.20" +tree-sitter-javascript = "0.20" +tree-sitter-go = "0.20" +tree-sitter-rust = "0.20" +``` + +### Phase 2: The Scout Engine (`src/scout/`) + +Create a new module `applications/aphoria/src/scout/`. + +* `mod.rs`: Public interface. +* `engine.rs`: Orchestrates parsing and querying. +* `queries/`: Directory containing `.scm` query files for each category/language. + * `python/tls.scm` + * `go/sql_injection.scm` + +**Struct definition:** +```rust +pub struct CandidateSnippet { + pub file_path: String, + pub language: Language, + pub start_line: usize, + pub end_line: usize, + pub code: String, + pub context_variables: HashMap, // Name -> Value/Definition + pub query_id: String, // Which query found this +} +``` + +### Phase 3: The Judge Engine (`src/llm/judge.rs`) + +Refactor `LlmExtractor` to support "Judge Mode". + +* Modify `extract()` to accept `CandidateSnippet` instead of full file content. +* Create specialized prompts for specific query IDs (e.g., if Scout found a TLS pattern, use the specialized "TLS Judge" prompt, not the generic one). + +### Phase 4: Integration + +Modify the main `scan` loop: + +1. **Regex Extractors** run first (unchanged). +2. **Scout** runs on all files (extremely fast). +3. **Deduplicate:** If Scout finds a region already handled by Regex, drop it. +4. **Judge:** Send remaining Candidates to LLM. + +--- + +## 5. Evaluation & Metrics + +The "Prompt Evaluation System" (Phase 7.8) adapts to this model: + +**1. Scout Evaluation (Deterministic):** +* **Metric:** Recall. "Did the Scout find the vulnerable line in `fixtures/tls/bad.py`?" +* **Test:** Unit tests using `tree-sitter` queries against code snippets. No LLM required. + +**2. Judge Evaluation (Probabilistic):** +* **Metric:** Precision/Accuracy. "Given the snippet, did the LLM classify it correctly?" +* **Fixture:** `tests/llm_fixtures` now contains *snippets* derived from the Golden Corpus files. + +**3. Cost Efficiency Metric:** +* Track `tokens_per_claim`. +* Goal: Reduce tokens/claim by >80% compared to Monolithic approach. + +## 6. Migration Strategy + +1. **Parallel Run:** Run Scout logic alongside Regex logic in "shadow mode" (logging only) to tune queries. +2. **Incremental Rollout:** Enable Scout & Judge for **one category** (e.g., TLS) while leaving others in Monolithic mode (if any) or Regex mode. +3. **Full Switch:** Deprecate "Monolithic Mode" prompts. + +--- + +## 7. Comparison Summary + +| Feature | Current (Monolithic) | Scout & Judge (Proposed) | +| :--- | :--- | :--- | +| **Trigger** | File name heuristic | AST Pattern Match | +| **Input** | Whole File | Relevant Snippet | +| **Context** | Noisy (imports, unrelated code) | Focused (local scope) | +| **Cost** | $$$ (Linear to file size) | ¢ (Linear to *relevant* code) | +| **Reliability** | Low (Lost in middle) | High (Forced focus) | +| **Maintenance** | Prompt Engineering | Query Engineering + Simple Prompts | + diff --git a/applications/aphoria/roadmap.md b/applications/aphoria/roadmap.md index 7dc2fbe..7fb3239 100644 --- a/applications/aphoria/roadmap.md +++ b/applications/aphoria/roadmap.md @@ -1142,7 +1142,7 @@ auto_promote = false # Require human approval (Phase 7.7) --- -## Phase 7.7: Pattern → Extractor Promotion ⬜ +## Phase 7.7: Pattern → Extractor Promotion ✅ > High-frequency learned patterns get promoted to declarative extractors. This closes the learning loop: patterns discovered by LLM become permanent, fast regex extractors. @@ -1162,15 +1162,17 @@ Human review (optional) → Approve/Reject Merge to project's .aphoria/extractors/ ``` -### 7.7.1 Promotion Pipeline ⬜ +### 7.7.1 Promotion Pipeline ✅ -| Task | Description | -|------|-------------| -| Candidate selection | Query patterns meeting threshold | -| Regex generation | LLM generates regex from examples | -| YAML generation | Convert to declarative extractor format | -| Validation | Test against all stored examples | -| Review queue | Present candidates for human approval | +| Task | Status | +|------|--------| +| `PromotionPipeline` | ✅ `promotion/pipeline.rs` — orchestrates full promotion flow | +| `RegexGenerator` | ✅ `promotion/regex_gen.rs` — Gemini LLM integration | +| `ExtractorValidator` | ✅ `promotion/validator.rs` — ReDoS detection, timing validation | +| `YamlWriter` | ✅ `promotion/writer.rs` — outputs to `.aphoria/extractors/learned/` | +| `InteractiveReviewer` | ✅ `promotion/review.rs` — CLI review workflow | +| `PromotionCandidate` | ✅ `promotion/types.rs` | +| `ValidationResult` | ✅ `promotion/types.rs` | ```rust pub struct PromotionPipeline { @@ -1216,13 +1218,13 @@ impl PromotionPipeline { } ``` -### 7.7.2 Regex Generation ⬜ +### 7.7.2 Regex Generation ✅ -| Task | Description | -|------|-------------| -| Multi-example prompt | Include all examples in generation prompt | -| Regex safety | Prevent catastrophic backtracking | -| Test coverage | Generate test cases alongside regex | +| Task | Status | +|------|--------| +| Multi-example prompt | ✅ Includes all examples in generation prompt | +| Regex safety | ✅ ReDoS detection prevents catastrophic backtracking | +| Test coverage | ✅ Validates against stored examples | ```rust async fn generate_regex(examples: &[String], claim: &ClaimTemplate) -> Result { @@ -1244,14 +1246,14 @@ async fn generate_regex(examples: &[String], claim: &ClaimTemplate) -> Result0.95 confidence (Phase 9) | +| Task | Status | +|------|--------| +| `aphoria extractors review` | ✅ CLI to review pending promotions | +| `aphoria extractors stats` | ✅ Show pattern store statistics | +| `aphoria extractors candidates` | ✅ List promotion candidates | +| `aphoria extractors promote` | ✅ Promote pattern to extractor | +| Approval workflow | ✅ Approve, reject, or skip via InteractiveReviewer | +| Rejection tracking | ⬜ Deferred to Phase 9 (rejection reason persistence) | +| Auto-approve mode | ⬜ Deferred to Phase 9 (>0.95 confidence auto-promote) | ```bash $ aphoria extractors review @@ -1320,9 +1325,9 @@ Pending promotions: 3 [a]pprove [r]eject [e]dit [s]kip [q]uit: _ ``` -### 7.7.5 Extractor Output ⬜ +### 7.7.5 Extractor Output ✅ -Promoted patterns become declarative extractors in `.aphoria/extractors/`: +Promoted patterns become declarative extractors in `.aphoria/extractors/learned/`: ```yaml # .aphoria/extractors/learned/tls_min_version_const.yaml @@ -1348,7 +1353,7 @@ metadata: confidence: 0.91 ``` -### 7.7.6 Configuration ⬜ +### 7.7.6 Configuration ✅ ```toml # aphoria.toml @@ -1357,14 +1362,13 @@ enabled = true # Enable promotion pipeline auto_promote = false # Require human approval output_dir = ".aphoria/extractors/learned" min_confidence = 0.8 # Minimum to consider +min_projects = 5 # Projects needed before promotion require_validation = true # Must pass validation suite - -[promotion.review] -notify = "slack://webhook/..." # Notify when candidates ready -batch_size = 10 # Max candidates per review session ``` -**Files:** `promotion/mod.rs`, `promotion/pipeline.rs`, `promotion/regex_gen.rs`, `promotion/validator.rs`, `promotion/review.rs` +**Files:** `promotion/mod.rs`, `promotion/pipeline.rs`, `promotion/regex_gen.rs`, `promotion/validator.rs`, `promotion/review.rs`, `promotion/writer.rs`, `promotion/types.rs`, `handlers/extractors.rs` + +**Tests:** 43 tests covering pipeline, validation, regex generation, and YAML output. --- @@ -1549,14 +1553,14 @@ contribute_patterns = true # Share patterns to community | 7 | Declarative Extractors | Phase 6 | ✅ | | **7.5** | **LLM-in-the-Loop Extraction (Gemini)** | Phase 7 | ✅ | | **7.6** | **Pattern Learning Store** | Phase 7.5 | ✅ | -| **7.7** | **Pattern → Extractor Promotion** | Phase 7.6 | ⬜ | +| **7.7** | **Pattern → Extractor Promotion** | Phase 7.6 | ✅ | | 8 | Enterprise Extractors (MVP: 8.1, 8.6, 8.11) | Phase 7.5 | ✅ | | **9** | **Autonomous Extractor Generation** | Phase 8 | ⬜ | **Current state:** -- Phases 0-3, 4.5, 4A-4E, 5, 5.6, 6, 7, 7.5, 7.6, 8 (MVP) complete (clippy clean) +- Phases 0-3, 4.5, 4A-4E, 5, 5.6, 6, 7, 7.5, 7.6, 7.7, 8 (MVP) complete (clippy clean) - Full corpus: RFC, OWASP, Vendor sources -- 17 extractors including security (weak_crypto, command_injection, sql_injection, high_entropy_secrets, auth_bypass, insecure_cookies) +- 25 extractors including security (weak_crypto, command_injection, sql_injection, high_entropy_secrets, auth_bypass, insecure_cookies, path_traversal, unvalidated_redirects, weak_password, security_headers, insecure_deserialization, ssrf, orm_injection, xxe) - Trust Packs: signed policy bundles with import/export - Ephemeral mode: 40x faster for CI - Observation write-back: `--sync` records novel claims as Tier 4 project memory @@ -1567,10 +1571,11 @@ contribute_patterns = true # Share patterns to community - Community Corpus: Opt-in anonymous pattern sharing with privacy-preserving anonymization - Declarative Extractors: TOML-defined custom extractors without Rust code - LLM Extraction: Gemini-powered semantic claim extraction for high-value files -- Enterprise Extractors MVP: High-entropy secrets (Shannon entropy), auth bypass patterns, insecure cookie flags +- Enterprise Extractors: High-entropy secrets, auth bypass, insecure cookies, path traversal, unvalidated redirects, weak passwords, security headers, insecure deserialization, SSRF, ORM injection, XXE - Pattern Learning: LLM-extracted claims recorded for promotion to declarative extractors +- Pattern Promotion: CLI workflow to promote learned patterns to declarative extractors with Gemini regex generation and validation -**Next:** Phase 7.7 → 8 (full) → 9 (Self-Learning Extraction System) +**Next:** Phase 8 (full) → 9 (Self-Learning Extraction System) ### The Self-Learning Vision @@ -1581,11 +1586,11 @@ Phase 7.5: LLM-in-the-Loop (Gemini semantic extraction) ✅ COMPLETE ↓ Phase 7.6: Pattern Learning (remember what LLM finds) ✅ COMPLETE ↓ -Phase 7.7: Pattern Promotion (patterns → extractors) ⬜ NEXT +Phase 7.7: Pattern Promotion (patterns → extractors) ✅ COMPLETE ↓ Phase 8: Enterprise Extractors (generated + curated) ✅ MVP (8.1, 8.6, 8.11) ↓ -Phase 9: Autonomous Generation (fully self-improving) ⬜ +Phase 9: Autonomous Generation (fully self-improving) ⬜ NEXT ``` **The endgame:** Every PR teaches Aphoria. After a month, it knows your security patterns better than your team does. @@ -1744,34 +1749,51 @@ fn extract_config_claims(config: &ConfigValue, path: &[String]) -> Vec`, `` | +| defusedxml detection | ✅ Lower confidence when defusedxml is imported | +| Tests | ✅ 9+ tests covering all patterns | + +**Languages:** Python, JavaScript, TypeScript, Go Unsafe XML parsing: @@ -2004,9 +2115,21 @@ SAXParserFactory.newInstance() // Without secure processing --- -### 8.14 Weak Password Requirements ⬜ +### 8.14 Weak Password Requirements ✅ -**Impact:** MEDIUM | **Effort:** LOW +**Impact:** MEDIUM | **Effort:** LOW | **Status:** Complete + +| Task | Status | +|------|--------| +| `WeakPasswordExtractor` | ✅ `extractors/weak_password.rs` | +| Minimum length < 8 | ✅ `password_min_length: 6`, `minLength: 4` | +| Bcrypt cost < 10 | ✅ `bcrypt_cost = 8`, `hash_rounds = 5` | +| Simple length checks | ✅ `len(password) >= 6` in code | +| Complexity disabled | ✅ `require_special_chars: false`, `require_uppercase = false` | +| Number requirement disabled | ✅ `require_numbers: no`, `require_digit = 0` | +| Tests | ✅ 7+ tests covering all patterns | + +**Languages:** Python, JavaScript, TypeScript, Go, Rust, YAML, JSON, TOML Password validation that's too weak: @@ -2071,26 +2194,24 @@ async fn extract_with_llm(code: &str, file: &str) -> Vec { | **8.1** | High-entropy secrets | HIGH | MEDIUM | Catches real leaked secrets | ✅ | | **8.2** | Framework-specific | HIGH | HIGH | Spring/Django/Express coverage | ⬜ | | **8.3** | Config deep parsing | HIGH | MEDIUM | Nested YAML/JSON understanding | ⬜ | -| **8.4** | Semantic TLS | MEDIUM | MEDIUM | Catches const TLS_MIN = "1.0" | ⬜ | -| **8.5** | ORM SQL injection | MEDIUM | MEDIUM | SQLAlchemy, Django, Sequelize | ⬜ | +| **8.4** | Semantic TLS | MEDIUM | MEDIUM | Catches const TLS_MIN = "1.0" | ✅ | +| **8.5** | ORM SQL injection | MEDIUM | MEDIUM | SQLAlchemy, Django, Sequelize | ✅ | | **8.6** | Auth bypass | HIGH | MEDIUM | Backdoors, hardcoded creds | ✅ | -| **8.7** | Deserialization | HIGH | MEDIUM | pickle, Marshal, eval | ⬜ | -| **8.8** | Path traversal | MEDIUM | LOW | ../../../etc/passwd | ⬜ | -| **8.9** | SSRF | HIGH | MEDIUM | Internal network access | ⬜ | -| **8.10** | Security headers | MEDIUM | LOW | Missing helmet(), CSP | ⬜ | +| **8.7** | Deserialization | HIGH | MEDIUM | pickle, Marshal, eval | ✅ | +| **8.8** | Path traversal | MEDIUM | LOW | ../../../etc/passwd | ✅ | +| **8.9** | SSRF | HIGH | MEDIUM | Internal network access | ✅ | +| **8.10** | Security headers | MEDIUM | LOW | Missing helmet(), CSP | ✅ | | **8.11** | Cookie flags | MEDIUM | LOW | httpOnly, secure, sameSite | ✅ | -| **8.12** | Open redirects | MEDIUM | LOW | Phishing via redirect | ⬜ | -| **8.13** | XXE | HIGH | MEDIUM | XML entity injection | ⬜ | -| **8.14** | Weak passwords | MEDIUM | LOW | MIN_LENGTH = 4 | ⬜ | +| **8.12** | Open redirects | MEDIUM | LOW | Phishing via redirect | ✅ | +| **8.13** | XXE | HIGH | MEDIUM | XML entity injection | ✅ | +| **8.14** | Weak passwords | MEDIUM | LOW | MIN_LENGTH = 4 | ✅ | | **8.15** | LLM extraction | VERY HIGH | VERY HIGH | Semantic understanding | ✅ (Phase 7.5) | -**MVP Complete (8.1, 8.6, 8.11):** High-impact extractors for enterprise pilots. +**Phase 8 Complete (8.1, 8.4, 8.5-8.14):** All first-pass extractors implemented. 12 of 14 Phase 8 extractors complete. -**Recommended order for remaining extractors:** -1. **8.3** Config deep parsing (foundational for 8.2) -2. **8.2** Framework-specific (customer-driven) -3. **8.5** ORM SQL injection (common in enterprise apps) -4. **8.7** Deserialization (critical vulnerabilities) +**Remaining deferred extractors:** +1. **8.2** Framework-specific (HIGH effort - Spring, Django, Express, Rails) +2. **8.3** Config deep parsing (HIGH effort - YAML/JSON AST parsing) --- diff --git a/applications/aphoria/src/config/defaults.rs b/applications/aphoria/src/config/defaults.rs index 67ab7ae..464210f 100644 --- a/applications/aphoria/src/config/defaults.rs +++ b/applications/aphoria/src/config/defaults.rs @@ -40,10 +40,19 @@ impl Default for ExtractorConfig { "unreal_cpp".to_string(), "unreal_config".to_string(), "unreal_performance".to_string(), - // Phase 8: Enterprise extractors + // Phase 8: Enterprise extractors (first batch) "high_entropy_secrets".to_string(), "auth_bypass".to_string(), "insecure_cookies".to_string(), + // Phase 8: Enterprise extractors (second batch) + "path_traversal".to_string(), + "unvalidated_redirects".to_string(), + "weak_password".to_string(), + "security_headers".to_string(), + "insecure_deserialization".to_string(), + "ssrf".to_string(), + "orm_injection".to_string(), + "xxe".to_string(), ], disabled: vec![], timeout_config: TimeoutExtractorConfig::default(), diff --git a/applications/aphoria/src/extractors/insecure_deserialization.rs b/applications/aphoria/src/extractors/insecure_deserialization.rs new file mode 100644 index 0000000..fb74e57 --- /dev/null +++ b/applications/aphoria/src/extractors/insecure_deserialization.rs @@ -0,0 +1,386 @@ +//! Insecure deserialization vulnerability extractor. +//! +//! Detects patterns where untrusted data is deserialized using unsafe methods, +//! which can lead to remote code execution vulnerabilities. + +use regex::Regex; +use stemedb_core::types::ObjectValue; + +use super::traits::{build_claim, Extractor}; +use crate::types::{ExtractedClaim, Language}; + +/// Extractor for insecure deserialization vulnerabilities. +/// +/// Detects patterns indicating unsafe deserialization: +/// - Python pickle (critical - RCE) +/// - Python yaml.load without SafeLoader +/// - Python marshal +/// - Python eval/exec with user input +/// - JavaScript node-serialize +/// - Go gob without validation +/// - Java ObjectInputStream patterns +pub struct InsecureDeserializationExtractor { + // Python patterns (critical) + python_pickle: Regex, + python_yaml_unsafe: Regex, + python_marshal: Regex, + python_eval: Regex, + + // JavaScript patterns + js_serialize: Regex, + + // Go patterns + go_gob: Regex, + + // Java-style patterns (polyglot detection) + java_ois: Regex, +} + +impl Default for InsecureDeserializationExtractor { + fn default() -> Self { + Self::new() + } +} + +impl InsecureDeserializationExtractor { + /// Create a new insecure deserialization extractor with compiled regexes. + /// + /// # Panics + /// Panics if any regex pattern is invalid (programmer error). + #[allow(clippy::expect_used)] + pub fn new() -> Self { + Self { + // Python: pickle (critical - allows arbitrary code execution) + python_pickle: Regex::new(r#"pickle\.(?:loads?|Unpickler)\s*\("#).expect("valid regex"), + + // Python: yaml.load without SafeLoader (yaml.safe_load is OK) + // Matches yaml.load( but not yaml.safe_load( + python_yaml_unsafe: Regex::new(r#"yaml\.(?:load|unsafe_load)\s*\([^)]*(?:\)|,[^S])"#) + .expect("valid regex"), + + // Python: marshal (unsafe, similar to pickle) + python_marshal: Regex::new(r#"marshal\.loads?\s*\("#).expect("valid regex"), + + // Python: eval/exec with user input + python_eval: Regex::new( + r#"(?:eval|exec)\s*\(\s*(?:request\.|params\.|input|user|data)"#, + ) + .expect("valid regex"), + + // JavaScript: node-serialize (known vulnerable) + js_serialize: Regex::new( + r#"(?:require\s*\(\s*["']node-serialize["']\)|\.unserialize\s*\()"#, + ) + .expect("valid regex"), + + // Go: gob decoder (can be unsafe with untrusted input) + go_gob: Regex::new(r#"gob\.(?:NewDecoder|Decode)\s*\("#).expect("valid regex"), + + // Java-style patterns (for polyglot detection in config files, etc.) + java_ois: Regex::new(r#"ObjectInputStream|readObject\s*\(\)"#).expect("valid regex"), + } + } + + fn make_claim( + path_segments: &[String], + file: &str, + line: usize, + matched: &str, + method: &str, + confidence: f32, + description: &str, + ) -> ExtractedClaim { + build_claim( + path_segments, + &["serialization", "deserialization"], + "deserialize_method", + ObjectValue::Text(method.to_string()), + file, + line, + matched, + confidence, + description, + ) + } +} + +impl Extractor for InsecureDeserializationExtractor { + fn name(&self) -> &str { + "insecure_deserialization" + } + + fn languages(&self) -> &[Language] { + &[Language::Python, Language::JavaScript, Language::TypeScript, Language::Go] + } + + fn extract( + &self, + path_segments: &[String], + content: &str, + language: Language, + file: &str, + ) -> Vec { + let mut claims = Vec::new(); + + for (line_idx, line) in content.lines().enumerate() { + let line_num = line_idx + 1; + + match language { + Language::Python => { + // Pickle (critical - RCE) + if let Some(m) = self.python_pickle.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "pickle", + 0.95, + "pickle deserialization allows arbitrary code execution (CRITICAL)", + )); + } + + // yaml.load without SafeLoader + if let Some(m) = self.python_yaml_unsafe.find(line) { + // Check if SafeLoader is used on the same line + if !line.contains("SafeLoader") && !line.contains("safe_load") { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "yaml_unsafe", + 0.85, + "yaml.load without SafeLoader allows code execution", + )); + } + } + + // marshal + if let Some(m) = self.python_marshal.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "marshal", + 0.9, + "marshal deserialization is unsafe with untrusted data", + )); + } + + // eval/exec with user input + if let Some(m) = self.python_eval.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "eval", + 0.95, + "eval/exec with user input allows arbitrary code execution (CRITICAL)", + )); + } + } + Language::JavaScript | Language::TypeScript => { + // node-serialize (known vulnerable) + if let Some(m) = self.js_serialize.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "node_serialize", + 0.95, + "node-serialize is vulnerable to remote code execution (CVE-2017-5941)", + )); + } + } + Language::Go => { + // gob decoder + if let Some(m) = self.go_gob.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "gob", + 0.75, // Lower confidence - gob is safer but can still be problematic + "gob deserialization with untrusted input may be unsafe", + )); + } + } + _ => {} + } + + // Check for Java patterns in any language (polyglot detection) + if let Some(m) = self.java_ois.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "java_ois", + 0.85, + "Java ObjectInputStream deserialization is vulnerable to RCE", + )); + } + } + + claims + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_python_pickle() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + data = pickle.loads(request.data) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "app.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("CRITICAL")); + assert!(claims[0].confidence >= 0.9); + } + + #[test] + fn test_python_pickle_load() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + with open('data.pkl', 'rb') as f: + obj = pickle.load(f) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "app.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_python_yaml_unsafe() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + data = yaml.load(file_content) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "config.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("SafeLoader")); + } + + #[test] + fn test_python_yaml_safe_no_detection() { + let extractor = InsecureDeserializationExtractor::new(); + // Safe: using SafeLoader + let content = r#" + data = yaml.load(file_content, Loader=yaml.SafeLoader) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "config.py"); + + assert!(claims.is_empty()); + } + + #[test] + fn test_python_marshal() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + obj = marshal.loads(data) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "app.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_python_eval() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + result = eval(request.data) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "app.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("CRITICAL")); + } + + #[test] + fn test_js_node_serialize_require() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + const s = require('node-serialize'); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "app.js"); + + assert_eq!(claims.len(), 1); // require pattern + } + + #[test] + fn test_js_unserialize() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + const obj = s.unserialize(data); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "app.js"); + + assert_eq!(claims.len(), 1); // unserialize pattern + } + + #[test] + fn test_go_gob() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + dec := gob.NewDecoder(reader) + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "handler.go"); + + assert_eq!(claims.len(), 1); // NewDecoder + } + + #[test] + fn test_go_gob_decode() { + let extractor = InsecureDeserializationExtractor::new(); + let content = r#" + err := gob.Decode(&data) + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "handler.go"); + + assert_eq!(claims.len(), 1); // Decode + } + + #[test] + fn test_java_ois_polyglot() { + let extractor = InsecureDeserializationExtractor::new(); + // Java pattern detected in any language + let content = r#" + ObjectInputStream ois = new ObjectInputStream(inputStream); + Object obj = ois.readObject(); + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "mixed.py"); + + assert_eq!(claims.len(), 2); // ObjectInputStream and readObject + } +} diff --git a/applications/aphoria/src/extractors/mod.rs b/applications/aphoria/src/extractors/mod.rs index fff547f..24ea6a9 100644 --- a/applications/aphoria/src/extractors/mod.rs +++ b/applications/aphoria/src/extractors/mod.rs @@ -18,6 +18,14 @@ //! - `high_entropy_secrets`: High-entropy strings likely to be leaked secrets //! - `auth_bypass`: Authentication bypass patterns (hardcoded creds, debug auth) //! - `insecure_cookies`: Cookies missing Secure/HttpOnly flags +//! - `path_traversal`: File operations with user-controlled paths +//! - `unvalidated_redirects`: HTTP redirects with user-controlled URLs +//! - `weak_password`: Weak password policy configurations +//! - `security_headers`: Missing or disabled security headers +//! - `insecure_deserialization`: Unsafe deserialization (pickle, yaml.load, etc.) +//! - `ssrf`: HTTP requests with user-controlled URLs +//! - `orm_injection`: ORM methods with string interpolation +//! - `xxe`: XML parsing without external entity protection //! //! # Declarative Extractors //! @@ -32,10 +40,15 @@ mod dep_versions; mod hardcoded_secrets; mod high_entropy; mod insecure_cookies; +mod insecure_deserialization; mod jwt_config; +mod orm_injection; +mod path_traversal; mod rate_limit; mod registry; +mod security_headers; mod sql_injection; +mod ssrf; mod timeout_config; mod tls_verify; mod tls_version; @@ -43,7 +56,10 @@ mod traits; mod unreal_config; mod unreal_cpp; mod unreal_performance; +mod unvalidated_redirects; mod weak_crypto; +mod weak_password; +mod xxe; pub use auth_bypass::AuthBypassExtractor; pub use command_injection::CommandInjectionExtractor; @@ -55,10 +71,15 @@ pub use dep_versions::DepVersionsExtractor; pub use hardcoded_secrets::HardcodedSecretsExtractor; pub use high_entropy::HighEntropySecretsExtractor; pub use insecure_cookies::InsecureCookiesExtractor; +pub use insecure_deserialization::InsecureDeserializationExtractor; pub use jwt_config::JwtConfigExtractor; +pub use orm_injection::OrmInjectionExtractor; +pub use path_traversal::PathTraversalExtractor; pub use rate_limit::{RateLimitExtractor, RateLimitThresholds}; pub use registry::ExtractorRegistry; +pub use security_headers::SecurityHeadersExtractor; pub use sql_injection::SqlInjectionExtractor; +pub use ssrf::SsrfExtractor; pub use timeout_config::{TimeoutConfigExtractor, TimeoutThresholds}; pub use tls_verify::TlsVerifyExtractor; pub use tls_version::TlsVersionExtractor; @@ -66,4 +87,7 @@ pub use traits::{build_claim, is_test_file, Extractor}; pub use unreal_config::UnrealConfigExtractor; pub use unreal_cpp::UnrealCppExtractor; pub use unreal_performance::UnrealPerformanceExtractor; +pub use unvalidated_redirects::UnvalidatedRedirectsExtractor; pub use weak_crypto::WeakCryptoExtractor; +pub use weak_password::WeakPasswordExtractor; +pub use xxe::XxeExtractor; diff --git a/applications/aphoria/src/extractors/orm_injection.rs b/applications/aphoria/src/extractors/orm_injection.rs new file mode 100644 index 0000000..5d0af03 --- /dev/null +++ b/applications/aphoria/src/extractors/orm_injection.rs @@ -0,0 +1,370 @@ +//! ORM SQL injection vulnerability extractor. +//! +//! Detects patterns where ORM methods are used with string interpolation +//! instead of proper parameterized queries, which can lead to SQL injection. + +use regex::Regex; +use stemedb_core::types::ObjectValue; + +use super::traits::{build_claim, Extractor}; +use crate::types::{ExtractedClaim, Language}; + +/// Extractor for ORM-specific SQL injection vulnerabilities. +/// +/// Detects patterns indicating unsafe query construction in popular ORMs: +/// - Django: .raw() and .extra() with string interpolation +/// - SQLAlchemy: text() and execute() with f-strings +/// - Sequelize: query() with template literals +/// - TypeORM: where() with template literals +/// - GORM: Raw() with fmt.Sprintf +/// - Prisma: $queryRawUnsafe with interpolation +pub struct OrmInjectionExtractor { + // Django patterns + django_raw: Regex, + django_extra: Regex, + + // SQLAlchemy patterns + sqlalchemy_text: Regex, + sqlalchemy_exec: Regex, + + // Sequelize patterns + sequelize_raw: Regex, + + // TypeORM patterns + typeorm_where: Regex, + + // GORM patterns + gorm_raw: Regex, + + // Prisma patterns + prisma_raw: Regex, +} + +impl Default for OrmInjectionExtractor { + fn default() -> Self { + Self::new() + } +} + +impl OrmInjectionExtractor { + /// Create a new ORM injection extractor with compiled regexes. + /// + /// # Panics + /// Panics if any regex pattern is invalid (programmer error). + #[allow(clippy::expect_used)] + pub fn new() -> Self { + Self { + // Django: raw/extra with formatting + django_raw: Regex::new(r#"\.raw\s*\(\s*(?:f["']|["'][^"']*\{[^}]*\}["']\.format)"#) + .expect("valid regex"), + django_extra: Regex::new(r#"\.extra\s*\([^)]*where\s*=\s*\[.*(?:f["']|%)"#) + .expect("valid regex"), + + // SQLAlchemy: text with formatting + sqlalchemy_text: Regex::new(r#"text\s*\(\s*(?:f["']|["'][^"']*%|["'][^"']*\.format)"#) + .expect("valid regex"), + sqlalchemy_exec: Regex::new(r#"\.execute\s*\(\s*(?:f["']|["'][^"']*\{)"#) + .expect("valid regex"), + + // Sequelize: raw query interpolation + sequelize_raw: Regex::new(r#"sequelize\.query\s*\(\s*`[^`]*\$\{"#) + .expect("valid regex"), + + // TypeORM: where with interpolation + typeorm_where: Regex::new(r#"\.(?:where|andWhere|orWhere)\s*\(\s*`[^`]*\$\{"#) + .expect("valid regex"), + + // GORM: Raw with Sprintf + gorm_raw: Regex::new(r#"\.Raw\s*\(\s*(?:fmt\.Sprintf|"[^"]*"\s*\+)"#) + .expect("valid regex"), + + // Prisma: $queryRawUnsafe + prisma_raw: Regex::new(r#"\$queryRawUnsafe\s*\(\s*`[^`]*\$\{"#).expect("valid regex"), + } + } + + fn make_claim( + path_segments: &[String], + file: &str, + line: usize, + matched: &str, + orm: &str, + description: &str, + ) -> ExtractedClaim { + build_claim( + path_segments, + &["db", "orm", "query"], + "query_construction", + ObjectValue::Text(format!("interpolated_{}", orm)), + file, + line, + matched, + 0.9, + description, + ) + } +} + +impl Extractor for OrmInjectionExtractor { + fn name(&self) -> &str { + "orm_injection" + } + + fn languages(&self) -> &[Language] { + &[Language::Python, Language::JavaScript, Language::TypeScript, Language::Go] + } + + fn extract( + &self, + path_segments: &[String], + content: &str, + language: Language, + file: &str, + ) -> Vec { + let mut claims = Vec::new(); + + for (line_idx, line) in content.lines().enumerate() { + let line_num = line_idx + 1; + + match language { + Language::Python => { + // Django raw queries + if let Some(m) = self.django_raw.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "django", + "Django .raw() with string interpolation (SQL injection risk)", + )); + } + // Django extra queries + if let Some(m) = self.django_extra.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "django", + "Django .extra() with interpolation (SQL injection risk)", + )); + } + // SQLAlchemy text + if let Some(m) = self.sqlalchemy_text.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "sqlalchemy", + "SQLAlchemy text() with interpolation (SQL injection risk)", + )); + } + // SQLAlchemy execute + if let Some(m) = self.sqlalchemy_exec.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "sqlalchemy", + "SQLAlchemy execute() with f-string (SQL injection risk)", + )); + } + } + Language::JavaScript | Language::TypeScript => { + // Sequelize raw query + if let Some(m) = self.sequelize_raw.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "sequelize", + "Sequelize raw query with template literal (SQL injection risk)", + )); + } + // TypeORM where + if let Some(m) = self.typeorm_where.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "typeorm", + "TypeORM where() with template literal (SQL injection risk)", + )); + } + // Prisma queryRawUnsafe + if let Some(m) = self.prisma_raw.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "prisma", + "Prisma $queryRawUnsafe with interpolation (SQL injection risk)", + )); + } + } + Language::Go => { + // GORM Raw + if let Some(m) = self.gorm_raw.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "gorm", + "GORM Raw() with fmt.Sprintf (SQL injection risk)", + )); + } + } + _ => {} + } + } + + claims + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_django_raw_fstring() { + let extractor = OrmInjectionExtractor::new(); + let content = r#" + users = User.objects.raw(f"SELECT * FROM users WHERE name = '{name}'") + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "views.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("db/orm/query")); + } + + #[test] + fn test_django_raw_format() { + let extractor = OrmInjectionExtractor::new(); + let content = r#" + users = User.objects.raw("SELECT * FROM users WHERE id = {}".format(user_id)) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "views.py"); + + // This matches the django_raw pattern (has {} and .format) + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_sqlalchemy_text() { + let extractor = OrmInjectionExtractor::new(); + let content = r#" + result = session.execute(text(f"SELECT * FROM users WHERE id = {user_id}")) + "#; + + let claims = extractor.extract(&["python".to_string()], content, Language::Python, "db.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("SQLAlchemy")); + } + + #[test] + fn test_sqlalchemy_execute_fstring() { + let extractor = OrmInjectionExtractor::new(); + let content = r#" + conn.execute(f"UPDATE users SET name = '{name}' WHERE id = {id}") + "#; + + let claims = extractor.extract(&["python".to_string()], content, Language::Python, "db.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_sequelize_raw() { + let extractor = OrmInjectionExtractor::new(); + let content = r#" + const users = await sequelize.query(`SELECT * FROM users WHERE id = ${userId}`); + "#; + + let claims = extractor.extract(&["js".to_string()], content, Language::JavaScript, "db.js"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("Sequelize")); + } + + #[test] + fn test_typeorm_where() { + let extractor = OrmInjectionExtractor::new(); + let content = r#" + const user = await userRepo.createQueryBuilder("user") + .where(`user.id = ${userId}`) + .getOne(); + "#; + + let claims = + extractor.extract(&["ts".to_string()], content, Language::TypeScript, "user.ts"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_gorm_raw() { + let extractor = OrmInjectionExtractor::new(); + let content = r#" + db.Raw(fmt.Sprintf("SELECT * FROM users WHERE id = %d", userID)).Scan(&user) + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "repo.go"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("GORM")); + } + + #[test] + fn test_prisma_query_raw_unsafe() { + let extractor = OrmInjectionExtractor::new(); + let content = r#" + const users = await prisma.$queryRawUnsafe(`SELECT * FROM users WHERE id = ${userId}`); + "#; + + let claims = extractor.extract(&["ts".to_string()], content, Language::TypeScript, "db.ts"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("Prisma")); + } + + #[test] + fn test_no_false_positives_parameterized() { + let extractor = OrmInjectionExtractor::new(); + // Safe: parameterized query - no f-string, no .format(), just %s placeholder + let content = r#" + users = User.objects.raw("SELECT * FROM users WHERE id = ?", [user_id]) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "views.py"); + + assert!(claims.is_empty()); + } + + #[test] + fn test_no_false_positives_orm_filter() { + let extractor = OrmInjectionExtractor::new(); + // Safe: using ORM filter methods + let content = r#" + users = User.objects.filter(id=user_id) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "views.py"); + + assert!(claims.is_empty()); + } +} diff --git a/applications/aphoria/src/extractors/path_traversal.rs b/applications/aphoria/src/extractors/path_traversal.rs new file mode 100644 index 0000000..95d91af --- /dev/null +++ b/applications/aphoria/src/extractors/path_traversal.rs @@ -0,0 +1,348 @@ +//! Path traversal vulnerability extractor. +//! +//! Detects patterns where file system operations use user-controlled input +//! without proper validation, which can lead to directory traversal attacks. + +use regex::Regex; +use stemedb_core::types::ObjectValue; + +use super::traits::{build_claim, Extractor}; +use crate::types::{ExtractedClaim, Language}; + +/// Extractor for path traversal vulnerabilities. +/// +/// Detects patterns indicating unsafe file path handling: +/// - File operations with user-controlled input +/// - Path.join/filepath.Join with request parameters +/// - sendFile/res.download with user input +/// - Direct traversal literals (../ or %2e%2e) +pub struct PathTraversalExtractor { + // Python patterns + python_open_user: Regex, + python_path_join: Regex, + + // JavaScript/TypeScript patterns + js_fs_user: Regex, + js_path_join: Regex, + js_sendfile: Regex, + + // Go patterns + go_filepath: Regex, + + // Rust patterns + rust_path_user: Regex, + + // Universal patterns + traversal_literal: Regex, +} + +impl Default for PathTraversalExtractor { + fn default() -> Self { + Self::new() + } +} + +impl PathTraversalExtractor { + /// Create a new path traversal extractor with compiled regexes. + /// + /// # Panics + /// Panics if any regex pattern is invalid (programmer error). + #[allow(clippy::expect_used)] + pub fn new() -> Self { + Self { + // Python: file ops with user input + python_open_user: Regex::new( + r#"(?:open|read|write)\s*\([^)]*(?:request\.|params\[|input|user)"#, + ) + .expect("valid regex"), + python_path_join: Regex::new( + r#"os\.path\.join\s*\([^)]*(?:request\.|params\[|input|user)"#, + ) + .expect("valid regex"), + + // JavaScript: fs/path with user input + js_fs_user: Regex::new( + r#"fs\.(?:readFile|writeFile|createReadStream|readFileSync|writeFileSync)\s*\([^)]*(?:req\.|params\.|query\.)"#, + ) + .expect("valid regex"), + js_path_join: Regex::new( + r#"path\.(?:join|resolve)\s*\([^)]*(?:req\.|params\.|query\.)"#, + ) + .expect("valid regex"), + js_sendfile: Regex::new(r#"res\.(?:sendFile|download)\s*\([^)]*(?:req\.|params\.)"#) + .expect("valid regex"), + + // Go: filepath with user input + go_filepath: Regex::new( + r#"(?:filepath\.Join|os\.Open|os\.ReadFile|ioutil\.ReadFile)\s*\([^)]*(?:r\.|req\.|c\.)"#, + ) + .expect("valid regex"), + + // Rust: path operations with user input + rust_path_user: Regex::new( + r#"(?:Path::new|PathBuf::from|std::fs::read|std::fs::write)\s*\([^)]*(?:request|params|query|user)"#, + ) + .expect("valid regex"), + + // Universal: direct traversal literals + traversal_literal: Regex::new(r#"\.\.[\\/]|%2e%2e|%2E%2E"#).expect("valid regex"), + } + } + + fn make_claim( + path_segments: &[String], + file: &str, + line: usize, + matched: &str, + category: &str, + description: &str, + ) -> ExtractedClaim { + build_claim( + path_segments, + &["filesystem", "path", category], + "user_controlled_path", + ObjectValue::Boolean(true), + file, + line, + matched, + 0.85, + description, + ) + } +} + +impl Extractor for PathTraversalExtractor { + fn name(&self) -> &str { + "path_traversal" + } + + fn languages(&self) -> &[Language] { + &[ + Language::Python, + Language::JavaScript, + Language::TypeScript, + Language::Go, + Language::Rust, + ] + } + + fn extract( + &self, + path_segments: &[String], + content: &str, + language: Language, + file: &str, + ) -> Vec { + let mut claims = Vec::new(); + + for (line_idx, line) in content.lines().enumerate() { + let line_num = line_idx + 1; + + // Check for traversal literals in any language + if let Some(m) = self.traversal_literal.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "traversal", + "Path contains directory traversal sequence (../)", + )); + } + + // Language-specific patterns + match language { + Language::Python => { + if let Some(m) = self.python_open_user.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "file_operation", + "File operation with user-controlled path (path traversal risk)", + )); + } + if let Some(m) = self.python_path_join.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "path_construction", + "os.path.join with user input (path traversal risk)", + )); + } + } + Language::JavaScript | Language::TypeScript => { + if let Some(m) = self.js_fs_user.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "file_operation", + "fs operation with user-controlled path (path traversal risk)", + )); + } + if let Some(m) = self.js_path_join.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "path_construction", + "path.join/resolve with user input (path traversal risk)", + )); + } + if let Some(m) = self.js_sendfile.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "file_serving", + "res.sendFile with user input (path traversal risk)", + )); + } + } + Language::Go => { + if let Some(m) = self.go_filepath.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "file_operation", + "Filepath operation with user input (path traversal risk)", + )); + } + } + Language::Rust => { + if let Some(m) = self.rust_path_user.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "file_operation", + "Path operation with user input (path traversal risk)", + )); + } + } + _ => {} + } + } + + claims + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_python_open_user_input() { + let extractor = PathTraversalExtractor::new(); + let content = r#" + file = open(request.GET['filename'], 'r') + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "app.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("filesystem/path")); + } + + #[test] + fn test_python_path_join() { + let extractor = PathTraversalExtractor::new(); + let content = r#" + path = os.path.join(base_dir, request.args['file']) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "app.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_js_fs_read() { + let extractor = PathTraversalExtractor::new(); + let content = r#" + fs.readFileSync(req.query.file); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "app.js"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_js_sendfile() { + let extractor = PathTraversalExtractor::new(); + let content = r#" + res.sendFile(req.params.filename); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "app.js"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_go_filepath() { + let extractor = PathTraversalExtractor::new(); + let content = r#" + path := filepath.Join(baseDir, r.URL.Query().Get("file")) + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "main.go"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_traversal_literal() { + let extractor = PathTraversalExtractor::new(); + let content = r#" + path := "../../../etc/passwd" + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "main.go"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("traversal sequence")); + } + + #[test] + fn test_encoded_traversal() { + let extractor = PathTraversalExtractor::new(); + let content = r#" + url := "%2e%2e%2fconfig" + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "main.go"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_no_false_positives_safe_path() { + let extractor = PathTraversalExtractor::new(); + // Safe: no user input + let content = r#" + fs.readFileSync('./config.json'); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "app.js"); + + assert!(claims.is_empty()); + } +} diff --git a/applications/aphoria/src/extractors/registry.rs b/applications/aphoria/src/extractors/registry.rs index 762d0e8..91253ec 100644 --- a/applications/aphoria/src/extractors/registry.rs +++ b/applications/aphoria/src/extractors/registry.rs @@ -13,9 +13,14 @@ use super::dep_versions::DepVersionsExtractor; use super::hardcoded_secrets::HardcodedSecretsExtractor; use super::high_entropy::HighEntropySecretsExtractor; use super::insecure_cookies::InsecureCookiesExtractor; +use super::insecure_deserialization::InsecureDeserializationExtractor; use super::jwt_config::JwtConfigExtractor; +use super::orm_injection::OrmInjectionExtractor; +use super::path_traversal::PathTraversalExtractor; use super::rate_limit::RateLimitExtractor; +use super::security_headers::SecurityHeadersExtractor; use super::sql_injection::SqlInjectionExtractor; +use super::ssrf::SsrfExtractor; use super::timeout_config::{TimeoutConfigExtractor, TimeoutThresholds}; use super::tls_verify::TlsVerifyExtractor; use super::tls_version::TlsVersionExtractor; @@ -23,7 +28,10 @@ use super::traits::Extractor; use super::unreal_config::UnrealConfigExtractor; use super::unreal_cpp::UnrealCppExtractor; use super::unreal_performance::UnrealPerformanceExtractor; +use super::unvalidated_redirects::UnvalidatedRedirectsExtractor; use super::weak_crypto::WeakCryptoExtractor; +use super::weak_password::WeakPasswordExtractor; +use super::xxe::XxeExtractor; /// Registry of available extractors. pub struct ExtractorRegistry { @@ -116,6 +124,31 @@ impl ExtractorRegistry { if is_enabled("insecure_cookies") { extractors.push(Box::new(InsecureCookiesExtractor::new())); } + // Phase 8: Enterprise security extractors + if is_enabled("path_traversal") { + extractors.push(Box::new(PathTraversalExtractor::new())); + } + if is_enabled("unvalidated_redirects") { + extractors.push(Box::new(UnvalidatedRedirectsExtractor::new())); + } + if is_enabled("weak_password") { + extractors.push(Box::new(WeakPasswordExtractor::new())); + } + if is_enabled("security_headers") { + extractors.push(Box::new(SecurityHeadersExtractor::new())); + } + if is_enabled("insecure_deserialization") { + extractors.push(Box::new(InsecureDeserializationExtractor::new())); + } + if is_enabled("ssrf") { + extractors.push(Box::new(SsrfExtractor::new())); + } + if is_enabled("orm_injection") { + extractors.push(Box::new(OrmInjectionExtractor::new())); + } + if is_enabled("xxe") { + extractors.push(Box::new(XxeExtractor::new())); + } // Register declarative extractors from config // Declarative extractors are always enabled unless explicitly disabled. @@ -199,7 +232,7 @@ mod tests { use crate::extractors::declarative::{DeclarativeClaimDef, DeclarativeValue}; /// Number of built-in extractors (not counting declarative). - const BUILTIN_EXTRACTOR_COUNT: usize = 17; + const BUILTIN_EXTRACTOR_COUNT: usize = 25; #[test] fn test_registry_creation() { diff --git a/applications/aphoria/src/extractors/security_headers.rs b/applications/aphoria/src/extractors/security_headers.rs new file mode 100644 index 0000000..caec804 --- /dev/null +++ b/applications/aphoria/src/extractors/security_headers.rs @@ -0,0 +1,359 @@ +//! Missing security headers extractor. +//! +//! Detects patterns where security headers are explicitly disabled or +//! configured insecurely. + +use regex::Regex; +use stemedb_core::types::ObjectValue; + +use super::traits::{build_claim, Extractor}; +use crate::types::{ExtractedClaim, Language}; + +/// Extractor for missing or disabled security headers. +/// +/// Detects patterns indicating insecure header configurations: +/// - X-Frame-Options disabled or set to ALLOWALL +/// - X-Content-Type-Options disabled +/// - X-XSS-Protection disabled +/// - HSTS disabled +/// - Content-Security-Policy disabled +pub struct SecurityHeadersExtractor { + // Explicit header disabled + header_disabled: Regex, + + // Django missing secure settings + django_missing: Regex, + + // YAML headers disabled + yaml_disabled: Regex, + + // Frame options ALLOWALL + frame_allowall: Regex, + + // CSP disabled or unsafe + csp_unsafe: Regex, + + // HSTS disabled + hsts_disabled: Regex, +} + +impl Default for SecurityHeadersExtractor { + fn default() -> Self { + Self::new() + } +} + +impl SecurityHeadersExtractor { + /// Create a new security headers extractor with compiled regexes. + /// + /// # Panics + /// Panics if any regex pattern is invalid (programmer error). + #[allow(clippy::expect_used)] + pub fn new() -> Self { + Self { + // Explicit header disabled in various formats + header_disabled: Regex::new( + r#"(?i)(?:X-Frame-Options|X-Content-Type-Options|X-XSS-Protection)\s*[:=]\s*["']?(?:none|disabled?|false|off)["']?"#, + ) + .expect("valid regex"), + + // Django missing secure settings + django_missing: Regex::new( + r#"(?i)SECURE_(?:BROWSER_XSS_FILTER|CONTENT_TYPE_NOSNIFF|HSTS_SECONDS|SSL_REDIRECT)\s*=\s*(?:False|0)"#, + ) + .expect("valid regex"), + + // YAML headers disabled + yaml_disabled: Regex::new( + r#"(?i)(?:x_frame_options|xss_protection|content_type_nosniff|hsts)\s*:\s*(?:false|no|disabled?|off)"#, + ) + .expect("valid regex"), + + // Frame options ALLOWALL (dangerous) + frame_allowall: Regex::new(r#"(?i)X-Frame-Options\s*[:=]\s*["']?ALLOWALL"#) + .expect("valid regex"), + + // CSP disabled or using unsafe-inline/unsafe-eval + csp_unsafe: Regex::new( + r#"(?i)(?:Content-Security-Policy|CSP)\s*[:=]\s*["']?(?:none|disabled?|.*unsafe-(?:inline|eval))"#, + ) + .expect("valid regex"), + + // HSTS disabled or set to 0 + hsts_disabled: Regex::new( + r#"(?i)(?:Strict-Transport-Security|HSTS|hsts_seconds)\s*[:=]\s*(?:["']?(?:none|disabled?|false|off)["']?|0)"#, + ) + .expect("valid regex"), + } + } + + fn make_claim( + path_segments: &[String], + file: &str, + line: usize, + matched: &str, + header: &str, + description: &str, + ) -> ExtractedClaim { + build_claim( + path_segments, + &["http", "security_headers", header], + "header_status", + ObjectValue::Text("disabled".to_string()), + file, + line, + matched, + 0.8, + description, + ) + } +} + +impl Extractor for SecurityHeadersExtractor { + fn name(&self) -> &str { + "security_headers" + } + + fn languages(&self) -> &[Language] { + &[ + Language::Python, + Language::JavaScript, + Language::TypeScript, + Language::Go, + Language::Yaml, + Language::Json, + Language::Toml, + ] + } + + fn extract( + &self, + path_segments: &[String], + content: &str, + _language: Language, + file: &str, + ) -> Vec { + let mut claims = Vec::new(); + + for (line_idx, line) in content.lines().enumerate() { + let line_num = line_idx + 1; + + // Check for explicitly disabled headers + if let Some(m) = self.header_disabled.find(line) { + let header = if line.to_lowercase().contains("frame") { + "x_frame_options" + } else if line.to_lowercase().contains("content-type") { + "x_content_type_options" + } else { + "x_xss_protection" + }; + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + header, + "Security header explicitly disabled", + )); + } + + // Check Django secure settings + if let Some(m) = self.django_missing.find(line) { + let header = if line.to_lowercase().contains("xss") { + "x_xss_protection" + } else if line.to_lowercase().contains("nosniff") { + "x_content_type_options" + } else if line.to_lowercase().contains("hsts") { + "hsts" + } else { + "ssl_redirect" + }; + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + header, + "Django security setting disabled", + )); + } + + // Check YAML disabled patterns + if let Some(m) = self.yaml_disabled.find(line) { + let header = if line.to_lowercase().contains("frame") { + "x_frame_options" + } else if line.to_lowercase().contains("xss") { + "x_xss_protection" + } else if line.to_lowercase().contains("nosniff") { + "x_content_type_options" + } else { + "hsts" + }; + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + header, + "Security header disabled in configuration", + )); + } + + // Check for dangerous X-Frame-Options ALLOWALL + if let Some(m) = self.frame_allowall.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "x_frame_options", + "X-Frame-Options set to ALLOWALL (clickjacking risk)", + )); + } + + // Check for CSP unsafe patterns + if let Some(m) = self.csp_unsafe.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "content_security_policy", + "Content-Security-Policy disabled or uses unsafe directives", + )); + } + + // Check for HSTS disabled + if let Some(m) = self.hsts_disabled.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "hsts", + "Strict-Transport-Security (HSTS) disabled", + )); + } + } + + claims + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_header_disabled() { + let extractor = SecurityHeadersExtractor::new(); + let content = r#" + X-Frame-Options: none + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "nginx.conf"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("security_headers")); + } + + #[test] + fn test_django_missing() { + let extractor = SecurityHeadersExtractor::new(); + let content = r#" + SECURE_BROWSER_XSS_FILTER = False + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "settings.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("Django")); + } + + #[test] + fn test_yaml_disabled() { + let extractor = SecurityHeadersExtractor::new(); + let content = r#" + x_frame_options: false + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "config.yaml"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_frame_allowall() { + let extractor = SecurityHeadersExtractor::new(); + let content = r#" + X-Frame-Options = "ALLOWALL" + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Toml, "config.toml"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("clickjacking")); + } + + #[test] + fn test_csp_unsafe() { + let extractor = SecurityHeadersExtractor::new(); + let content = r#" + Content-Security-Policy: script-src 'unsafe-inline' + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "config.yaml"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("content_security_policy")); + } + + #[test] + fn test_hsts_disabled() { + let extractor = SecurityHeadersExtractor::new(); + let content = r#" + Strict-Transport-Security: none + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "config.yaml"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("hsts")); + } + + #[test] + fn test_hsts_zero() { + let extractor = SecurityHeadersExtractor::new(); + let content = r#" + HSTS_SECONDS = 0 + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "settings.py"); + + // Should detect hsts_disabled pattern (HSTS = 0) + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_no_false_positives_enabled() { + let extractor = SecurityHeadersExtractor::new(); + // Safe: headers enabled + let content = r#" + X-Frame-Options: SAMEORIGIN + X-Content-Type-Options: nosniff + SECURE_BROWSER_XSS_FILTER = True + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "config.yaml"); + + assert!(claims.is_empty()); + } +} diff --git a/applications/aphoria/src/extractors/ssrf.rs b/applications/aphoria/src/extractors/ssrf.rs new file mode 100644 index 0000000..738c7c6 --- /dev/null +++ b/applications/aphoria/src/extractors/ssrf.rs @@ -0,0 +1,393 @@ +//! Server-Side Request Forgery (SSRF) vulnerability extractor. +//! +//! Detects patterns where HTTP requests are made with user-controlled URLs, +//! which can allow attackers to access internal services or exfiltrate data. + +use regex::Regex; +use stemedb_core::types::ObjectValue; + +use super::traits::{build_claim, Extractor}; +use crate::types::{ExtractedClaim, Language}; + +/// Extractor for SSRF vulnerabilities. +/// +/// Detects patterns indicating unsafe URL handling in HTTP requests: +/// - HTTP clients (requests, fetch, axios, reqwest) with user-controlled URLs +/// - Webhook/callback URLs from user input +/// - Image/proxy URLs from request parameters +pub struct SsrfExtractor { + // Python patterns + python_requests: Regex, + python_urllib: Regex, + python_httpx: Regex, + + // JavaScript/TypeScript patterns + js_fetch: Regex, + js_axios: Regex, + js_got: Regex, + + // Go patterns + go_http: Regex, + + // Rust patterns + rust_reqwest: Regex, + + // Common sink patterns (all languages) + ssrf_sink: Regex, +} + +impl Default for SsrfExtractor { + fn default() -> Self { + Self::new() + } +} + +impl SsrfExtractor { + /// Create a new SSRF extractor with compiled regexes. + /// + /// # Panics + /// Panics if any regex pattern is invalid (programmer error). + #[allow(clippy::expect_used)] + pub fn new() -> Self { + Self { + // Python: requests with user URL + python_requests: Regex::new( + r#"requests\.(?:get|post|put|delete|head|patch|request)\s*\(\s*(?:url|uri|target|endpoint|request\.)"#, + ) + .expect("valid regex"), + python_urllib: Regex::new( + r#"urllib\.request\.(?:urlopen|Request)\s*\(\s*(?:url|uri|request\.)"#, + ) + .expect("valid regex"), + python_httpx: Regex::new( + r#"httpx\.(?:get|post|put|delete|AsyncClient)\s*\([^)]*(?:url|uri|request\.)"#, + ) + .expect("valid regex"), + + // JavaScript: fetch with user URL + js_fetch: Regex::new( + r#"fetch\s*\(\s*(?:url|uri|target|endpoint|req\.(?:query|params|body)\.)"#, + ) + .expect("valid regex"), + js_axios: Regex::new( + r#"axios\.(?:get|post|put|delete|request)\s*\(\s*(?:url|uri|target|endpoint)"#, + ) + .expect("valid regex"), + js_got: Regex::new( + r#"got\s*\(\s*(?:url|uri|target|endpoint)"#, + ) + .expect("valid regex"), + + // Go: http.Get with user URL + go_http: Regex::new( + r#"http\.(?:Get|Post|Head|PostForm|NewRequest)\s*\(\s*(?:url|uri|target|endpoint|r\.)"#, + ) + .expect("valid regex"), + + // Rust: reqwest with user URL + rust_reqwest: Regex::new( + r#"reqwest::(?:get|Client).*\(\s*(?:url|&url|format!|user)"#, + ) + .expect("valid regex"), + + // Common sink patterns - URLs that look user-controlled + ssrf_sink: Regex::new( + r#"(?i)(?:proxy_url|image_url|webhook_url|callback_url|redirect_url|target_url|remote_url|external_url)\s*[:=]\s*(?:request\.|params\.|req\.Query)"#, + ) + .expect("valid regex"), + } + } + + fn make_claim( + path_segments: &[String], + file: &str, + line: usize, + matched: &str, + category: &str, + description: &str, + ) -> ExtractedClaim { + build_claim( + path_segments, + &["network", "ssrf", category], + "request_url_source", + ObjectValue::Text("user_controlled".to_string()), + file, + line, + matched, + 0.85, + description, + ) + } +} + +impl Extractor for SsrfExtractor { + fn name(&self) -> &str { + "ssrf" + } + + fn languages(&self) -> &[Language] { + &[ + Language::Python, + Language::JavaScript, + Language::TypeScript, + Language::Go, + Language::Rust, + ] + } + + fn extract( + &self, + path_segments: &[String], + content: &str, + language: Language, + file: &str, + ) -> Vec { + let mut claims = Vec::new(); + + for (line_idx, line) in content.lines().enumerate() { + let line_num = line_idx + 1; + + // Check for common sink patterns (all languages) + if let Some(m) = self.ssrf_sink.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "url_sink", + "URL variable populated from user input (SSRF risk)", + )); + } + + // Language-specific patterns + match language { + Language::Python => { + if let Some(m) = self.python_requests.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "http_client", + "requests library with user-controlled URL (SSRF risk)", + )); + } + if let Some(m) = self.python_urllib.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "http_client", + "urllib with user-controlled URL (SSRF risk)", + )); + } + if let Some(m) = self.python_httpx.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "http_client", + "httpx with user-controlled URL (SSRF risk)", + )); + } + } + Language::JavaScript | Language::TypeScript => { + if let Some(m) = self.js_fetch.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "http_client", + "fetch with user-controlled URL (SSRF risk)", + )); + } + if let Some(m) = self.js_axios.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "http_client", + "axios with user-controlled URL (SSRF risk)", + )); + } + if let Some(m) = self.js_got.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "http_client", + "got with user-controlled URL (SSRF risk)", + )); + } + } + Language::Go => { + if let Some(m) = self.go_http.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "http_client", + "http.Get/Post with user-controlled URL (SSRF risk)", + )); + } + } + Language::Rust => { + if let Some(m) = self.rust_reqwest.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "http_client", + "reqwest with user-controlled URL (SSRF risk)", + )); + } + } + _ => {} + } + } + + claims + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_python_requests() { + let extractor = SsrfExtractor::new(); + let content = r#" + response = requests.get(url) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "api.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("network/ssrf")); + } + + #[test] + fn test_python_requests_user_input() { + let extractor = SsrfExtractor::new(); + let content = r#" + response = requests.post(request.json['webhook_url'], data=payload) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "webhook.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_python_urllib() { + let extractor = SsrfExtractor::new(); + let content = r#" + data = urllib.request.urlopen(url).read() + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "fetch.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_js_fetch() { + let extractor = SsrfExtractor::new(); + let content = r#" + const data = await fetch(url); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "api.js"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_js_axios() { + let extractor = SsrfExtractor::new(); + let content = r#" + const response = await axios.get(url); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "api.js"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_go_http() { + let extractor = SsrfExtractor::new(); + let content = r#" + resp, err := http.Get(url) + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "client.go"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_rust_reqwest() { + let extractor = SsrfExtractor::new(); + let content = r#" + let body = reqwest::get(url).await?.text().await?; + "#; + + let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "client.rs"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_ssrf_sink_pattern() { + let extractor = SsrfExtractor::new(); + let content = r#" + proxy_url = req.Query("proxy") + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "proxy.go"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("URL variable")); + } + + #[test] + fn test_webhook_url_sink() { + let extractor = SsrfExtractor::new(); + let content = r#" + webhook_url = request.json.get('callback') + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "webhook.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_no_false_positives_hardcoded() { + let extractor = SsrfExtractor::new(); + // Safe: hardcoded URL + let content = r#" + response = requests.get("https://api.example.com/data") + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "api.py"); + + assert!(claims.is_empty()); + } +} diff --git a/applications/aphoria/src/extractors/tls_version.rs b/applications/aphoria/src/extractors/tls_version.rs index 492c0b1..f4e28d7 100644 --- a/applications/aphoria/src/extractors/tls_version.rs +++ b/applications/aphoria/src/extractors/tls_version.rs @@ -37,6 +37,18 @@ pub struct TlsVersionExtractor { // Config file patterns (YAML, TOML, JSON) config_min_version: Regex, + + // Semantic patterns (variable name suggests TLS + value is deprecated) + semantic_tls_version: Regex, + + // Environment variable patterns + env_tls_version: Regex, + + // Terraform HCL patterns + terraform_tls: Regex, + + // Kubernetes camelCase patterns (YAML) + k8s_tls: Regex, } impl Default for TlsVersionExtractor { @@ -89,6 +101,40 @@ impl TlsVersionExtractor { r#"(?i)(?:min_version|tls_min_version|minimum_tls_version|ssl_min_version)\s*[:=]\s*["']?(?:1\.[01]|TLSv?1(?:\.[01])?|SSL(?:v?3)?|TLS10|TLS11)["']?"#, ) .expect("valid regex"), + + // Semantic: variable name contains tls/ssl and min/version in any order + // Covers common patterns: TLS_MIN_VERSION, MIN_TLS_VERSION, SSL_VERSION, sslVersion, tlsMinVersion + // Pattern 1: (tls|ssl) followed by min/version (e.g., TLS_MIN_VERSION, ssl_version) + // Pattern 2: (min|version) followed by tls/ssl (e.g., MIN_TLS, VERSION_SSL) + // Pattern 3: camelCase (e.g., sslVersion, tlsMinVersion, minTlsVersion) + // Value must be terminated by quote, end of word, or end of line to avoid matching "TLS1_2" as "TLS1" + // Allow optional type annotations (e.g., Rust `: &str`) between name and value + semantic_tls_version: Regex::new( + r#"(?i)\b(?:(?:tls|ssl)[_A-Z]*(?:min(?:imum)?|version)|(?:min(?:imum)?|version)[_A-Z]*(?:tls|ssl)|(?:tls|ssl)[A-Z][a-z]*(?:Version|Min)|(?:min|version)[A-Z][a-z]*(?:Tls|Ssl))\w*(?:\s*:\s*[^=]+)?\s*=\s*["']?(1\.[01]|TLSv?1(?:\.[01])?|SSL(?:v?[23])?)(?:["'\s;]|$)"#, + ) + .expect("valid regex"), + + // Environment variables (NAME=value, with optional export) + // Catches: TLS_MIN_VERSION=1.0, export SSL_VERSION=TLSv1 + env_tls_version: Regex::new( + r#"(?i)^(?:export\s+)?(\w*(?:tls|ssl)\w*(?:_?(?:min|version))+\w*)\s*=\s*["']?(1\.[01]|TLSv?1(?:\.[01])?|SSL(?:v?[23])?)["']?\s*$"#, + ) + .expect("valid regex"), + + // Terraform HCL patterns + // Catches: min_tls_version = "TLS1_0", ssl_minimum_protocol_version = "TLSv1" + // The value must be a complete deprecated version - use word boundary or quote/end + terraform_tls: Regex::new( + r#"(?i)(?:min(?:imum)?_)?(?:tls|ssl)(?:_(?:protocol_)?)?version\s*=\s*["']?(?:TLS_?1_?[01]|TLSv1(?:\.[01])?|1\.[01]|SSL(?:v?[23])?)(?:["'\s]|$)"#, + ) + .expect("valid regex"), + + // Kubernetes camelCase patterns (YAML) + // Catches: minTLSVersion: VersionTLS10, tlsMinVersion: "1.0" + k8s_tls: Regex::new( + r#"(?i)(?:min)?(?:tls|ssl)(?:Min)?(?:Version|Protocol)\s*:\s*["']?(?:VersionTLS1[01]|VersionSSL30|TLSv?1(?:\.[01])?|1\.[01]|SSL(?:v?[23])?)["']?"#, + ) + .expect("valid regex"), } } @@ -141,6 +187,8 @@ impl Extractor for TlsVersionExtractor { Language::Yaml, Language::Toml, Language::Json, + Language::Terraform, + Language::Dotenv, ] } @@ -267,214 +315,56 @@ impl Extractor for TlsVersionExtractor { "deprecated", "Deprecated TLS version in configuration (RFC 8996)", )); + // Kubernetes camelCase patterns for YAML + if language == Language::Yaml { + claims.extend(self.check_pattern( + content, + &self.k8s_tls, + path_segments, + file, + "deprecated", + "Kubernetes minTLSVersion set to deprecated value (RFC 8996)", + )); + } + } + Language::Terraform => { + claims.extend(self.check_pattern( + content, + &self.terraform_tls, + path_segments, + file, + "deprecated", + "Terraform TLS configuration uses deprecated version (RFC 8996)", + )); + } + Language::Dotenv => { + claims.extend(self.check_pattern( + content, + &self.env_tls_version, + path_segments, + file, + "deprecated", + "Environment variable sets deprecated TLS version (RFC 8996)", + )); } _ => {} } + // Apply semantic pattern to all languages (cross-language) + // This catches patterns like: const TLS_MIN_VERSION = "1.0" + claims.extend(self.check_pattern( + content, + &self.semantic_tls_version, + path_segments, + file, + "deprecated", + "Semantic: Variable name suggests TLS version, value is deprecated (RFC 8996)", + )); + claims } } #[cfg(test)] -mod tests { - use super::*; - - #[test] - fn test_rust_tls10_detection() { - let extractor = TlsVersionExtractor::new(); - // Content with TLS10 on two lines - should find 2 matches - let content = r#" - use rustls::version::TLS10; - config.min_protocol_version = Some(TLS10); - "#; - - let claims = - extractor.extract(&["rust".to_string()], content, Language::Rust, "src/tls.rs"); - - // Both lines match TLS10 pattern - assert_eq!(claims.len(), 2); - assert!(claims.iter().all(|c| c.value == ObjectValue::Text("1.0".to_string()))); - } - - #[test] - fn test_rust_tls11_detection() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - let version = TLS1_1; - "#; - - let claims = - extractor.extract(&["rust".to_string()], content, Language::Rust, "src/tls.rs"); - - assert_eq!(claims.len(), 1); - assert_eq!(claims[0].value, ObjectValue::Text("1.1".to_string())); - } - - #[test] - fn test_go_version_tls10_detection() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - cfg := &tls.Config{ - MinVersion: tls.VersionTLS10, - } - "#; - - let claims = extractor.extract(&["go".to_string()], content, Language::Go, "server.go"); - - assert_eq!(claims.len(), 1); - assert_eq!(claims[0].value, ObjectValue::Text("1.0".to_string())); - } - - #[test] - fn test_go_version_tls11_detection() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - cfg := &tls.Config{ - MinVersion: tls.VersionTLS11, - } - "#; - - let claims = extractor.extract(&["go".to_string()], content, Language::Go, "server.go"); - - assert_eq!(claims.len(), 1); - assert_eq!(claims[0].value, ObjectValue::Text("1.1".to_string())); - } - - #[test] - fn test_python_tls_version_detection() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - import ssl - ctx = ssl.SSLContext(ssl.PROTOCOL_TLSv1) - ctx.minimum_version = ssl.TLSVersion.TLSv1_1 - "#; - - let claims = - extractor.extract(&["python".to_string()], content, Language::Python, "server.py"); - - // Should detect both TLSv1 and TLSv1_1 - assert_eq!(claims.len(), 2); - } - - #[test] - fn test_js_min_version_detection() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - const server = https.createServer({ - minVersion: 'TLSv1', - key: fs.readFileSync('key.pem') - }); - "#; - - let claims = - extractor.extract(&["js".to_string()], content, Language::JavaScript, "server.js"); - - assert_eq!(claims.len(), 1); - assert_eq!(claims[0].value, ObjectValue::Text("1.0".to_string())); - } - - #[test] - fn test_js_secure_protocol_detection() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - const options = { - secureProtocol: 'TLSv1_method' - }; - "#; - - let claims = - extractor.extract(&["js".to_string()], content, Language::JavaScript, "client.js"); - - assert_eq!(claims.len(), 1); - } - - #[test] - fn test_yaml_min_version_detection() { - let extractor = TlsVersionExtractor::new(); - let content = r#" -tls: - min_version: "1.0" - cert_file: /etc/certs/server.crt -"#; - - let claims = extractor.extract( - &["config".to_string()], - content, - Language::Yaml, - "config/production.yaml", - ); - - assert_eq!(claims.len(), 1); - assert_eq!(claims[0].value, ObjectValue::Text("deprecated".to_string())); - } - - #[test] - fn test_yaml_tls_min_version_detection() { - let extractor = TlsVersionExtractor::new(); - let content = r#" -server: - tls_min_version: TLSv1.1 -"#; - - let claims = - extractor.extract(&["config".to_string()], content, Language::Yaml, "config.yaml"); - - assert_eq!(claims.len(), 1); - } - - #[test] - fn test_no_false_positives_tls12() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - cfg := &tls.Config{ - MinVersion: tls.VersionTLS12, - } - "#; - - let claims = extractor.extract(&["go".to_string()], content, Language::Go, "server.go"); - - assert!(claims.is_empty()); - } - - #[test] - fn test_no_false_positives_tls13() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - use rustls::version::TLS13; - config.min_protocol_version = Some(TLS13); - "#; - - let claims = - extractor.extract(&["rust".to_string()], content, Language::Rust, "src/tls.rs"); - - assert!(claims.is_empty()); - } - - #[test] - fn test_no_false_positives_modern_config() { - let extractor = TlsVersionExtractor::new(); - let content = r#" -tls: - min_version: "1.2" - max_version: "1.3" -"#; - - let claims = - extractor.extract(&["config".to_string()], content, Language::Yaml, "config.yaml"); - - assert!(claims.is_empty()); - } - - #[test] - fn test_concept_path_structure() { - let extractor = TlsVersionExtractor::new(); - let content = r#" - cfg := &tls.Config{MinVersion: tls.VersionTLS10} - "#; - - let claims = extractor.extract(&["go".to_string()], content, Language::Go, "server.go"); - - assert_eq!(claims.len(), 1); - assert!(claims[0].concept_path.contains("tls/min_version")); - } -} +#[path = "tls_version_tests.rs"] +mod tests; diff --git a/applications/aphoria/src/extractors/tls_version_tests.rs b/applications/aphoria/src/extractors/tls_version_tests.rs new file mode 100644 index 0000000..4098a7b --- /dev/null +++ b/applications/aphoria/src/extractors/tls_version_tests.rs @@ -0,0 +1,362 @@ +//! Tests for TLS version extractor. + +use super::tls_version::TlsVersionExtractor; +use super::Extractor; +use crate::types::Language; +use stemedb_core::types::ObjectValue; + +#[test] +fn test_rust_tls10_detection() { + let extractor = TlsVersionExtractor::new(); + // Content with TLS10 on two lines - should find 2 matches + let content = r#" + use rustls::version::TLS10; + config.min_protocol_version = Some(TLS10); + "#; + + let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "src/tls.rs"); + + // Both lines match TLS10 pattern + assert_eq!(claims.len(), 2); + assert!(claims.iter().all(|c| c.value == ObjectValue::Text("1.0".to_string()))); +} + +#[test] +fn test_rust_tls11_detection() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + let version = TLS1_1; + "#; + + let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "src/tls.rs"); + + assert_eq!(claims.len(), 1); + assert_eq!(claims[0].value, ObjectValue::Text("1.1".to_string())); +} + +#[test] +fn test_go_version_tls10_detection() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + cfg := &tls.Config{ + MinVersion: tls.VersionTLS10, + } + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "server.go"); + + assert_eq!(claims.len(), 1); + assert_eq!(claims[0].value, ObjectValue::Text("1.0".to_string())); +} + +#[test] +fn test_go_version_tls11_detection() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + cfg := &tls.Config{ + MinVersion: tls.VersionTLS11, + } + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "server.go"); + + assert_eq!(claims.len(), 1); + assert_eq!(claims[0].value, ObjectValue::Text("1.1".to_string())); +} + +#[test] +fn test_python_tls_version_detection() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + import ssl + ctx = ssl.SSLContext(ssl.PROTOCOL_TLSv1) + ctx.minimum_version = ssl.TLSVersion.TLSv1_1 + "#; + + let claims = extractor.extract(&["python".to_string()], content, Language::Python, "server.py"); + + // Should detect both TLSv1 and TLSv1_1 + assert_eq!(claims.len(), 2); +} + +#[test] +fn test_js_min_version_detection() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + const server = https.createServer({ + minVersion: 'TLSv1', + key: fs.readFileSync('key.pem') + }); + "#; + + let claims = extractor.extract(&["js".to_string()], content, Language::JavaScript, "server.js"); + + assert_eq!(claims.len(), 1); + assert_eq!(claims[0].value, ObjectValue::Text("1.0".to_string())); +} + +#[test] +fn test_js_secure_protocol_detection() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + const options = { + secureProtocol: 'TLSv1_method' + }; + "#; + + let claims = extractor.extract(&["js".to_string()], content, Language::JavaScript, "client.js"); + + assert_eq!(claims.len(), 1); +} + +#[test] +fn test_yaml_min_version_detection() { + let extractor = TlsVersionExtractor::new(); + let content = r#" +tls: + min_version: "1.0" + cert_file: /etc/certs/server.crt +"#; + + let claims = extractor.extract( + &["config".to_string()], + content, + Language::Yaml, + "config/production.yaml", + ); + + assert_eq!(claims.len(), 1); + assert_eq!(claims[0].value, ObjectValue::Text("deprecated".to_string())); +} + +#[test] +fn test_yaml_tls_min_version_detection() { + let extractor = TlsVersionExtractor::new(); + let content = r#" +server: + tls_min_version: TLSv1.1 +"#; + + let claims = extractor.extract(&["config".to_string()], content, Language::Yaml, "config.yaml"); + + // May match both config pattern and semantic pattern + assert!(!claims.is_empty()); + assert!(claims.iter().any(|c| c.value == ObjectValue::Text("deprecated".to_string()))); +} + +#[test] +fn test_no_false_positives_tls12() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + cfg := &tls.Config{ + MinVersion: tls.VersionTLS12, + } + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "server.go"); + + assert!(claims.is_empty()); +} + +#[test] +fn test_no_false_positives_tls13() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + use rustls::version::TLS13; + config.min_protocol_version = Some(TLS13); + "#; + + let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "src/tls.rs"); + + assert!(claims.is_empty()); +} + +#[test] +fn test_no_false_positives_modern_config() { + let extractor = TlsVersionExtractor::new(); + let content = r#" +tls: + min_version: "1.2" + max_version: "1.3" +"#; + + let claims = extractor.extract(&["config".to_string()], content, Language::Yaml, "config.yaml"); + + assert!(claims.is_empty()); +} + +#[test] +fn test_concept_path_structure() { + let extractor = TlsVersionExtractor::new(); + let content = r#" + cfg := &tls.Config{MinVersion: tls.VersionTLS10} + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "server.go"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("tls/min_version")); +} + +// ===== Phase 8.4: Semantic TLS Version Detection Tests ===== + +#[test] +fn test_semantic_const_rust() { + let extractor = TlsVersionExtractor::new(); + let content = r#"const TLS_MIN_VERSION: &str = "1.0";"#; + + let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config.rs"); + + assert!(!claims.is_empty()); + assert!(claims.iter().any(|c| c.description.contains("Semantic"))); +} + +#[test] +fn test_semantic_let_js() { + let extractor = TlsVersionExtractor::new(); + let content = r#"let sslVersion = "TLSv1";"#; + + let claims = extractor.extract(&["js".to_string()], content, Language::JavaScript, "config.js"); + + assert!(!claims.is_empty()); + assert!(claims.iter().any(|c| c.description.contains("Semantic"))); +} + +#[test] +fn test_semantic_assignment_python() { + let extractor = TlsVersionExtractor::new(); + let content = r#"TLS_MINIMUM_VERSION = "1.1""#; + + let claims = extractor.extract(&["python".to_string()], content, Language::Python, "config.py"); + + assert!(!claims.is_empty()); + assert!(claims.iter().any(|c| c.description.contains("Semantic"))); +} + +#[test] +fn test_semantic_ssl_version() { + let extractor = TlsVersionExtractor::new(); + let content = r#"const SSL_VERSION = "SSLv3";"#; + + let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "src/ssl.rs"); + + assert!(!claims.is_empty()); +} + +#[test] +fn test_env_tls_version() { + let extractor = TlsVersionExtractor::new(); + let content = "TLS_MIN_VERSION=1.0"; + + let claims = extractor.extract(&["env".to_string()], content, Language::Dotenv, ".env"); + + assert!(!claims.is_empty()); + assert!(claims.iter().any(|c| c.description.contains("Environment variable"))); +} + +#[test] +fn test_env_export_ssl() { + let extractor = TlsVersionExtractor::new(); + let content = "export SSL_VERSION=TLSv1"; + + let claims = + extractor.extract(&["env".to_string()], content, Language::Dotenv, ".env.production"); + + assert!(!claims.is_empty()); +} + +#[test] +fn test_terraform_min_tls_version() { + let extractor = TlsVersionExtractor::new(); + let content = r#"min_tls_version = "TLS1_0""#; + + let claims = + extractor.extract(&["terraform".to_string()], content, Language::Terraform, "main.tf"); + + assert!(!claims.is_empty()); + assert!(claims.iter().any(|c| c.description.contains("Terraform"))); +} + +#[test] +fn test_terraform_ssl_version() { + let extractor = TlsVersionExtractor::new(); + let content = r#"ssl_minimum_protocol_version = "TLSv1""#; + + let claims = + extractor.extract(&["terraform".to_string()], content, Language::Terraform, "variables.tf"); + + assert!(!claims.is_empty()); +} + +#[test] +fn test_k8s_min_tls_version() { + let extractor = TlsVersionExtractor::new(); + let content = r#" +apiVersion: v1 +kind: Config +spec: + minTLSVersion: VersionTLS10 +"#; + + let claims = + extractor.extract(&["k8s".to_string()], content, Language::Yaml, "deployment.yaml"); + + assert!(!claims.is_empty()); + assert!(claims.iter().any(|c| c.description.contains("Kubernetes"))); +} + +#[test] +fn test_k8s_tls_min_version_camel() { + let extractor = TlsVersionExtractor::new(); + let content = r#"tlsMinVersion: "1.0""#; + + let claims = extractor.extract(&["k8s".to_string()], content, Language::Yaml, "ingress.yaml"); + + assert!(!claims.is_empty()); +} + +#[test] +fn test_no_false_positive_semantic_tls12() { + let extractor = TlsVersionExtractor::new(); + let content = r#"const TLS_MIN_VERSION = "1.2";"#; + + let claims = extractor.extract(&["rust".to_string()], content, Language::Rust, "src/config.rs"); + + // TLS 1.2 is safe, should not match semantic pattern + assert!(claims.is_empty()); +} + +#[test] +fn test_no_false_positive_env_tls13() { + let extractor = TlsVersionExtractor::new(); + let content = "TLS_VERSION=1.3"; + + let claims = extractor.extract(&["env".to_string()], content, Language::Dotenv, ".env"); + + // TLS 1.3 is safe, should not match + assert!(claims.is_empty()); +} + +#[test] +fn test_no_false_positive_terraform_tls12() { + let extractor = TlsVersionExtractor::new(); + let content = r#"min_tls_version = "TLS1_2""#; + + let claims = + extractor.extract(&["terraform".to_string()], content, Language::Terraform, "main.tf"); + + // TLS 1.2 is safe + assert!(claims.is_empty()); +} + +#[test] +fn test_no_false_positive_k8s_tls12() { + let extractor = TlsVersionExtractor::new(); + let content = r#"minTLSVersion: VersionTLS12"#; + + let claims = + extractor.extract(&["k8s".to_string()], content, Language::Yaml, "deployment.yaml"); + + // TLS 1.2 is safe + assert!(claims.is_empty()); +} diff --git a/applications/aphoria/src/extractors/unvalidated_redirects.rs b/applications/aphoria/src/extractors/unvalidated_redirects.rs new file mode 100644 index 0000000..7e864f8 --- /dev/null +++ b/applications/aphoria/src/extractors/unvalidated_redirects.rs @@ -0,0 +1,301 @@ +//! Unvalidated redirects vulnerability extractor. +//! +//! Detects patterns where HTTP redirects use user-controlled input +//! without proper validation, which can lead to open redirect attacks. + +use regex::Regex; +use stemedb_core::types::ObjectValue; + +use super::traits::{build_claim, Extractor}; +use crate::types::{ExtractedClaim, Language}; + +/// Extractor for unvalidated redirect vulnerabilities. +/// +/// Detects patterns indicating unsafe redirect handling: +/// - redirect() with user-controlled URLs +/// - window.location assignment with user input +/// - URL parameters named redirect/return/next/goto +pub struct UnvalidatedRedirectsExtractor { + // Python patterns + python_redirect: Regex, + python_flask_redirect: Regex, + + // JavaScript/TypeScript patterns + js_redirect: Regex, + js_location: Regex, + + // Go patterns + go_redirect: Regex, + + // Universal URL parameter patterns + url_param: Regex, +} + +impl Default for UnvalidatedRedirectsExtractor { + fn default() -> Self { + Self::new() + } +} + +impl UnvalidatedRedirectsExtractor { + /// Create a new unvalidated redirects extractor with compiled regexes. + /// + /// # Panics + /// Panics if any regex pattern is invalid (programmer error). + #[allow(clippy::expect_used)] + pub fn new() -> Self { + Self { + // Python: redirect with user input + python_redirect: Regex::new( + r#"(?:redirect|HttpResponseRedirect)\s*\(\s*(?:request\.(?:GET|POST|args)|url|next|return_url)"#, + ) + .expect("valid regex"), + python_flask_redirect: Regex::new( + r#"redirect\s*\(\s*request\.(?:args|form)\.get\s*\("#, + ) + .expect("valid regex"), + + // JavaScript: redirect with user input + js_redirect: Regex::new( + r#"res\.redirect\s*\(\s*(?:req\.(?:query|params|body)\.|url|next)"#, + ) + .expect("valid regex"), + js_location: Regex::new( + r#"(?:window\.)?location(?:\.href)?\s*=\s*(?:url|params|query|req\.)"#, + ) + .expect("valid regex"), + + // Go: http.Redirect with user input + go_redirect: Regex::new( + r#"http\.Redirect\s*\([^,]*,\s*[^,]*,\s*(?:r\.|req\.|c\.)"#, + ) + .expect("valid regex"), + + // Universal: dangerous URL parameter patterns + url_param: Regex::new( + r#"(?i)(?:redirect|return|next|goto|url|continue)(?:_url|Uri|_to)?\s*[:=]\s*(?:req\.|request\.|params\.)"#, + ) + .expect("valid regex"), + } + } + + fn make_claim( + path_segments: &[String], + file: &str, + line: usize, + matched: &str, + category: &str, + description: &str, + ) -> ExtractedClaim { + build_claim( + path_segments, + &["http", "redirect", category], + "redirect_source", + ObjectValue::Text("user_controlled".to_string()), + file, + line, + matched, + 0.85, + description, + ) + } +} + +impl Extractor for UnvalidatedRedirectsExtractor { + fn name(&self) -> &str { + "unvalidated_redirects" + } + + fn languages(&self) -> &[Language] { + &[Language::Python, Language::JavaScript, Language::TypeScript, Language::Go] + } + + fn extract( + &self, + path_segments: &[String], + content: &str, + language: Language, + file: &str, + ) -> Vec { + let mut claims = Vec::new(); + + for (line_idx, line) in content.lines().enumerate() { + let line_num = line_idx + 1; + + // Check for dangerous URL parameter patterns (all languages) + if let Some(m) = self.url_param.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "url_parameter", + "Redirect URL from user-controlled parameter (open redirect risk)", + )); + } + + // Language-specific patterns + match language { + Language::Python => { + if let Some(m) = self.python_redirect.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "response", + "Redirect with user-controlled URL (open redirect risk)", + )); + } + if let Some(m) = self.python_flask_redirect.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "response", + "Flask redirect with request parameter (open redirect risk)", + )); + } + } + Language::JavaScript | Language::TypeScript => { + if let Some(m) = self.js_redirect.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "response", + "res.redirect with user input (open redirect risk)", + )); + } + if let Some(m) = self.js_location.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "client_side", + "window.location assignment with user input (open redirect risk)", + )); + } + } + Language::Go => { + if let Some(m) = self.go_redirect.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "response", + "http.Redirect with user input (open redirect risk)", + )); + } + } + _ => {} + } + } + + claims + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_python_redirect() { + let extractor = UnvalidatedRedirectsExtractor::new(); + let content = r#" + return redirect(request.GET['next']) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "views.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("http/redirect")); + } + + #[test] + fn test_python_flask_redirect() { + let extractor = UnvalidatedRedirectsExtractor::new(); + let content = r#" + return redirect(request.form.get('destination')) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "app.py"); + + // Should match the flask redirect pattern (redirect + request.form.get) + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_js_redirect() { + let extractor = UnvalidatedRedirectsExtractor::new(); + let content = r#" + res.redirect(req.query.next); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "app.js"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_js_location() { + let extractor = UnvalidatedRedirectsExtractor::new(); + let content = r#" + window.location = url; + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "app.js"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("window.location")); + } + + #[test] + fn test_go_redirect() { + let extractor = UnvalidatedRedirectsExtractor::new(); + let content = r#" + http.Redirect(w, r, r.URL.Query().Get("next"), http.StatusFound) + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "handler.go"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_url_param_pattern() { + let extractor = UnvalidatedRedirectsExtractor::new(); + let content = r#" + redirect_url = request.args.get("redirect") + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "handler.py"); + + // Should match the url_param pattern (redirect_url = request.) + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_no_false_positives_static_redirect() { + let extractor = UnvalidatedRedirectsExtractor::new(); + // Safe: static redirect URL + let content = r#" + res.redirect('/login'); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "app.js"); + + assert!(claims.is_empty()); + } +} diff --git a/applications/aphoria/src/extractors/weak_password.rs b/applications/aphoria/src/extractors/weak_password.rs new file mode 100644 index 0000000..c00a5c9 --- /dev/null +++ b/applications/aphoria/src/extractors/weak_password.rs @@ -0,0 +1,344 @@ +//! Weak password requirements extractor. +//! +//! Detects patterns where password policies are too weak, +//! such as minimum length < 8, bcrypt cost < 10, or missing complexity requirements. + +use regex::Regex; +use stemedb_core::types::ObjectValue; + +use super::traits::{build_claim, Extractor}; +use crate::types::{ExtractedClaim, Language}; + +/// Extractor for weak password requirement configurations. +/// +/// Detects patterns indicating insufficient password policies: +/// - Minimum password length < 8 characters +/// - Bcrypt cost/rounds < 10 +/// - Complexity requirements disabled +pub struct WeakPasswordExtractor { + // Minimum length too short (< 8) + min_length_weak: Regex, + min_length_config: Regex, + + // Bcrypt cost too low (< 10) + bcrypt_weak: Regex, + + // Simple length check in code + simple_check: Regex, + + // Special chars not required + no_special: Regex, + no_uppercase: Regex, + no_number: Regex, +} + +impl Default for WeakPasswordExtractor { + fn default() -> Self { + Self::new() + } +} + +impl WeakPasswordExtractor { + /// Create a new weak password extractor with compiled regexes. + /// + /// # Panics + /// Panics if any regex pattern is invalid (programmer error). + #[allow(clippy::expect_used)] + pub fn new() -> Self { + Self { + // Minimum length too short (< 8) in various config formats + min_length_weak: Regex::new( + r#"(?i)(?:min[_-]?(?:password[_-]?)?length|password[_-]?min(?:[_-]?length)?)\s*[:=]\s*[1-7](?:\D|$)"#, + ) + .expect("valid regex"), + min_length_config: Regex::new( + r#"(?i)["']?(?:minLength|min_length|minimumLength)["']?\s*[:=]\s*[1-7](?:\D|$)"#, + ) + .expect("valid regex"), + + // Bcrypt cost too low (< 10) + bcrypt_weak: Regex::new( + r#"(?i)(?:bcrypt|hash|argon2?|scrypt).*(?:cost|rounds|iterations)\s*[:=]\s*[1-9](?:\D|$)"#, + ) + .expect("valid regex"), + + // Simple length check in code (checking for < 8) + simple_check: Regex::new( + r#"len\s*\(\s*password\s*\)\s*(?:>=?|>)\s*[1-7](?:\D|$)"#, + ) + .expect("valid regex"), + + // Complexity requirements disabled + no_special: Regex::new( + r#"(?i)require[_-]?(?:special|symbol)[_-]?(?:char)?s?\s*[:=]\s*(?:false|no|0)"#, + ) + .expect("valid regex"), + no_uppercase: Regex::new( + r#"(?i)require[_-]?(?:upper|uppercase)[_-]?(?:case)?\s*[:=]\s*(?:false|no|0)"#, + ) + .expect("valid regex"), + no_number: Regex::new( + r#"(?i)require[_-]?(?:number|digit)s?\s*[:=]\s*(?:false|no|0)"#, + ) + .expect("valid regex"), + } + } + + fn make_claim( + path_segments: &[String], + file: &str, + line: usize, + matched: &str, + category: &str, + value: ObjectValue, + description: &str, + ) -> ExtractedClaim { + build_claim( + path_segments, + &["auth", "password", "policy", category], + "requirement_strength", + value, + file, + line, + matched, + 0.9, + description, + ) + } +} + +impl Extractor for WeakPasswordExtractor { + fn name(&self) -> &str { + "weak_password" + } + + fn languages(&self) -> &[Language] { + &[ + Language::Python, + Language::JavaScript, + Language::TypeScript, + Language::Go, + Language::Rust, + Language::Yaml, + Language::Json, + Language::Toml, + ] + } + + fn extract( + &self, + path_segments: &[String], + content: &str, + _language: Language, + file: &str, + ) -> Vec { + let mut claims = Vec::new(); + + for (line_idx, line) in content.lines().enumerate() { + let line_num = line_idx + 1; + + // Check for weak minimum length + if let Some(m) = self.min_length_weak.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "min_length", + ObjectValue::Text("weak".to_string()), + "Minimum password length < 8 characters (should be at least 8)", + )); + } + if let Some(m) = self.min_length_config.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "min_length", + ObjectValue::Text("weak".to_string()), + "Minimum password length < 8 characters (should be at least 8)", + )); + } + + // Check for weak bcrypt cost + if let Some(m) = self.bcrypt_weak.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "hash_cost", + ObjectValue::Text("weak".to_string()), + "Password hashing cost/rounds < 10 (should be at least 10 for bcrypt)", + )); + } + + // Check for simple length validation + if let Some(m) = self.simple_check.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "validation", + ObjectValue::Text("weak".to_string()), + "Password validation allows < 8 characters", + )); + } + + // Check for disabled complexity requirements + if let Some(m) = self.no_special.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "complexity", + ObjectValue::Boolean(false), + "Special character requirement disabled", + )); + } + if let Some(m) = self.no_uppercase.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "complexity", + ObjectValue::Boolean(false), + "Uppercase requirement disabled", + )); + } + if let Some(m) = self.no_number.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "complexity", + ObjectValue::Boolean(false), + "Number/digit requirement disabled", + )); + } + } + + claims + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_weak_min_length_yaml() { + let extractor = WeakPasswordExtractor::new(); + let content = r#" + password_min: 6 + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "auth.yaml"); + + // Should match min_length_weak pattern + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("auth/password/policy")); + } + + #[test] + fn test_weak_min_length_json() { + let extractor = WeakPasswordExtractor::new(); + let content = r#" + "minLength": 4 + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Json, "config.json"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_weak_bcrypt_cost() { + let extractor = WeakPasswordExtractor::new(); + let content = r#" + bcrypt_cost = 8 + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Toml, "config.toml"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("cost/rounds")); + } + + #[test] + fn test_simple_length_check() { + let extractor = WeakPasswordExtractor::new(); + let content = r#" + if len(password) >= 6: + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "auth.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_no_special_chars() { + let extractor = WeakPasswordExtractor::new(); + let content = r#" + require_special_chars: false + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "auth.yaml"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("Special character")); + } + + #[test] + fn test_no_uppercase() { + let extractor = WeakPasswordExtractor::new(); + let content = r#" + require_uppercase = false + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Toml, "config.toml"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_no_false_positives_strong() { + let extractor = WeakPasswordExtractor::new(); + // Strong: min length >= 8 + let content = r#" + password_min_length: 12 + bcrypt_cost: 12 + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "auth.yaml"); + + assert!(claims.is_empty()); + } + + #[test] + fn test_boundary_value_8() { + let extractor = WeakPasswordExtractor::new(); + // Boundary: exactly 8 should be OK + let content = r#" + password_min_length: 8 + "#; + + let claims = + extractor.extract(&["config".to_string()], content, Language::Yaml, "auth.yaml"); + + assert!(claims.is_empty()); + } +} diff --git a/applications/aphoria/src/extractors/xxe.rs b/applications/aphoria/src/extractors/xxe.rs new file mode 100644 index 0000000..d798bba --- /dev/null +++ b/applications/aphoria/src/extractors/xxe.rs @@ -0,0 +1,432 @@ +//! XML External Entity (XXE) vulnerability extractor. +//! +//! Detects patterns where XML parsers are used without disabling external entity +//! processing, which can lead to data exfiltration, SSRF, or denial of service. + +use regex::Regex; +use stemedb_core::types::ObjectValue; + +use super::traits::{build_claim, Extractor}; +use crate::types::{ExtractedClaim, Language}; + +/// Extractor for XXE vulnerabilities. +/// +/// Detects patterns indicating potentially unsafe XML parsing: +/// - Python: lxml, xml.etree, xml.dom.minidom, xml.sax +/// - JavaScript: xml2js, libxmljs +/// - Go: encoding/xml +/// - Java-style patterns (polyglot detection) +/// - DTD entity declarations +pub struct XxeExtractor { + // Python patterns + python_lxml: Regex, + python_etree: Regex, + python_minidom: Regex, + python_sax: Regex, + + // JavaScript patterns + js_xml2js: Regex, + js_libxmljs: Regex, + + // Go patterns + go_xml: Regex, + + // Java-style patterns + java_xxe: Regex, + + // DTD entity declaration + entity_decl: Regex, +} + +impl Default for XxeExtractor { + fn default() -> Self { + Self::new() + } +} + +impl XxeExtractor { + /// Create a new XXE extractor with compiled regexes. + /// + /// # Panics + /// Panics if any regex pattern is invalid (programmer error). + #[allow(clippy::expect_used)] + pub fn new() -> Self { + Self { + // Python: lxml/etree parse + python_lxml: Regex::new(r#"(?:etree|lxml)\.(?:parse|fromstring|XML)\s*\("#) + .expect("valid regex"), + + // Python: xml.etree.ElementTree + python_etree: Regex::new( + r#"(?:xml\.etree\.ElementTree|ET)\.(?:parse|fromstring|XMLParser)\s*\("#, + ) + .expect("valid regex"), + + // Python: xml.dom.minidom + python_minidom: Regex::new(r#"xml\.dom\.minidom\.(?:parse|parseString)\s*\("#) + .expect("valid regex"), + + // Python: xml.sax + python_sax: Regex::new(r#"xml\.sax\.(?:parse|parseString|make_parser)\s*\("#) + .expect("valid regex"), + + // JavaScript: xml2js + js_xml2js: Regex::new(r#"xml2js\.(?:parseString|Parser)\s*\("#).expect("valid regex"), + + // JavaScript: libxmljs + js_libxmljs: Regex::new(r#"libxmljs\.parseXml\s*\("#).expect("valid regex"), + + // Go: encoding/xml + go_xml: Regex::new(r#"xml\.(?:Unmarshal|NewDecoder)\s*\("#).expect("valid regex"), + + // Java-style patterns (polyglot detection in config files, etc.) + java_xxe: Regex::new( + r#"(?:DocumentBuilder|SAXParser|XMLReader|TransformerFactory)(?:Factory)?\.new"#, + ) + .expect("valid regex"), + + // DTD entity declaration (dangerous in untrusted XML) + entity_decl: Regex::new(r#" ExtractedClaim { + build_claim( + path_segments, + &["xml", "parsing"], + "parser_config", + ObjectValue::Text(parser.to_string()), + file, + line, + matched, + confidence, + description, + ) + } +} + +impl Extractor for XxeExtractor { + fn name(&self) -> &str { + "xxe" + } + + fn languages(&self) -> &[Language] { + &[Language::Python, Language::JavaScript, Language::TypeScript, Language::Go] + } + + fn extract( + &self, + path_segments: &[String], + content: &str, + language: Language, + file: &str, + ) -> Vec { + let mut claims = Vec::new(); + + for (line_idx, line) in content.lines().enumerate() { + let line_num = line_idx + 1; + + // Check for DTD entity declarations (high risk in any context) + if let Some(m) = self.entity_decl.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "dtd_entity", + 0.95, + "DTD SYSTEM/PUBLIC entity declaration (XXE attack vector)", + )); + } + + match language { + Language::Python => { + // lxml/etree (can be safe with proper configuration) + if let Some(m) = self.python_lxml.find(line) { + // Lower confidence if defusedxml is imported or resolve_entities=False + let confidence = if content.contains("defusedxml") + || line.contains("resolve_entities=False") + { + 0.5 + } else { + 0.85 + }; + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "lxml", + confidence, + "lxml XML parsing may be vulnerable to XXE without proper config", + )); + } + + // xml.etree.ElementTree + if let Some(m) = self.python_etree.find(line) { + // Python 3.8+ has some protections, but external entities still a concern + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "elementtree", + 0.75, + "xml.etree.ElementTree may allow external entity expansion", + )); + } + + // xml.dom.minidom (vulnerable by default) + if let Some(m) = self.python_minidom.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "minidom", + 0.85, + "xml.dom.minidom is vulnerable to XXE attacks", + )); + } + + // xml.sax (needs feature flags to be safe) + if let Some(m) = self.python_sax.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "sax", + 0.85, + "xml.sax is vulnerable to XXE without feature_external_ges=False", + )); + } + } + Language::JavaScript | Language::TypeScript => { + // xml2js (generally safer, but can be misconfigured) + if let Some(m) = self.js_xml2js.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "xml2js", + 0.7, + "xml2js XML parsing - verify external entity settings", + )); + } + + // libxmljs (can be vulnerable) + if let Some(m) = self.js_libxmljs.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "libxmljs", + 0.85, + "libxmljs may be vulnerable to XXE attacks", + )); + } + } + Language::Go => { + // encoding/xml (safer by default, but DTD expansion can be issue) + if let Some(m) = self.go_xml.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "encoding_xml", + 0.65, + "Go xml package - generally safe but verify with untrusted input", + )); + } + } + _ => {} + } + + // Check for Java patterns (polyglot detection) + if let Some(m) = self.java_xxe.find(line) { + claims.push(Self::make_claim( + path_segments, + file, + line_num, + m.as_str(), + "java_parser", + 0.9, + "Java XML parser - requires feature flags to prevent XXE", + )); + } + } + + claims + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_python_lxml() { + let extractor = XxeExtractor::new(); + let content = r#" + doc = etree.parse(xml_file) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "parser.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].concept_path.contains("xml/parsing")); + } + + #[test] + fn test_python_lxml_with_defusedxml() { + let extractor = XxeExtractor::new(); + let content = r#" + import defusedxml.ElementTree as ET + doc = etree.parse(xml_file) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "parser.py"); + + // Should still detect but with lower confidence + assert_eq!(claims.len(), 1); + assert!(claims[0].confidence < 0.6); + } + + #[test] + fn test_python_elementtree() { + let extractor = XxeExtractor::new(); + let content = r#" + import xml.etree.ElementTree as ET + tree = ET.parse(source) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "xml.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_python_minidom() { + let extractor = XxeExtractor::new(); + let content = r#" + from xml.dom.minidom import parse + doc = xml.dom.minidom.parse(xml_string) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "parser.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("minidom")); + } + + #[test] + fn test_python_sax() { + let extractor = XxeExtractor::new(); + let content = r#" + xml.sax.parse(source, handler) + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "handler.py"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_js_xml2js() { + let extractor = XxeExtractor::new(); + let content = r#" + xml2js.parseString(xmlData, callback); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "parser.js"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_js_libxmljs() { + let extractor = XxeExtractor::new(); + let content = r#" + const doc = libxmljs.parseXml(xmlString); + "#; + + let claims = + extractor.extract(&["js".to_string()], content, Language::JavaScript, "parser.js"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_go_xml() { + let extractor = XxeExtractor::new(); + let content = r#" + err := xml.Unmarshal(data, &result) + "#; + + let claims = extractor.extract(&["go".to_string()], content, Language::Go, "parser.go"); + + assert_eq!(claims.len(), 1); + } + + #[test] + fn test_java_parser() { + let extractor = XxeExtractor::new(); + let content = r#" + DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "mixed.py"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].description.contains("Java")); + } + + #[test] + fn test_dtd_entity() { + let extractor = XxeExtractor::new(); + let content = r#" + + "#; + + // Use a non-test filename to avoid confidence reduction + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "parser.xml"); + + assert_eq!(claims.len(), 1); + assert!(claims[0].confidence >= 0.9); + assert!(claims[0].description.contains("XXE attack vector")); + } + + #[test] + fn test_dtd_public_entity() { + let extractor = XxeExtractor::new(); + let content = r#" + + "#; + + let claims = + extractor.extract(&["python".to_string()], content, Language::Python, "test.xml"); + + assert_eq!(claims.len(), 1); + } +} diff --git a/applications/aphoria/src/llm/prompts.rs b/applications/aphoria/src/llm/prompts.rs index c813c37..15fdeac 100644 --- a/applications/aphoria/src/llm/prompts.rs +++ b/applications/aphoria/src/llm/prompts.rs @@ -61,6 +61,7 @@ pub fn language_to_prefix(language: Language) -> &'static str { Language::GoMod => "gomod", Language::NpmManifest => "npm", Language::PythonManifest => "python", + Language::Terraform => "terraform", Language::Unknown => "unknown", } } @@ -84,6 +85,7 @@ pub fn language_to_name(language: Language) -> &'static str { Language::GoMod => "Go module", Language::NpmManifest => "NPM manifest", Language::PythonManifest => "Python manifest", + Language::Terraform => "Terraform", Language::Unknown => "Unknown", } } @@ -107,6 +109,7 @@ pub fn language_to_extension(language: Language) -> &'static str { Language::GoMod => "go", Language::NpmManifest => "json", Language::PythonManifest => "toml", + Language::Terraform => "hcl", Language::Unknown => "", } } diff --git a/applications/aphoria/src/promotion/regex_gen.rs b/applications/aphoria/src/promotion/regex_gen.rs index 9969a16..d71d3c0 100644 --- a/applications/aphoria/src/promotion/regex_gen.rs +++ b/applications/aphoria/src/promotion/regex_gen.rs @@ -244,6 +244,7 @@ fn language_to_string(lang: Language) -> String { Language::GoMod => "gomod", Language::NpmManifest => "npm", Language::PythonManifest => "pip", + Language::Terraform => "terraform", Language::Unknown => "unknown", } .to_string() diff --git a/applications/aphoria/src/types/language.rs b/applications/aphoria/src/types/language.rs index 50aa5de..203d838 100644 --- a/applications/aphoria/src/types/language.rs +++ b/applications/aphoria/src/types/language.rs @@ -34,6 +34,8 @@ pub enum Language { Dotenv, /// Docker files. Docker, + /// Terraform files. + Terraform, /// Cargo manifest. CargoManifest, /// Go module file. @@ -62,6 +64,7 @@ impl fmt::Display for Language { Language::Ini => "ini", Language::Dotenv => "dotenv", Language::Docker => "docker", + Language::Terraform => "terraform", Language::CargoManifest => "cargo", Language::GoMod => "gomod", Language::NpmManifest => "npm", @@ -90,6 +93,7 @@ impl FromStr for Language { "ini" => Ok(Language::Ini), "dotenv" | "env" => Ok(Language::Dotenv), "docker" | "dockerfile" => Ok(Language::Docker), + "terraform" | "tf" => Ok(Language::Terraform), "cargo" | "cargo.toml" => Ok(Language::CargoManifest), "gomod" | "go.mod" => Ok(Language::GoMod), "npm" | "package.json" => Ok(Language::NpmManifest), @@ -153,6 +157,7 @@ impl Language { "toml" => Language::Toml, "json" => Language::Json, "ini" => Language::Ini, + "tf" => Language::Terraform, _ => Language::Unknown, } } @@ -174,6 +179,8 @@ mod tests { assert_eq!(Language::from_path(Path::new("go.mod")), Language::GoMod); assert_eq!(Language::from_path(Path::new(".env.production")), Language::Dotenv); assert_eq!(Language::from_path(Path::new("Dockerfile")), Language::Docker); + assert_eq!(Language::from_path(Path::new("main.tf")), Language::Terraform); + assert_eq!(Language::from_path(Path::new("variables.tf")), Language::Terraform); } #[test] @@ -201,6 +208,8 @@ mod tests { assert_eq!(Language::from_str("gomod").unwrap(), Language::GoMod); assert_eq!(Language::from_str("npm").unwrap(), Language::NpmManifest); assert_eq!(Language::from_str("pip").unwrap(), Language::PythonManifest); + assert_eq!(Language::from_str("terraform").unwrap(), Language::Terraform); + assert_eq!(Language::from_str("tf").unwrap(), Language::Terraform); } #[test] diff --git a/applications/aphoria/src/types/mod.rs b/applications/aphoria/src/types/mod.rs index 3d3abc6..01a3887 100644 --- a/applications/aphoria/src/types/mod.rs +++ b/applications/aphoria/src/types/mod.rs @@ -24,7 +24,7 @@ pub use verdict::Verdict; /// # Example /// /// ``` -/// use aphoria::types::PredicateAliasSet; +/// use aphoria::PredicateAliasSet; /// /// let aliases = PredicateAliasSet::new("enabled", vec!["required", "mandatory"]); /// assert!(aliases.contains("enabled")); diff --git a/applications/aphoria/src/walker/path_mapper.rs b/applications/aphoria/src/walker/path_mapper.rs index cc17325..13ec0a0 100644 --- a/applications/aphoria/src/walker/path_mapper.rs +++ b/applications/aphoria/src/walker/path_mapper.rs @@ -37,7 +37,11 @@ impl PathMapper { Language::JavaScript | Language::NpmManifest => "javascript", Language::Cpp => "cpp", Language::Ini => "config", - Language::Yaml | Language::Toml | Language::Json | Language::Dotenv => "config", + Language::Yaml + | Language::Toml + | Language::Json + | Language::Dotenv + | Language::Terraform => "config", Language::Docker => "docker", Language::Unknown => "unknown", }; diff --git a/crates/stemedb-core/src/signing.rs b/crates/stemedb-core/src/signing.rs index 334ff8c..def8861 100644 --- a/crates/stemedb-core/src/signing.rs +++ b/crates/stemedb-core/src/signing.rs @@ -60,7 +60,13 @@ pub fn compute_content_hash_v2(assertion: &Assertion) -> [u8; 32] { } ObjectValue::Number(n) => { hasher.update(b"N:"); - hasher.update(&n.to_le_bytes()); + // Round to 10 decimal places for reproducibility across JSON round-trips. + // JSON serialization/deserialization can change the exact f64 bit pattern + // for numbers that aren't exactly representable in decimal. + // 10 decimal places is more than enough for real-world values while + // surviving the decimal→binary→decimal conversion in JSON parsing. + let s = format!("{:.10}", n); + hasher.update(s.as_bytes()); } ObjectValue::Boolean(b) => { hasher.update(b"B:"); @@ -79,8 +85,11 @@ pub fn compute_content_hash_v2(assertion: &Assertion) -> [u8; 32] { hasher.update(&(assertion.source_class as u8).to_le_bytes()); // Confidence and timestamp + // Use string format for confidence (f32) to survive JSON round-trips. + // f32 has ~7 significant decimal digits, so 6 decimal places is sufficient. hasher.update(b":"); - hasher.update(&assertion.confidence.to_le_bytes()); + let confidence_str = format!("{:.6}", assertion.confidence); + hasher.update(confidence_str.as_bytes()); hasher.update(b":"); hasher.update(&assertion.timestamp.to_le_bytes()); diff --git a/crates/stemedb-ingest/src/worker/processing.rs b/crates/stemedb-ingest/src/worker/processing.rs index d002f35..a9afe38 100644 --- a/crates/stemedb-ingest/src/worker/processing.rs +++ b/crates/stemedb-ingest/src/worker/processing.rs @@ -1,7 +1,5 @@ -//! Record processing methods for the IngestWorker. -//! -//! Contains methods for ingesting assertions, votes, and epochs, -//! including validation and signature verification. +//! Record processing methods for the IngestWorker. Ingests assertions, votes, +//! and epochs with validation and signature verification. use super::record_types::RECORD_HEADER_SIZE; use super::{IngestWorker, RecordType}; @@ -327,12 +325,20 @@ impl IngestWorker { // The hash covers: subject, predicate, object, source_hash, source_class, confidence, timestamp. let v2_content_hash: Option<[u8; 32]> = if assertion.signatures.iter().any(|s| s.version == 2) { + // Debug: show exact number format for comparison with signing + let object_str = match &assertion.object { + stemedb_core::types::ObjectValue::Number(n) => format!("Number({:.17})", n), + other => format!("{:?}", other), + }; + let confidence_str = format!("{:.17}", assertion.confidence); let hash = compute_content_hash_v2(assertion); debug!( subject = %assertion.subject, predicate = %assertion.predicate, + object = %object_str, + source_hash = %hex::encode(assertion.source_hash), source_class = ?assertion.source_class, - confidence = %assertion.confidence, + confidence = %confidence_str, timestamp = %assertion.timestamp, content_hash = %hex::encode(hash), "Computed v2 content hash for verification" diff --git a/crates/stemedb-ontology/README.md b/crates/stemedb-ontology/README.md new file mode 100644 index 0000000..9e2ddc0 --- /dev/null +++ b/crates/stemedb-ontology/README.md @@ -0,0 +1,177 @@ +# stemedb-ontology + +Domain Ontology Layer for Episteme - defines how subjects are structured based on predicate type and domain. Ensures conflicts collide correctly when different sources report on the same thing. + +## Module Overview + +| Module | Purpose | +|--------|---------| +| `domain.rs` | Domain, EntityType, PredicateSchema, SourceTier builders | +| `subject.rs` | SubjectBuilder for canonical subject construction | +| `validator.rs` | Validates assertions against domain rules | +| `client.rs` | HTTP client for StemeDB API | +| `dto/` | Request/response DTOs for API communication | +| `pharma/` | Pharmaceutical domain (reference implementation) | + +## Quick Start + +### CLI Usage (steme-pharma) + +```bash +# Build the CLI +cargo build --release -p stemedb-ontology + +# Ingest FDA label data +./target/release/steme-pharma ingest semaglutide,tirzepatide + +# Ingest with mock conflicts for testing +./target/release/steme-pharma ingest semaglutide --with-conflicts + +# Query conflicts (Skeptic lens - default) +./target/release/steme-pharma query "Semaglutide:Type2Diabetes" hba1c_reduction_percent + +# Query with source hierarchy (Layered Consensus) +./target/release/steme-pharma query "Semaglutide:Type2Diabetes" weight_loss_percent --mode layered + +# Compare two drugs +./target/release/steme-pharma compare \ + "Semaglutide:Type2Diabetes" \ + "Tirzepatide:Type2Diabetes" \ + --predicate hba1c_reduction_percent + +# Explore available predicates for a subject +./target/release/steme-pharma explore "Semaglutide:Type2Diabetes" + +# Validate a subject/predicate combination +./target/release/steme-pharma validate "Semaglutide:Type2Diabetes" hba1c_reduction_percent + +# JSON output (for scripting) +./target/release/steme-pharma --format json query "Semaglutide" nausea_rate +``` + +### Programmatic Usage + +```rust +use stemedb_ontology::{pharma, SubjectBuilder, Validator}; +use stemedb_ontology::client::StemeClient; +use stemedb_ontology::pharma::extractors::{FdaLabelExtractor, MedicalExtractor, SourceInput}; +use ed25519_dalek::SigningKey; +use rand::rngs::OsRng; + +// Load the pharma domain definition +let domain = pharma::definition(); + +// Build a subject using the ontology +let schema = domain.get_schema("efficacy").unwrap(); +let mut entities = std::collections::HashMap::new(); +entities.insert("Drug".to_string(), "Semaglutide".to_string()); +entities.insert("Indication".to_string(), "Type2Diabetes".to_string()); +let subject = SubjectBuilder::build(schema, &entities, &domain).unwrap(); +assert_eq!(subject, "Semaglutide:Type2Diabetes"); + +// Validate assertions +let validator = Validator::new(&domain); +let result = validator.validate("hba1c_reduction_percent", &subject, 0.95); +assert!(result.is_ok()); + +// Extract and ingest claims +let client = StemeClient::new("http://localhost:18180"); +let extractor = FdaLabelExtractor::new(); +let signing_key = SigningKey::generate(&mut OsRng); +let agent_id = signing_key.verifying_key().to_bytes(); +let hlc = uhlc::HLCBuilder::new().build(); + +let claims = extractor.extract(&SourceInput::DrugName("semaglutide".into())).await?; +for claim in claims { + let assertion = claim.to_assertion(&signing_key, agent_id, &hlc); + let hash = client.assert(&assertion).await?; + println!("Ingested: {}", hash); +} + +// Query for conflicts +let skeptic = client.skeptic("Semaglutide:Type2Diabetes", "hba1c_reduction_percent").await?; +println!("Conflict score: {}", skeptic.conflict_score); +``` + +## Architecture + +``` + ┌─────────────────────────────────────┐ + │ Domain Definition │ + │ (EntityTypes, Schemas, Hierarchy) │ + └──────────────┬──────────────────────┘ + │ + ┌───────────────────────┼───────────────────────┐ + │ │ │ + v v v +┌─────────────────┐ ┌──────────────────┐ ┌──────────────────┐ +│ SubjectBuilder │ │ Validator │ │ MedicalExtractor│ +│ │ │ │ │ (trait) │ +│ Build canonical │ │ Validate against │ │ Extract claims │ +│ subject strings │ │ domain rules │ │ from sources │ +└────────┬────────┘ └────────┬─────────┘ └────────┬─────────┘ + │ │ │ + └──────────────┬──────┴──────────────────────┘ + │ + v + ┌───────────────────┐ + │ StemeClient │ + │ │ + │ Submit assertions │ + │ Query with lenses │ + └─────────┬─────────┘ + │ + v + ┌───────────────────┐ + │ StemeDB API │ + │ :18180/v1/* │ + └───────────────────┘ +``` + +## Subject Patterns + +Different predicate types use different subject structures to ensure proper collision: + +| Category | Pattern | Example | Use Case | +|----------|---------|---------|----------| +| Efficacy | `{Drug}:{Indication}` | `Semaglutide:Type2Diabetes` | Outcome measures for specific conditions | +| Safety | `{Drug}` | `Semaglutide` | Adverse events (apply across indications) | +| Mechanism | `{Drug}:{Target}` | `Semaglutide:GLP1R` | Pharmacology details | +| Comparison | `{Drug}:{Comparator}:{Indication}` | `Semaglutide:Tirzepatide:Type2Diabetes` | Head-to-head trials | + +## Source Hierarchy + +Claims are weighted by source authority: + +| Tier | Source Class | Weight | Examples | +|------|--------------|--------|----------| +| 0 | Regulatory | 1.0 | FDA Labels, EMA Reports | +| 1 | Clinical | 0.9 | Phase III RCTs, Lancet, NEJM | +| 2 | Observational | 0.7 | Real-World Evidence, FAERS | +| 3 | Expert | 0.5 | Guidelines, ADA Standards | +| 4 | Community | 0.3 | PatientsLikeMe, Moderated Forums | +| 5 | Anecdotal | 0.1 | Reddit, Twitter, Blog Posts | + +## Adding a New Domain + +See [Adding a Domain Guide](../../docs/guides/adding-a-domain.md) for step-by-step instructions on implementing new domains (e.g., cardiology, finance). + +## Testing + +```bash +# Run all ontology tests +cargo test -p stemedb-ontology + +# Run with output +cargo test -p stemedb-ontology -- --nocapture + +# Consumer Health UAT +cargo test -p stemedb-ontology --test consumer_health_uat +``` + +## Related Documentation + +- [What is Episteme](../../what-is-episteme.md) +- [Architecture](../../architecture.md) +- [Vision](../../vision.md) +- [Consumer Health Use Case](../../docs/app-concepts/consumer-health.md) diff --git a/crates/stemedb-ontology/src/bin/steme_pharma/commands.rs b/crates/stemedb-ontology/src/bin/steme_pharma/commands.rs index 1fd18dd..fc0a88d 100644 --- a/crates/stemedb-ontology/src/bin/steme_pharma/commands.rs +++ b/crates/stemedb-ontology/src/bin/steme_pharma/commands.rs @@ -40,21 +40,26 @@ pub async fn run_ingest( // Extract and ingest FDA claims if !args.mock_only { let extractor = FdaLabelExtractor::new(); + let drugs: Vec<&str> = args.drugs.split(',').map(str::trim).collect(); + let total_drugs = drugs.len(); - for drug in args.drugs.split(',') { - let drug = drug.trim(); - println!("--- Ingesting {} (FDA) ---", drug); + for (drug_idx, drug) in drugs.iter().enumerate() { + println!("--- [{}/{}] Ingesting {} (FDA) ---", drug_idx + 1, total_drugs, drug); - match extractor.extract(&SourceInput::DrugName(drug.to_string())).await { + match extractor.extract(&SourceInput::DrugName((*drug).to_string())).await { Ok(claims) => { - info!(drug = drug, claims_count = claims.len(), "Extracted FDA claims"); - for claim in claims { + let claims_count = claims.len(); + info!(drug = drug, claims_count = claims_count, "Extracted FDA claims"); + + for (claim_idx, claim) in claims.iter().enumerate() { let assertion = claim.to_assertion(&signing_key, agent_id, &hlc); match client.assert(&assertion).await { Ok(hash) => { total_ingested += 1; println!( - " [Regulatory] {} = {} -> {}...", + " [{}/{}] [Regulatory] {} = {} -> {}...", + claim_idx + 1, + claims_count, claim.predicate, format_value(&claim.value), &hash[..16] @@ -63,7 +68,13 @@ pub async fn run_ingest( Err(e) => { total_errors += 1; warn!(error = %e, predicate = %claim.predicate, "Failed to ingest"); - println!(" [ERROR] {} -> {}", claim.predicate, e); + println!( + " [{}/{}] [ERROR] {} -> {}", + claim_idx + 1, + claims_count, + claim.predicate, + e + ); } } } diff --git a/crates/stemedb-ontology/src/client.rs b/crates/stemedb-ontology/src/client.rs index c8beb3a..2237c38 100644 --- a/crates/stemedb-ontology/src/client.rs +++ b/crates/stemedb-ontology/src/client.rs @@ -105,7 +105,13 @@ impl StemeClient { let response = self.http_client.post(&url).json(&request).send().await.map_err(|e| { if e.is_connect() { - ClientError::ServerUnavailable { url: url.clone(), message: e.to_string() } + ClientError::ServerUnavailable { + url: url.clone(), + message: format!( + "Connection failed: {}. Ensure StemeDB is running: cargo run -p stemedb-api", + e + ), + } } else { ClientError::Http(e) } @@ -170,7 +176,13 @@ impl StemeClient { let response = self.http_client.get(&url).send().await.map_err(|e| { if e.is_connect() { - ClientError::ServerUnavailable { url: url.clone(), message: e.to_string() } + ClientError::ServerUnavailable { + url: url.clone(), + message: format!( + "Connection failed: {}. Ensure StemeDB is running: cargo run -p stemedb-api", + e + ), + } } else { ClientError::Http(e) } @@ -223,7 +235,13 @@ impl StemeClient { let response = self.http_client.get(&url).send().await.map_err(|e| { if e.is_connect() { - ClientError::ServerUnavailable { url: url.clone(), message: e.to_string() } + ClientError::ServerUnavailable { + url: url.clone(), + message: format!( + "Connection failed: {}. Ensure StemeDB is running: cargo run -p stemedb-api", + e + ), + } } else { ClientError::Http(e) } @@ -286,7 +304,13 @@ impl StemeClient { let response = self.http_client.get(&url).send().await.map_err(|e| { if e.is_connect() { - ClientError::ServerUnavailable { url: url.clone(), message: e.to_string() } + ClientError::ServerUnavailable { + url: url.clone(), + message: format!( + "Connection failed: {}. Ensure StemeDB is running: cargo run -p stemedb-api", + e + ), + } } else { ClientError::Http(e) } @@ -329,7 +353,13 @@ impl StemeClient { let response = self.http_client.get(&url).send().await.map_err(|e| { if e.is_connect() { - ClientError::ServerUnavailable { url: url.clone(), message: e.to_string() } + ClientError::ServerUnavailable { + url: url.clone(), + message: format!( + "Connection failed: {}. Ensure StemeDB is running: cargo run -p stemedb-api", + e + ), + } } else { ClientError::Http(e) } diff --git a/crates/stemedb-wal/src/journal.rs b/crates/stemedb-wal/src/journal.rs index c07facd..7e5146b 100644 --- a/crates/stemedb-wal/src/journal.rs +++ b/crates/stemedb-wal/src/journal.rs @@ -92,6 +92,12 @@ impl Journal { })?; guard.write(&buf)?; + // Update the cached segment size to reflect the write. + // This ensures read() can use the cached size for bounds checking. + let new_file_size = + guard.file().metadata().map_err(|e| QuarantineError::io(guard.path(), e))?.len(); + self.segment_mgr.update_current_segment_size(new_file_size); + let offset = self.current_offset; self.current_offset += record.disk_size(); diff --git a/crates/stemedb-wal/src/segment.rs b/crates/stemedb-wal/src/segment.rs index fb85dc7..70e96ea 100644 --- a/crates/stemedb-wal/src/segment.rs +++ b/crates/stemedb-wal/src/segment.rs @@ -147,6 +147,17 @@ impl SegmentManager { current_segment_size >= self.max_segment_size } + /// Update the cached size of the current (latest) segment. + /// + /// Call this after appending data to keep the cached size in sync with + /// the actual file size. This ensures that `read()` operations can use + /// the cached size for bounds checking without a disk stat call. + pub fn update_current_segment_size(&mut self, new_size: u64) { + if let Some(segment) = self.segments.last_mut() { + segment.size = new_size; + } + } + /// Create a new segment with the given base offset. /// /// Writes a v2 FileHeader to the new file and adds it to the segment list. diff --git a/docs/guides/adding-a-domain.md b/docs/guides/adding-a-domain.md new file mode 100644 index 0000000..335bd9b --- /dev/null +++ b/docs/guides/adding-a-domain.md @@ -0,0 +1,590 @@ +# Adding a New Domain to stemedb-ontology + +This guide walks you through implementing a new domain (vertical) in the stemedb-ontology crate. By the end, you'll have a working domain with entity types, predicate schemas, and optional extractors. + +**Time:** ~30 minutes +**Prerequisites:** Rust knowledge, familiarity with StemeDB concepts + +## Overview + +A domain in stemedb-ontology defines: + +1. **Entity Types** - The kinds of things in your domain (e.g., Drug, Company, Asset) +2. **Predicate Schemas** - How subjects are built for different predicate categories +3. **Source Hierarchy** - How to weight different source authorities +4. **Extractors (optional)** - Code that extracts claims from external sources + +## Step 1: Plan Your Domain Model + +Before writing code, answer these questions: + +### What entities exist in your domain? + +| Entity | Description | Example Values | +|--------|-------------|----------------| +| ? | ? | ? | + +**Pharma example:** +| Entity | Description | Example Values | +|--------|-------------|----------------| +| Drug | Pharmaceutical compound | Semaglutide, Tirzepatide | +| Indication | Medical condition | Type2Diabetes, Obesity | +| Target | Molecular target | GLP1R, GIPR | + +### What predicates will you track? + +Group predicates by category (determines subject pattern): + +| Category | Subject Pattern | Example Predicates | +|----------|-----------------|-------------------| +| ? | ? | ? | + +**Pharma example:** +| Category | Subject Pattern | Example Predicates | +|----------|-----------------|-------------------| +| Efficacy | `{Drug}:{Indication}` | hba1c_reduction_percent, weight_loss_percent | +| Safety | `{Drug}` | nausea_rate, has_boxed_warning | +| Mechanism | `{Drug}:{Target}` | binding_affinity, mechanism_of_action | + +### What sources will provide data? + +Order from most to least authoritative: + +| Tier | Source Class | Examples | Weight | +|------|--------------|----------|--------| +| 0 | Regulatory | ? | 1.0 | +| 1 | Clinical | ? | 0.9 | +| ... | ... | ... | ... | + +## Step 2: Create Domain Module + +Create the directory structure: + +``` +crates/stemedb-ontology/src/ + {domain}/ + mod.rs # Re-exports + definition.rs # Domain::new() builder +``` + +### Template: `{domain}/mod.rs` + +```rust +//! {Domain} domain ontology. +//! +//! This module defines the {domain} vertical with: +//! - Entity types (...) +//! - Predicate schemas (...) +//! - Source hierarchy (...) + +pub mod definition; + +pub use definition::definition; + +// Re-export domain-specific types if any +// pub use definition::{...}; +``` + +### Template: `{domain}/definition.rs` + +```rust +//! Compiled-in {domain} domain definition. + +use crate::domain::{ + DefaultLens, Domain, EntityType, NamingConvention, PredicateSchema, SourceTier, +}; +use stemedb_core::types::SourceClass; + +/// Build the {domain} domain definition. +pub fn definition() -> Domain { + let mut domain = Domain::new( + "{Domain}", + "Description of what this domain covers", + ); + + // ------------------------------------------------------------------------- + // Entity Types + // ------------------------------------------------------------------------- + + // Primary entity (e.g., the main subject of claims) + domain = domain.with_entity_type( + "{PrimaryEntity}", + EntityType::required("Description") + .with_naming(NamingConvention::CamelCase) + // Add aliases for common variations + .with_alias("ALIAS", "Canonical"), + ); + + // Secondary entity (for compound subjects) + domain = domain.with_entity_type( + "{SecondaryEntity}", + EntityType::required("Description") + .with_naming(NamingConvention::CamelCase), + ); + + // ------------------------------------------------------------------------- + // Predicate Schemas + // ------------------------------------------------------------------------- + + // Category 1: Primary predicates (single entity subject) + domain = domain.with_predicate_schema( + "category1", + PredicateSchema::new( + "Description of this predicate category", + "{PrimaryEntity}", + ) + .with_predicates(vec![ + "predicate_one", + "predicate_two", + ]) + .with_default_lens(DefaultLens::Recency), + ); + + // Category 2: Compound predicates (multi-entity subject) + domain = domain.with_predicate_schema( + "category2", + PredicateSchema::new( + "Description", + "{PrimaryEntity}:{SecondaryEntity}", + ) + .with_predicates(vec![ + "compound_predicate", + ]) + .with_default_lens(DefaultLens::LayeredConsensus), + ); + + // ------------------------------------------------------------------------- + // Source Hierarchy + // ------------------------------------------------------------------------- + + domain = domain.with_source_hierarchy(vec![ + SourceTier::new(SourceClass::Regulatory, "Tier 0: Official Sources") + .with_examples(vec!["Government agencies", "Standards bodies"]) + .with_weight(1.0), + SourceTier::new(SourceClass::Clinical, "Tier 1: Primary Research") + .with_examples(vec!["Peer-reviewed journals", "Research institutions"]) + .with_weight(0.9) + .with_decay(730), // 2 year half-life + SourceTier::new(SourceClass::Observational, "Tier 2: Secondary Analysis") + .with_examples(vec!["Industry reports", "Analyst research"]) + .with_weight(0.7) + .with_decay(365), + SourceTier::new(SourceClass::Expert, "Tier 3: Expert Opinion") + .with_examples(vec!["Industry experts", "Consultants"]) + .with_weight(0.5) + .with_decay(180), + SourceTier::new(SourceClass::Community, "Tier 4: Community") + .with_examples(vec!["Professional forums", "Curated discussions"]) + .with_weight(0.3) + .with_decay(90), + SourceTier::new(SourceClass::Anecdotal, "Tier 5: Anecdotal") + .with_examples(vec!["Social media", "Blog posts"]) + .with_weight(0.1) + .with_decay(30), + ]); + + domain +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_definition_builds() { + let domain = definition(); + assert_eq!(domain.name, "{Domain}"); + assert!(!domain.entity_types.is_empty()); + assert!(!domain.predicate_schemas.is_empty()); + assert!(!domain.source_hierarchy.is_empty()); + } + + #[test] + fn test_entity_normalization() { + let domain = definition(); + let entity = domain.get_entity_type("{PrimaryEntity}").expect("entity exists"); + + // Test alias normalization + assert_eq!(entity.normalize("ALIAS"), "Canonical"); + assert_eq!(entity.normalize("Canonical"), "Canonical"); + } + + #[test] + fn test_predicate_schema_lookup() { + let domain = definition(); + + // Direct lookup + let schema = domain.get_schema("category1").expect("schema exists"); + assert_eq!(schema.subject_pattern, "{PrimaryEntity}"); + + // Lookup by predicate + let schema = domain.schema_for_predicate("predicate_one").expect("found"); + assert!(schema.predicates.contains(&"predicate_one".to_string())); + } +} +``` + +## Step 3: Implement Extractors (Optional) + +If your domain has external data sources, implement the `MedicalExtractor` trait. + +### Directory Structure + +``` +crates/stemedb-ontology/src/ + {domain}/ + mod.rs + definition.rs + extractors/ + mod.rs + {source}.rs +``` + +### Template: `{domain}/extractors/mod.rs` + +```rust +//! Data extractors for {domain}. + +mod {source}; + +pub use {source}::{Source}Extractor; + +// Re-export common traits from parent +pub use crate::pharma::extractors::{ + ExtractError, MedicalClaim, MedicalExtractor, RetryConfig, SourceInput, +}; +``` + +### Template: `{domain}/extractors/{source}.rs` + +```rust +//! {Source} data extractor. + +use super::{ExtractError, MedicalClaim, MedicalExtractor, SourceInput}; +use async_trait::async_trait; +use stemedb_core::types::{ObjectValue, SourceClass}; + +/// Extractor for {Source} data. +pub struct {Source}Extractor { + http_client: reqwest::Client, + base_url: String, +} + +impl {Source}Extractor { + /// Create a new extractor. + pub fn new() -> Self { + Self { + http_client: reqwest::Client::new(), + base_url: "https://api.example.com".to_string(), + } + } +} + +impl Default for {Source}Extractor { + fn default() -> Self { + Self::new() + } +} + +#[async_trait] +impl MedicalExtractor for {Source}Extractor { + fn name(&self) -> &str { + "{Source} Extractor" + } + + fn source_class(&self) -> SourceClass { + SourceClass::Regulatory // Adjust based on source authority + } + + fn can_handle(&self, source: &SourceInput) -> bool { + matches!(source, SourceInput::DrugName(_) | SourceInput::Url(_)) + } + + async fn extract(&self, source: &SourceInput) -> Result, ExtractError> { + let query = match source { + SourceInput::DrugName(name) => name.clone(), + SourceInput::Url(url) => url.clone(), + _ => return Err(ExtractError::NotFound("Unsupported input type".into())), + }; + + // Fetch data from source + let url = format!("{}/search?q={}", self.base_url, urlencoding::encode(&query)); + let response = self.http_client.get(&url).send().await?; + + if !response.status().is_success() { + return Err(ExtractError::ApiError(format!( + "HTTP {}", response.status() + ))); + } + + // Parse response and extract claims + let mut claims = Vec::new(); + + // Example claim + claims.push( + MedicalClaim::new( + "Subject", + "predicate_name", + ObjectValue::Float(42.0), + ) + .with_confidence(0.9) + .with_source_url(&url) + .with_source_section("Section Name") + .with_quote("Supporting quote from source") + .with_source_class(self.source_class()) + ); + + Ok(claims) + } +} +``` + +## Step 4: Create CLI Binary (Optional) + +For user-facing domains, create a CLI tool. + +### Template: `src/bin/steme_{domain}.rs` + +```rust +//! CLI for {domain} domain operations. + +use clap::Parser; +use stemedb_ontology::client::StemeClient; +use stemedb_ontology::{domain}::definition; + +mod cli; +mod commands; + +#[derive(Parser)] +#[command(name = "steme-{domain}")] +#[command(about = "{Domain} data operations for StemeDB")] +struct Cli { + #[arg(long, default_value = "http://localhost:18180")] + server: String, + + #[command(subcommand)] + command: Commands, +} + +#[derive(clap::Subcommand)] +enum Commands { + /// Ingest data + Ingest { /* args */ }, + /// Query data + Query { /* args */ }, +} + +#[tokio::main] +async fn main() -> Result<(), Box> { + let cli = Cli::parse(); + let client = StemeClient::new(&cli.server); + + match cli.command { + Commands::Ingest { /* args */ } => { + // Implementation + } + Commands::Query { /* args */ } => { + // Implementation + } + } + + Ok(()) +} +``` + +## Step 5: Testing Checklist + +Before considering your domain complete: + +- [ ] `cargo build -p stemedb-ontology` succeeds +- [ ] `definition()` returns a valid Domain +- [ ] All entity types have meaningful descriptions +- [ ] All predicate schemas have correct subject patterns +- [ ] Entity normalization works (aliases resolve correctly) +- [ ] `schema_for_predicate()` finds the right schema +- [ ] Source hierarchy has 6 tiers with decreasing weights +- [ ] (If extractors) `cargo test -p stemedb-ontology` passes + +Run the tests: + +```bash +cargo test -p stemedb-ontology +cargo clippy -p stemedb-ontology -- -D warnings +``` + +## Step 6: Integration + +### Export from lib.rs + +Edit `crates/stemedb-ontology/src/lib.rs`: + +```rust +// Add your domain module +pub mod {domain}; + +// Re-export for convenience +pub use {domain}::definition as {domain}_domain; +``` + +### Update ai-lookup + +Add entry to `ai-lookup/index.md` under Domain Ontology section. + +### Update CLAUDE.md routing (if significant) + +If your domain is frequently used, add a routing entry in the Find Your Guide table. + +## Complete Example: Cardiology Domain (Skeleton) + +Here's a minimal working example for a cardiology domain: + +```rust +// crates/stemedb-ontology/src/cardiology/mod.rs +//! Cardiology domain ontology. + +pub mod definition; +pub use definition::definition; +``` + +```rust +// crates/stemedb-ontology/src/cardiology/definition.rs +use crate::domain::{DefaultLens, Domain, EntityType, NamingConvention, PredicateSchema, SourceTier}; +use stemedb_core::types::SourceClass; + +pub fn definition() -> Domain { + let mut domain = Domain::new( + "Cardiology", + "Cardiovascular conditions, procedures, and outcomes", + ); + + // Entities + domain = domain + .with_entity_type( + "Condition", + EntityType::required("Cardiovascular condition") + .with_naming(NamingConvention::CamelCase) + .with_alias("MI", "MyocardialInfarction") + .with_alias("CHF", "CongestiveHeartFailure") + .with_alias("AF", "AtrialFibrillation"), + ) + .with_entity_type( + "Procedure", + EntityType::required("Medical procedure") + .with_naming(NamingConvention::CamelCase) + .with_alias("CABG", "CoronaryArteryBypassGraft") + .with_alias("PCI", "PercutaneousCoronaryIntervention"), + ) + .with_entity_type( + "Biomarker", + EntityType::required("Diagnostic biomarker") + .with_naming(NamingConvention::CamelCase), + ); + + // Schemas + domain = domain + .with_predicate_schema( + "diagnosis", + PredicateSchema::new("Diagnostic criteria", "{Condition}") + .with_predicates(vec![ + "diagnostic_criteria", + "staging_system", + "severity_classification", + ]) + .with_default_lens(DefaultLens::Authority), + ) + .with_predicate_schema( + "outcome", + PredicateSchema::new("Treatment outcomes", "{Condition}:{Procedure}") + .with_predicates(vec![ + "mortality_rate", + "complication_rate", + "readmission_rate", + "length_of_stay_days", + ]) + .with_default_lens(DefaultLens::LayeredConsensus), + ) + .with_predicate_schema( + "biomarker", + PredicateSchema::new("Biomarker thresholds", "{Biomarker}") + .with_predicates(vec![ + "normal_range", + "diagnostic_threshold", + "prognostic_value", + ]) + .with_default_lens(DefaultLens::Consensus), + ); + + // Source hierarchy + domain = domain.with_source_hierarchy(vec![ + SourceTier::new(SourceClass::Regulatory, "Tier 0: Guidelines") + .with_examples(vec!["ACC/AHA Guidelines", "ESC Guidelines"]) + .with_weight(1.0), + SourceTier::new(SourceClass::Clinical, "Tier 1: Clinical Trials") + .with_examples(vec!["Landmark RCTs", "Meta-analyses"]) + .with_weight(0.9) + .with_decay(730), + SourceTier::new(SourceClass::Observational, "Tier 2: Registries") + .with_examples(vec!["NCDR", "Get With The Guidelines"]) + .with_weight(0.7) + .with_decay(365), + SourceTier::new(SourceClass::Expert, "Tier 3: Expert Consensus") + .with_examples(vec!["Consensus statements", "Textbooks"]) + .with_weight(0.5) + .with_decay(180), + SourceTier::new(SourceClass::Community, "Tier 4: Community") + .with_examples(vec!["Medical forums", "CME discussions"]) + .with_weight(0.3) + .with_decay(90), + SourceTier::new(SourceClass::Anecdotal, "Tier 5: Anecdotal") + .with_examples(vec!["Case reports", "Social media"]) + .with_weight(0.1) + .with_decay(30), + ]); + + domain +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn test_cardiology_domain() { + let domain = definition(); + assert_eq!(domain.name, "Cardiology"); + + // Check entity aliases + let condition = domain.get_entity_type("Condition").unwrap(); + assert_eq!(condition.normalize("MI"), "MyocardialInfarction"); + + // Check schema lookup + let schema = domain.schema_for_predicate("mortality_rate").unwrap(); + assert_eq!(schema.subject_pattern, "{Condition}:{Procedure}"); + } +} +``` + +## Troubleshooting + +### "Unknown predicate" errors + +Your predicate isn't in any schema. Add it to the appropriate `with_predicates()` call. + +### Subject collision issues + +If claims that should conflict aren't conflicting, check that: +1. The subject pattern matches your intent +2. Entity values are being normalized consistently +3. The predicate is in the right schema category + +### Extractor not finding data + +1. Check the API URL is correct +2. Verify the query parameters match the API's expectations +3. Add debug logging to see raw responses + +## Next Steps + +- Run the Consumer Health UAT to see the pharma domain in action +- Read the [Lens documentation](../services/lens.md) to understand conflict resolution +- Check the [SDK guide](../../ai-lookup/services/sdk.md) for Go integration diff --git a/roadmap.md b/roadmap.md index f8d7c0c..dd61445 100644 --- a/roadmap.md +++ b/roadmap.md @@ -44,7 +44,7 @@ | **Time Travel Works** | `as_of=2024-01-01` returns historical snapshot | ✅ Infrastructure ready | | **Decay Works** | 6-month-old Reddit claim has lower effective confidence than fresh FDA | ✅ Infrastructure ready | | **UAT Passes** | Consumer Health scenarios documented and verified | ✅ Week 4 | -| **Self-Serve Demo** | CLI tool lets anyone explore without code | 🚧 Week 5 | +| **Self-Serve Demo** | CLI tool lets anyone explore without code | ✅ Week 5 | ### The Demo Script @@ -76,7 +76,7 @@ | **Week 2** ✅ | FDA extractor, claim-to-assertion signing | Ontology | Week 1 | | **Week 3** ✅ | Ingest FDA claims, mock conflicts, SkepticLens demo | Ontology | Week 2 | | **Week 4** ✅ | UAT scenarios documented and verified | Ontology | Week 3 | -| **Week 5** | `steme-pharma` CLI for self-serve exploration | Ontology | Week 3 | +| **Week 5** ✅ | `steme-pharma` CLI for self-serve exploration | Ontology | Week 3 | | **Week 6** | Polish, factor out reusable patterns, document | Ontology | Week 4-5 | ### What's NOT in MVP @@ -1346,11 +1346,10 @@ These are valuable but not required to prove the core value proposition: * [x] **Week 2**: FDA Extractor + Signing — `FdaLabelExtractor`, `MedicalClaim::to_assertion()`, exponential backoff. ✅ COMPLETE * [x] **Week 3**: StemeDB Integration — `StemeClient`, `pharma-ingest` CLI, mock conflict demo. ✅ COMPLETE * [x] **Week 4**: UAT Scenarios — Document acceptance criteria, validation tests. ✅ COMPLETE -* [ ] **Week 5**: CLI Tool — `steme-pharma` CLI for ingest/query/compare. +* [x] **Week 5**: CLI Tool — `steme-pharma` CLI for ingest/query/compare. ✅ COMPLETE * [ ] **Week 6**: Generalization — Factor out reusable patterns, document "Adding a Domain". ### Next Up -* **Week 5 MVP**: Full `steme-pharma` CLI with query, compare, and explore commands. * **Week 6 MVP**: Factor out reusable patterns, document "Adding a Domain" guide. * **Phase 8B-C** (deferred): Observability, geo-distribution — production concerns, not MVP blockers. @@ -1363,6 +1362,14 @@ These are valuable but not required to prove the core value proposition: * **Agent Wallet** (Key management sidecar) -> App layer. ### Recently Completed +* [x] **🎯 MVP Week 5**: `steme-pharma` CLI for self-serve exploration. + * Full CLI binary with 5 subcommands: `ingest`, `query`, `compare`, `explore`, `validate`. + * Query modes: `skeptic` (default), `layered` (per-tier), and lens-based (recency, consensus, etc.). + * Table and JSON output formats via `comfy-table`. + * Client extensions: `layered()`, `query()`, `list_predicates()` methods. + * Response DTOs: `LayeredResponse`, `QueryResponse`, `AssertionDto`, `TierResolutionDto`. + * Domain validation for known pharma predicates and subject patterns. + * Modular design: cli.rs, commands.rs, helpers.rs, output.rs. * [x] **🎯 MVP Week 4**: UAT scenarios documented and verified. * Integration test suite: `crates/stemedb-ontology/tests/consumer_health_uat.rs` * 4 automated UAT scenarios with real Ed25519 signing @@ -1597,7 +1604,10 @@ INFRASTRUCTURE (Complete) VERTICAL INTEGRATION (In Progress) [stemedb-ontology Weeks 1-3] ✅ ───────────────────────┘ (Domain defs, FDA extractor, StemeClient) | v - [MVP Week 5: CLI Tool] [ ] + [MVP Week 5: CLI Tool] ✅ + | + v + [MVP Week 6: Polish & Docs] [ ] | v 🎯 CONSUMER HEALTH MVP diff --git a/scripts/demo-consumer-health.sh b/scripts/demo-consumer-health.sh new file mode 100755 index 0000000..1789810 --- /dev/null +++ b/scripts/demo-consumer-health.sh @@ -0,0 +1,239 @@ +#!/usr/bin/env bash +# Consumer Health Demo Script +# Demonstrates StemeDB + stemedb-ontology for the Consumer Health use case +# +# Prerequisites: +# - StemeDB API running: cargo run -p stemedb-api +# - steme-pharma built: cargo build -p stemedb-ontology +# +# Usage: +# ./scripts/demo-consumer-health.sh + +set -e + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +CYAN='\033[0;36m' +BOLD='\033[1m' +NC='\033[0m' # No Color + +# Configuration +STEMEDB_URL="${STEMEDB_URL:-http://localhost:18180}" +PHARMA_CLI="./target/release/steme-pharma" +PAUSE_SECONDS=2 + +# Helper functions +print_header() { + echo -e "\n${BOLD}${BLUE}════════════════════════════════════════════════════════════════${NC}" + echo -e "${BOLD}${BLUE} $1${NC}" + echo -e "${BOLD}${BLUE}════════════════════════════════════════════════════════════════${NC}\n" +} + +print_step() { + echo -e "${CYAN}▶ $1${NC}" +} + +print_success() { + echo -e "${GREEN}✓ $1${NC}" +} + +print_warning() { + echo -e "${YELLOW}⚠ $1${NC}" +} + +print_error() { + echo -e "${RED}✗ $1${NC}" +} + +wait_for_user() { + if [ -t 0 ]; then + echo -e "\n${YELLOW}Press Enter to continue...${NC}" + read -r + else + sleep $PAUSE_SECONDS + fi +} + +# Check prerequisites +print_header "Consumer Health Demo - Prerequisites Check" + +# Check if steme-pharma exists +if [ ! -f "$PHARMA_CLI" ]; then + print_warning "steme-pharma not found at $PHARMA_CLI" + print_step "Building steme-pharma..." + cargo build --release -p stemedb-ontology +fi + +# Check if StemeDB is running +print_step "Checking StemeDB connection..." +if curl -s "${STEMEDB_URL}/v1/health" > /dev/null 2>&1; then + print_success "StemeDB is running at $STEMEDB_URL" +else + print_error "StemeDB not reachable at $STEMEDB_URL" + echo -e "\nStart StemeDB with:" + echo -e " ${CYAN}cargo run -p stemedb-api${NC}" + exit 1 +fi + +# ============================================================================ +# STEP 1: Ingest FDA Data + Mock Conflicts +# ============================================================================ +print_header "Step 1: Ingest FDA Data + Mock Conflicts" + +print_step "Ingesting FDA label data for Semaglutide and Tirzepatide..." +print_step "Adding mock conflicts (simulating social media contradictions)..." +echo "" + +$PHARMA_CLI --stemedb-url "$STEMEDB_URL" ingest "semaglutide,tirzepatide" --with-conflicts + +print_success "Data ingested with mock conflicts" +wait_for_user + +# ============================================================================ +# STEP 2: Conflict Detection (Skeptic Lens) +# ============================================================================ +print_header "Step 2: Conflict Detection (Skeptic Lens)" + +echo -e "${BOLD}Question:${NC} What do different sources say about Semaglutide's nausea rate?" +echo -e "${BOLD}Lens:${NC} Skeptic (shows all claims, highlights disagreements)\n" + +print_step "Querying nausea_rate with Skeptic lens..." +echo "" + +$PHARMA_CLI --stemedb-url "$STEMEDB_URL" query "Semaglutide" "nausea_rate" --mode skeptic + +echo -e "\n${YELLOW}Note:${NC} The conflict score indicates disagreement between sources." +echo -e "FDA (Regulatory tier) reports clinical trial data." +echo -e "Reddit (Anecdotal tier) reports user experiences." +wait_for_user + +# ============================================================================ +# STEP 3: Source Hierarchy (Layered Consensus) +# ============================================================================ +print_header "Step 3: Source Hierarchy (Layered Consensus)" + +echo -e "${BOLD}Question:${NC} What's the consensus on HbA1c reduction, broken down by source tier?" +echo -e "${BOLD}Lens:${NC} LayeredConsensus (shows per-tier agreement)\n" + +print_step "Querying hba1c_reduction_percent with Layered Consensus..." +echo "" + +$PHARMA_CLI --stemedb-url "$STEMEDB_URL" query "Semaglutide:Type2Diabetes" "hba1c_reduction_percent" --mode layered + +echo -e "\n${YELLOW}Note:${NC} Each tier shows its own consensus." +echo -e "Higher tiers (Regulatory, Clinical) have more weight in final resolution." +wait_for_user + +# ============================================================================ +# STEP 4: Drug Comparison +# ============================================================================ +print_header "Step 4: Drug Comparison" + +echo -e "${BOLD}Question:${NC} How do Semaglutide and Tirzepatide compare on weight loss?" +echo -e "${BOLD}Method:${NC} Side-by-side query of both subjects\n" + +print_step "Comparing weight_loss_percent..." +echo "" + +$PHARMA_CLI --stemedb-url "$STEMEDB_URL" compare \ + "Semaglutide:Type2Diabetes" \ + "Tirzepatide:Type2Diabetes" \ + --predicate "weight_loss_percent" + +echo -e "\n${YELLOW}Note:${NC} Both drugs' claims are shown with conflict scores." +echo -e "A consumer can see both FDA data and community reports." +wait_for_user + +# ============================================================================ +# STEP 5: Explore Available Data +# ============================================================================ +print_header "Step 5: Explore Available Data" + +echo -e "${BOLD}Question:${NC} What predicates are available for Semaglutide?" +echo -e "${BOLD}Method:${NC} List all predicates with assertions for this subject\n" + +print_step "Exploring Semaglutide predicates..." +echo "" + +$PHARMA_CLI --stemedb-url "$STEMEDB_URL" explore "Semaglutide:Type2Diabetes" + +echo "" + +print_step "Exploring Semaglutide safety predicates..." +echo "" + +$PHARMA_CLI --stemedb-url "$STEMEDB_URL" explore "Semaglutide" + +echo -e "\n${YELLOW}Note:${NC} Efficacy predicates use Drug:Indication subjects." +echo -e "Safety predicates use Drug-only subjects (apply across indications)." +wait_for_user + +# ============================================================================ +# STEP 6: JSON Output (for Integration) +# ============================================================================ +print_header "Step 6: JSON Output (for Integration)" + +echo -e "${BOLD}Use Case:${NC} Programmatic access for AI agents or web apps" +echo -e "${BOLD}Format:${NC} JSON output for parsing\n" + +print_step "Getting JSON response..." +echo "" + +$PHARMA_CLI --stemedb-url "$STEMEDB_URL" --format json query "Semaglutide" "nausea_rate" | head -50 + +echo -e "\n... (truncated for demo)" +wait_for_user + +# ============================================================================ +# Summary +# ============================================================================ +print_header "Demo Summary" + +echo -e "${GREEN}${BOLD}What We Demonstrated:${NC}\n" + +echo -e " 1. ${CYAN}Data Ingestion${NC}" +echo -e " - FDA label extraction (Regulatory tier)" +echo -e " - Mock social media conflicts (Anecdotal tier)" +echo "" + +echo -e " 2. ${CYAN}Conflict Detection${NC}" +echo -e " - Skeptic lens shows ALL claims" +echo -e " - Conflict score quantifies disagreement" +echo "" + +echo -e " 3. ${CYAN}Source Hierarchy${NC}" +echo -e " - LayeredConsensus groups by authority tier" +echo -e " - FDA data weighted higher than Reddit" +echo "" + +echo -e " 4. ${CYAN}Drug Comparison${NC}" +echo -e " - Side-by-side view of multiple subjects" +echo -e " - Each drug's claims with provenance" +echo "" + +echo -e " 5. ${CYAN}Data Exploration${NC}" +echo -e " - Discover available predicates" +echo -e " - Different subject patterns for efficacy vs safety" +echo "" + +echo -e " 6. ${CYAN}API Integration${NC}" +echo -e " - JSON output for programmatic access" +echo -e " - Ready for AI agents and web apps" +echo "" + +echo -e "${BOLD}Consumer Health Value Proposition:${NC}" +echo -e " - ${GREEN}See all perspectives${NC}, not just the loudest" +echo -e " - ${GREEN}Understand source authority${NC} (FDA vs. Reddit)" +echo -e " - ${GREEN}Make informed decisions${NC} with conflict awareness" +echo "" + +echo -e "${BOLD}Next Steps:${NC}" +echo -e " - Run Consumer Health UAT: ${CYAN}cargo test -p stemedb-ontology --test consumer_health_uat${NC}" +echo -e " - Read the guide: ${CYAN}docs/app-concepts/consumer-health.md${NC}" +echo -e " - Try the Go SDK: ${CYAN}sdk/go/steme/${NC}" +echo "" + +print_success "Demo complete!"