# Programmatic Extractors: Option Semantics ## Overview This example demonstrates when and how to use **programmatic extractors** instead of declarative extractors. The problem: detecting when Rust `Option` configuration fields are set to `None` (unbounded) vs `Some(value)` (bounded). ## The Problem **Scenario**: HTTP client with configurable redirect and retry limits: ```rust pub struct ClientConfig { pub max_redirects: Option, // None = unbounded pub max_retries: Option, // None = unbounded } impl Default for ClientConfig { fn default() -> Self { Self { max_redirects: None, // ← VIOLATION: Allows infinite loops max_retries: None, // ← VIOLATION: Allows retry storms } } } ``` **Security/Safety Claims**: 1. Redirect limit MUST be configured (not unbounded) 2. Retry limit MUST be configured (not unbounded) ## Why Declarative Extractors Fail **Declarative extractors** (regex-based) have limitations: ```toml # ❌ This won't work reliably [[extractors.declarative]] name = "max_redirects_none" pattern = "max_redirects:\\s*None" predicate = "configured" value = false ``` **Problems**: 1. ❌ Can't distinguish struct field declarations from actual values in Default impl 2. ❌ Can't represent "unbounded" semantically for numeric comparison 3. ❌ Can't extract values from `Some(10)` vs `Some(100)` for threshold checks 4. ❌ Context-blind: doesn't know if a field is `Option` or not **Result**: ~50% detection rate on first dogfood attempt. ## The Programmatic Solution ### Two Extractors, Two Claims Strategy To fully validate Option bounded configuration, we need: 1. **OptionBoundsExtractor** - Detects `None` assignments (unbounded) 2. **OptionValueExtractor** - Extracts values from `Some(n)` (for threshold checks) ### Implementation 1: OptionBoundsExtractor **Purpose**: Detect when `Option` fields are set to `None`. ```rust pub struct OptionBoundsExtractor { /// Matches: pub field_name: Option field_pattern: Regex, /// Matches: field_name: None none_pattern: Regex, } impl Extractor for OptionBoundsExtractor { fn extract(&self, path_segments: &[String], content: &str, ...) -> Vec { // 1. Find all Option field declarations let option_fields = self.field_pattern .captures_iter(content) .map(|cap| cap[1].to_string()) .collect::>(); // 2. Find all None assignments let none_assignments = content.lines() .enumerate() .filter_map(|(idx, line)| { self.none_pattern.captures(line).map(|cap| { (cap[1].to_string(), idx + 1) }) }) .collect::>(); // 3. Match field names - if an Option field is set to None, it's unbounded for (field_name, line_num) in none_assignments { if option_fields.contains(&field_name) { observations.push(Observation { concept_path: format!("{}/{}", path, field_name), predicate: "configured", value: Boolean(false), // Not configured (unbounded) ... }); } } observations } } ``` **Key Logic**: - ✅ Only triggers when BOTH patterns match (field declaration + None assignment) - ✅ Context-aware: knows the field is `Option` - ✅ Creates semantic observation: `configured = false` ### Implementation 2: OptionValueExtractor **Purpose**: Extract actual values from `Some(n)` for threshold comparison. ```rust pub struct OptionValueExtractor { field_pattern: Regex, // pub field_name: Option some_pattern: Regex, // field_name: Some(value) } impl Extractor for OptionValueExtractor { fn extract(&self, ...) -> Vec { // 1. Find all Option fields let option_fields = self.field_pattern.captures_iter(content)...; // 2. Find all Some(value) assignments for (line_num, line) in content.lines().enumerate() { if let Some(cap) = self.some_pattern.captures(line) { let field_name = &cap[1]; let value = &cap[2]; if option_fields.contains(field_name) { observations.push(Observation { predicate: "max_value", value: Text(value.to_string()), // Extract for comparison ... }); } } } observations } } ``` ### Two-Claim Strategy For each bounded field, author **TWO claims**: **Claim 1: Must be configured** (OptionBoundsExtractor) ```toml [[claim]] id = "httpclient-max-redirects-configured" concept_path = "httpclient/max_redirects" predicate = "configured" value = true comparison = "equals" invariant = "Redirect limit MUST be configured (not unbounded)" consequence = "Unbounded redirects allow infinite loops, exhaust resources" ``` **Claim 2: Max value threshold** (OptionValueExtractor) ```toml [[claim]] id = "httpclient-max-redirects-threshold" concept_path = "httpclient/max_redirects" predicate = "max_value" value = 10.0 comparison = "equals" invariant = "Redirect limit MUST NOT exceed 10" consequence = "Excessive redirects waste bandwidth, delay responses" ``` ### Conflict Detection | Code | OptionBoundsExtractor | OptionValueExtractor | Result | |------|----------------------|---------------------|--------| | `max_redirects: None` | `configured = false` | *(no observation)* | **CONFLICT** with Claim 1 ✓ | | `max_redirects: Some(20)` | *(no observation)* | `max_value = "20"` | **CONFLICT** with Claim 2 ✓ | | `max_redirects: Some(5)` | *(no observation)* | `max_value = "5"` | **PASS** both claims ✓ | ## Results: Declarative vs Programmatic ### Task #1 (Declarative Only) - **Detection rate**: 71% (5/7 violations) - **Missed**: `max_redirects: None`, `max_retries: None` - **Reason**: Can't distinguish None in struct vs Default impl ### Task #3 (Hybrid: Declarative + Programmatic) - **Detection rate**: 100% (7/7 violations) - **Time**: ~7 hours (2 extractors + 4 claims + tests + docs) - **Reusability**: Template for any bounded Option field ## When to Use Programmatic Extractors ### Use Programmatic When: 1. **Context matters**: Need to understand surrounding code (e.g., "is this field Option?") 2. **Semantic understanding**: Need to represent "unbounded" or extract values for comparison 3. **Multi-pattern matching**: Need to correlate multiple patterns (declaration + assignment) 4. **Type-aware**: Need to know the field's type to interpret its value ### Use Declarative When: 1. **Simple patterns**: Static text matching (e.g., "hardcoded API key") 2. **No context needed**: Pattern is self-contained 3. **Rapid prototyping**: Quick validation before committing to programmatic 4. **90%+ accuracy**: Declarative achieves target detection rate ## Hybrid Strategy (Recommended) **Day 3 Workflow**: 1. **Start with declarative** (rapid prototyping, ~30 min) 2. **Measure detection rate** (run scan, check conflicts) 3. **If <70%**: Flag for refinement 4. **Day 5**: Create programmatic extractors for false negatives 5. **Re-scan**: Verify ≥90% detection 6. **Document**: Show before/after improvement **Example**: - Day 3: Declarative → 71% (5/7) - Day 5: Add programmatic → 100% (7/7) - Improvement: +29 percentage points ## Code Location **Extractors**: - `applications/aphoria/src/extractors/option_bounds.rs` - `applications/aphoria/src/extractors/option_value.rs` **Claims**: - `applications/aphoria/dogfood/httpclient/.aphoria/claims.toml` **Tests**: ```bash cargo test -p aphoria --lib extractors::option_bounds cargo test -p aphoria --lib extractors::option_value ``` ## Reusable Pattern This pattern works for any bounded Option configuration: | Field | Claim 1 (configured) | Claim 2 (threshold) | |-------|---------------------|---------------------| | `max_connections` | MUST be configured | ≤ 100 | | `max_lifetime` | MUST be configured | ≤ 3600s | | `pool_size` | MUST be configured | ≤ 50 | | `idle_timeout` | MUST be configured | ≤ 300s | **Extractor configuration**: ```rust fn verifiable_predicates(&self) -> Vec<(&str, &str)> { vec![ ("max_redirects", "configured"), // or "max_value" ("max_retries", "configured"), ("max_connections", "configured"), ("idle_timeout", "configured"), ] } ``` ## Enterprise Value This implementation demonstrates: 1. **Production quality**: Proper error handling, tests, documentation 2. **Reusability**: Template for any bounded configuration pattern 3. **Knowledge transfer**: Shows when/why to use programmatic extractors 4. **Flywheel completion**: Unblocks autonomous learning for Pilot 1 **Time investment**: 7 hours **Payoff**: Reusable for 10+ similar patterns across all dogfood exercises