stemedb/applications/aphoria/dogfood/cachewrap/eval/gap-analysis-2026-02-11.md
jml e758f2ebfb feat(aphoria): implement programmatic extractors for Option<T> semantics
Completes Task #3 of httpclient dogfooding with 100% detection rate (7/7 violations).

## New Extractors

- **OptionBoundsExtractor**: Detects Option<T> fields set to None (unbounded)
- **OptionValueExtractor**: Extracts values from Some(n) for threshold checks

Both extractors use context-aware pattern matching to understand Rust Option<T>
semantics, which declarative extractors cannot handle.

## Implementation

**Files Created**:
- applications/aphoria/src/extractors/option_bounds.rs (257 lines)
- applications/aphoria/src/extractors/option_value.rs (277 lines)
- applications/aphoria/docs/examples/extractors/programmatic-option-semantics.md

**Files Modified**:
- applications/aphoria/src/extractors/mod.rs - Added module declarations
- applications/aphoria/src/extractors/registry.rs - Registered extractors
- applications/aphoria/dogfood/httpclient/.aphoria/claims.toml - Added 4 claims
- applications/aphoria/dogfood/httpclient/TASK-1-SUMMARY.md - Task #3 completion

## Results

| Metric | Value |
|--------|-------|
| Detection Rate | 100% (7/7 violations) |
| Improvement | +29 percentage points (from 71%) |
| New Violations | 2 (max_redirects, max_retries unbounded) |
| Unit Tests | 13 (all passing) |

## Two-Claim Strategy

For each bounded Option<T> field:
1. **configured** claim - Detects None (unbounded)
2. **max_value** claim - Validates Some(n) threshold

Example:
- `max_redirects: None` → CONFLICT (not configured)
- `max_redirects: Some(20)` → CONFLICT (exceeds 10)
- `max_redirects: Some(5)` → PASS

## Enterprise Quality

✓ Proper error handling (no unwrap/expect)
✓ Comprehensive tests (6+7 unit tests)
✓ Full documentation with examples
✓ Reusable for 10+ similar patterns
✓ Screening patterns for performance

## Cachewrap Dogfood

Also includes complete cachewrap dogfood exercise:
- 10 claims for Redis cache wrapper
- Day 1-5 summaries
- Full retrospective and evaluation
- Declarative extractors for all patterns

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 06:43:10 +00:00

9.6 KiB

Gap Analysis - cachewrap Documentation

Timestamp: 2026-02-11


CRITICAL FIRST CHECK: Aphoria Nature Question

Question: Did the team use LLM workflows (skills) or manual CLI?

Evidence Review:

Day 1 (Claims):

  • .aphoria/claims.toml shows created_by = "aphoria-suggest" for all 20 claims
  • DAY1-SUMMARY.md mentions "Pattern Discovery via LLM"
  • Time: 18.2 seconds per claim (suggests automation)
  • Verdict: USED /aphoria-suggest skill

Day 3 (Extractors):

  • DAY3-SUMMARY.md shows manual .aphoria/config.toml editing
  • 3 iterations with manual debugging (separate files → config → path alignment)
  • No mention of /aphoria-custom-extractor-creator skill invocations
  • gap-analysis.md mentions skill but no evidence of actual usage
  • Verdict: MANUAL workflow

Conclusion: PARTIAL Product Misunderstanding

Type: Documentation Gap (Not Product Misunderstanding)

Reason: Team used LLM skills for Day 1 but manual workflow for Day 3

Root Cause: Documentation failed to emphasize continuous LLM requirement across all phases. Skills presented as "recommended tools" not "core mechanism."

Impact:

  • Team experienced extractor creation challenges (3 iterations)
  • Manual workflow slower and more error-prone than autonomous
  • Knowledge capture happened but inefficiently

Recommendation:

  • Emphasize: "LLM workflows REQUIRED for ALL phases" (not just Day 1)
  • Distinguish: "Autonomous workflow" (skills) vs "Debug mode" (manual CLI)
  • See Finding 2 in main evaluation report

Gap 1: Day 3 Workflow Not Emphasized as Flywheel Core

Type: Missing Information + Unclear Instructions

Evidence:

  • Team thought (DAY3-SUMMARY.md:1-8): "Day 3: Scanning & Extractor Creation - 9 minutes"
  • Team did (DAY3-SUMMARY.md:80-152): 3 iterations before achieving 50% detection
    • Iteration 1: Created separate .toml files (wrong approach)
    • Iteration 2: Added to config.toml but concept path mismatch
    • Iteration 3: Fixed paths, achieved 50%
  • Doc said (plan.md:111): "Day 3: Scanning (1.5-2 hrs) - 6-phase workflow"

Root Cause: Day 3 presented as "validation day" when it's knowledge capture day (Steps 4-5 of flywheel)

Impact:

  • Time lost: None (team completed in 9 min vs 2 hr target)
  • Confusion level: Medium (3 iterations to find correct approach)
  • Blocker: No (team discovered correct pattern via trial and error)

Recommendation:

  • Where: plan.md Day 3 introduction (lines 111-115)
  • What to add:
    ## ⚠️ CRITICAL: Day 3 is Flywheel Steps 4-5
    
    This is NOT "run scan and check results." This IS:
    - Step 4: Identify claims without extractors (MISSING verdicts)
    - Step 5: Create extractors for those claims (autonomous learning)
    
    **Without extractor creation, NO knowledge is captured.**
    
    Evidence of correct execution:
    - `.aphoria/extractors/` directory with 8+ .toml files, OR
    - `.aphoria/config.toml` with `[[extractors.declarative]]` sections
    - `scan-v2.json` exists (verification scan AFTER extractor creation)
    - DAY3-SUMMARY.md documents detection rate improvement (v1 → v2)
    
  • Priority: High (Critical for flywheel narrative)

Gap 2: Continuous LLM Requirement Not Explicit

Type: Buried Information

Evidence:

  • Team thought: Skills are optional tools, manual CLI is primary
  • Team did:
    • Day 1: Used /aphoria-suggest skill
    • Day 3: Manually edited config.toml
  • Doc said (plan.md:121): "Skills: /aphoria-suggest, /aphoria-claims, /aphoria-custom-extractor-creator"
  • Doc said (README.md:142): Lists skills with "when to use"

Root Cause: Documentation doesn't distinguish "autonomous workflow" (LLM-driven) vs "manual CLI" (debug mode)

Impact:

  • Time lost: Unknown (team still completed fast)
  • Confusion level: Medium (used skills inconsistently)
  • Blocker: No (partial LLM usage still worked)

Recommendation:

  • Where: README.md top section + plan.md Day 1 & Day 3
  • What to add:
    ## 🤖 Autonomous Workflow (REQUIRED)
    
    Aphoria IS an LLM-driven continuous learning system. Skills ARE the product:
    
    - **Day 1:** `/aphoria-suggest` discovers patterns → `/aphoria-claims` authors claims
    - **Day 3:** `/aphoria-custom-extractor-creator` generates extractors for gaps
    
    **Manual CLI exists for debugging only.** If you find yourself:
    - Running `aphoria claims create` manually → Use `/aphoria-suggest` instead
    - Editing `.aphoria/config.toml` manually → Use `/aphoria-custom-extractor-creator`
    
    The dogfood exercise validates the **autonomous workflow**, not manual fallbacks.
    
  • Priority: High (Product positioning)

Gap 3: Detection Rate Target Not Contextualized

Type: Unclear Instructions

Evidence:

  • Team thought (DAY3-SUMMARY.md:18): "⚠️ Below target" (50% vs ≥90%)
  • Team did (DAY3-SUMMARY.md:186-229): Analyzed "Why 50% Instead of ≥90%?" with root causes
  • Doc said (plan.md:7): "Detection rate: ≥90% of violations"

Root Cause: Target implies built-in extractors should catch 90%, doesn't account for baseline scan expectations

Impact:

  • Time lost: 0 (team understood through analysis)
  • Confusion level: High (thought they failed)
  • Blocker: No (DAY5 retrospective clarified success)

Recommendation:

  • Where: plan.md Day 3 success criteria (lines 170-178)
  • What to add:
    **Detection Rate Expectations:**
    
    - **Baseline scan (v1):** 0-20% expected (built-in extractors don't know cache patterns)
    - **After declarative extractors (v2):** 50-70% achievable (regex limitations)
    - **After programmatic extractors (v3):** 90-100% target (AST analysis)
    
    **Success = improvement, not perfection.** 0% → 50% validates the flywheel.
    
  • Priority: High (Affects success interpretation)

Gap 4: Concept Path Alignment Not Pre-Explained

Type: Missing Information

Evidence:

  • Team thought: Extractor subject can be any string
  • Team did (DAY3-SUMMARY.md:126-148):
    • Iteration 2: claim.subject = "timeout" → tail config/timeout
    • Iteration 3: claim.subject = "cache/timeout" → tail cache/timeout
  • Doc said: config.toml has examples but no explanation

Root Cause: Tail-path matching algorithm not documented for extractor authors

Impact:

  • Time lost: ~1 minute (Iteration 2)
  • Confusion level: Medium
  • Blocker: No (discovered via trial and error)

Recommendation:

  • Where: plan.md Day 3 Phase 4 (lines 130-145)
  • What to add:
    **⚠️ Concept Path Alignment:**
    
    Extractor `claim.subject` creates observation tail-path (last 2 segments).
    
    Example:
    - Claim: `cache/timeout`
    - Subject: `timeout` → Tail: `config/timeout`- Subject: `cache/timeout` → Tail: `cache/timeout`**Pattern:** Prefix subjects with claim namespace.
    
  • Priority: Medium (Reduces iteration count)

Gap 5: Declarative Extractor Limitations Not Documented

Type: Buried Information

Evidence:

  • Team thought: Declarative extractors can detect any pattern
  • Team did (DAY3-SUMMARY.md:193-221):
    • 5/10 violations detected (50%)
    • 5 undetected due to: declaration vs value, escaping, multi-line, context
  • Doc said: No mention of trade-offs in plan.md or README.md

Root Cause: Extractor type selection guidance missing

Impact:

  • Time lost: 0 (discovered post-exercise)
  • Confusion level: Low (understood limitations through execution)
  • Blocker: No (50% validates mechanism)

Recommendation:

  • Where: plan.md Day 3 + docs/extractors/ guide
  • What to add:
    ## Declarative vs Programmatic Extractors
    
    **Use declarative (regex in config.toml) when:**
    - ✅ Config values (`max_size: None`)
    - ✅ Function signatures (`pub async fn get`)
    
    **Use programmatic (Rust code) when:**
    - ❌ Function bodies (need to see `validate_key()` call)
    - ❌ Multi-line patterns with context
    
    **Day 3:** Use declarative for speed. Refine to programmatic in Day 5 if needed.
    
  • Priority: Medium (Improves extractor selection)

Gap 6: Authority Tier Mapping Not Explicit

Type: Missing Information

Evidence:

  • Team thought: Tiers are subjective
  • Team did: Mix of "expert" and "community" tiers (reasonable choices)
  • Doc said (docs/sources/): Mentions tiers but no criteria

Root Cause: Decision framework for tier selection not documented

Impact:

  • Time lost: 0
  • Confusion level: Low
  • Blocker: No

Recommendation:

  • Where: plan.md Day 1 section
  • What to add:
    ## Authority Tier Selection
    
    | Tier | Source | When |
    |------|--------|------|
    | Tier 1 | RFCs, Standards | Normative (MUST) |
    | Tier 2 | Vendor docs | Official recommendations |
    | Tier 3 | Community | Implementation patterns |
    
  • Priority: Low (Nice-to-have)

Non-Gaps (Team Errors)

Error 1: Separate TOML Files

What team did wrong:

  • Created 10 .toml files in .aphoria/extractors/ directory
  • Assumed directory-based loading

Doc was clear:

  • config.toml:64 shows [[extractors.declarative]] syntax inline

Reason: Misread extractor configuration format

Impact: 1 minute wasted (Iteration 1)

Not a gap: Syntax was documented and visible


Summary

Total Gaps: 6 (3 Critical, 2 Medium, 1 Low)

Total Errors: 1 (Iteration 1 wrong approach)

Critical Pattern: Documentation presents Day 3 and LLM workflows as optional when they're core to the autonomous learning flywheel.

Recommendation: Emphasize REQUIRED status for:

  1. Day 3 extractor creation (Steps 4-5 of flywheel)
  2. Continuous LLM usage (skills for ALL phases)
  3. Detection rate context (0% → 50% → 90% progression)