jml e758f2ebfb feat(aphoria): implement programmatic extractors for Option<T> semantics

Completes Task #3 of httpclient dogfooding with 100% detection rate (7/7 violations).

## New Extractors

- **OptionBoundsExtractor**: Detects Option<T> fields set to None (unbounded)
- **OptionValueExtractor**: Extracts values from Some(n) for threshold checks

Both extractors use context-aware pattern matching to understand Rust Option<T>
semantics, which declarative extractors cannot handle.

## Implementation

**Files Created**:
- applications/aphoria/src/extractors/option_bounds.rs (257 lines)
- applications/aphoria/src/extractors/option_value.rs (277 lines)
- applications/aphoria/docs/examples/extractors/programmatic-option-semantics.md

**Files Modified**:
- applications/aphoria/src/extractors/mod.rs - Added module declarations
- applications/aphoria/src/extractors/registry.rs - Registered extractors
- applications/aphoria/dogfood/httpclient/.aphoria/claims.toml - Added 4 claims
- applications/aphoria/dogfood/httpclient/TASK-1-SUMMARY.md - Task #3 completion

## Results

| Metric | Value |
|--------|-------|
| Detection Rate | 100% (7/7 violations) |
| Improvement | +29 percentage points (from 71%) |
| New Violations | 2 (max_redirects, max_retries unbounded) |
| Unit Tests | 13 (all passing) |

## Two-Claim Strategy

For each bounded Option<T> field:
1. **configured** claim - Detects None (unbounded)
2. **max_value** claim - Validates Some(n) threshold

Example:
- `max_redirects: None` → CONFLICT (not configured)
- `max_redirects: Some(20)` → CONFLICT (exceeds 10)
- `max_redirects: Some(5)` → PASS

## Enterprise Quality

✓ Proper error handling (no unwrap/expect)
✓ Comprehensive tests (6+7 unit tests)
✓ Full documentation with examples
✓ Reusable for 10+ similar patterns
✓ Screening patterns for performance

## Cachewrap Dogfood

Also includes complete cachewrap dogfood exercise:
- 10 claims for Redis cache wrapper
- Day 1-5 summaries
- Full retrospective and evaluation
- Declarative extractors for all patterns

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-11 06:43:10 +00:00

18 KiB

Raw Blame History

Setup Evaluation: cachewrap Dogfood Project

Evaluation Date: 2026-02-11 Evaluator: Claude (Setup Review Agent) Status: ⚠️ MOSTLY READY (2 gaps to fix before starting)

Executive Summary

The cachewrap dogfood project is 90% correctly set up with excellent structure, hypothesis, and documentation. However, it's missing critical Day 3 enhancements that were added to msgqueue after its Day 3 failure.

Must fix before Day 1:

Add manual fallback format to Day 3 Phase 4
Add debug workflow to Day 3 Phase 5

These fixes take ~10 minutes and prevent a Day 3 failure like msgqueue experienced.

Setup Checklist

✅ Correctly Set Up

Directory Structure (Perfect)

cachewrap/
├── README.md                    ✅ Excellent (hypothesis, metrics, status)
├── plan.md                      ⚠️ Good (needs Day 3 updates)
├── .aphoria/
│   ├── config.toml              ✅ Perfect (persistent mode, 3 corpus sources)
│   └── claims.toml              ✅ Ready (empty with instructions)
├── docs/
│   └── sources/                 ✅ Perfect (3 authority sources)
│       ├── redis-spec.md        ✅ Template with extraction guide
│       ├── aws-elasticache.md   ✅ Template ready
│       └── redis-rs-lib.md      ✅ Template ready
└── src/
    └── .gitkeep                 ✅ Placeholder with instructions

All expected directories and files present.

README.md Quality (⭐⭐⭐⭐⭐ Excellent)

✅ Hypothesis clearly stated:

"Connection patterns + resource limits + TTL semantics from 3 corpora (httpclient, dbpool, msgqueue) transfer to cache clients with 35-40% pattern reuse, demonstrating multi-domain flywheel strength"

✅ Target metrics defined:

Time savings: ≥60% vs manual
Pattern reuse: ≥35% (7/20 claims)
Detection rate: ≥90% (9/10 violations)
Naming errors: <2
Total time: 12-16 hours

✅ Difficulty calibrated: ★★★★☆ (harder than msgqueue ★★★☆☆)

✅ Corpus overlap explained:

httpclient: 4 patterns (timeout, TLS, retry, async)
dbpool: 2 patterns (max_connections, lifecycle)
msgqueue: 1 pattern (metrics)
New: 13 cache-specific patterns

✅ Violations categorized by type:

3 security (key injection, TLS disabled, plaintext credentials)
3 performance (missing TTL, unbounded size, sync blocking)
3 correctness (no eviction, timeout=0, no pooling)
1 observability (no metrics)

✅ Cross-cutting nature emphasized: Tests whether flywheel works across security + performance + correctness boundaries simultaneously.

This is gold-standard README quality.

.aphoria/config.toml (⭐⭐⭐⭐⭐ Perfect)

✅ Persistent mode enabled:

[episteme]
mode = "persistent"
corpus_db = "/home/jml/.aphoria/corpus-db"

✅ 3 corpus sources configured:

[corpus]
sources = ["httpclient", "dbpool", "msgqueue"]

✅ Corpus flags enabled:

include_rfc = true
include_owasp = true
include_vendor = true
use_community = true

✅ Inline markers enabled:

[extractors.inline_markers]
enabled = true
sync_to_pending = true

✅ Comments explain extractor expectations:

# Built-in extractors that may detect violations:
# - hardcoded_secrets: Detects violation 3
# - tls_config: Detects violation 2
# - timeout_config: May detect violation 8
#
# Custom extractors needed (created on Day 3):
# - key_validation: Violation 1
# - ttl_presence: Violation 4
# ...

This config is production-ready.

Authority Sources (⭐⭐⭐⭐☆ Very Good)

redis-spec.md (Tier 1):

✅ Template structure correct
✅ Extraction guide included
✅ Key claims identified (TTL, eviction, key validation, connection pooling)
✅ Placeholders for user to fill ("> User fills in: Fetch Redis command docs")

aws-elasticache.md (Tier 2):

✅ Template ready
✅ Best practices focus

redis-rs-lib.md (Tier 3):

✅ Template ready
✅ Community patterns focus

Minor improvement: Could pre-populate some quotes from well-known Redis docs, but templates are sufficient for dogfooding.

plan.md Day 1-2 (⭐⭐⭐⭐⭐ Excellent)

✅ Day 1 process clear:

Step 1: Discover reusable patterns (30 min)
Step 2: Draft new claims (30 min)
Step 3: Author all claims (30 min)
Step 4: Verify claims (10 min)

✅ Day 2 process detailed:

Files to create listed (config.rs, client.rs, error.rs, lib.rs)
Each violation mapped to file + line
Inline marker syntax shown
Test requirements specified (15+ tests)

✅ Violations are realistic:

Not contrived (e.g., key injection via user input directly to Redis)
Have clear consequences
Inline markers documented

Day 1-2 are production-ready.

⚠️ Gaps to Fix (Day 3)

Gap 1: Missing Manual Fallback Format (Day 3 Phase 4)

Problem: plan.md Day 3 Phase 4 only shows skill invocation:

/aphoria-custom-extractor-creator \
  --violation "cache SET without TTL" \
  --claim "cache-004"

But doesn't show what to do if skill is unavailable.

From msgqueue evaluation: Teams need manual fallback with:

Complete declarative extractor TOML format
Emphasis that subject must EXACTLY match claim concept_path
Validation steps BEFORE scanning
Link to comprehensive reference doc

What's needed:

Add after Phase 4 skill invocations:

**If skill is unavailable:** You can manually create declarative extractors. Follow the format below:

**Manual Fallback (Declarative Extractor):**

Add to `.aphoria/config.toml` for EACH violation:

\```toml
[[extractors.declarative]]
name = "descriptive_name"
pattern = 'regex_pattern_matching_code'
languages = ["rust"]

[extractors.declarative.claim]
subject = "FULL_CLAIM_CONCEPT_PATH"  # ← Copy from claim's concept_path EXACTLY
predicate = "claim_predicate"
value = inverted_value  # false if claim expects true
confidence = 0.95
\```

**⚠️ CRITICAL:** `subject` must EXACTLY match your claim's `concept_path`.

**Example (TTL presence):**
\```toml
[[extractors.declarative]]
name = "ttl_presence_check"
pattern = 'SET.*(?!EX|PX)'
languages = ["rust"]

[extractors.declarative.claim]
subject = "cachewrap/cache/ttl"  # ← Matches claim concept_path exactly
predicate = "required"
value = false  # Observing "NOT required" (violation)
confidence = 0.95
\```

**Validation Before Scanning:**
\```bash
# 1. Check subject matches claim concept_path
grep "subject =" .aphoria/config.toml
grep "concept_path =" .aphoria/claims.toml
# Subjects should match concept_paths EXACTLY

# 2. Test regex pattern matches code
grep -rE 'SET.*(?!EX|PX)' src/
# Should find the violation line

# 3. Verify TOML syntax
cargo install taplo-cli
taplo fmt --check .aphoria/config.toml
\```

**See also:** `../../docs/extractors/declarative-extractors.md` for complete reference.

Why this matters: msgqueue Day 3 failed TWICE because:

First attempt: Skipped extractor creation entirely
Second attempt: Created extractors with wrong subject format (missing prefix)

Manual fallback with validation prevents both failures.

Gap 2: Missing Debug Workflow (Day 3 Phase 5)

Problem: plan.md Day 3 Phase 5 shows expected result but doesn't explain what to do if detection rate is still 0%.

From msgqueue evaluation: After creating 7 extractors, team had 0% detection because extractor subject fields didn't match claim concept_path fields.

What's needed:

Add after Phase 5 scan commands:

**If detection rate is still 0% (extractors don't match claims):**

This means extractors ran but observations didn't align with claims. Debug:

\```bash
# Step 1: Verify observations were created
jq '.observations | length' scan-v2.json
# Expected: > 0 (if 0, patterns don't match code)

# Step 2: Compare observation paths vs claim paths
jq '.observations[].concept_path' scan-v2.json | sort -u
grep "concept_path =" .aphoria/claims.toml | sort -u
# Observation paths should END with same tail as claim paths

# Step 3: Check for tail-path mismatch
# Example mismatch:
# - Observation: cache/ttl (extractor subject too short)
# - Claim: cachewrap/cache/ttl (needs full path)
# - Fix: Update extractor subject = "cachewrap/cache/ttl"

# Step 4: Verify predicate alignment
jq '.observations[].predicate' scan-v2.json | sort -u
grep "predicate =" .aphoria/claims.toml | sort -u
# Must match exactly
\```

**Common Issue:** Extractor `subject` doesn't match claim `concept_path`.
**Fix:** Update extractor subject to use full path matching claim.

**Example Fix:**
\```toml
# Before (WRONG):
[extractors.declarative.claim]
subject = "cache/ttl"  # ❌ Missing "cachewrap/" prefix

# After (CORRECT):
[extractors.declarative.claim]
subject = "cachewrap/cache/ttl"  # ✅ Matches claim exactly
\```

Re-scan after fixing:
\```bash
aphoria scan --format json > scan-v3.json
# Should now show 9/10 conflicts
\```

Why this matters: Without debug workflow, teams spend hours in trial-and-error. With it, they can diagnose and fix alignment issues in 10 minutes.

✅ Not Missing (But Expected)

These are intentionally empty (correct for pre-Day-1 state):

✅ No Cargo.toml - Created on Day 2 when implementing code
✅ No claims-template.sh - Optional (can use CLI directly)
✅ No src/*.rs files - Created on Day 2
✅ Empty claims.toml - Filled on Day 1 via /aphoria-claims
✅ No DAY1-SUMMARY.md - Created after completing Day 1

Comparison: cachewrap vs msgqueue Setup

Aspect	msgqueue (before fixes)	cachewrap (current)	Status
Directory structure	✅ Complete	✅ Complete	Equal
README quality	✅ Excellent	✅ Excellent	Equal
Config.toml	✅ Perfect	✅ Perfect	Equal
Authority sources	✅ Complete	✅ Complete	Equal
Day 1-2 plan	✅ Detailed	✅ Detailed	Equal
Day 3 manual fallback	❌ Missing → caused failure	❌ Missing	Needs fix
Day 3 debug workflow	❌ Missing → caused failure	❌ Missing	Needs fix

cachewrap is at same state msgqueue was BEFORE Day 3 failures.

Good news: We know exactly what to add (manual fallback + debug workflow) because msgqueue failures taught us.

Validation Against Dogfooding Standards

From `aphoria-dogfood` Skill Requirements:

✅ 1. Test Something New (Hypothesis Required):

Clear hypothesis: "3 corpora → 35-40% reuse in 4th domain"
Specific and measurable

✅ 2. Reuse Is the Magic (30%+ Corpus Overlap):

Expected: 35% (7/20 claims)
Justified by pattern analysis (4 from httpclient, 2 from dbpool, 1 from msgqueue)

✅ 3. Violations Must Be Intentional (7-10 with Consequences):

10 violations planned
Each has consequence
Each has inline marker syntax documented

✅ 4. Quantify Everything (Metrics Required):

Time savings: ≥60%
Pattern reuse: ≥35%
Detection rate: ≥90%
Naming errors: <2
Total time: 12-16 hours

✅ 5. Follow the 5-Day Arc:

Day 1: Claims (1-2 hrs)
Day 2: Implementation (3-4 hrs)
Day 3: Scanning (1.5-2 hrs)
Day 4: Remediation (3-4 hrs)
Day 5: Documentation (2-3 hrs)

All standards met except Day 3 manual fallback + debug workflow.

Difficulty Assessment

Rated: ★★★★☆ (4/5 stars)

Justification (from README):

Lower corpus overlap (35% vs msgqueue's 50%)
Cross-cutting violations (security + performance + correctness)
Stateful semantics (cache invalidation, TTL, consistency)
Subtle bugs (key injection, race conditions)

Time estimate: 12-16 hours (vs msgqueue's 8-10 hours)

Is this realistic?

Comparing to completed exercises:

httpclient: 8-10 hrs (baseline, 0% reuse) ✅ Realistic
msgqueue: 8-10 hrs (50% reuse) ✅ Realistic
cachewrap: 12-16 hrs (35% reuse, higher complexity) ✅ Realistic

Why longer despite corpus:

3 corpus sources = more discovery time (Day 1 takes longer)
13 new patterns (vs msgqueue's 11) = more authoring (Day 1)
10 violations (vs msgqueue's 8) = more implementation (Day 2)
Cross-cutting violations = more complex extractors (Day 3)

Difficulty rating is well-calibrated.

Domain Validation

Why Cache Client? (From README)

✅ Tests multi-domain transfer: Patterns from HTTP + DB + messaging → caching ✅ Tests cross-cutting concerns: Security + performance + correctness simultaneously ✅ Tests stateful semantics: TTL, eviction, consistency (harder than stateless HTTP) ✅ Tests corpus adaptability: 3 sources with 35% overlap

This is a valid progression:

httpclient: Baseline (no corpus)
dbpool: Single-source transfer (httpclient → dbpool)
msgqueue: Dual-source transfer (httpclient + dbpool → msgqueue)
cachewrap: Triple-source transfer (httpclient + dbpool + msgqueue → cache)

Each exercise increases complexity and validates a deeper aspect of the flywheel.

Corpus Overlap Analysis

Claimed Reuse (7/20 = 35%)

From httpclient (4 patterns):

timeout → cache timeout ✅ Valid (connection timeout)
tls/certificate_validation → cache TLS ✅ Valid (secure connection)
retry/max_attempts → cache retry ✅ Valid (operation retry)
async/runtime → cache async ✅ Valid (async I/O)

From dbpool (2 patterns):

max_connections → cache max connections ✅ Valid (connection pooling)
connection_lifecycle → cache connection lifecycle ✅ Valid (cleanup)

From msgqueue (1 pattern):

metrics/enabled → cache metrics ✅ Valid (observability)

Assessment: All 7 reuse claims are legitimate pattern transfers. Not forced.

New Patterns (13 cache-specific)

TTL and expiration (3) ✅ Cache-specific
Key validation and injection (2) ✅ Cache-specific
Eviction policies (2) ✅ Cache-specific
Serialization and compression (2) ✅ Cache-specific
Consistency and sharding (2) ✅ Cache-specific
Circuit breaker, stampede prevention (2) ✅ Cache-specific

Assessment: 13 new patterns are genuinely cache-specific, not variations of existing patterns.

35% reuse estimate is realistic.

Recommendations

Immediate (Before Starting Day 1) - ~10 minutes

1. Add manual fallback to plan.md Day 3 Phase 4:

Copy format from dogfood/msgqueue/plan.md lines 303-341
Adapt example from msgqueue → cachewrap
Link to ../../docs/extractors/declarative-extractors.md

2. Add debug workflow to plan.md Day 3 Phase 5:

Copy format from dogfood/msgqueue/plan.md lines 342-385
Adapt commands for cachewrap (subject paths, predicates)

Impact: Prevents Day 3 failure like msgqueue experienced (70 minutes wasted)

Optional (Before Starting) - ~30 minutes

3. Create claims-template.sh:

Batch script to create all 20 claims
Reduces Day 1 time from 1-2 hours → 45 minutes
See dogfood/httpclient/create-claims.sh for template

4. Pre-populate authority sources:

Add 2-3 actual quotes from Redis docs to redis-spec.md
Reduces Day 1 discovery time
But templates are sufficient - not critical

During Execution

5. Track detection rate pattern:

On Day 3, track:

Baseline scan: X/10 detected
After extractor creation: Y/10 detected
Expected: 0-2 → 9-10 (big improvement)

This validates the cross-domain flywheel hypothesis.

6. Compare to msgqueue metrics:

After Day 5, compare:

msgqueue: 50% reuse, 8-10 hours, 100% detection
cachewrap: 35% reuse, 12-16 hours, ≥90% detection

If cachewrap takes <60% more time despite 30% less reuse, the flywheel scales well.

Final Verdict

Status: ⚠️ 90% Ready - Fix 2 Gaps

What's excellent:

⭐⭐⭐⭐⭐ README (hypothesis, metrics, difficulty)
⭐⭐⭐⭐⭐ Config (persistent mode, 3 corpus sources)
⭐⭐⭐⭐⭐ Day 1-2 plan (detailed, realistic)
⭐⭐⭐⭐☆ Authority sources (templates ready)
⭐⭐⭐⭐⭐ Domain choice (validates multi-domain transfer)

What needs fixing:

⚠️ Day 3 Phase 4: Add manual fallback format
⚠️ Day 3 Phase 5: Add debug workflow

Time to fix: ~10 minutes

After fixes: ✅ Ready to start Day 1

Comparison to Gold Standard (httpclient)

Aspect	httpclient	cachewrap	Rating
Directory structure	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Equal
README hypothesis	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Equal
Config quality	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Equal
Authority sources	⭐⭐⭐⭐⭐ (filled)	⭐⭐⭐⭐☆ (templates)	Slightly lower
Day 1-2 plan	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Equal
Day 3 plan	⭐⭐⭐⭐⭐ (complete)	⭐⭐⭐☆☆ (missing 2 features)	Needs update
Day 4-5 plan	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Equal

Overall: cachewrap is httpclient-quality except for Day 3 gaps (which are easy to fix).

Action Items

For Setup Owner (Do Now)

Copy manual fallback format from msgqueue to cachewrap plan.md Phase 4
Copy debug workflow from msgqueue to cachewrap plan.md Phase 5
Review additions for cachewrap-specific terminology
Commit changes

Time: 10 minutes

For Day 1 Executor (When Starting)

Read plan.md completely before starting
Verify /aphoria-suggest skill available
Verify /aphoria-claims skill available
Have docs/extractors/declarative-extractors.md open for reference

For Day 3 Executor (Critical)

DO NOT skip Phase 4 (extractor creation) - This is the flywheel validation
Follow 6-phase workflow exactly (pre-flight → scan → gap → create → verify → document)
If 0% detection after Phase 5 → Use debug workflow immediately
Document detection rate improvement (v1 → v2)

Evaluation complete: 2026-02-11

Next step: Fix 2 Day 3 gaps, then ready to start Day 1.

18 KiB Raw Blame History