stemedb/applications/aphoria/dogfood/cachewrap/SETUP-EVALUATION.md
jml e758f2ebfb feat(aphoria): implement programmatic extractors for Option<T> semantics
Completes Task #3 of httpclient dogfooding with 100% detection rate (7/7 violations).

## New Extractors

- **OptionBoundsExtractor**: Detects Option<T> fields set to None (unbounded)
- **OptionValueExtractor**: Extracts values from Some(n) for threshold checks

Both extractors use context-aware pattern matching to understand Rust Option<T>
semantics, which declarative extractors cannot handle.

## Implementation

**Files Created**:
- applications/aphoria/src/extractors/option_bounds.rs (257 lines)
- applications/aphoria/src/extractors/option_value.rs (277 lines)
- applications/aphoria/docs/examples/extractors/programmatic-option-semantics.md

**Files Modified**:
- applications/aphoria/src/extractors/mod.rs - Added module declarations
- applications/aphoria/src/extractors/registry.rs - Registered extractors
- applications/aphoria/dogfood/httpclient/.aphoria/claims.toml - Added 4 claims
- applications/aphoria/dogfood/httpclient/TASK-1-SUMMARY.md - Task #3 completion

## Results

| Metric | Value |
|--------|-------|
| Detection Rate | 100% (7/7 violations) |
| Improvement | +29 percentage points (from 71%) |
| New Violations | 2 (max_redirects, max_retries unbounded) |
| Unit Tests | 13 (all passing) |

## Two-Claim Strategy

For each bounded Option<T> field:
1. **configured** claim - Detects None (unbounded)
2. **max_value** claim - Validates Some(n) threshold

Example:
- `max_redirects: None` → CONFLICT (not configured)
- `max_redirects: Some(20)` → CONFLICT (exceeds 10)
- `max_redirects: Some(5)` → PASS

## Enterprise Quality

✓ Proper error handling (no unwrap/expect)
✓ Comprehensive tests (6+7 unit tests)
✓ Full documentation with examples
✓ Reusable for 10+ similar patterns
✓ Screening patterns for performance

## Cachewrap Dogfood

Also includes complete cachewrap dogfood exercise:
- 10 claims for Redis cache wrapper
- Day 1-5 summaries
- Full retrospective and evaluation
- Declarative extractors for all patterns

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 06:43:10 +00:00

18 KiB

Setup Evaluation: cachewrap Dogfood Project

Evaluation Date: 2026-02-11 Evaluator: Claude (Setup Review Agent) Status: ⚠️ MOSTLY READY (2 gaps to fix before starting)


Executive Summary

The cachewrap dogfood project is 90% correctly set up with excellent structure, hypothesis, and documentation. However, it's missing critical Day 3 enhancements that were added to msgqueue after its Day 3 failure.

Must fix before Day 1:

  1. Add manual fallback format to Day 3 Phase 4
  2. Add debug workflow to Day 3 Phase 5

These fixes take ~10 minutes and prevent a Day 3 failure like msgqueue experienced.


Setup Checklist

Correctly Set Up

Directory Structure (Perfect)

cachewrap/
├── README.md                    ✅ Excellent (hypothesis, metrics, status)
├── plan.md                      ⚠️ Good (needs Day 3 updates)
├── .aphoria/
│   ├── config.toml              ✅ Perfect (persistent mode, 3 corpus sources)
│   └── claims.toml              ✅ Ready (empty with instructions)
├── docs/
│   └── sources/                 ✅ Perfect (3 authority sources)
│       ├── redis-spec.md        ✅ Template with extraction guide
│       ├── aws-elasticache.md   ✅ Template ready
│       └── redis-rs-lib.md      ✅ Template ready
└── src/
    └── .gitkeep                 ✅ Placeholder with instructions

All expected directories and files present.


README.md Quality ( Excellent)

Hypothesis clearly stated:

"Connection patterns + resource limits + TTL semantics from 3 corpora (httpclient, dbpool, msgqueue) transfer to cache clients with 35-40% pattern reuse, demonstrating multi-domain flywheel strength"

Target metrics defined:

  • Time savings: ≥60% vs manual
  • Pattern reuse: ≥35% (7/20 claims)
  • Detection rate: ≥90% (9/10 violations)
  • Naming errors: <2
  • Total time: 12-16 hours

Difficulty calibrated: ★★★★☆ (harder than msgqueue ★★★☆☆)

Corpus overlap explained:

  • httpclient: 4 patterns (timeout, TLS, retry, async)
  • dbpool: 2 patterns (max_connections, lifecycle)
  • msgqueue: 1 pattern (metrics)
  • New: 13 cache-specific patterns

Violations categorized by type:

  • 3 security (key injection, TLS disabled, plaintext credentials)
  • 3 performance (missing TTL, unbounded size, sync blocking)
  • 3 correctness (no eviction, timeout=0, no pooling)
  • 1 observability (no metrics)

Cross-cutting nature emphasized: Tests whether flywheel works across security + performance + correctness boundaries simultaneously.

This is gold-standard README quality.


.aphoria/config.toml ( Perfect)

Persistent mode enabled:

[episteme]
mode = "persistent"
corpus_db = "/home/jml/.aphoria/corpus-db"

3 corpus sources configured:

[corpus]
sources = ["httpclient", "dbpool", "msgqueue"]

Corpus flags enabled:

include_rfc = true
include_owasp = true
include_vendor = true
use_community = true

Inline markers enabled:

[extractors.inline_markers]
enabled = true
sync_to_pending = true

Comments explain extractor expectations:

# Built-in extractors that may detect violations:
# - hardcoded_secrets: Detects violation 3
# - tls_config: Detects violation 2
# - timeout_config: May detect violation 8
#
# Custom extractors needed (created on Day 3):
# - key_validation: Violation 1
# - ttl_presence: Violation 4
# ...

This config is production-ready.


Authority Sources (☆ Very Good)

redis-spec.md (Tier 1):

  • Template structure correct
  • Extraction guide included
  • Key claims identified (TTL, eviction, key validation, connection pooling)
  • Placeholders for user to fill ("> User fills in: Fetch Redis command docs")

aws-elasticache.md (Tier 2):

  • Template ready
  • Best practices focus

redis-rs-lib.md (Tier 3):

  • Template ready
  • Community patterns focus

Minor improvement: Could pre-populate some quotes from well-known Redis docs, but templates are sufficient for dogfooding.


plan.md Day 1-2 ( Excellent)

Day 1 process clear:

  • Step 1: Discover reusable patterns (30 min)
  • Step 2: Draft new claims (30 min)
  • Step 3: Author all claims (30 min)
  • Step 4: Verify claims (10 min)

Day 2 process detailed:

  • Files to create listed (config.rs, client.rs, error.rs, lib.rs)
  • Each violation mapped to file + line
  • Inline marker syntax shown
  • Test requirements specified (15+ tests)

Violations are realistic:

  • Not contrived (e.g., key injection via user input directly to Redis)
  • Have clear consequences
  • Inline markers documented

Day 1-2 are production-ready.


⚠️ Gaps to Fix (Day 3)

Gap 1: Missing Manual Fallback Format (Day 3 Phase 4)

Problem: plan.md Day 3 Phase 4 only shows skill invocation:

/aphoria-custom-extractor-creator \
  --violation "cache SET without TTL" \
  --claim "cache-004"

But doesn't show what to do if skill is unavailable.

From msgqueue evaluation: Teams need manual fallback with:

  1. Complete declarative extractor TOML format
  2. Emphasis that subject must EXACTLY match claim concept_path
  3. Validation steps BEFORE scanning
  4. Link to comprehensive reference doc

What's needed:

Add after Phase 4 skill invocations:

**If skill is unavailable:** You can manually create declarative extractors. Follow the format below:

**Manual Fallback (Declarative Extractor):**

Add to `.aphoria/config.toml` for EACH violation:

\```toml
[[extractors.declarative]]
name = "descriptive_name"
pattern = 'regex_pattern_matching_code'
languages = ["rust"]

[extractors.declarative.claim]
subject = "FULL_CLAIM_CONCEPT_PATH"  # ← Copy from claim's concept_path EXACTLY
predicate = "claim_predicate"
value = inverted_value  # false if claim expects true
confidence = 0.95
\```

**⚠️ CRITICAL:** `subject` must EXACTLY match your claim's `concept_path`.

**Example (TTL presence):**
\```toml
[[extractors.declarative]]
name = "ttl_presence_check"
pattern = 'SET.*(?!EX|PX)'
languages = ["rust"]

[extractors.declarative.claim]
subject = "cachewrap/cache/ttl"  # ← Matches claim concept_path exactly
predicate = "required"
value = false  # Observing "NOT required" (violation)
confidence = 0.95
\```

**Validation Before Scanning:**
\```bash
# 1. Check subject matches claim concept_path
grep "subject =" .aphoria/config.toml
grep "concept_path =" .aphoria/claims.toml
# Subjects should match concept_paths EXACTLY

# 2. Test regex pattern matches code
grep -rE 'SET.*(?!EX|PX)' src/
# Should find the violation line

# 3. Verify TOML syntax
cargo install taplo-cli
taplo fmt --check .aphoria/config.toml
\```

**See also:** `../../docs/extractors/declarative-extractors.md` for complete reference.

Why this matters: msgqueue Day 3 failed TWICE because:

  1. First attempt: Skipped extractor creation entirely
  2. Second attempt: Created extractors with wrong subject format (missing prefix)

Manual fallback with validation prevents both failures.


Gap 2: Missing Debug Workflow (Day 3 Phase 5)

Problem: plan.md Day 3 Phase 5 shows expected result but doesn't explain what to do if detection rate is still 0%.

From msgqueue evaluation: After creating 7 extractors, team had 0% detection because extractor subject fields didn't match claim concept_path fields.

What's needed:

Add after Phase 5 scan commands:

**If detection rate is still 0% (extractors don't match claims):**

This means extractors ran but observations didn't align with claims. Debug:

\```bash
# Step 1: Verify observations were created
jq '.observations | length' scan-v2.json
# Expected: > 0 (if 0, patterns don't match code)

# Step 2: Compare observation paths vs claim paths
jq '.observations[].concept_path' scan-v2.json | sort -u
grep "concept_path =" .aphoria/claims.toml | sort -u
# Observation paths should END with same tail as claim paths

# Step 3: Check for tail-path mismatch
# Example mismatch:
# - Observation: cache/ttl (extractor subject too short)
# - Claim: cachewrap/cache/ttl (needs full path)
# - Fix: Update extractor subject = "cachewrap/cache/ttl"

# Step 4: Verify predicate alignment
jq '.observations[].predicate' scan-v2.json | sort -u
grep "predicate =" .aphoria/claims.toml | sort -u
# Must match exactly
\```

**Common Issue:** Extractor `subject` doesn't match claim `concept_path`.
**Fix:** Update extractor subject to use full path matching claim.

**Example Fix:**
\```toml
# Before (WRONG):
[extractors.declarative.claim]
subject = "cache/ttl"  # ❌ Missing "cachewrap/" prefix

# After (CORRECT):
[extractors.declarative.claim]
subject = "cachewrap/cache/ttl"  # ✅ Matches claim exactly
\```

Re-scan after fixing:
\```bash
aphoria scan --format json > scan-v3.json
# Should now show 9/10 conflicts
\```

Why this matters: Without debug workflow, teams spend hours in trial-and-error. With it, they can diagnose and fix alignment issues in 10 minutes.


Not Missing (But Expected)

These are intentionally empty (correct for pre-Day-1 state):

  • No Cargo.toml - Created on Day 2 when implementing code
  • No claims-template.sh - Optional (can use CLI directly)
  • No src/*.rs files - Created on Day 2
  • Empty claims.toml - Filled on Day 1 via /aphoria-claims
  • No DAY1-SUMMARY.md - Created after completing Day 1

Comparison: cachewrap vs msgqueue Setup

Aspect msgqueue (before fixes) cachewrap (current) Status
Directory structure Complete Complete Equal
README quality Excellent Excellent Equal
Config.toml Perfect Perfect Equal
Authority sources Complete Complete Equal
Day 1-2 plan Detailed Detailed Equal
Day 3 manual fallback Missing → caused failure Missing Needs fix
Day 3 debug workflow Missing → caused failure Missing Needs fix

cachewrap is at same state msgqueue was BEFORE Day 3 failures.

Good news: We know exactly what to add (manual fallback + debug workflow) because msgqueue failures taught us.


Validation Against Dogfooding Standards

From aphoria-dogfood Skill Requirements:

1. Test Something New (Hypothesis Required):

  • Clear hypothesis: "3 corpora → 35-40% reuse in 4th domain"
  • Specific and measurable

2. Reuse Is the Magic (30%+ Corpus Overlap):

  • Expected: 35% (7/20 claims)
  • Justified by pattern analysis (4 from httpclient, 2 from dbpool, 1 from msgqueue)

3. Violations Must Be Intentional (7-10 with Consequences):

  • 10 violations planned
  • Each has consequence
  • Each has inline marker syntax documented

4. Quantify Everything (Metrics Required):

  • Time savings: ≥60%
  • Pattern reuse: ≥35%
  • Detection rate: ≥90%
  • Naming errors: <2
  • Total time: 12-16 hours

5. Follow the 5-Day Arc:

  • Day 1: Claims (1-2 hrs)
  • Day 2: Implementation (3-4 hrs)
  • Day 3: Scanning (1.5-2 hrs)
  • Day 4: Remediation (3-4 hrs)
  • Day 5: Documentation (2-3 hrs)

All standards met except Day 3 manual fallback + debug workflow.


Difficulty Assessment

Rated: ★★★★☆ (4/5 stars)

Justification (from README):

  • Lower corpus overlap (35% vs msgqueue's 50%)
  • Cross-cutting violations (security + performance + correctness)
  • Stateful semantics (cache invalidation, TTL, consistency)
  • Subtle bugs (key injection, race conditions)

Time estimate: 12-16 hours (vs msgqueue's 8-10 hours)

Is this realistic?

Comparing to completed exercises:

  • httpclient: 8-10 hrs (baseline, 0% reuse) Realistic
  • msgqueue: 8-10 hrs (50% reuse) Realistic
  • cachewrap: 12-16 hrs (35% reuse, higher complexity) Realistic

Why longer despite corpus:

  • 3 corpus sources = more discovery time (Day 1 takes longer)
  • 13 new patterns (vs msgqueue's 11) = more authoring (Day 1)
  • 10 violations (vs msgqueue's 8) = more implementation (Day 2)
  • Cross-cutting violations = more complex extractors (Day 3)

Difficulty rating is well-calibrated.


Domain Validation

Why Cache Client? (From README)

Tests multi-domain transfer: Patterns from HTTP + DB + messaging → caching Tests cross-cutting concerns: Security + performance + correctness simultaneously Tests stateful semantics: TTL, eviction, consistency (harder than stateless HTTP) Tests corpus adaptability: 3 sources with 35% overlap

This is a valid progression:

  1. httpclient: Baseline (no corpus)
  2. dbpool: Single-source transfer (httpclient → dbpool)
  3. msgqueue: Dual-source transfer (httpclient + dbpool → msgqueue)
  4. cachewrap: Triple-source transfer (httpclient + dbpool + msgqueue → cache)

Each exercise increases complexity and validates a deeper aspect of the flywheel.


Corpus Overlap Analysis

Claimed Reuse (7/20 = 35%)

From httpclient (4 patterns):

  • timeout → cache timeout Valid (connection timeout)
  • tls/certificate_validation → cache TLS Valid (secure connection)
  • retry/max_attempts → cache retry Valid (operation retry)
  • async/runtime → cache async Valid (async I/O)

From dbpool (2 patterns):

  • max_connections → cache max connections Valid (connection pooling)
  • connection_lifecycle → cache connection lifecycle Valid (cleanup)

From msgqueue (1 pattern):

  • metrics/enabled → cache metrics Valid (observability)

Assessment: All 7 reuse claims are legitimate pattern transfers. Not forced.


New Patterns (13 cache-specific)

  • TTL and expiration (3) Cache-specific
  • Key validation and injection (2) Cache-specific
  • Eviction policies (2) Cache-specific
  • Serialization and compression (2) Cache-specific
  • Consistency and sharding (2) Cache-specific
  • Circuit breaker, stampede prevention (2) Cache-specific

Assessment: 13 new patterns are genuinely cache-specific, not variations of existing patterns.

35% reuse estimate is realistic.


Recommendations

Immediate (Before Starting Day 1) - ~10 minutes

1. Add manual fallback to plan.md Day 3 Phase 4:

  • Copy format from dogfood/msgqueue/plan.md lines 303-341
  • Adapt example from msgqueue → cachewrap
  • Link to ../../docs/extractors/declarative-extractors.md

2. Add debug workflow to plan.md Day 3 Phase 5:

  • Copy format from dogfood/msgqueue/plan.md lines 342-385
  • Adapt commands for cachewrap (subject paths, predicates)

Impact: Prevents Day 3 failure like msgqueue experienced (70 minutes wasted)


Optional (Before Starting) - ~30 minutes

3. Create claims-template.sh:

  • Batch script to create all 20 claims
  • Reduces Day 1 time from 1-2 hours → 45 minutes
  • See dogfood/httpclient/create-claims.sh for template

4. Pre-populate authority sources:

  • Add 2-3 actual quotes from Redis docs to redis-spec.md
  • Reduces Day 1 discovery time
  • But templates are sufficient - not critical

During Execution

5. Track detection rate pattern:

On Day 3, track:

  • Baseline scan: X/10 detected
  • After extractor creation: Y/10 detected
  • Expected: 0-2 → 9-10 (big improvement)

This validates the cross-domain flywheel hypothesis.

6. Compare to msgqueue metrics:

After Day 5, compare:

  • msgqueue: 50% reuse, 8-10 hours, 100% detection
  • cachewrap: 35% reuse, 12-16 hours, ≥90% detection

If cachewrap takes <60% more time despite 30% less reuse, the flywheel scales well.


Final Verdict

Status: ⚠️ 90% Ready - Fix 2 Gaps

What's excellent:

  • README (hypothesis, metrics, difficulty)
  • Config (persistent mode, 3 corpus sources)
  • Day 1-2 plan (detailed, realistic)
  • ☆ Authority sources (templates ready)
  • Domain choice (validates multi-domain transfer)

What needs fixing:

  • ⚠️ Day 3 Phase 4: Add manual fallback format
  • ⚠️ Day 3 Phase 5: Add debug workflow

Time to fix: ~10 minutes

After fixes: Ready to start Day 1


Comparison to Gold Standard (httpclient)

Aspect httpclient cachewrap Rating
Directory structure Equal
README hypothesis Equal
Config quality Equal
Authority sources (filled) ☆ (templates) Slightly lower
Day 1-2 plan Equal
Day 3 plan (complete) ☆☆ (missing 2 features) Needs update
Day 4-5 plan Equal

Overall: cachewrap is httpclient-quality except for Day 3 gaps (which are easy to fix).


Action Items

For Setup Owner (Do Now)

  • Copy manual fallback format from msgqueue to cachewrap plan.md Phase 4
  • Copy debug workflow from msgqueue to cachewrap plan.md Phase 5
  • Review additions for cachewrap-specific terminology
  • Commit changes

Time: 10 minutes

For Day 1 Executor (When Starting)

  • Read plan.md completely before starting
  • Verify /aphoria-suggest skill available
  • Verify /aphoria-claims skill available
  • Have docs/extractors/declarative-extractors.md open for reference

For Day 3 Executor (Critical)

  • DO NOT skip Phase 4 (extractor creation) - This is the flywheel validation
  • Follow 6-phase workflow exactly (pre-flight → scan → gap → create → verify → document)
  • If 0% detection after Phase 5 → Use debug workflow immediately
  • Document detection rate improvement (v1 → v2)

Evaluation complete: 2026-02-11

Next step: Fix 2 Day 3 gaps, then ready to start Day 1.