stemedb/applications/aphoria/dogfood/cachewrap/SETUP-EVALUATION.md
jml e758f2ebfb feat(aphoria): implement programmatic extractors for Option<T> semantics
Completes Task #3 of httpclient dogfooding with 100% detection rate (7/7 violations).

## New Extractors

- **OptionBoundsExtractor**: Detects Option<T> fields set to None (unbounded)
- **OptionValueExtractor**: Extracts values from Some(n) for threshold checks

Both extractors use context-aware pattern matching to understand Rust Option<T>
semantics, which declarative extractors cannot handle.

## Implementation

**Files Created**:
- applications/aphoria/src/extractors/option_bounds.rs (257 lines)
- applications/aphoria/src/extractors/option_value.rs (277 lines)
- applications/aphoria/docs/examples/extractors/programmatic-option-semantics.md

**Files Modified**:
- applications/aphoria/src/extractors/mod.rs - Added module declarations
- applications/aphoria/src/extractors/registry.rs - Registered extractors
- applications/aphoria/dogfood/httpclient/.aphoria/claims.toml - Added 4 claims
- applications/aphoria/dogfood/httpclient/TASK-1-SUMMARY.md - Task #3 completion

## Results

| Metric | Value |
|--------|-------|
| Detection Rate | 100% (7/7 violations) |
| Improvement | +29 percentage points (from 71%) |
| New Violations | 2 (max_redirects, max_retries unbounded) |
| Unit Tests | 13 (all passing) |

## Two-Claim Strategy

For each bounded Option<T> field:
1. **configured** claim - Detects None (unbounded)
2. **max_value** claim - Validates Some(n) threshold

Example:
- `max_redirects: None` → CONFLICT (not configured)
- `max_redirects: Some(20)` → CONFLICT (exceeds 10)
- `max_redirects: Some(5)` → PASS

## Enterprise Quality

✓ Proper error handling (no unwrap/expect)
✓ Comprehensive tests (6+7 unit tests)
✓ Full documentation with examples
✓ Reusable for 10+ similar patterns
✓ Screening patterns for performance

## Cachewrap Dogfood

Also includes complete cachewrap dogfood exercise:
- 10 claims for Redis cache wrapper
- Day 1-5 summaries
- Full retrospective and evaluation
- Declarative extractors for all patterns

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 06:43:10 +00:00

586 lines
18 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Setup Evaluation: cachewrap Dogfood Project
**Evaluation Date:** 2026-02-11
**Evaluator:** Claude (Setup Review Agent)
**Status:** ⚠️ **MOSTLY READY** (2 gaps to fix before starting)
---
## Executive Summary
The cachewrap dogfood project is **90% correctly set up** with excellent structure, hypothesis, and documentation. However, it's **missing critical Day 3 enhancements** that were added to msgqueue after its Day 3 failure.
**Must fix before Day 1:**
1. Add manual fallback format to Day 3 Phase 4
2. Add debug workflow to Day 3 Phase 5
**These fixes take ~10 minutes and prevent a Day 3 failure like msgqueue experienced.**
---
## Setup Checklist
### ✅ Correctly Set Up
#### Directory Structure (Perfect)
```
cachewrap/
├── README.md ✅ Excellent (hypothesis, metrics, status)
├── plan.md ⚠️ Good (needs Day 3 updates)
├── .aphoria/
│ ├── config.toml ✅ Perfect (persistent mode, 3 corpus sources)
│ └── claims.toml ✅ Ready (empty with instructions)
├── docs/
│ └── sources/ ✅ Perfect (3 authority sources)
│ ├── redis-spec.md ✅ Template with extraction guide
│ ├── aws-elasticache.md ✅ Template ready
│ └── redis-rs-lib.md ✅ Template ready
└── src/
└── .gitkeep ✅ Placeholder with instructions
```
**All expected directories and files present.**
---
#### README.md Quality (⭐⭐⭐⭐⭐ Excellent)
**Hypothesis clearly stated:**
> "Connection patterns + resource limits + TTL semantics from 3 corpora (httpclient, dbpool, msgqueue) transfer to cache clients with 35-40% pattern reuse, demonstrating multi-domain flywheel strength"
**Target metrics defined:**
- Time savings: ≥60% vs manual
- Pattern reuse: ≥35% (7/20 claims)
- Detection rate: ≥90% (9/10 violations)
- Naming errors: <2
- Total time: 12-16 hours
**Difficulty calibrated:** ★★★★☆ (harder than msgqueue ★★★☆☆)
**Corpus overlap explained:**
- httpclient: 4 patterns (timeout, TLS, retry, async)
- dbpool: 2 patterns (max_connections, lifecycle)
- msgqueue: 1 pattern (metrics)
- New: 13 cache-specific patterns
**Violations categorized by type:**
- 3 security (key injection, TLS disabled, plaintext credentials)
- 3 performance (missing TTL, unbounded size, sync blocking)
- 3 correctness (no eviction, timeout=0, no pooling)
- 1 observability (no metrics)
**Cross-cutting nature emphasized:**
Tests whether flywheel works across security + performance + correctness boundaries simultaneously.
**This is gold-standard README quality.**
---
#### .aphoria/config.toml (⭐⭐⭐⭐⭐ Perfect)
**Persistent mode enabled:**
```toml
[episteme]
mode = "persistent"
corpus_db = "/home/jml/.aphoria/corpus-db"
```
**3 corpus sources configured:**
```toml
[corpus]
sources = ["httpclient", "dbpool", "msgqueue"]
```
**Corpus flags enabled:**
```toml
include_rfc = true
include_owasp = true
include_vendor = true
use_community = true
```
**Inline markers enabled:**
```toml
[extractors.inline_markers]
enabled = true
sync_to_pending = true
```
**Comments explain extractor expectations:**
```toml
# Built-in extractors that may detect violations:
# - hardcoded_secrets: Detects violation 3
# - tls_config: Detects violation 2
# - timeout_config: May detect violation 8
#
# Custom extractors needed (created on Day 3):
# - key_validation: Violation 1
# - ttl_presence: Violation 4
# ...
```
**This config is production-ready.**
---
#### Authority Sources (⭐⭐⭐⭐☆ Very Good)
**redis-spec.md (Tier 1):**
- Template structure correct
- Extraction guide included
- Key claims identified (TTL, eviction, key validation, connection pooling)
- Placeholders for user to fill ("> **User fills in:** Fetch Redis command docs")
**aws-elasticache.md (Tier 2):**
- ✅ Template ready
- ✅ Best practices focus
**redis-rs-lib.md (Tier 3):**
- ✅ Template ready
- ✅ Community patterns focus
**Minor improvement:** Could pre-populate some quotes from well-known Redis docs, but templates are sufficient for dogfooding.
---
#### plan.md Day 1-2 (⭐⭐⭐⭐⭐ Excellent)
**Day 1 process clear:**
- Step 1: Discover reusable patterns (30 min)
- Step 2: Draft new claims (30 min)
- Step 3: Author all claims (30 min)
- Step 4: Verify claims (10 min)
**Day 2 process detailed:**
- Files to create listed (config.rs, client.rs, error.rs, lib.rs)
- Each violation mapped to file + line
- Inline marker syntax shown
- Test requirements specified (15+ tests)
**Violations are realistic:**
- Not contrived (e.g., key injection via user input directly to Redis)
- Have clear consequences
- Inline markers documented
**Day 1-2 are production-ready.**
---
### ⚠️ Gaps to Fix (Day 3)
#### Gap 1: Missing Manual Fallback Format (Day 3 Phase 4)
**Problem:** plan.md Day 3 Phase 4 only shows skill invocation:
```bash
/aphoria-custom-extractor-creator \
--violation "cache SET without TTL" \
--claim "cache-004"
```
**But doesn't show what to do if skill is unavailable.**
**From msgqueue evaluation:** Teams need manual fallback with:
1. Complete declarative extractor TOML format
2. Emphasis that `subject` must EXACTLY match claim `concept_path`
3. Validation steps BEFORE scanning
4. Link to comprehensive reference doc
**What's needed:**
Add after Phase 4 skill invocations:
```markdown
**If skill is unavailable:** You can manually create declarative extractors. Follow the format below:
**Manual Fallback (Declarative Extractor):**
Add to `.aphoria/config.toml` for EACH violation:
\```toml
[[extractors.declarative]]
name = "descriptive_name"
pattern = 'regex_pattern_matching_code'
languages = ["rust"]
[extractors.declarative.claim]
subject = "FULL_CLAIM_CONCEPT_PATH" # ← Copy from claim's concept_path EXACTLY
predicate = "claim_predicate"
value = inverted_value # false if claim expects true
confidence = 0.95
\```
**⚠️ CRITICAL:** `subject` must EXACTLY match your claim's `concept_path`.
**Example (TTL presence):**
\```toml
[[extractors.declarative]]
name = "ttl_presence_check"
pattern = 'SET.*(?!EX|PX)'
languages = ["rust"]
[extractors.declarative.claim]
subject = "cachewrap/cache/ttl" # ← Matches claim concept_path exactly
predicate = "required"
value = false # Observing "NOT required" (violation)
confidence = 0.95
\```
**Validation Before Scanning:**
\```bash
# 1. Check subject matches claim concept_path
grep "subject =" .aphoria/config.toml
grep "concept_path =" .aphoria/claims.toml
# Subjects should match concept_paths EXACTLY
# 2. Test regex pattern matches code
grep -rE 'SET.*(?!EX|PX)' src/
# Should find the violation line
# 3. Verify TOML syntax
cargo install taplo-cli
taplo fmt --check .aphoria/config.toml
\```
**See also:** `../../docs/extractors/declarative-extractors.md` for complete reference.
```
**Why this matters:** msgqueue Day 3 failed TWICE because:
1. First attempt: Skipped extractor creation entirely
2. Second attempt: Created extractors with wrong `subject` format (missing prefix)
Manual fallback with validation prevents both failures.
---
#### Gap 2: Missing Debug Workflow (Day 3 Phase 5)
**Problem:** plan.md Day 3 Phase 5 shows expected result but doesn't explain **what to do if detection rate is still 0%**.
**From msgqueue evaluation:** After creating 7 extractors, team had 0% detection because extractor `subject` fields didn't match claim `concept_path` fields.
**What's needed:**
Add after Phase 5 scan commands:
```markdown
**If detection rate is still 0% (extractors don't match claims):**
This means extractors ran but observations didn't align with claims. Debug:
\```bash
# Step 1: Verify observations were created
jq '.observations | length' scan-v2.json
# Expected: > 0 (if 0, patterns don't match code)
# Step 2: Compare observation paths vs claim paths
jq '.observations[].concept_path' scan-v2.json | sort -u
grep "concept_path =" .aphoria/claims.toml | sort -u
# Observation paths should END with same tail as claim paths
# Step 3: Check for tail-path mismatch
# Example mismatch:
# - Observation: cache/ttl (extractor subject too short)
# - Claim: cachewrap/cache/ttl (needs full path)
# - Fix: Update extractor subject = "cachewrap/cache/ttl"
# Step 4: Verify predicate alignment
jq '.observations[].predicate' scan-v2.json | sort -u
grep "predicate =" .aphoria/claims.toml | sort -u
# Must match exactly
\```
**Common Issue:** Extractor `subject` doesn't match claim `concept_path`.
**Fix:** Update extractor subject to use full path matching claim.
**Example Fix:**
\```toml
# Before (WRONG):
[extractors.declarative.claim]
subject = "cache/ttl" # ❌ Missing "cachewrap/" prefix
# After (CORRECT):
[extractors.declarative.claim]
subject = "cachewrap/cache/ttl" # ✅ Matches claim exactly
\```
Re-scan after fixing:
\```bash
aphoria scan --format json > scan-v3.json
# Should now show 9/10 conflicts
\```
```
**Why this matters:** Without debug workflow, teams spend hours in trial-and-error. With it, they can diagnose and fix alignment issues in 10 minutes.
---
### ✅ Not Missing (But Expected)
These are intentionally empty (correct for pre-Day-1 state):
-**No Cargo.toml** - Created on Day 2 when implementing code
-**No claims-template.sh** - Optional (can use CLI directly)
-**No src/*.rs files** - Created on Day 2
-**Empty claims.toml** - Filled on Day 1 via `/aphoria-claims`
-**No DAY1-SUMMARY.md** - Created after completing Day 1
---
## Comparison: cachewrap vs msgqueue Setup
| Aspect | msgqueue (before fixes) | cachewrap (current) | Status |
|--------|-------------------------|---------------------|--------|
| **Directory structure** | ✅ Complete | ✅ Complete | Equal |
| **README quality** | ✅ Excellent | ✅ Excellent | Equal |
| **Config.toml** | ✅ Perfect | ✅ Perfect | Equal |
| **Authority sources** | ✅ Complete | ✅ Complete | Equal |
| **Day 1-2 plan** | ✅ Detailed | ✅ Detailed | Equal |
| **Day 3 manual fallback** | ❌ Missing → caused failure | ❌ **Missing** | **Needs fix** |
| **Day 3 debug workflow** | ❌ Missing → caused failure | ❌ **Missing** | **Needs fix** |
**cachewrap is at same state msgqueue was BEFORE Day 3 failures.**
**Good news:** We know exactly what to add (manual fallback + debug workflow) because msgqueue failures taught us.
---
## Validation Against Dogfooding Standards
### From `aphoria-dogfood` Skill Requirements:
**1. Test Something New (Hypothesis Required):**
- Clear hypothesis: "3 corpora → 35-40% reuse in 4th domain"
- Specific and measurable
**2. Reuse Is the Magic (30%+ Corpus Overlap):**
- Expected: 35% (7/20 claims)
- Justified by pattern analysis (4 from httpclient, 2 from dbpool, 1 from msgqueue)
**3. Violations Must Be Intentional (7-10 with Consequences):**
- 10 violations planned
- Each has consequence
- Each has inline marker syntax documented
**4. Quantify Everything (Metrics Required):**
- Time savings: ≥60%
- Pattern reuse: ≥35%
- Detection rate: ≥90%
- Naming errors: <2
- Total time: 12-16 hours
**5. Follow the 5-Day Arc:**
- Day 1: Claims (1-2 hrs)
- Day 2: Implementation (3-4 hrs)
- Day 3: Scanning (1.5-2 hrs)
- Day 4: Remediation (3-4 hrs)
- Day 5: Documentation (2-3 hrs)
**All standards met except Day 3 manual fallback + debug workflow.**
---
## Difficulty Assessment
**Rated:** ★★★★☆ (4/5 stars)
**Justification (from README):**
- Lower corpus overlap (35% vs msgqueue's 50%)
- Cross-cutting violations (security + performance + correctness)
- Stateful semantics (cache invalidation, TTL, consistency)
- Subtle bugs (key injection, race conditions)
**Time estimate:** 12-16 hours (vs msgqueue's 8-10 hours)
**Is this realistic?**
Comparing to completed exercises:
- httpclient: 8-10 hrs (baseline, 0% reuse) Realistic
- msgqueue: 8-10 hrs (50% reuse) Realistic
- cachewrap: 12-16 hrs (35% reuse, higher complexity) **Realistic**
**Why longer despite corpus:**
- 3 corpus sources = more discovery time (Day 1 takes longer)
- 13 new patterns (vs msgqueue's 11) = more authoring (Day 1)
- 10 violations (vs msgqueue's 8) = more implementation (Day 2)
- Cross-cutting violations = more complex extractors (Day 3)
**Difficulty rating is well-calibrated.**
---
## Domain Validation
### Why Cache Client? (From README)
**Tests multi-domain transfer:** Patterns from HTTP + DB + messaging caching
**Tests cross-cutting concerns:** Security + performance + correctness simultaneously
**Tests stateful semantics:** TTL, eviction, consistency (harder than stateless HTTP)
**Tests corpus adaptability:** 3 sources with 35% overlap
**This is a valid progression:**
1. httpclient: Baseline (no corpus)
2. dbpool: Single-source transfer (httpclient dbpool)
3. msgqueue: Dual-source transfer (httpclient + dbpool msgqueue)
4. **cachewrap: Triple-source transfer (httpclient + dbpool + msgqueue → cache)**
Each exercise increases complexity and validates a deeper aspect of the flywheel.
---
## Corpus Overlap Analysis
### Claimed Reuse (7/20 = 35%)
**From httpclient (4 patterns):**
- `timeout` cache timeout Valid (connection timeout)
- `tls/certificate_validation` cache TLS Valid (secure connection)
- `retry/max_attempts` cache retry Valid (operation retry)
- `async/runtime` cache async Valid (async I/O)
**From dbpool (2 patterns):**
- `max_connections` cache max connections Valid (connection pooling)
- `connection_lifecycle` cache connection lifecycle Valid (cleanup)
**From msgqueue (1 pattern):**
- `metrics/enabled` cache metrics Valid (observability)
**Assessment:** All 7 reuse claims are **legitimate pattern transfers**. Not forced.
---
### New Patterns (13 cache-specific)
- TTL and expiration (3) Cache-specific
- Key validation and injection (2) Cache-specific
- Eviction policies (2) Cache-specific
- Serialization and compression (2) Cache-specific
- Consistency and sharding (2) Cache-specific
- Circuit breaker, stampede prevention (2) Cache-specific
**Assessment:** 13 new patterns are **genuinely cache-specific**, not variations of existing patterns.
**35% reuse estimate is realistic.**
---
## Recommendations
### Immediate (Before Starting Day 1) - ~10 minutes
**1. Add manual fallback to plan.md Day 3 Phase 4:**
- Copy format from `dogfood/msgqueue/plan.md` lines 303-341
- Adapt example from msgqueue cachewrap
- Link to `../../docs/extractors/declarative-extractors.md`
**2. Add debug workflow to plan.md Day 3 Phase 5:**
- Copy format from `dogfood/msgqueue/plan.md` lines 342-385
- Adapt commands for cachewrap (subject paths, predicates)
**Impact:** Prevents Day 3 failure like msgqueue experienced (70 minutes wasted)
---
### Optional (Before Starting) - ~30 minutes
**3. Create `claims-template.sh`:**
- Batch script to create all 20 claims
- Reduces Day 1 time from 1-2 hours 45 minutes
- See `dogfood/httpclient/create-claims.sh` for template
**4. Pre-populate authority sources:**
- Add 2-3 actual quotes from Redis docs to `redis-spec.md`
- Reduces Day 1 discovery time
- But templates are sufficient - not critical
---
### During Execution
**5. Track detection rate pattern:**
On Day 3, track:
- Baseline scan: X/10 detected
- After extractor creation: Y/10 detected
- Expected: 0-2 9-10 (big improvement)
This validates the **cross-domain flywheel hypothesis**.
**6. Compare to msgqueue metrics:**
After Day 5, compare:
- msgqueue: 50% reuse, 8-10 hours, 100% detection
- cachewrap: 35% reuse, 12-16 hours, 90% detection
If cachewrap takes **<60% more time** despite **30% less reuse**, the flywheel scales well.
---
## Final Verdict
### Status: ⚠️ **90% Ready - Fix 2 Gaps**
**What's excellent:**
- ⭐⭐⭐⭐⭐ README (hypothesis, metrics, difficulty)
- ⭐⭐⭐⭐⭐ Config (persistent mode, 3 corpus sources)
- ⭐⭐⭐⭐⭐ Day 1-2 plan (detailed, realistic)
- ⭐⭐⭐⭐☆ Authority sources (templates ready)
- ⭐⭐⭐⭐⭐ Domain choice (validates multi-domain transfer)
**What needs fixing:**
- Day 3 Phase 4: Add manual fallback format
- Day 3 Phase 5: Add debug workflow
**Time to fix:** ~10 minutes
**After fixes:** Ready to start Day 1
---
## Comparison to Gold Standard (httpclient)
| Aspect | httpclient | cachewrap | Rating |
|--------|-----------|-----------|--------|
| Directory structure | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
| README hypothesis | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
| Config quality | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
| Authority sources | ⭐⭐⭐⭐⭐ (filled) | ⭐⭐⭐⭐☆ (templates) | Slightly lower |
| Day 1-2 plan | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
| Day 3 plan | ⭐⭐⭐⭐⭐ (complete) | ⭐⭐⭐☆☆ (missing 2 features) | **Needs update** |
| Day 4-5 plan | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
**Overall:** cachewrap is **httpclient-quality** except for Day 3 gaps (which are easy to fix).
---
## Action Items
### For Setup Owner (Do Now)
- [ ] Copy manual fallback format from msgqueue to cachewrap plan.md Phase 4
- [ ] Copy debug workflow from msgqueue to cachewrap plan.md Phase 5
- [ ] Review additions for cachewrap-specific terminology
- [ ] Commit changes
**Time:** 10 minutes
### For Day 1 Executor (When Starting)
- [ ] Read `plan.md` completely before starting
- [ ] Verify `/aphoria-suggest` skill available
- [ ] Verify `/aphoria-claims` skill available
- [ ] Have `docs/extractors/declarative-extractors.md` open for reference
### For Day 3 Executor (Critical)
- [ ] **DO NOT skip Phase 4 (extractor creation)** - This is the flywheel validation
- [ ] Follow 6-phase workflow exactly (pre-flight scan gap create verify document)
- [ ] If 0% detection after Phase 5 Use debug workflow immediately
- [ ] Document detection rate improvement (v1 v2)
---
**Evaluation complete:** 2026-02-11
**Next step:** Fix 2 Day 3 gaps, then **ready to start Day 1**.