stemedb/applications/aphoria/dogfood/cachewrap/SETUP-EVALUATION.md

# Setup Evaluation: cachewrap Dogfood Project

**Evaluation Date:** 2026-02-11
**Evaluator:** Claude (Setup Review Agent)
**Status:** ⚠️ **MOSTLY READY** (2 gaps to fix before starting)

---

## Executive Summary

The cachewrap dogfood project is **90% correctly set up** with excellent structure, hypothesis, and documentation. However, it's **missing critical Day 3 enhancements** that were added to msgqueue after its Day 3 failure.

**Must fix before Day 1:**
1. Add manual fallback format to Day 3 Phase 4
2. Add debug workflow to Day 3 Phase 5

**These fixes take ~10 minutes and prevent a Day 3 failure like msgqueue experienced.**

---

## Setup Checklist

### ✅ Correctly Set Up

#### Directory Structure (Perfect)
```
cachewrap/
├── README.md                    ✅ Excellent (hypothesis, metrics, status)
├── plan.md                      ⚠️ Good (needs Day 3 updates)
├── .aphoria/
│   ├── config.toml              ✅ Perfect (persistent mode, 3 corpus sources)
│   └── claims.toml              ✅ Ready (empty with instructions)
├── docs/
│   └── sources/                 ✅ Perfect (3 authority sources)
│       ├── redis-spec.md        ✅ Template with extraction guide
│       ├── aws-elasticache.md   ✅ Template ready
│       └── redis-rs-lib.md      ✅ Template ready
└── src/
    └── .gitkeep                 ✅ Placeholder with instructions
```

**All expected directories and files present.**

---

#### README.md Quality (⭐⭐⭐⭐⭐ Excellent)

✅ **Hypothesis clearly stated:**
> "Connection patterns + resource limits + TTL semantics from 3 corpora (httpclient, dbpool, msgqueue) transfer to cache clients with 35-40% pattern reuse, demonstrating multi-domain flywheel strength"

✅ **Target metrics defined:**
- Time savings: ≥60% vs manual
- Pattern reuse: ≥35% (7/20 claims)
- Detection rate: ≥90% (9/10 violations)
- Naming errors: <2
- Total time: 12-16 hours

✅ **Difficulty calibrated:** ★★★★☆ (harder than msgqueue ★★★☆☆)

✅ **Corpus overlap explained:**
- httpclient: 4 patterns (timeout, TLS, retry, async)
- dbpool: 2 patterns (max_connections, lifecycle)
- msgqueue: 1 pattern (metrics)
- New: 13 cache-specific patterns

✅ **Violations categorized by type:**
- 3 security (key injection, TLS disabled, plaintext credentials)
- 3 performance (missing TTL, unbounded size, sync blocking)
- 3 correctness (no eviction, timeout=0, no pooling)
- 1 observability (no metrics)

✅ **Cross-cutting nature emphasized:**
Tests whether flywheel works across security + performance + correctness boundaries simultaneously.

**This is gold-standard README quality.**

---

#### .aphoria/config.toml (⭐⭐⭐⭐⭐ Perfect)

✅ **Persistent mode enabled:**
```toml
[episteme]
mode = "persistent"
corpus_db = "/home/jml/.aphoria/corpus-db"
```

✅ **3 corpus sources configured:**
```toml
[corpus]
sources = ["httpclient", "dbpool", "msgqueue"]
```

✅ **Corpus flags enabled:**
```toml
include_rfc = true
include_owasp = true
include_vendor = true
use_community = true
```

✅ **Inline markers enabled:**
```toml
[extractors.inline_markers]
enabled = true
sync_to_pending = true
```

✅ **Comments explain extractor expectations:**
```toml
# Built-in extractors that may detect violations:
# - hardcoded_secrets: Detects violation 3
# - tls_config: Detects violation 2
# - timeout_config: May detect violation 8
#
# Custom extractors needed (created on Day 3):
# - key_validation: Violation 1
# - ttl_presence: Violation 4
# ...
```

**This config is production-ready.**

---

#### Authority Sources (⭐⭐⭐⭐☆ Very Good)

**redis-spec.md (Tier 1):**
- ✅ Template structure correct
- ✅ Extraction guide included
- ✅ Key claims identified (TTL, eviction, key validation, connection pooling)
- ✅ Placeholders for user to fill ("> **User fills in:** Fetch Redis command docs")

**aws-elasticache.md (Tier 2):**
- ✅ Template ready
- ✅ Best practices focus

**redis-rs-lib.md (Tier 3):**
- ✅ Template ready
- ✅ Community patterns focus

**Minor improvement:** Could pre-populate some quotes from well-known Redis docs, but templates are sufficient for dogfooding.

---

#### plan.md Day 1-2 (⭐⭐⭐⭐⭐ Excellent)

✅ **Day 1 process clear:**
- Step 1: Discover reusable patterns (30 min)
- Step 2: Draft new claims (30 min)
- Step 3: Author all claims (30 min)
- Step 4: Verify claims (10 min)

✅ **Day 2 process detailed:**
- Files to create listed (config.rs, client.rs, error.rs, lib.rs)
- Each violation mapped to file + line
- Inline marker syntax shown
- Test requirements specified (15+ tests)

✅ **Violations are realistic:**
- Not contrived (e.g., key injection via user input directly to Redis)
- Have clear consequences
- Inline markers documented

**Day 1-2 are production-ready.**

---

### ⚠️ Gaps to Fix (Day 3)

#### Gap 1: Missing Manual Fallback Format (Day 3 Phase 4)

**Problem:** plan.md Day 3 Phase 4 only shows skill invocation:

```bash
/aphoria-custom-extractor-creator \
  --violation "cache SET without TTL" \
  --claim "cache-004"
```

**But doesn't show what to do if skill is unavailable.**

**From msgqueue evaluation:** Teams need manual fallback with:
1. Complete declarative extractor TOML format
2. Emphasis that `subject` must EXACTLY match claim `concept_path`
3. Validation steps BEFORE scanning
4. Link to comprehensive reference doc

**What's needed:**

Add after Phase 4 skill invocations:

```markdown
**If skill is unavailable:** You can manually create declarative extractors. Follow the format below:

**Manual Fallback (Declarative Extractor):**

Add to `.aphoria/config.toml` for EACH violation:

\```toml
[[extractors.declarative]]
name = "descriptive_name"
pattern = 'regex_pattern_matching_code'
languages = ["rust"]

[extractors.declarative.claim]
subject = "FULL_CLAIM_CONCEPT_PATH"  # ← Copy from claim's concept_path EXACTLY
predicate = "claim_predicate"
value = inverted_value  # false if claim expects true
confidence = 0.95
\```

**⚠️ CRITICAL:** `subject` must EXACTLY match your claim's `concept_path`.

**Example (TTL presence):**
\```toml
[[extractors.declarative]]
name = "ttl_presence_check"
pattern = 'SET.*(?!EX|PX)'
languages = ["rust"]

[extractors.declarative.claim]
subject = "cachewrap/cache/ttl"  # ← Matches claim concept_path exactly
predicate = "required"
value = false  # Observing "NOT required" (violation)
confidence = 0.95
\```

**Validation Before Scanning:**
\```bash
# 1. Check subject matches claim concept_path
grep "subject =" .aphoria/config.toml
grep "concept_path =" .aphoria/claims.toml
# Subjects should match concept_paths EXACTLY

# 2. Test regex pattern matches code
grep -rE 'SET.*(?!EX|PX)' src/
# Should find the violation line

# 3. Verify TOML syntax
cargo install taplo-cli
taplo fmt --check .aphoria/config.toml
\```

**See also:** `../../docs/extractors/declarative-extractors.md` for complete reference.
```

**Why this matters:** msgqueue Day 3 failed TWICE because:
1. First attempt: Skipped extractor creation entirely
2. Second attempt: Created extractors with wrong `subject` format (missing prefix)

Manual fallback with validation prevents both failures.

---

#### Gap 2: Missing Debug Workflow (Day 3 Phase 5)

**Problem:** plan.md Day 3 Phase 5 shows expected result but doesn't explain **what to do if detection rate is still 0%**.

**From msgqueue evaluation:** After creating 7 extractors, team had 0% detection because extractor `subject` fields didn't match claim `concept_path` fields.

**What's needed:**

Add after Phase 5 scan commands:

```markdown
**If detection rate is still 0% (extractors don't match claims):**

This means extractors ran but observations didn't align with claims. Debug:

\```bash
# Step 1: Verify observations were created
jq '.observations | length' scan-v2.json
# Expected: > 0 (if 0, patterns don't match code)

# Step 2: Compare observation paths vs claim paths
jq '.observations[].concept_path' scan-v2.json | sort -u
grep "concept_path =" .aphoria/claims.toml | sort -u
# Observation paths should END with same tail as claim paths

# Step 3: Check for tail-path mismatch
# Example mismatch:
# - Observation: cache/ttl (extractor subject too short)
# - Claim: cachewrap/cache/ttl (needs full path)
# - Fix: Update extractor subject = "cachewrap/cache/ttl"

# Step 4: Verify predicate alignment
jq '.observations[].predicate' scan-v2.json | sort -u
grep "predicate =" .aphoria/claims.toml | sort -u
# Must match exactly
\```

**Common Issue:** Extractor `subject` doesn't match claim `concept_path`.
**Fix:** Update extractor subject to use full path matching claim.

**Example Fix:**
\```toml
# Before (WRONG):
[extractors.declarative.claim]
subject = "cache/ttl"  # ❌ Missing "cachewrap/" prefix

# After (CORRECT):
[extractors.declarative.claim]
subject = "cachewrap/cache/ttl"  # ✅ Matches claim exactly
\```

Re-scan after fixing:
\```bash
aphoria scan --format json > scan-v3.json
# Should now show 9/10 conflicts
\```
```

**Why this matters:** Without debug workflow, teams spend hours in trial-and-error. With it, they can diagnose and fix alignment issues in 10 minutes.

---

### ✅ Not Missing (But Expected)

These are intentionally empty (correct for pre-Day-1 state):

- ✅ **No Cargo.toml** - Created on Day 2 when implementing code
- ✅ **No claims-template.sh** - Optional (can use CLI directly)
- ✅ **No src/*.rs files** - Created on Day 2
- ✅ **Empty claims.toml** - Filled on Day 1 via `/aphoria-claims`
- ✅ **No DAY1-SUMMARY.md** - Created after completing Day 1

---

## Comparison: cachewrap vs msgqueue Setup

| Aspect | msgqueue (before fixes) | cachewrap (current) | Status |
|--------|-------------------------|---------------------|--------|
| **Directory structure** | ✅ Complete | ✅ Complete | Equal |
| **README quality** | ✅ Excellent | ✅ Excellent | Equal |
| **Config.toml** | ✅ Perfect | ✅ Perfect | Equal |
| **Authority sources** | ✅ Complete | ✅ Complete | Equal |
| **Day 1-2 plan** | ✅ Detailed | ✅ Detailed | Equal |
| **Day 3 manual fallback** | ❌ Missing → caused failure | ❌ **Missing** | **Needs fix** |
| **Day 3 debug workflow** | ❌ Missing → caused failure | ❌ **Missing** | **Needs fix** |

**cachewrap is at same state msgqueue was BEFORE Day 3 failures.**

**Good news:** We know exactly what to add (manual fallback + debug workflow) because msgqueue failures taught us.

---

## Validation Against Dogfooding Standards

### From `aphoria-dogfood` Skill Requirements:

✅ **1. Test Something New (Hypothesis Required):**
- Clear hypothesis: "3 corpora → 35-40% reuse in 4th domain"
- Specific and measurable

✅ **2. Reuse Is the Magic (30%+ Corpus Overlap):**
- Expected: 35% (7/20 claims)
- Justified by pattern analysis (4 from httpclient, 2 from dbpool, 1 from msgqueue)

✅ **3. Violations Must Be Intentional (7-10 with Consequences):**
- 10 violations planned
- Each has consequence
- Each has inline marker syntax documented

✅ **4. Quantify Everything (Metrics Required):**
- Time savings: ≥60%
- Pattern reuse: ≥35%
- Detection rate: ≥90%
- Naming errors: <2
- Total time: 12-16 hours

✅ **5. Follow the 5-Day Arc:**
- Day 1: Claims (1-2 hrs)
- Day 2: Implementation (3-4 hrs)
- Day 3: Scanning (1.5-2 hrs)
- Day 4: Remediation (3-4 hrs)
- Day 5: Documentation (2-3 hrs)

**All standards met except Day 3 manual fallback + debug workflow.**

---

## Difficulty Assessment

**Rated:** ★★★★☆ (4/5 stars)

**Justification (from README):**
- Lower corpus overlap (35% vs msgqueue's 50%)
- Cross-cutting violations (security + performance + correctness)
- Stateful semantics (cache invalidation, TTL, consistency)
- Subtle bugs (key injection, race conditions)

**Time estimate:** 12-16 hours (vs msgqueue's 8-10 hours)

**Is this realistic?**

Comparing to completed exercises:
- httpclient: 8-10 hrs (baseline, 0% reuse) ✅ Realistic
- msgqueue: 8-10 hrs (50% reuse) ✅ Realistic
- cachewrap: 12-16 hrs (35% reuse, higher complexity) ✅ **Realistic**

**Why longer despite corpus:**
- 3 corpus sources = more discovery time (Day 1 takes longer)
- 13 new patterns (vs msgqueue's 11) = more authoring (Day 1)
- 10 violations (vs msgqueue's 8) = more implementation (Day 2)
- Cross-cutting violations = more complex extractors (Day 3)

**Difficulty rating is well-calibrated.**

---

## Domain Validation

### Why Cache Client? (From README)

✅ **Tests multi-domain transfer:** Patterns from HTTP + DB + messaging → caching
✅ **Tests cross-cutting concerns:** Security + performance + correctness simultaneously
✅ **Tests stateful semantics:** TTL, eviction, consistency (harder than stateless HTTP)
✅ **Tests corpus adaptability:** 3 sources with 35% overlap

**This is a valid progression:**
1. httpclient: Baseline (no corpus)
2. dbpool: Single-source transfer (httpclient → dbpool)
3. msgqueue: Dual-source transfer (httpclient + dbpool → msgqueue)
4. **cachewrap: Triple-source transfer (httpclient + dbpool + msgqueue → cache)**

Each exercise increases complexity and validates a deeper aspect of the flywheel.

---

## Corpus Overlap Analysis

### Claimed Reuse (7/20 = 35%)

**From httpclient (4 patterns):**
- `timeout` → cache timeout ✅ Valid (connection timeout)
- `tls/certificate_validation` → cache TLS ✅ Valid (secure connection)
- `retry/max_attempts` → cache retry ✅ Valid (operation retry)
- `async/runtime` → cache async ✅ Valid (async I/O)

**From dbpool (2 patterns):**
- `max_connections` → cache max connections ✅ Valid (connection pooling)
- `connection_lifecycle` → cache connection lifecycle ✅ Valid (cleanup)

**From msgqueue (1 pattern):**
- `metrics/enabled` → cache metrics ✅ Valid (observability)

**Assessment:** All 7 reuse claims are **legitimate pattern transfers**. Not forced.

---

### New Patterns (13 cache-specific)

- TTL and expiration (3) ✅ Cache-specific
- Key validation and injection (2) ✅ Cache-specific
- Eviction policies (2) ✅ Cache-specific
- Serialization and compression (2) ✅ Cache-specific
- Consistency and sharding (2) ✅ Cache-specific
- Circuit breaker, stampede prevention (2) ✅ Cache-specific

**Assessment:** 13 new patterns are **genuinely cache-specific**, not variations of existing patterns.

**35% reuse estimate is realistic.**

---

## Recommendations

### Immediate (Before Starting Day 1) - ~10 minutes

**1. Add manual fallback to plan.md Day 3 Phase 4:**
- Copy format from `dogfood/msgqueue/plan.md` lines 303-341
- Adapt example from msgqueue → cachewrap
- Link to `../../docs/extractors/declarative-extractors.md`

**2. Add debug workflow to plan.md Day 3 Phase 5:**
- Copy format from `dogfood/msgqueue/plan.md` lines 342-385
- Adapt commands for cachewrap (subject paths, predicates)

**Impact:** Prevents Day 3 failure like msgqueue experienced (70 minutes wasted)

---

### Optional (Before Starting) - ~30 minutes

**3. Create `claims-template.sh`:**
- Batch script to create all 20 claims
- Reduces Day 1 time from 1-2 hours → 45 minutes
- See `dogfood/httpclient/create-claims.sh` for template

**4. Pre-populate authority sources:**
- Add 2-3 actual quotes from Redis docs to `redis-spec.md`
- Reduces Day 1 discovery time
- But templates are sufficient - not critical

---

### During Execution

**5. Track detection rate pattern:**

On Day 3, track:
- Baseline scan: X/10 detected
- After extractor creation: Y/10 detected
- Expected: 0-2 → 9-10 (big improvement)

This validates the **cross-domain flywheel hypothesis**.

**6. Compare to msgqueue metrics:**

After Day 5, compare:
- msgqueue: 50% reuse, 8-10 hours, 100% detection
- cachewrap: 35% reuse, 12-16 hours, ≥90% detection

If cachewrap takes **<60% more time** despite **30% less reuse**, the flywheel scales well.

---

## Final Verdict

### Status: ⚠️ **90% Ready - Fix 2 Gaps**

**What's excellent:**
- ⭐⭐⭐⭐⭐ README (hypothesis, metrics, difficulty)
- ⭐⭐⭐⭐⭐ Config (persistent mode, 3 corpus sources)
- ⭐⭐⭐⭐⭐ Day 1-2 plan (detailed, realistic)
- ⭐⭐⭐⭐☆ Authority sources (templates ready)
- ⭐⭐⭐⭐⭐ Domain choice (validates multi-domain transfer)

**What needs fixing:**
- ⚠️ Day 3 Phase 4: Add manual fallback format
- ⚠️ Day 3 Phase 5: Add debug workflow

**Time to fix:** ~10 minutes

**After fixes:** ✅ Ready to start Day 1

---

## Comparison to Gold Standard (httpclient)

| Aspect | httpclient | cachewrap | Rating |
|--------|-----------|-----------|--------|
| Directory structure | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
| README hypothesis | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
| Config quality | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
| Authority sources | ⭐⭐⭐⭐⭐ (filled) | ⭐⭐⭐⭐☆ (templates) | Slightly lower |
| Day 1-2 plan | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |
| Day 3 plan | ⭐⭐⭐⭐⭐ (complete) | ⭐⭐⭐☆☆ (missing 2 features) | **Needs update** |
| Day 4-5 plan | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | Equal |

**Overall:** cachewrap is **httpclient-quality** except for Day 3 gaps (which are easy to fix).

---

## Action Items

### For Setup Owner (Do Now)

- [ ] Copy manual fallback format from msgqueue to cachewrap plan.md Phase 4
- [ ] Copy debug workflow from msgqueue to cachewrap plan.md Phase 5
- [ ] Review additions for cachewrap-specific terminology
- [ ] Commit changes

**Time:** 10 minutes

### For Day 1 Executor (When Starting)

- [ ] Read `plan.md` completely before starting
- [ ] Verify `/aphoria-suggest` skill available
- [ ] Verify `/aphoria-claims` skill available
- [ ] Have `docs/extractors/declarative-extractors.md` open for reference

### For Day 3 Executor (Critical)

- [ ] **DO NOT skip Phase 4 (extractor creation)** - This is the flywheel validation
- [ ] Follow 6-phase workflow exactly (pre-flight → scan → gap → create → verify → document)
- [ ] If 0% detection after Phase 5 → Use debug workflow immediately
- [ ] Document detection rate improvement (v1 → v2)

---

**Evaluation complete:** 2026-02-11

**Next step:** Fix 2 Day 3 gaps, then **ready to start Day 1**.