# Documentation Evaluation Report

**Project:** applications/aphoria/dogfood/cachewrap
**Evaluation Date:** 2026-02-11
**Documentation Evaluated:**
- `cachewrap/README.md`
- `cachewrap/plan.md`
- `cachewrap/.aphoria/config.toml`
- `cachewrap/docs/sources/*.md`

**Team Phase:** Complete (Days 1-5)

---

## Executive Summary

**Overall Assessment:** Team completed cachewrap dogfood exercise in 1.4 hours (91% faster than 12-16 hour target) with 35% pattern reuse and 50% detection rate. **Partial use of LLM workflows** - team used `/aphoria-suggest` skill for Day 1 claims but manual workflow for Day 3 extractor creation, indicating documentation failed to emphasize **continuous** skill usage throughout all phases.

**Gaps Found:** 3 critical, 2 medium, 1 low
- **Critical Blockers:** 2 (Day 3 6-phase workflow, continuous LLM requirement)
- **Documentation Clarity:** 1 (detection rate expectations)
- **Medium Priority:** 2 (concept path alignment, extractor limitations)
- **Low Priority:** 1 (authority tier guidance)

**Team Errors (Not Gaps):** 1 (Iteration 1 separate TOML files)

**Key Finding:** Documentation presents Day 3 extractor creation as optional/debugging step rather than **REQUIRED flywheel phase**. Team skipped manual extractor creation initially, attempted it after baseline scan, but documentation didn't explain this is **Steps 4-5 of the autonomous loop** that must happen EVERY commit.

---

## Critical Findings (High Priority)

### Finding 1: Day 3 Workflow Not Emphasized as Flywheel Core

**Impact:** Team treated Day 3 as "run scan and look at output" instead of "identify gaps → create extractors"

**Evidence:**
- DAY3-SUMMARY.md shows 3 iterations before achieving 50% detection
- First iteration created separate .toml files (wrong approach)
- Second iteration added to config.toml but concept path mismatch
- Third iteration fixed paths
- **Total Day 3 time: 9 minutes** (extremely fast, suggests confusion resolved quickly)

**Documentation Said:**
- plan.md:111 - "Day 3: Scanning (1.5-2 hrs) - 6-phase workflow"
- plan.md:119 - Lists 6 phases including "Extractor creation" as Phase 4
- plan.md:132 - "Use `/aphoria-custom-extractor-creator` for each gap"

**What Was Missing:**
- **No emphasis on "REQUIRED" status** - presented as optional debugging
- **No connection to flywheel Steps 4-5** - didn't explain this IS the knowledge compounding mechanism
- **No example of correct execution** - team had to discover via trial and error
- **No pre-flight validation script** - no way to verify correct approach before starting

**Root Cause:** Documentation treats Day 3 as "validation day" when it's actually "**knowledge capture day**" - the step where autonomous learning happens.

**Recommendation:**
- **Where:** plan.md Day 3 section (lines 111-180)
- **What to add:**
  ```markdown
  ## ⚠️ CRITICAL: Day 3 is Flywheel Steps 4-5

  This is NOT "run scan and check results." This IS:
  - Step 4: Identify claims without extractors (MISSING verdicts)
  - Step 5: Create extractors for those claims (autonomous learning)

  **Without extractor creation, NO knowledge is captured.**

  Evidence of correct execution:
  - `.aphoria/extractors/` directory with 8+ .toml files, OR
  - `.aphoria/config.toml` with `[[extractors.declarative]]` sections
  - `scan-v2.json` exists (verification scan AFTER extractor creation)
  - DAY3-SUMMARY.md documents detection rate improvement (v1 → v2)

  If ANY are missing, Day 3 was NOT completed correctly.
  ```

- **Priority:** **BLOCKER** (affects entire autonomous learning narrative)

---

### Finding 2: Continuous LLM Requirement Not Explicit

**Impact:** Team used `/aphoria-suggest` skill on Day 1 but manual workflow on Day 3, missing that LLM workflows are required for BOTH phases

**Evidence:**
- `.aphoria/claims.toml` shows `created_by = "aphoria-suggest"` for all 20 claims ✅
- DAY3-SUMMARY.md shows manual `config.toml` editing (3 iterations) ❌
- No evidence of `/aphoria-custom-extractor-creator` invocations in daily summaries

**Documentation Said:**
- plan.md:121 - "Skills:" section lists `/aphoria-suggest`, `/aphoria-claims`, `/aphoria-custom-extractor-creator`
- plan.md:132 - "Use `/aphoria-custom-extractor-creator` for each gap"
- README.md:142 - Lists skills with "when to use" for each day

**What Was Missing:**
- **No "autonomous workflow" vs "manual CLI" distinction** - both presented as equal options
- **No emphasis on LLM requirement for Day 3** - skill mentioned but not marked as required
- **No explanation that manual extractor creation is DEBUG MODE** - team thought it was the primary workflow

**Root Cause:** Documentation inherited from dbpool/msgqueue which predated full autonomous workflows. Doesn't reflect 2026-02-10 updates emphasizing LLM as **core mechanism, not optional feature**.

**Recommendation:**
- **Where:** README.md top section + plan.md Day 1 & Day 3 introductions
- **What to add:**
  ```markdown
  ## 🤖 Autonomous Workflow (REQUIRED)

  Aphoria IS an LLM-driven continuous learning system. Skills ARE the product:

  - **Day 1:** `/aphoria-suggest` discovers patterns from 3 corpora → `/aphoria-claims` authors claims
  - **Day 3:** `/aphoria-custom-extractor-creator` generates extractors for missed claims

  **Manual CLI exists for debugging only.** If you find yourself:
  - Running `aphoria claims create` manually → Use `/aphoria-suggest` instead
  - Editing `.aphoria/config.toml` manually → Use `/aphoria-custom-extractor-creator` instead

  The dogfood exercise validates the **autonomous workflow**, not manual fallbacks.
  ```

- **Priority:** **BLOCKER** (contradicts product vision)

---

### Finding 3: Detection Rate Target Not Contextualized

**Impact:** Team achieved 50% detection but docs said ≥90%, creating confusion about whether exercise succeeded

**Evidence:**
- DAY3-SUMMARY.md:18 - "**Detection Rate (v3)**: ≥90% | 50% | -40% | ⚠️ Below target"
- DAY3-SUMMARY.md:186-229 - Section "Why 50% Instead of ≥90%?" analyzing root causes
- DAY5-SUMMARY.md documents success despite 50% detection

**Documentation Said:**
- plan.md:7 - "**Target Metrics:** Detection rate: ≥90% of violations"
- README.md:153 - "| Detection rate | ≥90% | Cross-cutting violation detection |"

**What Was Missing:**
- **No context on "first dogfood in new domain" expectations** - 0-50% detection is EXPECTED when corpus doesn't exist
- **No distinction between "built-in extractor detection" vs "after custom extractor creation"** - targets imply built-ins should catch 90%
- **No explanation that 50% with declarative extractors validates mechanism** - team thought they failed

**Root Cause:** Target was written assuming programmatic extractors (can hit 90%), but cachewrap used declarative extractors (50% ceiling due to regex limitations).

**Recommendation:**
- **Where:** plan.md Day 3 success criteria (lines 170-178)
- **What to add:**
  ```markdown
  **Detection Rate Expectations:**

  - **Baseline scan (v1):** 0-20% expected (built-in extractors don't know cache patterns)
  - **After declarative extractors (v2):** 50-70% achievable (regex pattern matching)
  - **After programmatic extractors (v3):** 90-100% target (AST analysis)

  **For this exercise:** 50% detection with declarative extractors **VALIDATES** the flywheel:
  - 0% → 50% proves knowledge compounding works
  - 50% ceiling proves declarative limitations (expected)
  - Remaining 5 violations require programmatic extractors (Day 5 refinement)

  **Success = improvement, not perfection.** The goal is proving the mechanism, not 100% coverage.
  ```

- **Priority:** **CRITICAL** (affects success interpretation)

---

## Medium Priority Improvements

### Gap 4: Concept Path Alignment Not Pre-Explained

**Type:** Missing Information

**Evidence:**
- DAY3-SUMMARY.md:126-148 - Iteration 2 failed due to concept path mismatch
- Team discovered: `claim.subject = "timeout"` → observation tail `config/timeout` (wrong)
- Fix: `claim.subject = "cache/timeout"` → observation tail `cache/timeout` (correct)

**Documentation Said:**
- config.toml has examples but no explanation of tail-path matching

**Impact:**
- Time lost: ~1 minute (iteration 2)
- Confusion level: Medium
- Blocker: No (discovered and fixed quickly)

**Recommendation:**
- **Where:** plan.md Day 3 Phase 4 (Extractor Creation)
- **What:** Add concept path alignment explanation:
  ```markdown
  **⚠️ Concept Path Alignment:**

  Extractor `claim.subject` creates observation tail-path (last 2 segments).
  This tail MUST match claim `concept_path` tail.

  Example:
  - Claim: `cache/timeout`
  - Extractor subject: `timeout` → Observation: `.../config/timeout` → Tail: `config/timeout` ❌
  - Extractor subject: `cache/timeout` → Observation: `.../cache/timeout` → Tail: `cache/timeout` ✅

  **Pattern:** Always prefix extractor subjects with claim namespace.
  ```
- **Priority:** MEDIUM (affects iteration count)

---

### Gap 5: Declarative Extractor Limitations Not Documented

**Type:** Buried Information

**Evidence:**
- DAY3-SUMMARY.md:300-305 - "Declarative extractors work best for: simple value patterns, function signatures"
- DAY3-SUMMARY.md:193-221 - 5 violations undetected due to pattern matching limitations
- DAY4-SUMMARY.md:212-240 - False negative analysis explaining regex can't see function bodies

**Documentation Said:**
- No mention of declarative vs programmatic trade-offs in plan.md or README.md

**Impact:**
- Time lost: 0 (discovered post-exercise)
- Confusion level: Low (understood through execution)
- Blocker: No (50% detection still validates mechanism)

**Recommendation:**
- **Where:** plan.md Day 3 section + docs/extractors/ guide
- **What:** Add extractor type decision tree:
  ```markdown
  ## Declarative vs Programmatic Extractors

  **Use declarative (regex in config.toml) when:**
  - ✅ Detecting config values (`max_size: None`)
  - ✅ Detecting function signatures (`pub async fn get`)
  - ✅ Simple line-based patterns

  **Use programmatic (Rust extractor) when:**
  - ❌ Need to inspect function bodies (`validate_key()` call inside `get()`)
  - ❌ Multi-line patterns with context
  - ❌ AST analysis (type checking, scope)

  **For Day 3:** Use declarative for speed. Refine to programmatic in Day 5 if <90% needed.
  ```
- **Priority:** MEDIUM (improves extractor selection)

---

## Low Priority Polish

### Gap 6: Authority Tier Mapping Not Explicit

**Type:** Missing Information

**Evidence:**
- claims.toml shows mix of "expert" and "community" tiers
- No clear guidance on when to use which tier

**Documentation Said:**
- docs/sources/ templates mention tiers but no decision criteria

**Impact:**
- Time lost: 0 (team made reasonable choices)
- Confusion level: Low
- Blocker: No

**Recommendation:**
- **Where:** plan.md Day 1 section
- **What:** Add tier decision table:
  ```markdown
  ## Authority Tier Selection

  | Tier | Source Type | Examples | When to Use |
  |------|-------------|----------|-------------|
  | Tier 1 (Standards) | RFCs, W3C, IETF | Redis protocol spec | Normative requirements (MUST) |
  | Tier 2 (Vendor) | AWS, Redis Labs | ElastiCache guide | Official recommendations |
  | Tier 3 (Community) | Library docs, Stack Overflow | redis-rs patterns | Implementation patterns |
  ```
- **Priority:** LOW (nice-to-have clarity)

---

## Team Errors (For Reference)

### Error 1: Separate TOML Files for Extractors

**What team did wrong:**
- Created 10 separate `.toml` files in `.aphoria/extractors/` directory
- Assumed Aphoria loads extractors from separate files

**Doc was clear:**
- config.toml:64 - Shows `[[extractors.declarative]]` syntax
- Examples in config show inline declarative extractors

**Reason:**
- Misread extractor configuration format (assumed directory-based loading)

**Time lost:** 1 minute

**Not a documentation gap** - config.toml syntax was correct and visible

---

## Recommended Actions

### Immediate (Before Next Dogfood)

1. **Update plan.md Day 3 section** - Add "⚠️ CRITICAL: Day 3 is Flywheel Steps 4-5" callout box
2. **Update README.md header** - Add "🤖 Autonomous Workflow (REQUIRED)" section
3. **Update plan.md metrics** - Add detection rate context (0% → 50% → 90% progression)

### Short Term (This Week)

1. **Create pre-flight validation script** - `scripts/validate-day3-execution.sh` that checks for:
   - `.aphoria/extractors/*.toml` OR `[[extractors.declarative]]` in config.toml
   - `scan-v2.json` exists
   - DAY3-SUMMARY.md exists
2. **Add concept path alignment guide** - `docs/extractors/concept-path-matching.md`
3. **Document extractor type trade-offs** - `docs/extractors/declarative-vs-programmatic.md`

### Long Term (Next Month)

1. **Create "Common Mistakes" guide** - Consolidate msgqueue + cachewrap learnings
2. **Add Day 3 execution video** - Screen recording showing correct 6-phase workflow
3. **Refactor all dogfood plans** - Apply learnings to httpclient, dbpool, msgqueue docs

---

## Appendices

- [Progress Log](./progress-log-2026-02-11.md) - Team daily summaries
- [Implementation Review](./implementation-review-2026-02-11.md) - Code analysis
- [Gap Analysis](./gap-analysis-2026-02-11.md) - Detailed gap categorization

---

## Metrics Summary

| Metric | Value |
|--------|-------|
| **Total Time** | 1.4 hours (Days 1-4) |
| **vs Target** | 12-16 hours → 91% faster |
| **Pattern Reuse** | 35% (7/20 claims from 3 corpora) |
| **Detection Rate** | 50% (5/10 violations with declarative extractors) |
| **Violations Fixed** | 10/10 (100%) |
| **Tests Passing** | 10/10 (100%) |

**Hypothesis Validated:** Multi-domain flywheel works (corpus reuse + extractor creation)

**Caveats:** 50% detection below 90% target due to declarative extractor limitations (expected)

**Conclusion:** Exercise succeeded at validating autonomous learning mechanism. Documentation gaps are **workflow emphasis** not fundamental flaws.

---

**Evaluation Status:** ✅ COMPLETE

**Next Steps:** Implement immediate recommendations before next dogfood exercise

**Evaluator:** aphoria-doc-evaluator skill
**Evaluation Duration:** Phase 1-4 systematic observation