# Documentation Evaluation Report **Project:** applications/aphoria/dogfood/cachewrap **Evaluation Date:** 2026-02-11 **Documentation Evaluated:** - `cachewrap/README.md` - `cachewrap/plan.md` - `cachewrap/.aphoria/config.toml` - `cachewrap/docs/sources/*.md` **Team Phase:** Complete (Days 1-5) --- ## Executive Summary **Overall Assessment:** Team completed cachewrap dogfood exercise in 1.4 hours (91% faster than 12-16 hour target) with 35% pattern reuse and 50% detection rate. **Partial use of LLM workflows** - team used `/aphoria-suggest` skill for Day 1 claims but manual workflow for Day 3 extractor creation, indicating documentation failed to emphasize **continuous** skill usage throughout all phases. **Gaps Found:** 3 critical, 2 medium, 1 low - **Critical Blockers:** 2 (Day 3 6-phase workflow, continuous LLM requirement) - **Documentation Clarity:** 1 (detection rate expectations) - **Medium Priority:** 2 (concept path alignment, extractor limitations) - **Low Priority:** 1 (authority tier guidance) **Team Errors (Not Gaps):** 1 (Iteration 1 separate TOML files) **Key Finding:** Documentation presents Day 3 extractor creation as optional/debugging step rather than **REQUIRED flywheel phase**. Team skipped manual extractor creation initially, attempted it after baseline scan, but documentation didn't explain this is **Steps 4-5 of the autonomous loop** that must happen EVERY commit. --- ## Critical Findings (High Priority) ### Finding 1: Day 3 Workflow Not Emphasized as Flywheel Core **Impact:** Team treated Day 3 as "run scan and look at output" instead of "identify gaps → create extractors" **Evidence:** - DAY3-SUMMARY.md shows 3 iterations before achieving 50% detection - First iteration created separate .toml files (wrong approach) - Second iteration added to config.toml but concept path mismatch - Third iteration fixed paths - **Total Day 3 time: 9 minutes** (extremely fast, suggests confusion resolved quickly) **Documentation Said:** - plan.md:111 - "Day 3: Scanning (1.5-2 hrs) - 6-phase workflow" - plan.md:119 - Lists 6 phases including "Extractor creation" as Phase 4 - plan.md:132 - "Use `/aphoria-custom-extractor-creator` for each gap" **What Was Missing:** - **No emphasis on "REQUIRED" status** - presented as optional debugging - **No connection to flywheel Steps 4-5** - didn't explain this IS the knowledge compounding mechanism - **No example of correct execution** - team had to discover via trial and error - **No pre-flight validation script** - no way to verify correct approach before starting **Root Cause:** Documentation treats Day 3 as "validation day" when it's actually "**knowledge capture day**" - the step where autonomous learning happens. **Recommendation:** - **Where:** plan.md Day 3 section (lines 111-180) - **What to add:** ```markdown ## ⚠️ CRITICAL: Day 3 is Flywheel Steps 4-5 This is NOT "run scan and check results." This IS: - Step 4: Identify claims without extractors (MISSING verdicts) - Step 5: Create extractors for those claims (autonomous learning) **Without extractor creation, NO knowledge is captured.** Evidence of correct execution: - `.aphoria/extractors/` directory with 8+ .toml files, OR - `.aphoria/config.toml` with `[[extractors.declarative]]` sections - `scan-v2.json` exists (verification scan AFTER extractor creation) - DAY3-SUMMARY.md documents detection rate improvement (v1 → v2) If ANY are missing, Day 3 was NOT completed correctly. ``` - **Priority:** **BLOCKER** (affects entire autonomous learning narrative) --- ### Finding 2: Continuous LLM Requirement Not Explicit **Impact:** Team used `/aphoria-suggest` skill on Day 1 but manual workflow on Day 3, missing that LLM workflows are required for BOTH phases **Evidence:** - `.aphoria/claims.toml` shows `created_by = "aphoria-suggest"` for all 20 claims ✅ - DAY3-SUMMARY.md shows manual `config.toml` editing (3 iterations) ❌ - No evidence of `/aphoria-custom-extractor-creator` invocations in daily summaries **Documentation Said:** - plan.md:121 - "Skills:" section lists `/aphoria-suggest`, `/aphoria-claims`, `/aphoria-custom-extractor-creator` - plan.md:132 - "Use `/aphoria-custom-extractor-creator` for each gap" - README.md:142 - Lists skills with "when to use" for each day **What Was Missing:** - **No "autonomous workflow" vs "manual CLI" distinction** - both presented as equal options - **No emphasis on LLM requirement for Day 3** - skill mentioned but not marked as required - **No explanation that manual extractor creation is DEBUG MODE** - team thought it was the primary workflow **Root Cause:** Documentation inherited from dbpool/msgqueue which predated full autonomous workflows. Doesn't reflect 2026-02-10 updates emphasizing LLM as **core mechanism, not optional feature**. **Recommendation:** - **Where:** README.md top section + plan.md Day 1 & Day 3 introductions - **What to add:** ```markdown ## 🤖 Autonomous Workflow (REQUIRED) Aphoria IS an LLM-driven continuous learning system. Skills ARE the product: - **Day 1:** `/aphoria-suggest` discovers patterns from 3 corpora → `/aphoria-claims` authors claims - **Day 3:** `/aphoria-custom-extractor-creator` generates extractors for missed claims **Manual CLI exists for debugging only.** If you find yourself: - Running `aphoria claims create` manually → Use `/aphoria-suggest` instead - Editing `.aphoria/config.toml` manually → Use `/aphoria-custom-extractor-creator` instead The dogfood exercise validates the **autonomous workflow**, not manual fallbacks. ``` - **Priority:** **BLOCKER** (contradicts product vision) --- ### Finding 3: Detection Rate Target Not Contextualized **Impact:** Team achieved 50% detection but docs said ≥90%, creating confusion about whether exercise succeeded **Evidence:** - DAY3-SUMMARY.md:18 - "**Detection Rate (v3)**: ≥90% | 50% | -40% | ⚠️ Below target" - DAY3-SUMMARY.md:186-229 - Section "Why 50% Instead of ≥90%?" analyzing root causes - DAY5-SUMMARY.md documents success despite 50% detection **Documentation Said:** - plan.md:7 - "**Target Metrics:** Detection rate: ≥90% of violations" - README.md:153 - "| Detection rate | ≥90% | Cross-cutting violation detection |" **What Was Missing:** - **No context on "first dogfood in new domain" expectations** - 0-50% detection is EXPECTED when corpus doesn't exist - **No distinction between "built-in extractor detection" vs "after custom extractor creation"** - targets imply built-ins should catch 90% - **No explanation that 50% with declarative extractors validates mechanism** - team thought they failed **Root Cause:** Target was written assuming programmatic extractors (can hit 90%), but cachewrap used declarative extractors (50% ceiling due to regex limitations). **Recommendation:** - **Where:** plan.md Day 3 success criteria (lines 170-178) - **What to add:** ```markdown **Detection Rate Expectations:** - **Baseline scan (v1):** 0-20% expected (built-in extractors don't know cache patterns) - **After declarative extractors (v2):** 50-70% achievable (regex pattern matching) - **After programmatic extractors (v3):** 90-100% target (AST analysis) **For this exercise:** 50% detection with declarative extractors **VALIDATES** the flywheel: - 0% → 50% proves knowledge compounding works - 50% ceiling proves declarative limitations (expected) - Remaining 5 violations require programmatic extractors (Day 5 refinement) **Success = improvement, not perfection.** The goal is proving the mechanism, not 100% coverage. ``` - **Priority:** **CRITICAL** (affects success interpretation) --- ## Medium Priority Improvements ### Gap 4: Concept Path Alignment Not Pre-Explained **Type:** Missing Information **Evidence:** - DAY3-SUMMARY.md:126-148 - Iteration 2 failed due to concept path mismatch - Team discovered: `claim.subject = "timeout"` → observation tail `config/timeout` (wrong) - Fix: `claim.subject = "cache/timeout"` → observation tail `cache/timeout` (correct) **Documentation Said:** - config.toml has examples but no explanation of tail-path matching **Impact:** - Time lost: ~1 minute (iteration 2) - Confusion level: Medium - Blocker: No (discovered and fixed quickly) **Recommendation:** - **Where:** plan.md Day 3 Phase 4 (Extractor Creation) - **What:** Add concept path alignment explanation: ```markdown **⚠️ Concept Path Alignment:** Extractor `claim.subject` creates observation tail-path (last 2 segments). This tail MUST match claim `concept_path` tail. Example: - Claim: `cache/timeout` - Extractor subject: `timeout` → Observation: `.../config/timeout` → Tail: `config/timeout` ❌ - Extractor subject: `cache/timeout` → Observation: `.../cache/timeout` → Tail: `cache/timeout` ✅ **Pattern:** Always prefix extractor subjects with claim namespace. ``` - **Priority:** MEDIUM (affects iteration count) --- ### Gap 5: Declarative Extractor Limitations Not Documented **Type:** Buried Information **Evidence:** - DAY3-SUMMARY.md:300-305 - "Declarative extractors work best for: simple value patterns, function signatures" - DAY3-SUMMARY.md:193-221 - 5 violations undetected due to pattern matching limitations - DAY4-SUMMARY.md:212-240 - False negative analysis explaining regex can't see function bodies **Documentation Said:** - No mention of declarative vs programmatic trade-offs in plan.md or README.md **Impact:** - Time lost: 0 (discovered post-exercise) - Confusion level: Low (understood through execution) - Blocker: No (50% detection still validates mechanism) **Recommendation:** - **Where:** plan.md Day 3 section + docs/extractors/ guide - **What:** Add extractor type decision tree: ```markdown ## Declarative vs Programmatic Extractors **Use declarative (regex in config.toml) when:** - ✅ Detecting config values (`max_size: None`) - ✅ Detecting function signatures (`pub async fn get`) - ✅ Simple line-based patterns **Use programmatic (Rust extractor) when:** - ❌ Need to inspect function bodies (`validate_key()` call inside `get()`) - ❌ Multi-line patterns with context - ❌ AST analysis (type checking, scope) **For Day 3:** Use declarative for speed. Refine to programmatic in Day 5 if <90% needed. ``` - **Priority:** MEDIUM (improves extractor selection) --- ## Low Priority Polish ### Gap 6: Authority Tier Mapping Not Explicit **Type:** Missing Information **Evidence:** - claims.toml shows mix of "expert" and "community" tiers - No clear guidance on when to use which tier **Documentation Said:** - docs/sources/ templates mention tiers but no decision criteria **Impact:** - Time lost: 0 (team made reasonable choices) - Confusion level: Low - Blocker: No **Recommendation:** - **Where:** plan.md Day 1 section - **What:** Add tier decision table: ```markdown ## Authority Tier Selection | Tier | Source Type | Examples | When to Use | |------|-------------|----------|-------------| | Tier 1 (Standards) | RFCs, W3C, IETF | Redis protocol spec | Normative requirements (MUST) | | Tier 2 (Vendor) | AWS, Redis Labs | ElastiCache guide | Official recommendations | | Tier 3 (Community) | Library docs, Stack Overflow | redis-rs patterns | Implementation patterns | ``` - **Priority:** LOW (nice-to-have clarity) --- ## Team Errors (For Reference) ### Error 1: Separate TOML Files for Extractors **What team did wrong:** - Created 10 separate `.toml` files in `.aphoria/extractors/` directory - Assumed Aphoria loads extractors from separate files **Doc was clear:** - config.toml:64 - Shows `[[extractors.declarative]]` syntax - Examples in config show inline declarative extractors **Reason:** - Misread extractor configuration format (assumed directory-based loading) **Time lost:** 1 minute **Not a documentation gap** - config.toml syntax was correct and visible --- ## Recommended Actions ### Immediate (Before Next Dogfood) 1. **Update plan.md Day 3 section** - Add "⚠️ CRITICAL: Day 3 is Flywheel Steps 4-5" callout box 2. **Update README.md header** - Add "🤖 Autonomous Workflow (REQUIRED)" section 3. **Update plan.md metrics** - Add detection rate context (0% → 50% → 90% progression) ### Short Term (This Week) 1. **Create pre-flight validation script** - `scripts/validate-day3-execution.sh` that checks for: - `.aphoria/extractors/*.toml` OR `[[extractors.declarative]]` in config.toml - `scan-v2.json` exists - DAY3-SUMMARY.md exists 2. **Add concept path alignment guide** - `docs/extractors/concept-path-matching.md` 3. **Document extractor type trade-offs** - `docs/extractors/declarative-vs-programmatic.md` ### Long Term (Next Month) 1. **Create "Common Mistakes" guide** - Consolidate msgqueue + cachewrap learnings 2. **Add Day 3 execution video** - Screen recording showing correct 6-phase workflow 3. **Refactor all dogfood plans** - Apply learnings to httpclient, dbpool, msgqueue docs --- ## Appendices - [Progress Log](./progress-log-2026-02-11.md) - Team daily summaries - [Implementation Review](./implementation-review-2026-02-11.md) - Code analysis - [Gap Analysis](./gap-analysis-2026-02-11.md) - Detailed gap categorization --- ## Metrics Summary | Metric | Value | |--------|-------| | **Total Time** | 1.4 hours (Days 1-4) | | **vs Target** | 12-16 hours → 91% faster | | **Pattern Reuse** | 35% (7/20 claims from 3 corpora) | | **Detection Rate** | 50% (5/10 violations with declarative extractors) | | **Violations Fixed** | 10/10 (100%) | | **Tests Passing** | 10/10 (100%) | **Hypothesis Validated:** Multi-domain flywheel works (corpus reuse + extractor creation) **Caveats:** 50% detection below 90% target due to declarative extractor limitations (expected) **Conclusion:** Exercise succeeded at validating autonomous learning mechanism. Documentation gaps are **workflow emphasis** not fundamental flaws. --- **Evaluation Status:** ✅ COMPLETE **Next Steps:** Implement immediate recommendations before next dogfood exercise **Evaluator:** aphoria-doc-evaluator skill **Evaluation Duration:** Phase 1-4 systematic observation