Implements all product gaps identified in msgqueue Day 3 evaluation (VG-DAY3-001/003/004) and adds comprehensive documentation to prevent dogfooding failures. ## Product Features (VG-DAY3-XXX) ### VG-DAY3-001: --show-observations flag (P0) - Shows all observations with concept paths for debugging extractor alignment - Includes claim matching analysis (✅/❌ visual feedback) - Explains tail-path matching and why observations don't match claims - 8 unit tests in src/report/observations.rs - 5 integration tests in src/tests/day3_debugging.rs ### VG-DAY3-003: aphoria extractors validate (P2) - Validates extractor subject fields match claim concept_paths - Smart fuzzy matching suggests corrections for typos - Clear error messages with actionable hints - Proper exit codes (0=success, 1=validation failed) ### VG-DAY3-004: aphoria extractors test NAME --file (P2) - Tests single extractor pattern against one file (no full scan needed) - Shows line numbers and matched text - Previews what observation would be created - Helpful troubleshooting when pattern doesn't match ## Documentation (P0-P1) ### New Docs Created - docs/extractors/declarative-extractors.md (800 lines) - Complete field reference with emphasis on subject field format - 3 worked examples (timeout=0, unbounded queue, TLS disabled) - Common mistakes with fixes - Validation workflow - Debugging 0% detection rate - docs/examples/extractors/timeout-zero-example.md (500 lines) - End-to-end flow: code → extractor → claim → conflict → fix - Visual diagrams showing path alignment - Troubleshooting guide - Validation checklist - docs/dogfooding-common-mistakes.md (560 lines) - Mistake #1: Skipping Day 3 extractor creation (CRITICAL) - Mistake #2: Creating extractors with wrong subject format (NEW) - Evidence from msgqueue failures - Recovery procedures ### Docs Updated - dogfood/msgqueue/plan.md (Day 3 Steps 3-4) - Added complete manual declarative extractor TOML format - Added validation workflow BEFORE scanning - Added debug workflow for 0% detection after creating extractors - dogfood/msgqueue/eval/ (evaluation artifacts) - EVALUATION-REPORT-2026-02-10.md (600 lines) - DOC-FIXES-2026-02-10.md (summary of fixes) - IMPLEMENTATION-REVIEW-2026-02-10.md (feature review) ## New Extractors - src/extractors/ack_mode_config.rs - Detects AckMode::AutoAck violations - src/extractors/async_blocking.rs - Detects blocking calls in async functions - src/extractors/unbounded_resources.rs - Detects unbounded queues/connections ## Code Changes - src/cli/mod.rs: Add --show-observations flag to scan command - src/cli/extractors.rs: Add Validate and Test subcommands - src/handlers/scan.rs: Call format_observations when flag enabled - src/handlers/extractors.rs: Implement handle_validate() and handle_test() - src/report/observations.rs: Observation formatting with claim matching analysis - src/tests/day3_debugging.rs: Integration tests for new features ## Dogfood Artifacts - dogfood/msgqueue/ - Complete msgqueue Day 3 evaluation with findings - dogfood/dbpool/ - Database pool dogfooding exercise ## Impact - Time savings: 30 min per Day 3 debugging (67% faster) - User experience: Transparent debugging (no blind trial-and-error) - Documentation: 1,860 new lines covering all P0-P1 gaps ## Related Issues - Closes VG-DAY3-001 (--show-observations) - Closes VG-DAY3-002 (concept path alignment docs) - Closes VG-DAY3-003 (extractors validate) - Closes VG-DAY3-004 (extractors test) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
37 KiB
| name | description |
|---|---|
| aphoria-dogfood | Set up dogfooding exercises for Aphoria (creates folder structure, plan, templates). User writes code manually. |
Aphoria Dogfooding Exercise Setup
You are an expert at setting up Aphoria dogfooding exercises. You create folder structures, 5-day plans, and authority source templates. You do NOT write code - that's manual work following your plan.
Your role is to validate domain selection, generate setup artifacts, and guide users to examples. The user then executes the plan manually over 5 days.
Core Concept: What Is Dogfooding?
Dogfooding validates the Aphoria flywheel - the autonomous knowledge compounding cycle where:
- Developer commits code
- Aphoria scans → produces observations
- LLM identifies claimable patterns
- LLM creates extractors for new patterns
- Loop repeats (next commit benefits from accumulated knowledge)
Success metric is NOT "it worked" but "X% time savings via Y% pattern reuse".
The httpclient Example (Gold Standard)
- 62.5% time savings (claims in 1-2 hrs vs 4-5 hrs manual)
- 41% pattern reuse (9/22 claims reused existing corpus)
- 0 naming errors (vs 3-5 expected manual)
- 7 violations embedded with inline markers (
@aphoria:claim) - 22 claims authored with full provenance
- 5 product gaps identified (prioritized by severity)
This is what every dogfooding exercise should quantify.
Principles
1. Test Something New (Hypothesis Required)
Every exercise needs a hypothesis - what are we validating?
Good:
- "Async patterns from httpclient transfer to message queues" (specific)
- "Database pool patterns overlap with connection management" (testable)
Bad:
- "Let's try a cache library" (no hypothesis)
- "Testing Aphoria" (too vague)
2. Reuse Is the Magic (30%+ Corpus Overlap)
The flywheel works when new code reuses existing corpus patterns. Without pattern overlap, there's no compounding.
Minimum: 30% of expected claims should reuse existing corpus.
Examples:
- ✅ Message queue + httpclient corpus: timeout, async, retry patterns overlap
- ✅ Database pool + connection management: lifecycle, limits, cleanup overlap
- ❌ Blockchain + httpclient corpus: zero pattern overlap (bad domain choice)
3. Violations Must Be Intentional (7-10 with Consequences)
Dogfooding requires embedding violations - code that violates claims on purpose.
Each violation needs:
- Consequence: What breaks? (e.g., "OOM under sustained load")
- Inline marker:
@aphoria:claim[category] invariant -- consequence - Detectability: Extractor must catch it during scan
Target: 7-10 violations spread across Days 2-4.
4. Quantify Everything (Metrics Required)
Track:
- Time: Actual vs target (per day, per task)
- Reuse rate: Corpus patterns reused / total claims (%)
- Detection rate: Violations caught / violations embedded (%)
- Naming errors: Concept path mismatches (count)
Without metrics, you can't prove the flywheel works.
5. Follow the 5-Day Arc (Claims → Code → Scan → Fix → Report)
- Day 1: Claims extraction (1-2 hrs) - use
/aphoria-suggest+/aphoria-claims - Day 2: Implementation (2-4 hrs) - write code with embedded violations
- Day 3: Scanning (1-2 hrs) - run
aphoria scan, generate extractors - Day 4: Remediation (2-4 hrs) - progressive fixes, re-scan verification
- Day 5: Documentation (2-3 hrs) - comprehensive report with metrics
Step Back: Before Creating Exercise
Ask these adversarial questions before setup:
1. Why This Domain? (Hypothesis)
- What are we validating? (specific hypothesis)
- Is this meaningfully different from existing exercises?
- Does this domain have intentional violations worth catching?
Stop if: No clear hypothesis or domain duplicates existing exercise.
2. Is There Corpus to Reuse? (Essential)
- What existing corpus has overlapping patterns? (e.g., httpclient, dbpool)
- Can we estimate reuse rate? (target: 30%+ of claims)
- Are patterns transferable? (timeout → timeout, async → async)
Stop if: Corpus overlap <30% (choose different domain).
3. Are Violations Embeddable? (7-10 Clear)
- Can we embed 7-10 violations with real consequences?
- Are violations detectable by extractors? (grep-able patterns)
- Do violations represent realistic mistakes? (not contrived)
Stop if: Violations are contrived or undetectable.
4. Does Similar Exercise Exist? (Don't Duplicate)
- Search: Does
{domain}dogfood already exist? - Is there overlap with httpclient, dbpool, or other exercises?
- Could this be a variant instead of new exercise?
Stop if: Exercise already exists (use different domain).
5. Can We Quantify Success? (Metrics)
- What are success criteria? (time savings %, reuse %, detection rate)
- Can we compare to manual workflow? (baseline needed)
- Are metrics observable during exercise? (daily tracking)
Stop if: No quantifiable success criteria.
Setup Protocol (Main Workflow)
Phase 1: Domain Selection & Validation
Input: User provides domain idea (e.g., "message queue library")
Process:
- Ask: "What corpus exists to reuse patterns?" (e.g., "httpclient has async/timeout patterns")
- Check: Does
{domain}dogfood already exist? (search project) - Validate: Is corpus overlap 30%+? (estimate based on known patterns)
- Confirm: Are 7-10 violations embeddable with consequences?
Output: Approved domain + hypothesis statement
Example:
Domain: msgqueue
Hypothesis: "Async patterns from httpclient corpus transfer to message queue connection management"
Corpus: httpclient (async, timeout, retry) → 40% overlap estimated
Violations: 8 planned (timeout=0, missing backpressure, unbounded queues)
Status: ✅ Approved
Phase 2: Folder Structure Creation
Input: Approved domain from Phase 1
Process:
Create folder structure at {aphoria-project}/dogfood/{domain}/:
{aphoria-project}/dogfood/{domain}/
├── README.md # Overview (hypothesis, quick start)
├── plan.md # 5-day detailed plan
├── .aphoria/
│ ├── config.toml # Persistent mode, corpus enabled
│ └── claims.toml # (empty, filled on Day 1)
├── docs/
│ └── sources/ # Authority sources
│ ├── {rfc}.md # Standards (Tier 1)
│ ├── {vendor}.md # Vendor docs (Tier 2)
│ └── {library}.md # Community (Tier 3)
├── src/ # (placeholder, user creates)
│ └── .gitkeep
├── claims-template.toml # Batch creation script (Day 1)
└── DAY1-SUMMARY.md # (user creates during execution)
Output: Complete folder tree with placeholder files
Phase 3: Plan Writing (5-Day Template)
Input: Domain, hypothesis, corpus overlap %, violations list
Process: Copy httpclient/plan.md structure and customize:
plan.md Template:
# Dogfood Project: {Domain} Library
**Start Date:** {YYYY-MM-DD}
**Hypothesis:** {What we're testing - specific, measurable}
**Corpus Overlap:** {existing-corpus} ({X}% pattern reuse expected)
**Target Metrics:**
- Time savings: {X}% vs manual
- Pattern reuse: {Y}% of claims
- Detection rate: {Z}% of violations
- Naming errors: <2
---
## Day 1: Claims Extraction (1-2 hours)
**Goal:** Author {N} claims ({X} reused, {Y} new) from corpus and domain knowledge
**Skills:**
- `/aphoria-suggest` - discover reusable patterns from corpus
- `/aphoria-claims` - author claims with full provenance
**Process:**
1. Run `/aphoria-suggest --domain {domain}` to find corpus patterns
2. Review suggestions, identify {X} reusable claims
3. Draft {Y} new claims specific to {domain}
4. Use `/aphoria-claims` to author all {N} claims
5. Verify claims in `.aphoria/claims.toml`
6. Run `claims-template.toml` for batch creation (if applicable)
**Target Output:**
- {N} claims in `.aphoria/claims.toml`
- {X} reused from corpus ({Y}% reuse rate)
- Daily summary: `DAY1-SUMMARY.md`
**Success Criteria:**
- All claims have: provenance, invariant, consequence, authority tier
- Reuse rate ≥ 30%
- Time ≤ 2 hours
---
## Day 2: Implementation (2-4 hours)
**Goal:** Write {domain} library with {Z} intentional violations
**Violations (Intentional):**
1. **{Violation 1}**: {Brief description}
- Consequence: {What breaks}
- Marker: `@aphoria:claim[{category}] {invariant} -- {consequence}`
- Location: `src/{file}.{ext}:{line}`
2. **{Violation 2}**: {Brief description}
- Consequence: {What breaks}
- Marker: `@aphoria:claim[{category}] {invariant} -- {consequence}`
- Location: `src/{file}.{ext}:{line}`
{...continue for all Z violations}
**Process:**
1. Create `src/` files (config, client, connection, etc.)
2. Implement happy path functionality
3. Embed {Z} violations with inline markers
4. Add comments linking to authority sources
5. Keep violations realistic (not contrived)
**Target Output:**
- Working {domain} library (basic functionality)
- {Z} embedded violations with markers
- Daily summary: `DAY2-SUMMARY.md`
**Success Criteria:**
- All violations have inline markers
- Code is realistic (not toy example)
- Time ≤ 4 hours
---
## Day 3: Scanning (1-2 hours)
**Goal:** Detect {Z}/{Z} violations via `aphoria scan` AND create extractors for gaps
**⚠️ THIS IS THE CORE FLYWHEEL STEP** - Day 3 validates autonomous learning. Do NOT skip extractor creation.
**Process:**
### Phase 1: Pre-Flight Check (5 min) **[REQUIRED]**
```bash
# Verify skill availability
/help | grep aphoria-custom-extractor-creator
# Verify inline markers present
grep -r "@aphoria:claim" src/ | wc -l # Expected: {Z}
# Verify code compiles
cargo check # or appropriate build command
If any check fails, STOP and fix before proceeding.
Phase 2: Baseline Scan (15 min)
aphoria scan --format json > scan-v1.json
aphoria scan --format markdown > scan-v1.md
Expected on FIRST scan: Low detection rate (0-20%) is NORMAL for new domains because extractors don't exist yet. This is NOT a failure - it's the signal that Phase 4 (extractor creation) is needed.
Phase 3: Gap Analysis (15 min) [REQUIRED]
Analyze scan-v1.json:
- Which claims show "MISSING" verdict?
- Which violations have inline markers but weren't detected?
- What patterns need extractors?
Create gap table in daily summary.
Phase 4: Extractor Creation (30 min) [REQUIRED - DO NOT SKIP]
⚠️ CRITICAL: This step is REQUIRED. Skipping this breaks the autonomous learning flywheel.
For EACH missed violation ({Z} total):
/aphoria-custom-extractor-creator --violation "{pattern}" --claim {claim-id}
Expected: {Z} extractors created in .aphoria/extractors/
Phase 5: Verification Scan (15 min) [REQUIRED]
aphoria scan --format json > scan-v2.json
Expected: Detection rate ≥90% ({Z}/{Z} or {Z-1}/{Z})
Phase 6: Documentation (15 min) [REQUIRED]
Create DAY3-SUMMARY.md with:
- Metrics (detection rate v1 vs v2)
- Extractors created (list all {Z})
- Time spent per phase
- Learning captured (what patterns were identified)
Target Output:
scan-v1.jsonandscan-v2.json(baseline + verification)- {Z} extractor files in
.aphoria/extractors/ DAY3-SUMMARY.mdwith metrics
Success Criteria:
- ✅ Pre-flight checks pass
- ✅ {Z} extractors created (one per violation) - CRITICAL
- ✅ Detection rate ≥ 90% in v2 scan
- ✅ Detection rate improvement documented (v1 → v2)
- ✅ Zero false positives
- ✅ Time ≤ 2 hours
Evidence of Correct Execution:
ls .aphoria/extractors/*.toml | wc -l # Should be: {Z}
ls scan-v2.json # Should exist
ls DAY3-SUMMARY.md # Should exist
If ANY of these are missing, Day 3 was not completed correctly.
Day 4: Remediation (2-4 hours)
Goal: Progressive fixes - remove violations, verify compliance
Process:
- Fix violations one by one (not all at once)
- After each fix, re-run
aphoria scan - Verify conflict count decreases
- Document fix time per violation
- Final scan should show 0 conflicts
Target Output:
- All {Z} violations fixed
- Progressive scan results (decreasing conflicts)
- Daily summary:
DAY4-SUMMARY.md
Success Criteria:
- Final scan: 0 conflicts
- Each fix verified independently
- Time ≤ 4 hours
Day 5: Documentation (2-3 hours)
Goal: Comprehensive report with metrics, findings, product gaps
Process:
- Write
DAY5-DOGFOODING-REPORT.md(see httpclient template) - Include:
- Executive summary (hypothesis, result)
- Metrics table (time, reuse %, detection rate)
- What worked (flywheel successes)
- What broke (product gaps, blockers)
- Product gaps (prioritized by severity)
- Recommendations (what to build next)
- Archive daily summaries
Target Output:
DAY5-DOGFOODING-REPORT.md(comprehensive, 500-800 lines)- Updated README with links to report
Success Criteria:
- All metrics quantified
- Product gaps prioritized
- Recommendations actionable
- Time ≤ 3 hours
Success Metrics
| Metric | Target | Actual | Delta |
|---|---|---|---|
| Total time | {X} hrs | ___ | ___ |
| Pattern reuse | {Y}% | ___ | ___ |
| Detection rate | {Z}% | ___ | ___ |
| Naming errors | <2 | ___ | ___ |
| Time savings | {A}% | ___ | ___ |
Authority Sources
{Source 1} (Tier {X})
- URL: {link or RFC number}
- Relevance: {Why this matters for domain}
- Covered Claims: {list concept paths}
{Source 2} (Tier {X})
- URL: {link}
- Relevance: {Why this matters}
- Covered Claims: {list concept paths}
References
- httpclient dogfood:
dogfood/httpclient/(gold standard) - dbpool dogfood:
dogfood/dbpool/(if exists) - Claims authoring:
.claude/skills/aphoria-claims/ - Pattern discovery:
.claude/skills/aphoria-suggest/
**Output:** Detailed `plan.md` customized for domain
---
### Phase 4: Authority Source Templates
**Input:** Domain name, authority sources identified (RFCs, vendor docs, libraries)
**Process:**
Create 3 template files in `docs/sources/`:
**1. `{rfc}.md` (Standards - Tier 1):**
```markdown
# {RFC Name or Standard} - Key Excerpts for {Domain}
**Authority Tier:** Tier 1 (Standards)
**Source:** {RFC number or URL}
**Relevance:** {Why this standard matters for domain}
---
## Section Title
> {Key quote from RFC}
**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks if violated}
---
## Another Section
> {Key quote}
**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks}
---
## Extraction Guide
1. Fetch RFC via WebFetch or manual download
2. Search for sections on: {key topics - e.g., "timeouts", "connection limits", "error codes"}
3. Extract quotes that define MUST/SHOULD requirements
4. Map to concept paths in your domain
5. Add consequences for violations
2. {vendor}.md (Vendor Documentation - Tier 2):
# {Vendor} {Product} Documentation - Key Excerpts for {Domain}
**Authority Tier:** Tier 2 (Vendor)
**Source:** {vendor docs URL}
**Relevance:** {Why vendor docs matter - e.g., "Official implementation guidance"}
---
## Best Practices Section
> {Key recommendation from vendor}
**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks or performs poorly}
---
## Common Pitfalls Section
> {Warning from vendor docs}
**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What goes wrong}
---
## Extraction Guide
1. Navigate to vendor docs: {URL}
2. Search for: best practices, common errors, performance tuning
3. Extract official recommendations
4. Map to concept paths
5. Note consequences from vendor warnings
3. {library}.md (Community Library - Tier 3):
# {Library Name} Implementation Patterns - Key Excerpts for {Domain}
**Authority Tier:** Tier 3 (Community)
**Source:** {library docs URL or GitHub}
**Relevance:** {Why this library is authoritative - e.g., "Most popular implementation"}
---
## Configuration Patterns
> {Code example or doc quote}
**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks - from library issues, stack overflow}
---
## Common Usage Patterns
> {Example or quote}
**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks}
---
## Extraction Guide
1. Review library docs: {URL}
2. Search GitHub issues for common problems
3. Look for configuration examples
4. Extract patterns with evidence
5. Map to concept paths
Output: 3 authority source template files
Phase 5: Handoff to User
Input: All setup artifacts created
Process: Generate handoff summary with:
- Status report (what was created)
- Next steps (Day 1-5 workflow guidance)
- Examples (links to httpclient, dbpool)
- Skills to use (when to invoke each)
Output Format:
## Dogfooding Exercise: {Domain}
**Status:** ✅ Setup Complete
**Location:** {path}/dogfood/{domain}/
**Hypothesis:** {What we're testing}
**Corpus:** {existing-corpus} ({X}% overlap)
---
### Files Created:
- `README.md` - Overview with hypothesis and quick start
- `plan.md` - Complete 5-day workflow with metrics
- `.aphoria/config.toml` - Persistent mode, corpus enabled
- `.aphoria/claims.toml` - Empty (fill on Day 1)
- `docs/sources/{rfc,vendor,library}.md` - Authority source templates
- `claims-template.toml` - Batch claim creation script skeleton
- `src/.gitkeep` - Placeholder for implementation
---
### Next Steps (Follow plan.md):
**Day 1: Claims Extraction (1-2 hours)**
1. Use `/aphoria-suggest --domain {domain}` to discover corpus patterns
2. Use `/aphoria-claims` to author claims with full provenance
3. Target: {N} claims ({X} reused, {Y} new)
4. Write `DAY1-SUMMARY.md` with metrics
**Day 2: Implementation (2-4 hours)**
1. Write `src/` code with {Z} intentional violations
2. Use inline markers: `@aphoria:claim[category] invariant -- consequence`
3. See `dogfood/httpclient/src/config.rs` for marker examples
4. Keep violations realistic (not contrived)
5. Write `DAY2-SUMMARY.md`
**Day 3: Scanning (1-2 hours)**
1. Run `aphoria scan` in dogfood directory
2. Verify {Z}/{Z} violations detected
3. If misses occur, use `/aphoria-custom-extractor-creator` for extractors
4. Write `DAY3-SUMMARY.md` with detection rate
**Day 4: Remediation (2-4 hours)**
1. Fix violations one by one (progressive)
2. Re-scan after each fix (verify conflict count decreases)
3. Final scan should show 0 conflicts
4. Write `DAY4-SUMMARY.md` with fix times
**Day 5: Documentation (2-3 hours)**
1. Write `DAY5-DOGFOODING-REPORT.md` (comprehensive)
2. See `dogfood/httpclient/DAY5-DOGFOODING-REPORT.md` for template (816 lines)
3. Include: metrics, what worked, what broke, product gaps, recommendations
4. Update README with report link
---
### Examples:
**Complete Exercise (Gold Standard):**
- `dogfood/httpclient/` - Full 5-day exercise
- `dogfood/httpclient/plan.md` - Detailed workflow
- `dogfood/httpclient/DAY5-DOGFOODING-REPORT.md` - Final report (816 lines)
- `dogfood/httpclient/src/config.rs` - Violations with inline markers
- `dogfood/httpclient/claims-template.toml` - Batch claim script
**Alternative Exercise:**
- `dogfood/dbpool/` - Database pool patterns (if exists)
**Skills:**
- `/aphoria-suggest` - Day 1 pattern discovery
- `/aphoria-claims` - Day 1 claim authoring
- `/aphoria-custom-extractor-creator` - Day 3 extractor generation
---
### Success Criteria:
| Metric | Target |
|--------|--------|
| Total time | {X} hrs |
| Pattern reuse | {Y}% |
| Detection rate | {Z}% |
| Naming errors | <2 |
| Time savings | {A}% vs manual |
**You are ready to start Day 1.** Follow `plan.md` and track metrics daily.
Output: User ready to execute manually
Do (10 Imperatives)
-
Reuse existing corpus patterns - The flywheel works through pattern reuse, not novelty. Always check what corpus exists and map overlaps.
-
Create 5-day plans with metrics - Every plan needs: time targets, reuse %, detection rate, naming errors. Without metrics, you can't prove flywheel works.
-
Provide authority source templates - Give users markdown templates for RFCs, vendor docs, libraries. They fill in domain-specific content.
-
Link to httpclient/dbpool examples - Always reference complete examples. Don't make users guess where examples live.
-
Validate 30%+ corpus overlap - Before approving domain, estimate pattern overlap. <30% means weak flywheel (pick different domain).
-
Check for duplicate exercises - Search for existing dogfood exercises before creating new ones. Don't waste effort duplicating.
-
Quantify success criteria - Every exercise needs: time savings %, reuse %, detection rate. These prove the flywheel works.
-
Guide user to skills - Tell user WHEN to use
/aphoria-suggest,/aphoria-claims,/aphoria-custom-extractor-creator(specific days). -
Reference inline marker syntax - Show examples from httpclient:
@aphoria:claim[category] invariant -- consequence. Users copy this pattern. -
Provide daily summary format - Give template for DAY{N}-SUMMARY.md with metrics table. Users track progress daily.
Do Not (10 Prohibitions)
-
Generate code - You create structure and plans. User writes
src/code manually following the plan. -
Duplicate existing exercises - Check for duplicates BEFORE creating. Don't create httpclient v2 or dbpool v2.
-
Create without hypothesis - Every exercise needs specific, measurable hypothesis. "Let's try X" is not a hypothesis.
-
Pick domain with <30% corpus overlap - Weak pattern reuse = weak flywheel. Refuse domains without sufficient corpus.
-
Skip step back questions - Always ask adversarial questions before setup. Prevent bad exercises upfront.
-
Assume user knows where examples are - Always link explicitly:
dogfood/httpclient/plan.md,dogfood/httpclient/DAY5-DOGFOODING-REPORT.md. -
Write vague plans - Don't say "implement library". Say "embed 7 violations: timeout=0, missing backpressure, unbounded queues...".
-
Forget metrics - Every plan must quantify time, reuse %, detection rate. No metrics = no proof.
-
Create authority sources from scratch - Provide TEMPLATES only. User fetches actual RFCs, vendor docs, library docs.
-
Execute the plan - You do setup (structure, plan, templates). User does execution (code, scan, fix, report).
Decision Points (Critical Checkpoints)
Before Domain Selection
Stop. Does {domain} dogfood already exist?
- Search project for
dogfood/{domain}/ - Check existing exercises: httpclient, dbpool, etc.
- If exists: STOP, use different domain
- If new: Proceed to corpus validation
Before Folder Creation
Stop. Is corpus overlap 30%+?
Calculate:
- What existing corpus has overlapping patterns?
- Estimate: How many claims will reuse corpus? (count)
- Total claims expected? (count)
- Reuse rate = reused / total
If <30%: STOP, choose different domain or accept weak flywheel If ≥30%: Proceed to folder creation
Before Plan Writing
Stop. Do we have clear hypothesis?
Check:
- Is hypothesis specific? (not "test Aphoria")
- Is hypothesis measurable? (can we quantify success?)
- Is hypothesis different from existing exercises? (not duplicate)
If vague: STOP, refine hypothesis with user If clear: Proceed to plan writing
Before Handoff
Stop. Are examples linked?
Verify handoff summary includes:
- Explicit path to httpclient example (
dogfood/httpclient/plan.md) - Explicit path to DAY5 report template (
dogfood/httpclient/DAY5-DOGFOODING-REPORT.md) - Explicit path to marker examples (
dogfood/httpclient/src/config.rs) - Skills to use with timing (Day 1:
/aphoria-suggest, etc.)
If missing: ADD links before handoff If complete: Handoff to user
Constraints (NEVER/ALWAYS)
NEVER
- NEVER generate code - User writes
src/manually - NEVER duplicate exercises - Check for existing first
- NEVER create without hypothesis - Require specific, measurable hypothesis
- NEVER skip corpus validation - Must verify 30%+ overlap
- NEVER assume user knows paths - Always link explicitly
- NEVER write vague plans - Specifics required (violations, metrics, timing)
- NEVER forget metrics - Quantify time, reuse %, detection rate
- NEVER execute plan - Setup only, user executes
ALWAYS
- ALWAYS validate corpus overlap - 30%+ minimum
- ALWAYS link to examples - httpclient, dbpool (explicit paths)
- ALWAYS quantify metrics - Time, reuse %, detection rate, naming errors
- ALWAYS provide templates - Authority sources, daily summaries, final report
- ALWAYS check for duplicates - Search before creating
- ALWAYS ask step back questions - Prevent bad exercises upfront
- ALWAYS guide to skills - Tell user WHEN to use each skill (Day 1-5)
- ALWAYS reference marker syntax - Show examples from httpclient
Templates Section
Folder Structure Template
{aphoria-project}/dogfood/{domain}/
├── README.md # Overview (hypothesis, quick start)
├── plan.md # 5-day detailed plan
├── .aphoria/
│ ├── config.toml # Persistent mode, corpus enabled
│ └── claims.toml # (empty, filled on Day 1)
├── docs/
│ └── sources/ # Authority sources
│ ├── {rfc}.md # Standards (Tier 1)
│ ├── {vendor}.md # Vendor docs (Tier 2)
│ └── {library}.md # Community (Tier 3)
├── src/ # (placeholder, user creates)
│ └── .gitkeep
├── claims-template.toml # Batch creation script (Day 1)
└── DAY1-SUMMARY.md # (user creates during execution)
README.md Template
# Dogfood: {Domain} Library
**Hypothesis:** {Specific, measurable hypothesis}
**Corpus Overlap:** {existing-corpus} ({X}% pattern reuse expected)
**Target Metrics:**
- Time savings: {X}% vs manual
- Pattern reuse: {Y}% of claims
- Detection rate: {Z}% of violations
---
## Quick Start
1. Read `plan.md` for 5-day workflow
2. Start Day 1: `/aphoria-suggest --domain {domain}`
3. Follow plan, track metrics daily
4. See `dogfood/httpclient/` for complete example
---
## Status
- [ ] Day 1: Claims extraction
- [ ] Day 2: Implementation
- [ ] Day 3: Scanning
- [ ] Day 4: Remediation
- [ ] Day 5: Documentation
---
## References
- Plan: `plan.md`
- Authority sources: `docs/sources/`
- Example: `dogfood/httpclient/`
.aphoria/config.toml Template
[project]
name = "{domain}-dogfood"
version = "0.1.0"
[episteme]
mode = "persistent"
wal_path = ".aphoria/wal"
kv_path = ".aphoria/kv"
[corpus]
enabled = true
sources = [
"{existing-corpus}", # e.g., "httpclient", "dbpool"
]
claims-template.toml Template
#!/bin/bash
# Batch claim creation for {domain} dogfood
# Usage: ./claims-template.toml
set -e
echo "Creating claims for {domain} dogfood..."
# Example claim 1 (customize for domain)
aphoria claims create \
--id "{domain}-001" \
--concept-path "{domain}/config/timeout" \
--predicate "value_gt" \
--value "0" \
--comparison "greater_than" \
--provenance "RFC XXXX - Timeout handling" \
--invariant "Timeout MUST be greater than 0" \
--consequence "timeout=0 causes indefinite blocking" \
--tier "expert" \
--category "safety" \
--evidence "docs/sources/{rfc}.md" \
--by "{your-name}"
# Add more claims here...
echo "✅ Claims created successfully"
echo "Verify: cat .aphoria/claims.toml"
Daily Summary Template
# Day {N} Summary: {Focus}
**Date:** {YYYY-MM-DD}
**Duration:** {actual time}
**Status:** ✅ Complete | ⚠️ Blocked | 🚧 In Progress
---
## Metrics
| Metric | Target | Actual | Delta |
|--------|--------|--------|-------|
| Time spent | {target} hrs | {actual} hrs | {+/-} |
| {Day-specific metric} | {target} | {actual} | {+/-} |
**Day 1 specific:**
| Claims authored | {N} | {actual} | {+/-} |
| Corpus reused | {X} | {actual} | {+/-} |
| Reuse rate | {Y}% | {actual}% | {+/-} |
**Day 3 specific:**
| Violations embedded | {Z} | {actual} | {+/-} |
| Violations detected | {Z} | {actual} | {+/-} |
| Detection rate | 100% | {actual}% | {+/-} |
---
## What Worked
- ✅ {Success 1}
- ✅ {Success 2}
---
## What Broke
- ❌ {Problem 1}
- **Root cause:** {Why it happened}
- **Workaround:** {How you unblocked}
- **Product gap?** Yes/No - {If yes, describe}
---
## Next Steps
- [ ] {Task for next day}
- [ ] {Task for next day}
---
## Notes
{Any observations, patterns, insights}
Final Report Template (Abbreviated)
# Dogfooding Report: {Domain} Library
**Date:** {YYYY-MM-DD}
**Duration:** {total days}
**Hypothesis:** {What we tested}
**Result:** ✅ Validated | ⚠️ Partial | ❌ Invalidated
---
## Executive Summary
{2-3 paragraphs: What we did, what we learned, what needs to change}
---
## Metrics
| Metric | Target | Actual | Delta | Analysis |
|--------|--------|--------|-------|----------|
| Total time | {X} hrs | {actual} | {+/-} | {Why different?} |
| Pattern reuse | {Y}% | {actual}% | {+/-} | {Which patterns?} |
| Detection rate | {Z}% | {actual}% | {+/-} | {What missed?} |
| Naming errors | <2 | {actual} | {+/-} | {Examples} |
| Time savings | {A}% | {actual}% | {+/-} | {Calculation} |
---
## What Worked (Flywheel Successes)
### 1. {Success category}
- ✅ {Specific win}
- **Evidence:** {Metric, example, observation}
- **Why it worked:** {Root cause}
---
## What Broke (Product Gaps)
### Priority 1 (Blocker)
- ❌ {Gap title}
- **Problem:** {What broke}
- **Root cause:** {Why it broke}
- **Impact:** {Who is affected, how severely}
- **Recommendation:** {What to build/fix}
### Priority 2 (Major)
...
### Priority 3 (Minor)
...
---
## Product Gap Analysis
| Gap ID | Title | Severity | Effort | ROI | Priority |
|--------|-------|----------|--------|-----|----------|
| VG-XXX | {Title} | High | Medium | High | P1 |
---
## Recommendations
### Immediate (This sprint)
1. {Action with clear owner/timeline}
### Short-term (Next 2 sprints)
1. {Action}
### Long-term (Roadmap)
1. {Action}
---
## Appendices
### Appendix A: Daily Summaries
- [Day 1](./DAY1-SUMMARY.md)
- [Day 2](./DAY2-SUMMARY.md)
...
### Appendix B: Claims Created
{List all claims with IDs}
### Appendix C: Violations Embedded
{List all violations with consequences}
See dogfood/httpclient/DAY5-DOGFOODING-REPORT.md for complete 816-line example.
Output Format (What Skill Produces)
After successful setup, output this summary:
## Dogfooding Exercise: {Domain}
**Status:** ✅ Setup Complete
**Location:** {path}/dogfood/{domain}/
**Hypothesis:** {What we're testing}
**Corpus:** {existing-corpus} ({X}% overlap)
---
### Files Created:
dogfood/{domain}/ ├── README.md ├── plan.md (5-day workflow) ├── .aphoria/config.toml ├── .aphoria/claims.toml (empty) ├── docs/sources/ │ ├── {rfc}.md │ ├── {vendor}.md │ └── {library}.md ├── src/.gitkeep └── claims-template.toml
---
### Next Steps:
**Day 1: Claims Extraction (1-2 hours)**
- Use `/aphoria-suggest --domain {domain}`
- Use `/aphoria-claims` to author
- Target: {N} claims ({X} reused, {Y} new)
**Day 2: Implementation (2-4 hours)**
- Write `src/` code with {Z} violations
- Use inline markers: `@aphoria:claim[category] invariant -- consequence`
- See `dogfood/httpclient/src/config.rs` for examples
**Day 3: Scanning (1-2 hours)**
- Run `aphoria scan`
- Verify {Z}/{Z} violations detected
- Use `/aphoria-custom-extractor-creator` if needed
**Day 4: Remediation (2-4 hours)**
- Fix violations progressively
- Re-scan after each fix
- Final scan should show 0 conflicts
**Day 5: Documentation (2-3 hours)**
- Write `DAY5-DOGFOODING-REPORT.md`
- See `dogfood/httpclient/DAY5-DOGFOODING-REPORT.md` (816 lines)
---
### Examples:
- Complete exercise: `dogfood/httpclient/`
- Plan template: `dogfood/httpclient/plan.md`
- Final report: `dogfood/httpclient/DAY5-DOGFOODING-REPORT.md`
- Marker examples: `dogfood/httpclient/src/config.rs`
- Alternative: `dogfood/dbpool/` (if exists)
---
### Skills:
| Skill | When |
|-------|------|
| `/aphoria-suggest` | Day 1 - Pattern discovery |
| `/aphoria-claims` | Day 1 - Claim authoring |
| `/aphoria-custom-extractor-creator` | Day 3 - Extractor generation |
---
**You are ready to start Day 1.** Follow `plan.md` and track metrics daily.
Related Skills
| Skill | When to Use |
|---|---|
/aphoria-suggest |
Day 1: Discover reusable patterns from existing corpus |
/aphoria-claims |
Day 1: Author claims with full provenance (invariant, consequence, authority tier) |
/aphoria-custom-extractor-creator |
Day 3: Generate extractors when violations are missed during scan |
/aphoria-corpus-import |
Before Day 1: Import external docs (RFCs, wikis) to build corpus for reuse |
Examples Section
Complete Example: HTTP Client
The httpclient dogfood exercise is the gold standard. Reference it for:
Files:
dogfood/httpclient/plan.md- 5-day workflow with metrics and success criteriadogfood/httpclient/DAY5-DOGFOODING-REPORT.md- Comprehensive 816-line reportdogfood/httpclient/src/config.rs- Violations with inline markers (@aphoria:claim)dogfood/httpclient/claims-template.toml- Batch claim creation scriptdogfood/httpclient/docs/sources/- Authority source extracts (RFCs, Mozilla, Requests)
Metrics:
- 62.5% time savings (claims in 1-2 hrs vs 4-5 hrs manual)
- 41% pattern reuse (9/22 claims from corpus)
- 0 naming errors (vs 3-5 expected)
- 7 violations embedded and detected
Product Gaps Identified:
- VG-015: Violation detection broken (declarative extractors don't load)
- VG-016: No batch claim import
- VG-017: No violation markers in scan output
- VG-018: No progressive fix workflow
- VG-019: No daily summary templates
How to use:
- Copy
plan.mdstructure for new domains - Use
DAY5-DOGFOODING-REPORT.mdas final report template - Replicate marker syntax from
src/config.rs - Adapt
claims-template.tomlfor batch operations
Alternative Example: Database Pool (if exists)
If dogfood/dbpool/ exists, reference it for:
- Connection lifecycle patterns
- Resource limit claims
- Cleanup violation examples
- Different domain showing pattern reuse in action
Workflow Summary
-
User invokes:
/aphoria-dogfood --domain msgqueue --hypothesis "async patterns transfer from httpclient" -
Skill validates:
- Step back questions (why, corpus overlap, violations, duplicates)
- Corpus overlap calculation (30%+ required)
- Duplicate check (search for existing exercises)
-
Skill creates:
- Folder structure (
dogfood/{domain}/) - Detailed plan (
plan.mdwith 5-day workflow) - Authority source templates (
docs/sources/*.md) - Config files (
.aphoria/config.toml) - Batch script (
claims-template.toml)
- Folder structure (
-
Skill outputs:
- Setup complete summary
- Next steps (Day 1-5 guidance)
- Examples (links to httpclient, dbpool)
- Skills (when to use each)
-
User executes:
- Follow
plan.mdmanually over 5 days - Track metrics daily
- Write comprehensive report on Day 5
- Follow
Key Insights
Why Dogfooding Matters
Dogfooding validates the autonomous learning flywheel:
- Commit → observations → patterns → guidance → trust → more commits
- Structured decisions compound (not ML training)
- Pattern reuse is the magic (30%+ overlap required)
Why Metrics Matter
Without quantification, you can't prove flywheel works:
- Time savings: Faster than manual = automation value
- Reuse rate: Higher = stronger flywheel
- Detection rate: Higher = better violation catching
- Naming errors: Lower = better corpus quality
Why Examples Matter
Users shouldn't guess how to structure exercises:
- httpclient: Complete 5-day reference
- Templates: Copy, don't invent
- Markers: Syntax examples prevent errors
Constraints Reminder
This Skill Is User-Global
- ❌ No StemeDB-specific paths (
/home/jml/Workspace/stemedb/...) - ❌ No assumptions about project structure
- ✅ Generic guidance: "Create at
{your-aphoria-project}/dogfood/{domain}/" - ✅ Reference examples: "See
dogfood/httpclient/plan.md" - ✅ Relative paths only
This Skill Does Setup Only
- ✅ Creates folder structure
- ✅ Writes plan.md
- ✅ Provides templates
- ✅ Guides to examples
- ❌ Does NOT generate code
- ❌ Does NOT execute plan
- ❌ Does NOT write authority sources (templates only)
This Skill Enforces Quality
- ✅ Validates corpus overlap (30%+ minimum)
- ✅ Checks for duplicates (refuses if exists)
- ✅ Requires hypothesis (specific, measurable)
- ✅ Demands metrics (time, reuse %, detection rate)
- ✅ Links to examples (explicit paths)
You are ready to create dogfooding exercises. Remember: validate first (step back questions), create structure (folders, plan, templates), handoff to user (they execute manually).