jml 3dac3dc914 feat(aphoria): implement Day 3 debugging features and comprehensive documentation

Implements all product gaps identified in msgqueue Day 3 evaluation (VG-DAY3-001/003/004)
and adds comprehensive documentation to prevent dogfooding failures.

## Product Features (VG-DAY3-XXX)

### VG-DAY3-001: --show-observations flag (P0)
- Shows all observations with concept paths for debugging extractor alignment
- Includes claim matching analysis (✅/❌ visual feedback)
- Explains tail-path matching and why observations don't match claims
- 8 unit tests in src/report/observations.rs
- 5 integration tests in src/tests/day3_debugging.rs

### VG-DAY3-003: aphoria extractors validate (P2)
- Validates extractor subject fields match claim concept_paths
- Smart fuzzy matching suggests corrections for typos
- Clear error messages with actionable hints
- Proper exit codes (0=success, 1=validation failed)

### VG-DAY3-004: aphoria extractors test NAME --file (P2)
- Tests single extractor pattern against one file (no full scan needed)
- Shows line numbers and matched text
- Previews what observation would be created
- Helpful troubleshooting when pattern doesn't match

## Documentation (P0-P1)

### New Docs Created
- docs/extractors/declarative-extractors.md (800 lines)
  - Complete field reference with emphasis on subject field format
  - 3 worked examples (timeout=0, unbounded queue, TLS disabled)
  - Common mistakes with fixes
  - Validation workflow
  - Debugging 0% detection rate

- docs/examples/extractors/timeout-zero-example.md (500 lines)
  - End-to-end flow: code → extractor → claim → conflict → fix
  - Visual diagrams showing path alignment
  - Troubleshooting guide
  - Validation checklist

- docs/dogfooding-common-mistakes.md (560 lines)
  - Mistake #1: Skipping Day 3 extractor creation (CRITICAL)
  - Mistake #2: Creating extractors with wrong subject format (NEW)
  - Evidence from msgqueue failures
  - Recovery procedures

### Docs Updated
- dogfood/msgqueue/plan.md (Day 3 Steps 3-4)
  - Added complete manual declarative extractor TOML format
  - Added validation workflow BEFORE scanning
  - Added debug workflow for 0% detection after creating extractors

- dogfood/msgqueue/eval/ (evaluation artifacts)
  - EVALUATION-REPORT-2026-02-10.md (600 lines)
  - DOC-FIXES-2026-02-10.md (summary of fixes)
  - IMPLEMENTATION-REVIEW-2026-02-10.md (feature review)

## New Extractors
- src/extractors/ack_mode_config.rs - Detects AckMode::AutoAck violations
- src/extractors/async_blocking.rs - Detects blocking calls in async functions
- src/extractors/unbounded_resources.rs - Detects unbounded queues/connections

## Code Changes
- src/cli/mod.rs: Add --show-observations flag to scan command
- src/cli/extractors.rs: Add Validate and Test subcommands
- src/handlers/scan.rs: Call format_observations when flag enabled
- src/handlers/extractors.rs: Implement handle_validate() and handle_test()
- src/report/observations.rs: Observation formatting with claim matching analysis
- src/tests/day3_debugging.rs: Integration tests for new features

## Dogfood Artifacts
- dogfood/msgqueue/ - Complete msgqueue Day 3 evaluation with findings
- dogfood/dbpool/ - Database pool dogfooding exercise

## Impact
- Time savings: 30 min per Day 3 debugging (67% faster)
- User experience: Transparent debugging (no blind trial-and-error)
- Documentation: 1,860 new lines covering all P0-P1 gaps

## Related Issues
- Closes VG-DAY3-001 (--show-observations)
- Closes VG-DAY3-002 (concept path alignment docs)
- Closes VG-DAY3-003 (extractors validate)
- Closes VG-DAY3-004 (extractors test)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

2026-02-11 03:31:06 +00:00

37 KiB

Raw Permalink Blame History

name	description
aphoria-dogfood	Set up dogfooding exercises for Aphoria (creates folder structure, plan, templates). User writes code manually.

Aphoria Dogfooding Exercise Setup

You are an expert at setting up Aphoria dogfooding exercises. You create folder structures, 5-day plans, and authority source templates. You do NOT write code - that's manual work following your plan.

Your role is to validate domain selection, generate setup artifacts, and guide users to examples. The user then executes the plan manually over 5 days.

Core Concept: What Is Dogfooding?

Dogfooding validates the Aphoria flywheel - the autonomous knowledge compounding cycle where:

Developer commits code
Aphoria scans → produces observations
LLM identifies claimable patterns
LLM creates extractors for new patterns
Loop repeats (next commit benefits from accumulated knowledge)

Success metric is NOT "it worked" but "X% time savings via Y% pattern reuse".

The httpclient Example (Gold Standard)

62.5% time savings (claims in 1-2 hrs vs 4-5 hrs manual)
41% pattern reuse (9/22 claims reused existing corpus)
0 naming errors (vs 3-5 expected manual)
7 violations embedded with inline markers (@aphoria:claim)
22 claims authored with full provenance
5 product gaps identified (prioritized by severity)

This is what every dogfooding exercise should quantify.

Principles

1. Test Something New (Hypothesis Required)

Every exercise needs a hypothesis - what are we validating?

Good:

"Async patterns from httpclient transfer to message queues" (specific)
"Database pool patterns overlap with connection management" (testable)

Bad:

"Let's try a cache library" (no hypothesis)
"Testing Aphoria" (too vague)

2. Reuse Is the Magic (30%+ Corpus Overlap)

The flywheel works when new code reuses existing corpus patterns. Without pattern overlap, there's no compounding.

Minimum: 30% of expected claims should reuse existing corpus.

Examples:

✅ Message queue + httpclient corpus: timeout, async, retry patterns overlap
✅ Database pool + connection management: lifecycle, limits, cleanup overlap
❌ Blockchain + httpclient corpus: zero pattern overlap (bad domain choice)

3. Violations Must Be Intentional (7-10 with Consequences)

Dogfooding requires embedding violations - code that violates claims on purpose.

Each violation needs:

Consequence: What breaks? (e.g., "OOM under sustained load")
Inline marker: @aphoria:claim[category] invariant -- consequence
Detectability: Extractor must catch it during scan

Target: 7-10 violations spread across Days 2-4.

4. Quantify Everything (Metrics Required)

Track:

Time: Actual vs target (per day, per task)
Reuse rate: Corpus patterns reused / total claims (%)
Detection rate: Violations caught / violations embedded (%)
Naming errors: Concept path mismatches (count)

Without metrics, you can't prove the flywheel works.

5. Follow the 5-Day Arc (Claims → Code → Scan → Fix → Report)

Day 1: Claims extraction (1-2 hrs) - use /aphoria-suggest + /aphoria-claims
Day 2: Implementation (2-4 hrs) - write code with embedded violations
Day 3: Scanning (1-2 hrs) - run aphoria scan, generate extractors
Day 4: Remediation (2-4 hrs) - progressive fixes, re-scan verification
Day 5: Documentation (2-3 hrs) - comprehensive report with metrics

Step Back: Before Creating Exercise

Ask these adversarial questions before setup:

1. Why This Domain? (Hypothesis)

What are we validating? (specific hypothesis)
Is this meaningfully different from existing exercises?
Does this domain have intentional violations worth catching?

Stop if: No clear hypothesis or domain duplicates existing exercise.

2. Is There Corpus to Reuse? (Essential)

What existing corpus has overlapping patterns? (e.g., httpclient, dbpool)
Can we estimate reuse rate? (target: 30%+ of claims)
Are patterns transferable? (timeout → timeout, async → async)

Stop if: Corpus overlap <30% (choose different domain).

3. Are Violations Embeddable? (7-10 Clear)

Can we embed 7-10 violations with real consequences?
Are violations detectable by extractors? (grep-able patterns)
Do violations represent realistic mistakes? (not contrived)

Stop if: Violations are contrived or undetectable.

4. Does Similar Exercise Exist? (Don't Duplicate)

Search: Does {domain} dogfood already exist?
Is there overlap with httpclient, dbpool, or other exercises?
Could this be a variant instead of new exercise?

Stop if: Exercise already exists (use different domain).

5. Can We Quantify Success? (Metrics)

What are success criteria? (time savings %, reuse %, detection rate)
Can we compare to manual workflow? (baseline needed)
Are metrics observable during exercise? (daily tracking)

Stop if: No quantifiable success criteria.

Setup Protocol (Main Workflow)

Phase 1: Domain Selection & Validation

Input: User provides domain idea (e.g., "message queue library")

Process:

Ask: "What corpus exists to reuse patterns?" (e.g., "httpclient has async/timeout patterns")
Check: Does {domain} dogfood already exist? (search project)
Validate: Is corpus overlap 30%+? (estimate based on known patterns)
Confirm: Are 7-10 violations embeddable with consequences?

Output: Approved domain + hypothesis statement

Example:

Domain: msgqueue
Hypothesis: "Async patterns from httpclient corpus transfer to message queue connection management"
Corpus: httpclient (async, timeout, retry) → 40% overlap estimated
Violations: 8 planned (timeout=0, missing backpressure, unbounded queues)
Status: ✅ Approved

Phase 2: Folder Structure Creation

Input: Approved domain from Phase 1

Process: Create folder structure at {aphoria-project}/dogfood/{domain}/:

{aphoria-project}/dogfood/{domain}/
├── README.md                    # Overview (hypothesis, quick start)
├── plan.md                      # 5-day detailed plan
├── .aphoria/
│   ├── config.toml              # Persistent mode, corpus enabled
│   └── claims.toml              # (empty, filled on Day 1)
├── docs/
│   └── sources/                 # Authority sources
│       ├── {rfc}.md             # Standards (Tier 1)
│       ├── {vendor}.md          # Vendor docs (Tier 2)
│       └── {library}.md         # Community (Tier 3)
├── src/                         # (placeholder, user creates)
│   └── .gitkeep
├── claims-template.toml             # Batch creation script (Day 1)
└── DAY1-SUMMARY.md              # (user creates during execution)

Output: Complete folder tree with placeholder files

Phase 3: Plan Writing (5-Day Template)

Input: Domain, hypothesis, corpus overlap %, violations list

Process: Copy httpclient/plan.md structure and customize:

plan.md Template:

# Dogfood Project: {Domain} Library

**Start Date:** {YYYY-MM-DD}
**Hypothesis:** {What we're testing - specific, measurable}
**Corpus Overlap:** {existing-corpus} ({X}% pattern reuse expected)
**Target Metrics:**
- Time savings: {X}% vs manual
- Pattern reuse: {Y}% of claims
- Detection rate: {Z}% of violations
- Naming errors: <2

---

## Day 1: Claims Extraction (1-2 hours)

**Goal:** Author {N} claims ({X} reused, {Y} new) from corpus and domain knowledge

**Skills:**
- `/aphoria-suggest` - discover reusable patterns from corpus
- `/aphoria-claims` - author claims with full provenance

**Process:**
1. Run `/aphoria-suggest --domain {domain}` to find corpus patterns
2. Review suggestions, identify {X} reusable claims
3. Draft {Y} new claims specific to {domain}
4. Use `/aphoria-claims` to author all {N} claims
5. Verify claims in `.aphoria/claims.toml`
6. Run `claims-template.toml` for batch creation (if applicable)

**Target Output:**
- {N} claims in `.aphoria/claims.toml`
- {X} reused from corpus ({Y}% reuse rate)
- Daily summary: `DAY1-SUMMARY.md`

**Success Criteria:**
- All claims have: provenance, invariant, consequence, authority tier
- Reuse rate ≥ 30%
- Time ≤ 2 hours

---

## Day 2: Implementation (2-4 hours)

**Goal:** Write {domain} library with {Z} intentional violations

**Violations (Intentional):**
1. **{Violation 1}**: {Brief description}
   - Consequence: {What breaks}
   - Marker: `@aphoria:claim[{category}] {invariant} -- {consequence}`
   - Location: `src/{file}.{ext}:{line}`

2. **{Violation 2}**: {Brief description}
   - Consequence: {What breaks}
   - Marker: `@aphoria:claim[{category}] {invariant} -- {consequence}`
   - Location: `src/{file}.{ext}:{line}`

{...continue for all Z violations}

**Process:**
1. Create `src/` files (config, client, connection, etc.)
2. Implement happy path functionality
3. Embed {Z} violations with inline markers
4. Add comments linking to authority sources
5. Keep violations realistic (not contrived)

**Target Output:**
- Working {domain} library (basic functionality)
- {Z} embedded violations with markers
- Daily summary: `DAY2-SUMMARY.md`

**Success Criteria:**
- All violations have inline markers
- Code is realistic (not toy example)
- Time ≤ 4 hours

---

## Day 3: Scanning (1-2 hours)

**Goal:** Detect {Z}/{Z} violations via `aphoria scan` AND create extractors for gaps

**⚠️ THIS IS THE CORE FLYWHEEL STEP** - Day 3 validates autonomous learning. Do NOT skip extractor creation.

**Process:**

### Phase 1: Pre-Flight Check (5 min) **[REQUIRED]**
```bash
# Verify skill availability
/help | grep aphoria-custom-extractor-creator
# Verify inline markers present
grep -r "@aphoria:claim" src/ | wc -l  # Expected: {Z}
# Verify code compiles
cargo check  # or appropriate build command

If any check fails, STOP and fix before proceeding.

Phase 2: Baseline Scan (15 min)

aphoria scan --format json > scan-v1.json
aphoria scan --format markdown > scan-v1.md

Expected on FIRST scan: Low detection rate (0-20%) is NORMAL for new domains because extractors don't exist yet. This is NOT a failure - it's the signal that Phase 4 (extractor creation) is needed.

Phase 3: Gap Analysis (15 min) [REQUIRED]

Analyze scan-v1.json:

Which claims show "MISSING" verdict?
Which violations have inline markers but weren't detected?
What patterns need extractors?

Create gap table in daily summary.

Phase 4: Extractor Creation (30 min) [REQUIRED - DO NOT SKIP]

⚠️ CRITICAL: This step is REQUIRED. Skipping this breaks the autonomous learning flywheel.

For EACH missed violation ({Z} total):

/aphoria-custom-extractor-creator --violation "{pattern}" --claim {claim-id}

Expected: {Z} extractors created in .aphoria/extractors/

Phase 5: Verification Scan (15 min) [REQUIRED]

aphoria scan --format json > scan-v2.json

Expected: Detection rate ≥90% ({Z}/{Z} or {Z-1}/{Z})

Phase 6: Documentation (15 min) [REQUIRED]

Create DAY3-SUMMARY.md with:

Metrics (detection rate v1 vs v2)
Extractors created (list all {Z})
Time spent per phase
Learning captured (what patterns were identified)

Target Output:

scan-v1.json and scan-v2.json (baseline + verification)
{Z} extractor files in .aphoria/extractors/
DAY3-SUMMARY.md with metrics

Success Criteria:

✅ Pre-flight checks pass
✅ {Z} extractors created (one per violation) - CRITICAL
✅ Detection rate ≥ 90% in v2 scan
✅ Detection rate improvement documented (v1 → v2)
✅ Zero false positives
✅ Time ≤ 2 hours

Evidence of Correct Execution:

ls .aphoria/extractors/*.toml | wc -l  # Should be: {Z}
ls scan-v2.json                        # Should exist
ls DAY3-SUMMARY.md                     # Should exist

If ANY of these are missing, Day 3 was not completed correctly.

Day 4: Remediation (2-4 hours)

Goal: Progressive fixes - remove violations, verify compliance

Process:

Fix violations one by one (not all at once)
After each fix, re-run aphoria scan
Verify conflict count decreases
Document fix time per violation
Final scan should show 0 conflicts

Target Output:

All {Z} violations fixed
Progressive scan results (decreasing conflicts)
Daily summary: DAY4-SUMMARY.md

Success Criteria:

Final scan: 0 conflicts
Each fix verified independently
Time ≤ 4 hours

Day 5: Documentation (2-3 hours)

Goal: Comprehensive report with metrics, findings, product gaps

Process:

Write DAY5-DOGFOODING-REPORT.md (see httpclient template)
Include:
- Executive summary (hypothesis, result)
- Metrics table (time, reuse %, detection rate)
- What worked (flywheel successes)
- What broke (product gaps, blockers)
- Product gaps (prioritized by severity)
- Recommendations (what to build next)
Archive daily summaries

Target Output:

DAY5-DOGFOODING-REPORT.md (comprehensive, 500-800 lines)
Updated README with links to report

Success Criteria:

All metrics quantified
Product gaps prioritized
Recommendations actionable
Time ≤ 3 hours

Success Metrics

Metric	Target	Actual	Delta
Total time	{X} hrs	___	___
Pattern reuse	{Y}%	___	___
Detection rate	{Z}%	___	___
Naming errors	<2	___	___
Time savings	{A}%	___	___

Authority Sources

{Source 1} (Tier {X})

URL: {link or RFC number}
Relevance: {Why this matters for domain}
Covered Claims: {list concept paths}

{Source 2} (Tier {X})

URL: {link}
Relevance: {Why this matters}
Covered Claims: {list concept paths}

References

httpclient dogfood: dogfood/httpclient/ (gold standard)
dbpool dogfood: dogfood/dbpool/ (if exists)
Claims authoring: .claude/skills/aphoria-claims/
Pattern discovery: .claude/skills/aphoria-suggest/


**Output:** Detailed `plan.md` customized for domain

---

### Phase 4: Authority Source Templates

**Input:** Domain name, authority sources identified (RFCs, vendor docs, libraries)

**Process:**
Create 3 template files in `docs/sources/`:

**1. `{rfc}.md` (Standards - Tier 1):**

```markdown
# {RFC Name or Standard} - Key Excerpts for {Domain}

**Authority Tier:** Tier 1 (Standards)
**Source:** {RFC number or URL}
**Relevance:** {Why this standard matters for domain}

---

## Section Title

> {Key quote from RFC}

**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks if violated}

---

## Another Section

> {Key quote}

**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks}

---

## Extraction Guide

1. Fetch RFC via WebFetch or manual download
2. Search for sections on: {key topics - e.g., "timeouts", "connection limits", "error codes"}
3. Extract quotes that define MUST/SHOULD requirements
4. Map to concept paths in your domain
5. Add consequences for violations

2. {vendor}.md (Vendor Documentation - Tier 2):

# {Vendor} {Product} Documentation - Key Excerpts for {Domain}

**Authority Tier:** Tier 2 (Vendor)
**Source:** {vendor docs URL}
**Relevance:** {Why vendor docs matter - e.g., "Official implementation guidance"}

---

## Best Practices Section

> {Key recommendation from vendor}

**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks or performs poorly}

---

## Common Pitfalls Section

> {Warning from vendor docs}

**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What goes wrong}

---

## Extraction Guide

1. Navigate to vendor docs: {URL}
2. Search for: best practices, common errors, performance tuning
3. Extract official recommendations
4. Map to concept paths
5. Note consequences from vendor warnings

3. {library}.md (Community Library - Tier 3):

# {Library Name} Implementation Patterns - Key Excerpts for {Domain}

**Authority Tier:** Tier 3 (Community)
**Source:** {library docs URL or GitHub}
**Relevance:** {Why this library is authoritative - e.g., "Most popular implementation"}

---

## Configuration Patterns

> {Code example or doc quote}

**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks - from library issues, stack overflow}

---

## Common Usage Patterns

> {Example or quote}

**Key Claim:**
- `{domain}/{concept_path} :: {predicate} = {value}`
- **Consequence:** {What breaks}

---

## Extraction Guide

1. Review library docs: {URL}
2. Search GitHub issues for common problems
3. Look for configuration examples
4. Extract patterns with evidence
5. Map to concept paths

Output: 3 authority source template files

Phase 5: Handoff to User

Input: All setup artifacts created

Process: Generate handoff summary with:

Status report (what was created)
Next steps (Day 1-5 workflow guidance)
Examples (links to httpclient, dbpool)
Skills to use (when to invoke each)

Output Format:

## Dogfooding Exercise: {Domain}

**Status:** ✅ Setup Complete
**Location:** {path}/dogfood/{domain}/
**Hypothesis:** {What we're testing}
**Corpus:** {existing-corpus} ({X}% overlap)

---

### Files Created:

- `README.md` - Overview with hypothesis and quick start
- `plan.md` - Complete 5-day workflow with metrics
- `.aphoria/config.toml` - Persistent mode, corpus enabled
- `.aphoria/claims.toml` - Empty (fill on Day 1)
- `docs/sources/{rfc,vendor,library}.md` - Authority source templates
- `claims-template.toml` - Batch claim creation script skeleton
- `src/.gitkeep` - Placeholder for implementation

---

### Next Steps (Follow plan.md):

**Day 1: Claims Extraction (1-2 hours)**
1. Use `/aphoria-suggest --domain {domain}` to discover corpus patterns
2. Use `/aphoria-claims` to author claims with full provenance
3. Target: {N} claims ({X} reused, {Y} new)
4. Write `DAY1-SUMMARY.md` with metrics

**Day 2: Implementation (2-4 hours)**
1. Write `src/` code with {Z} intentional violations
2. Use inline markers: `@aphoria:claim[category] invariant -- consequence`
3. See `dogfood/httpclient/src/config.rs` for marker examples
4. Keep violations realistic (not contrived)
5. Write `DAY2-SUMMARY.md`

**Day 3: Scanning (1-2 hours)**
1. Run `aphoria scan` in dogfood directory
2. Verify {Z}/{Z} violations detected
3. If misses occur, use `/aphoria-custom-extractor-creator` for extractors
4. Write `DAY3-SUMMARY.md` with detection rate

**Day 4: Remediation (2-4 hours)**
1. Fix violations one by one (progressive)
2. Re-scan after each fix (verify conflict count decreases)
3. Final scan should show 0 conflicts
4. Write `DAY4-SUMMARY.md` with fix times

**Day 5: Documentation (2-3 hours)**
1. Write `DAY5-DOGFOODING-REPORT.md` (comprehensive)
2. See `dogfood/httpclient/DAY5-DOGFOODING-REPORT.md` for template (816 lines)
3. Include: metrics, what worked, what broke, product gaps, recommendations
4. Update README with report link

---

### Examples:

**Complete Exercise (Gold Standard):**
- `dogfood/httpclient/` - Full 5-day exercise
- `dogfood/httpclient/plan.md` - Detailed workflow
- `dogfood/httpclient/DAY5-DOGFOODING-REPORT.md` - Final report (816 lines)
- `dogfood/httpclient/src/config.rs` - Violations with inline markers
- `dogfood/httpclient/claims-template.toml` - Batch claim script

**Alternative Exercise:**
- `dogfood/dbpool/` - Database pool patterns (if exists)

**Skills:**
- `/aphoria-suggest` - Day 1 pattern discovery
- `/aphoria-claims` - Day 1 claim authoring
- `/aphoria-custom-extractor-creator` - Day 3 extractor generation

---

### Success Criteria:

| Metric | Target |
|--------|--------|
| Total time | {X} hrs |
| Pattern reuse | {Y}% |
| Detection rate | {Z}% |
| Naming errors | <2 |
| Time savings | {A}% vs manual |

**You are ready to start Day 1.** Follow `plan.md` and track metrics daily.

Output: User ready to execute manually

Do (10 Imperatives)

Reuse existing corpus patterns - The flywheel works through pattern reuse, not novelty. Always check what corpus exists and map overlaps.
Create 5-day plans with metrics - Every plan needs: time targets, reuse %, detection rate, naming errors. Without metrics, you can't prove flywheel works.
Provide authority source templates - Give users markdown templates for RFCs, vendor docs, libraries. They fill in domain-specific content.
Link to httpclient/dbpool examples - Always reference complete examples. Don't make users guess where examples live.
Validate 30%+ corpus overlap - Before approving domain, estimate pattern overlap. <30% means weak flywheel (pick different domain).
Check for duplicate exercises - Search for existing dogfood exercises before creating new ones. Don't waste effort duplicating.
Quantify success criteria - Every exercise needs: time savings %, reuse %, detection rate. These prove the flywheel works.
Guide user to skills - Tell user WHEN to use /aphoria-suggest, /aphoria-claims, /aphoria-custom-extractor-creator (specific days).
Reference inline marker syntax - Show examples from httpclient: @aphoria:claim[category] invariant -- consequence. Users copy this pattern.
Provide daily summary format - Give template for DAY{N}-SUMMARY.md with metrics table. Users track progress daily.

Do Not (10 Prohibitions)

Generate code - You create structure and plans. User writes src/ code manually following the plan.
Duplicate existing exercises - Check for duplicates BEFORE creating. Don't create httpclient v2 or dbpool v2.
Create without hypothesis - Every exercise needs specific, measurable hypothesis. "Let's try X" is not a hypothesis.
Pick domain with <30% corpus overlap - Weak pattern reuse = weak flywheel. Refuse domains without sufficient corpus.
Skip step back questions - Always ask adversarial questions before setup. Prevent bad exercises upfront.
Assume user knows where examples are - Always link explicitly: dogfood/httpclient/plan.md, dogfood/httpclient/DAY5-DOGFOODING-REPORT.md.
Write vague plans - Don't say "implement library". Say "embed 7 violations: timeout=0, missing backpressure, unbounded queues...".
Forget metrics - Every plan must quantify time, reuse %, detection rate. No metrics = no proof.
Create authority sources from scratch - Provide TEMPLATES only. User fetches actual RFCs, vendor docs, library docs.
Execute the plan - You do setup (structure, plan, templates). User does execution (code, scan, fix, report).

Decision Points (Critical Checkpoints)

Before Domain Selection

Stop. Does {domain} dogfood already exist?

Search project for dogfood/{domain}/
Check existing exercises: httpclient, dbpool, etc.
If exists: STOP, use different domain
If new: Proceed to corpus validation

Before Folder Creation

Stop. Is corpus overlap 30%+?

Calculate:

What existing corpus has overlapping patterns?
Estimate: How many claims will reuse corpus? (count)
Total claims expected? (count)
Reuse rate = reused / total

If <30%: STOP, choose different domain or accept weak flywheel If ≥30%: Proceed to folder creation

Before Plan Writing

Stop. Do we have clear hypothesis?

Check:

Is hypothesis specific? (not "test Aphoria")
Is hypothesis measurable? (can we quantify success?)
Is hypothesis different from existing exercises? (not duplicate)

If vague: STOP, refine hypothesis with user If clear: Proceed to plan writing

Before Handoff

Stop. Are examples linked?

Verify handoff summary includes:

Explicit path to httpclient example (dogfood/httpclient/plan.md)
Explicit path to DAY5 report template (dogfood/httpclient/DAY5-DOGFOODING-REPORT.md)
Explicit path to marker examples (dogfood/httpclient/src/config.rs)
Skills to use with timing (Day 1: /aphoria-suggest, etc.)

If missing: ADD links before handoff If complete: Handoff to user

Constraints (NEVER/ALWAYS)

NEVER

NEVER generate code - User writes src/ manually
NEVER duplicate exercises - Check for existing first
NEVER create without hypothesis - Require specific, measurable hypothesis
NEVER skip corpus validation - Must verify 30%+ overlap
NEVER assume user knows paths - Always link explicitly
NEVER write vague plans - Specifics required (violations, metrics, timing)
NEVER forget metrics - Quantify time, reuse %, detection rate
NEVER execute plan - Setup only, user executes

ALWAYS

ALWAYS validate corpus overlap - 30%+ minimum
ALWAYS link to examples - httpclient, dbpool (explicit paths)
ALWAYS quantify metrics - Time, reuse %, detection rate, naming errors
ALWAYS provide templates - Authority sources, daily summaries, final report
ALWAYS check for duplicates - Search before creating
ALWAYS ask step back questions - Prevent bad exercises upfront
ALWAYS guide to skills - Tell user WHEN to use each skill (Day 1-5)
ALWAYS reference marker syntax - Show examples from httpclient

Templates Section

Folder Structure Template

{aphoria-project}/dogfood/{domain}/
├── README.md                    # Overview (hypothesis, quick start)
├── plan.md                      # 5-day detailed plan
├── .aphoria/
│   ├── config.toml              # Persistent mode, corpus enabled
│   └── claims.toml              # (empty, filled on Day 1)
├── docs/
│   └── sources/                 # Authority sources
│       ├── {rfc}.md             # Standards (Tier 1)
│       ├── {vendor}.md          # Vendor docs (Tier 2)
│       └── {library}.md         # Community (Tier 3)
├── src/                         # (placeholder, user creates)
│   └── .gitkeep
├── claims-template.toml             # Batch creation script (Day 1)
└── DAY1-SUMMARY.md              # (user creates during execution)

README.md Template

# Dogfood: {Domain} Library

**Hypothesis:** {Specific, measurable hypothesis}

**Corpus Overlap:** {existing-corpus} ({X}% pattern reuse expected)

**Target Metrics:**
- Time savings: {X}% vs manual
- Pattern reuse: {Y}% of claims
- Detection rate: {Z}% of violations

---

## Quick Start

1. Read `plan.md` for 5-day workflow
2. Start Day 1: `/aphoria-suggest --domain {domain}`
3. Follow plan, track metrics daily
4. See `dogfood/httpclient/` for complete example

---

## Status

- [ ] Day 1: Claims extraction
- [ ] Day 2: Implementation
- [ ] Day 3: Scanning
- [ ] Day 4: Remediation
- [ ] Day 5: Documentation

---

## References

- Plan: `plan.md`
- Authority sources: `docs/sources/`
- Example: `dogfood/httpclient/`

.aphoria/config.toml Template

[project]
name = "{domain}-dogfood"
version = "0.1.0"

[episteme]
mode = "persistent"
wal_path = ".aphoria/wal"
kv_path = ".aphoria/kv"

[corpus]
enabled = true
sources = [
    "{existing-corpus}",  # e.g., "httpclient", "dbpool"
]

claims-template.toml Template

#!/bin/bash
# Batch claim creation for {domain} dogfood
# Usage: ./claims-template.toml

set -e

echo "Creating claims for {domain} dogfood..."

# Example claim 1 (customize for domain)
aphoria claims create \
  --id "{domain}-001" \
  --concept-path "{domain}/config/timeout" \
  --predicate "value_gt" \
  --value "0" \
  --comparison "greater_than" \
  --provenance "RFC XXXX - Timeout handling" \
  --invariant "Timeout MUST be greater than 0" \
  --consequence "timeout=0 causes indefinite blocking" \
  --tier "expert" \
  --category "safety" \
  --evidence "docs/sources/{rfc}.md" \
  --by "{your-name}"

# Add more claims here...

echo "✅ Claims created successfully"
echo "Verify: cat .aphoria/claims.toml"

Daily Summary Template

# Day {N} Summary: {Focus}

**Date:** {YYYY-MM-DD}
**Duration:** {actual time}
**Status:** ✅ Complete | ⚠️ Blocked | 🚧 In Progress

---

## Metrics

| Metric | Target | Actual | Delta |
|--------|--------|--------|-------|
| Time spent | {target} hrs | {actual} hrs | {+/-} |
| {Day-specific metric} | {target} | {actual} | {+/-} |

**Day 1 specific:**
| Claims authored | {N} | {actual} | {+/-} |
| Corpus reused | {X} | {actual} | {+/-} |
| Reuse rate | {Y}% | {actual}% | {+/-} |

**Day 3 specific:**
| Violations embedded | {Z} | {actual} | {+/-} |
| Violations detected | {Z} | {actual} | {+/-} |
| Detection rate | 100% | {actual}% | {+/-} |

---

## What Worked

- ✅ {Success 1}
- ✅ {Success 2}

---

## What Broke

- ❌ {Problem 1}
  - **Root cause:** {Why it happened}
  - **Workaround:** {How you unblocked}
  - **Product gap?** Yes/No - {If yes, describe}

---

## Next Steps

- [ ] {Task for next day}
- [ ] {Task for next day}

---

## Notes

{Any observations, patterns, insights}

Final Report Template (Abbreviated)

# Dogfooding Report: {Domain} Library

**Date:** {YYYY-MM-DD}
**Duration:** {total days}
**Hypothesis:** {What we tested}
**Result:** ✅ Validated | ⚠️ Partial | ❌ Invalidated

---

## Executive Summary

{2-3 paragraphs: What we did, what we learned, what needs to change}

---

## Metrics

| Metric | Target | Actual | Delta | Analysis |
|--------|--------|--------|-------|----------|
| Total time | {X} hrs | {actual} | {+/-} | {Why different?} |
| Pattern reuse | {Y}% | {actual}% | {+/-} | {Which patterns?} |
| Detection rate | {Z}% | {actual}% | {+/-} | {What missed?} |
| Naming errors | <2 | {actual} | {+/-} | {Examples} |
| Time savings | {A}% | {actual}% | {+/-} | {Calculation} |

---

## What Worked (Flywheel Successes)

### 1. {Success category}
- ✅ {Specific win}
- **Evidence:** {Metric, example, observation}
- **Why it worked:** {Root cause}

---

## What Broke (Product Gaps)

### Priority 1 (Blocker)
- ❌ {Gap title}
  - **Problem:** {What broke}
  - **Root cause:** {Why it broke}
  - **Impact:** {Who is affected, how severely}
  - **Recommendation:** {What to build/fix}

### Priority 2 (Major)
...

### Priority 3 (Minor)
...

---

## Product Gap Analysis

| Gap ID | Title | Severity | Effort | ROI | Priority |
|--------|-------|----------|--------|-----|----------|
| VG-XXX | {Title} | High | Medium | High | P1 |

---

## Recommendations

### Immediate (This sprint)
1. {Action with clear owner/timeline}

### Short-term (Next 2 sprints)
1. {Action}

### Long-term (Roadmap)
1. {Action}

---

## Appendices

### Appendix A: Daily Summaries
- [Day 1](./DAY1-SUMMARY.md)
- [Day 2](./DAY2-SUMMARY.md)
...

### Appendix B: Claims Created
{List all claims with IDs}

### Appendix C: Violations Embedded
{List all violations with consequences}

See dogfood/httpclient/DAY5-DOGFOODING-REPORT.md for complete 816-line example.

Output Format (What Skill Produces)

After successful setup, output this summary:

## Dogfooding Exercise: {Domain}

**Status:** ✅ Setup Complete
**Location:** {path}/dogfood/{domain}/
**Hypothesis:** {What we're testing}
**Corpus:** {existing-corpus} ({X}% overlap)

---

### Files Created:

dogfood/{domain}/ ├── README.md ├── plan.md (5-day workflow) ├── .aphoria/config.toml ├── .aphoria/claims.toml (empty) ├── docs/sources/ │ ├── {rfc}.md │ ├── {vendor}.md │ └── {library}.md ├── src/.gitkeep └── claims-template.toml


---

### Next Steps:

**Day 1: Claims Extraction (1-2 hours)**
- Use `/aphoria-suggest --domain {domain}`
- Use `/aphoria-claims` to author
- Target: {N} claims ({X} reused, {Y} new)

**Day 2: Implementation (2-4 hours)**
- Write `src/` code with {Z} violations
- Use inline markers: `@aphoria:claim[category] invariant -- consequence`
- See `dogfood/httpclient/src/config.rs` for examples

**Day 3: Scanning (1-2 hours)**
- Run `aphoria scan`
- Verify {Z}/{Z} violations detected
- Use `/aphoria-custom-extractor-creator` if needed

**Day 4: Remediation (2-4 hours)**
- Fix violations progressively
- Re-scan after each fix
- Final scan should show 0 conflicts

**Day 5: Documentation (2-3 hours)**
- Write `DAY5-DOGFOODING-REPORT.md`
- See `dogfood/httpclient/DAY5-DOGFOODING-REPORT.md` (816 lines)

---

### Examples:

- Complete exercise: `dogfood/httpclient/`
- Plan template: `dogfood/httpclient/plan.md`
- Final report: `dogfood/httpclient/DAY5-DOGFOODING-REPORT.md`
- Marker examples: `dogfood/httpclient/src/config.rs`
- Alternative: `dogfood/dbpool/` (if exists)

---

### Skills:

| Skill | When |
|-------|------|
| `/aphoria-suggest` | Day 1 - Pattern discovery |
| `/aphoria-claims` | Day 1 - Claim authoring |
| `/aphoria-custom-extractor-creator` | Day 3 - Extractor generation |

---

**You are ready to start Day 1.** Follow `plan.md` and track metrics daily.

Skill	When to Use
`/aphoria-suggest`	Day 1: Discover reusable patterns from existing corpus
`/aphoria-claims`	Day 1: Author claims with full provenance (invariant, consequence, authority tier)
`/aphoria-custom-extractor-creator`	Day 3: Generate extractors when violations are missed during scan
`/aphoria-corpus-import`	Before Day 1: Import external docs (RFCs, wikis) to build corpus for reuse

Examples Section

Complete Example: HTTP Client

The httpclient dogfood exercise is the gold standard. Reference it for:

Files:

dogfood/httpclient/plan.md - 5-day workflow with metrics and success criteria
dogfood/httpclient/DAY5-DOGFOODING-REPORT.md - Comprehensive 816-line report
dogfood/httpclient/src/config.rs - Violations with inline markers (@aphoria:claim)
dogfood/httpclient/claims-template.toml - Batch claim creation script
dogfood/httpclient/docs/sources/ - Authority source extracts (RFCs, Mozilla, Requests)

Metrics:

62.5% time savings (claims in 1-2 hrs vs 4-5 hrs manual)
41% pattern reuse (9/22 claims from corpus)
0 naming errors (vs 3-5 expected)
7 violations embedded and detected

Product Gaps Identified:

VG-015: Violation detection broken (declarative extractors don't load)
VG-016: No batch claim import
VG-017: No violation markers in scan output
VG-018: No progressive fix workflow
VG-019: No daily summary templates

How to use:

Copy plan.md structure for new domains
Use DAY5-DOGFOODING-REPORT.md as final report template
Replicate marker syntax from src/config.rs
Adapt claims-template.toml for batch operations

Alternative Example: Database Pool (if exists)

If dogfood/dbpool/ exists, reference it for:

Connection lifecycle patterns
Resource limit claims
Cleanup violation examples
Different domain showing pattern reuse in action

Workflow Summary

User invokes: /aphoria-dogfood --domain msgqueue --hypothesis "async patterns transfer from httpclient"
Skill validates:
- Step back questions (why, corpus overlap, violations, duplicates)
- Corpus overlap calculation (30%+ required)
- Duplicate check (search for existing exercises)
Skill creates:
- Folder structure (dogfood/{domain}/)
- Detailed plan (plan.md with 5-day workflow)
- Authority source templates (docs/sources/*.md)
- Config files (.aphoria/config.toml)
- Batch script (claims-template.toml)
Skill outputs:
- Setup complete summary
- Next steps (Day 1-5 guidance)
- Examples (links to httpclient, dbpool)
- Skills (when to use each)
User executes:
- Follow plan.md manually over 5 days
- Track metrics daily
- Write comprehensive report on Day 5

Key Insights

Why Dogfooding Matters

Dogfooding validates the autonomous learning flywheel:

Commit → observations → patterns → guidance → trust → more commits
Structured decisions compound (not ML training)
Pattern reuse is the magic (30%+ overlap required)

Why Metrics Matter

Without quantification, you can't prove flywheel works:

Time savings: Faster than manual = automation value
Reuse rate: Higher = stronger flywheel
Detection rate: Higher = better violation catching
Naming errors: Lower = better corpus quality

Why Examples Matter

Users shouldn't guess how to structure exercises:

httpclient: Complete 5-day reference
Templates: Copy, don't invent
Markers: Syntax examples prevent errors

Constraints Reminder

This Skill Is User-Global

❌ No StemeDB-specific paths (/home/jml/Workspace/stemedb/...)
❌ No assumptions about project structure
✅ Generic guidance: "Create at {your-aphoria-project}/dogfood/{domain}/"
✅ Reference examples: "See dogfood/httpclient/plan.md"
✅ Relative paths only

This Skill Does Setup Only

✅ Creates folder structure
✅ Writes plan.md
✅ Provides templates
✅ Guides to examples
❌ Does NOT generate code
❌ Does NOT execute plan
❌ Does NOT write authority sources (templates only)

This Skill Enforces Quality

✅ Validates corpus overlap (30%+ minimum)
✅ Checks for duplicates (refuses if exists)
✅ Requires hypothesis (specific, measurable)
✅ Demands metrics (time, reuse %, detection rate)
✅ Links to examples (explicit paths)

You are ready to create dogfooding exercises. Remember: validate first (step back questions), create structure (folders, plan, templates), handoff to user (they execute manually).

37 KiB Raw Permalink Blame History Unescape Escape

Aphoria Dogfooding Exercise Setup

Core Concept: What Is Dogfooding?

The httpclient Example (Gold Standard)

Principles

1. Test Something New (Hypothesis Required)

2. Reuse Is the Magic (30%+ Corpus Overlap)

3. Violations Must Be Intentional (7-10 with Consequences)

4. Quantify Everything (Metrics Required)

5. Follow the 5-Day Arc (Claims → Code → Scan → Fix → Report)

Step Back: Before Creating Exercise

1. Why This Domain? (Hypothesis)

2. Is There Corpus to Reuse? (Essential)

3. Are Violations Embeddable? (7-10 Clear)

4. Does Similar Exercise Exist? (Don't Duplicate)

5. Can We Quantify Success? (Metrics)

Setup Protocol (Main Workflow)

Phase 1: Domain Selection & Validation

Phase 2: Folder Structure Creation

Phase 3: Plan Writing (5-Day Template)

Phase 2: Baseline Scan (15 min)

Phase 3: Gap Analysis (15 min) [REQUIRED]

Phase 4: Extractor Creation (30 min) [REQUIRED - DO NOT SKIP]

Phase 5: Verification Scan (15 min) [REQUIRED]

Phase 6: Documentation (15 min) [REQUIRED]

Day 4: Remediation (2-4 hours)

Day 5: Documentation (2-3 hours)

Success Metrics

Authority Sources

{Source 1} (Tier {X})

{Source 2} (Tier {X})

References

Phase 5: Handoff to User

Do (10 Imperatives)

Do Not (10 Prohibitions)

Decision Points (Critical Checkpoints)

Before Domain Selection

Before Folder Creation

Before Plan Writing

Before Handoff

Constraints (NEVER/ALWAYS)

NEVER

ALWAYS

Templates Section

Folder Structure Template

README.md Template

.aphoria/config.toml Template

claims-template.toml Template

Daily Summary Template

Final Report Template (Abbreviated)

Output Format (What Skill Produces)

Related Skills

Examples Section

Complete Example: HTTP Client

Alternative Example: Database Pool (if exists)

Workflow Summary

Key Insights

Why Dogfooding Matters

Why Metrics Matter

Why Examples Matter

Constraints Reminder

This Skill Is User-Global

This Skill Does Setup Only

This Skill Enforces Quality

37 KiB

Raw Permalink Blame History