stemedb/applications/aphoria/dogfood/dbpool/eval-archive-2026-02-09/EVALUATION-REPORT-2026-02-09-run2.md
jml 3dac3dc914 feat(aphoria): implement Day 3 debugging features and comprehensive documentation
Implements all product gaps identified in msgqueue Day 3 evaluation (VG-DAY3-001/003/004)
and adds comprehensive documentation to prevent dogfooding failures.

## Product Features (VG-DAY3-XXX)

### VG-DAY3-001: --show-observations flag (P0)
- Shows all observations with concept paths for debugging extractor alignment
- Includes claim matching analysis (/ visual feedback)
- Explains tail-path matching and why observations don't match claims
- 8 unit tests in src/report/observations.rs
- 5 integration tests in src/tests/day3_debugging.rs

### VG-DAY3-003: aphoria extractors validate (P2)
- Validates extractor subject fields match claim concept_paths
- Smart fuzzy matching suggests corrections for typos
- Clear error messages with actionable hints
- Proper exit codes (0=success, 1=validation failed)

### VG-DAY3-004: aphoria extractors test NAME --file (P2)
- Tests single extractor pattern against one file (no full scan needed)
- Shows line numbers and matched text
- Previews what observation would be created
- Helpful troubleshooting when pattern doesn't match

## Documentation (P0-P1)

### New Docs Created
- docs/extractors/declarative-extractors.md (800 lines)
  - Complete field reference with emphasis on subject field format
  - 3 worked examples (timeout=0, unbounded queue, TLS disabled)
  - Common mistakes with fixes
  - Validation workflow
  - Debugging 0% detection rate

- docs/examples/extractors/timeout-zero-example.md (500 lines)
  - End-to-end flow: code → extractor → claim → conflict → fix
  - Visual diagrams showing path alignment
  - Troubleshooting guide
  - Validation checklist

- docs/dogfooding-common-mistakes.md (560 lines)
  - Mistake #1: Skipping Day 3 extractor creation (CRITICAL)
  - Mistake #2: Creating extractors with wrong subject format (NEW)
  - Evidence from msgqueue failures
  - Recovery procedures

### Docs Updated
- dogfood/msgqueue/plan.md (Day 3 Steps 3-4)
  - Added complete manual declarative extractor TOML format
  - Added validation workflow BEFORE scanning
  - Added debug workflow for 0% detection after creating extractors

- dogfood/msgqueue/eval/ (evaluation artifacts)
  - EVALUATION-REPORT-2026-02-10.md (600 lines)
  - DOC-FIXES-2026-02-10.md (summary of fixes)
  - IMPLEMENTATION-REVIEW-2026-02-10.md (feature review)

## New Extractors
- src/extractors/ack_mode_config.rs - Detects AckMode::AutoAck violations
- src/extractors/async_blocking.rs - Detects blocking calls in async functions
- src/extractors/unbounded_resources.rs - Detects unbounded queues/connections

## Code Changes
- src/cli/mod.rs: Add --show-observations flag to scan command
- src/cli/extractors.rs: Add Validate and Test subcommands
- src/handlers/scan.rs: Call format_observations when flag enabled
- src/handlers/extractors.rs: Implement handle_validate() and handle_test()
- src/report/observations.rs: Observation formatting with claim matching analysis
- src/tests/day3_debugging.rs: Integration tests for new features

## Dogfood Artifacts
- dogfood/msgqueue/ - Complete msgqueue Day 3 evaluation with findings
- dogfood/dbpool/ - Database pool dogfooding exercise

## Impact
- Time savings: 30 min per Day 3 debugging (67% faster)
- User experience: Transparent debugging (no blind trial-and-error)
- Documentation: 1,860 new lines covering all P0-P1 gaps

## Related Issues
- Closes VG-DAY3-001 (--show-observations)
- Closes VG-DAY3-002 (concept path alignment docs)
- Closes VG-DAY3-003 (extractors validate)
- Closes VG-DAY3-004 (extractors test)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 03:31:06 +00:00

15 KiB

Documentation Evaluation Report - Run 2

Project: dogfood/dbpool Evaluation Date: 2026-02-09 Documentation Evaluated:

  • CHECKLIST.md (Days 1-2)
  • plan.md
  • README.md
  • docs/claim-extraction-example.md

Team Phase: Completed Day 2 (Implementation)


Executive Summary

Overall Assessment: Team produced excellent Day 2 implementation but completely skipped Day 1, creating a critical blocker for Day 3.

Critical Finding: Documentation presents Days 1-5 as parallel reference sections rather than sequential prerequisites. Team executed Day 2 perfectly (7/7 files, 21/21 tests passing, all violations embedded) but created 0/27 corpus claims from Day 1.

Impact: Day 3 scanning cannot proceed (scan requires claims). Estimated 8-10 hours lost (4-5 hours on Day 2, must backfill 4-6 hours for Day 1).

Gaps Found: 5 documentation gaps (2 critical)

  • Missing Information: 2 gaps
  • Unclear Instructions: 2 gaps
  • Buried Information: 1 gap

Team Errors (Not Gaps): 0

Critical Blockers: 1 (Day 1 skipped - prevents Day 3 scan)


Critical Findings (High Priority)

Finding 1: No Prerequisite Relationship Between Days

Type: Missing Information Impact: BLOCKER - Team skipped Day 1, cannot proceed to Day 3

What Happened:

  • Team read CHECKLIST.md Day 1 section
  • Team understood Day 1 requirements (progress log shows "Ready to Build Claims")
  • Team proceeded directly to Day 2 implementation
  • Team created 0/27 corpus claims
  • Day 3 scan will return 0 violations (nothing to compare against)

Evidence:

Team execution:

# Day 1 requirement
$ curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=vendor' | \
    jq '.items | map(select(.subject | startswith("dbpool"))) | length'
0

# Day 2 execution
$ ls src/
config.rs  connection.rs  error.rs  lib.rs  pool.rs

$ cargo test
test result: ok. 21 passed; 0 failed

Documentation does NOT say "Complete Day 1 before Day 2":

CHECKLIST.md:103:
## Day 1: Create 25-30 Corpus Claims

[...280 lines of Day 1 content...]

CHECKLIST.md:276:
## Day 2: Implementation - Information Needed

Root Cause:

Documentation structure implies days are sections of a reference document, not sequential workflow steps.

Location: CHECKLIST.md between lines 280-276

Fix Required:

Add explicit checkpoint between Day 1 and Day 2:

---

✅ **Day 1 Complete** when verification shows 25-30 claims in corpus

**⛔ CHECKPOINT: DO NOT PROCEED TO DAY 2 WITHOUT COMPLETING DAY 1**

Day 2 implementation requires corpus claims to exist for Day 3 scanning.
Without claims, scan will return 0 violations and the dogfood demo cannot proceed.

**Verify before continuing:**
\`\`\`bash
curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=vendor' | \\
  jq '.items | map(select(.subject | startswith("dbpool"))) | length'
# Must show: 25-30 (current: 0)
\`\`\`

If verification fails, complete Day 1 checkboxes (27 claims) before proceeding.

---

## Day 2: Implementation - Information Needed

Priority: CRITICAL - Must fix before next team


Finding 2: No Automated Verification Between Days

Type: Missing Information Impact: BLOCKER ENABLER - Nothing prevents sequence violation

What Happened:

  • Success criteria exist in Day 1 (CHECKLIST.md:105-110)
  • Team did not run verification command
  • Day 2 section does not require Day 1 verification
  • No automated check prevents Day 2 without Day 1

Evidence:

Documentation shows success criteria but doesn't require running it:

CHECKLIST.md:105:
**Success Criteria:**
\`\`\`bash
curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=vendor' | \\
  jq '.items | map(select(.subject | startswith("dbpool"))) | length'
# Expected output: 25-30
\`\`\`

Team behavior:

  • Did not run this command before Day 2
  • Proceeded to Day 2 without verification
  • No automated check caught the violation

Root Cause:

Success criteria presented as "expected output" documentation, not as "you must run this" checkpoint.

Location: Need new script + CHECKLIST.md Day 2 prerequisite

Fix Required:

1. Create automated verifier:

File: scripts/verify-day1.sh

#!/bin/bash
# Verify Day 1 completion before proceeding to Day 2

set -e

echo "=== Day 1 Verification ==="
echo

CLAIMS_COUNT=$(curl -s 'http://localhost:18180/v1/aphoria/corpus?sources[]=vendor' 2>/dev/null | \\
  jq '.items | map(select(.subject | startswith("dbpool"))) | length')

if [ "$CLAIMS_COUNT" -ge 25 ] && [ "$CLAIMS_COUNT" -le 30 ]; then
    echo "✓ Day 1 complete: $CLAIMS_COUNT claims in corpus"
    exit 0
else
    echo "✗ Day 1 incomplete: $CLAIMS_COUNT claims (expected 25-30)"
    echo
    echo "Complete Day 1 before proceeding:"
    echo "  1. Read: cat docs/claim-extraction-example.md"
    echo "  2. Create: Follow CHECKLIST.md Day 1, Step 3 (27 checkbox items)"
    echo "  3. Verify: Run this script again"
    exit 1
fi

2. Add to Day 2 start:

CHECKLIST.md:276:

## Day 2: Implement Code with Intentional Violations

**Prerequisites:** Day 1 complete (25-30 claims in corpus)

- [ ] **Verify Day 1 completion**
  \`\`\`bash
  ./scripts/verify-day1.sh
  \`\`\`
  **⛔ Must pass before proceeding**

  Expected output:
  \`\`\`
  === Day 1 Verification ===

  ✓ Day 1 complete: 27 claims in corpus
  \`\`\`

  If verification fails, return to Day 1 and complete all 27 claim checkboxes.

Priority: CRITICAL - Prevents future sequence violations


Medium Priority Improvements

Finding 3: Day 2 Heading Implies Reference, Not Action

Type: Unclear Instructions Impact: Contributes to day sequence confusion

What Happened:

Day 1 heading: "Create 25-30 Corpus Claims" (action verb) Day 2 heading: "Implementation - Information Needed" (passive tone)

Team may have interpreted Day 2 as reference material rather than sequential action.

Location: CHECKLIST.md:276

Fix:

Change heading and add structured metadata:

-## Day 2: Implementation - Information Needed
+## Day 2: Implement Code with Intentional Violations
+
+**Prerequisites:** Day 1 complete (25-30 claims in corpus)
+
+**Deliverable:** Working Rust library with 7 intentional violations
+
+**Success Criteria:**
+\`\`\`bash
+cargo test
+# Expected: 21/21 tests pass (violations are semantic, not syntax)
+\`\`\`
+
+**Estimated Time:** 4-5 hours
+
+---

Priority: MEDIUM (Improves clarity, prevents confusion)


Finding 4: README Day-by-Day Table Shows All Days Equally

Type: Unclear Instructions Impact: First impression suggests parallel sections

What Happened:

README.md shows all days in table with equal visual weight. No indication of prerequisites or sequence.

Location: README.md:70-78

Fix:

Add prerequisites column:

| Day | Focus | Key Deliverable | Prerequisites | Time |
|-----|-------|-----------------|---------------|------|
| **Day 1** | Corpus Building | 25-30 claims created via CLI | *(start here)* | 4-6 hours |
| **Day 2** | Implementation | Working code with 7-8 intentional violations | Day 1 ✓ | 4-5 hours |
| **Day 3** | Scanning | Initial scan showing all violations | Day 2 ✓ | 2-3 hours |
| **Day 4** | Remediation | Progressive fixes with re-scans | Day 3 ✓ | 4-5 hours |
| **Day 5** | Documentation | Success story, demo materials | Day 4 ✓ | 3-4 hours |

**⚠️ IMPORTANT:** Days must be completed sequentially. Each day requires the previous day's deliverable.

**Verification checkpoints:**
- After Day 1: Run `./scripts/verify-day1.sh` (must show 25-30 claims)
- After Day 2: Run `cargo test` (must show 21/21 passing)
- After Day 3: Check scan results (must show 7-8 violations)

Priority: MEDIUM (First thing team sees, sets expectations)


Low Priority Polish

Finding 5: plan.md Status Table Lacks Prerequisite Column

Type: Buried Information Impact: Visual parity - all days shown with equal status

Location: plan.md:88-96

Fix:

Add prerequisites column to status table:

| Phase | Status | Prerequisites | Completed | Notes |
|-------|--------|---------------|-----------|-------|
| Day 1: Preparation | 🔄 IN PROGRESS | None | 2026-02-09 | Corpus building |
| Day 2: Implementation | ⏳ PENDING | Day 1 ✓ | - | Requires claims in corpus |
| Day 3: First Scan | ⏳ PENDING | Day 2 ✓ | - | Requires code with violations |
| Day 4: Remediation | ⏳ PENDING | Day 3 ✓ | - | Requires scan results |
| Day 5: Documentation | ⏳ PENDING | Day 4 ✓ | - | Requires fixed code |

Priority: LOW (Status table is for tracking, not primary instructions)


Team Errors (For Reference)

NONE IDENTIFIED

Team behavior was systematic and logical given the documentation:

  • Read documentation thoroughly (progress log shows understanding)
  • Executed Day 2 perfectly (100% adherence to specifications)
  • Did not skip steps within Day 2 (all 7 files, all violations)
  • Comprehensive testing (21/21 tests passing)

This is NOT a team error - this is a documentation failure.

Documentation failed to communicate that Day 1 is a blocking prerequisite for Day 2.


What Team Did Right

Excellent Day 2 Implementation

Files Created: 7/7 (100%)

  • Cargo.toml (matches dependencies exactly)
  • src/lib.rs (clean module structure)
  • src/config.rs (5 violations perfectly embedded)
  • src/pool.rs (2 violations perfectly embedded)
  • src/connection.rs (clean placeholder)
  • src/error.rs (proper thiserror usage)
  • tests/basic.rs (3 integration tests)

Violations Embedded: 7/7 (100%)

  1. Unbounded max_connections (config.rs:25)
  2. Plaintext password (config.rs:73)
  3. Missing max_lifetime (config.rs:72)
  4. Excessive connection_timeout (config.rs:71)
  5. Zero min_connections (config.rs:70)
  6. No connection validation (pool.rs:78)
  7. No metrics exposed (pool.rs:24)

Tests Passing: 21/21 (100%)

  • Unit tests: 13/13
  • Integration tests: 3/3
  • Doc tests: 5/5

Code Quality: Excellent

  • Clean architecture
  • Proper async/await usage
  • Good error handling
  • Comprehensive inline documentation
  • Every violation documented with claim reference and consequence

Example of excellent violation documentation:

/// **VIOLATION 1**: Set to `None` (unbounded growth)
/// - Violates: `dbpool/max_connections` required claim
/// - Consequence: Pool grows without limit, exhausts database connections
pub max_connections: Option<usize>,

Immediate (Before Next Team)

Must implement to prevent repeat of this issue:

  1. Add checkpoint between Day 1 and Day 2 (Finding 1)

    • Location: CHECKLIST.md:280
    • Add: " DO NOT PROCEED WITHOUT DAY 1 COMPLETE"
    • Estimated time: 5 minutes
  2. Create verify-day1.sh script (Finding 2)

    • Location: scripts/verify-day1.sh
    • Content: Check claims count 25-30, exit 1 if fails
    • Estimated time: 10 minutes
  3. Add Day 1 verification to Day 2 start (Finding 2)

    • Location: CHECKLIST.md:276
    • Add: Prerequisite checkbox requiring verify-day1.sh pass
    • Estimated time: 5 minutes

Total immediate work: ~20 minutes

Short Term (This Week)

Should implement for clarity:

  1. Update Day 2 heading (Finding 3)

    • Add: Prerequisites, deliverable, success criteria
    • Estimated time: 10 minutes
  2. Update README table (Finding 4)

    • Add: Prerequisites column
    • Add: Warning about sequential execution
    • Estimated time: 10 minutes

Total short-term work: ~20 minutes

Long Term (Next Month)

Nice to have for completeness:

  1. Update plan.md table (Finding 5)

    • Add: Prerequisites column
    • Estimated time: 5 minutes
  2. Create automated day sequencer

    • New script: scripts/check-day-sequence.sh
    • Checks: Day N complete before Day N+1 starts
    • Integration: Add to pre-flight validator
    • Estimated time: 30 minutes

Total long-term work: ~35 minutes


Recovery Path for Current Team

Team is currently blocked. They cannot proceed to Day 3 without Day 1 completion.

Step 1: Inform Team

⛔ CHECKPOINT FAILURE DETECTED

Your Day 2 implementation is excellent (7/7 files, 21/21 tests passing, all violations embedded).

However, Day 1 was not completed:
- Expected: 25-30 claims in corpus
- Actual: 0 claims

Day 3 scanning requires claims to exist. Without claims, scan will return 0 violations.

You must backfill Day 1 before proceeding.

Step 2: Verify Current State

# Confirm Day 1 incomplete
./scripts/verify-day1.sh
# Expected: ✗ Day 1 incomplete: 0 claims (expected 25-30)

# Confirm Day 2 complete
cargo test
# Expected: test result: ok. 21 passed

Step 3: Complete Day 1

# Follow CHECKLIST.md Day 1
# Create all 27 claims using aphoria corpus create CLI
# Estimated time: 4-6 hours

Step 4: Verify Day 1 Completion

./scripts/verify-day1.sh
# Expected: ✓ Day 1 complete: 27 claims in corpus

Step 5: Proceed to Day 3

# Now scanning will work
aphoria scan --format json > scan-results-v1.json
# Expected: 7-8 violations detected

Estimated recovery time: 4-6 hours


Lessons Learned

Documentation Principle Violated

Principle: "Explicit > Implicit"

What we did (wrong):

  • Implicitly suggested sequence through day numbers (1, 2, 3)
  • Implicitly suggested prerequisites through prose ("you'll need claims for scanning")
  • Assumed readers would infer Day 1 must complete before Day 2

What we should do (right):

  • Explicitly state "Complete Day 1 before Day 2" in bold/emoji
  • Explicitly check prerequisite completion with automated script
  • Explicitly block progression with "DO NOT PROCEED" checkpoint

Agent vs Human Documentation

New insight: Agent interpreters need more explicit sequencing than humans.

Human reasoning:

"Day 1 comes before Day 2, so I should probably do Day 1 first"

Agent reasoning:

"Both sections are documented. I was told to 'go through every step' so I'll execute the implementation steps in Day 2"

Implication: Documentation for agent workflows needs:

  • Explicit prerequisite statements, not implicit ordering
  • Automated verification checkpoints
  • Visual/textual blocking indicators (, STOP, DO NOT PROCEED)

The "Information Needed" Anti-Pattern

Problem: Day 2 heading says "Implementation - Information Needed"

Team interpreted: Reference material to consult

Should have been: "Implement Code with Intentional Violations"

Learning: Use action verbs in headings, avoid passive/reference tone


Success Metrics (Post-Fix)

After implementing recommended fixes, next team should achieve:

Day 1 Completion:

  • 25-30 claims created
  • Verification command run successfully
  • Checkpoint passed before Day 2

Day 2 Execution:

  • Cannot proceed without Day 1 verified
  • Implementation matches current team's quality
  • Sequential workflow maintained

Day 3 Scanning:

  • Scan detects 7-8 violations
  • No confusion about why violations were detected
  • Demonstration premise intact

Time Saved: 4-6 hours (no backfill needed) Blocker Prevention: 100% (automated verification prevents sequence violation)


Appendices

  • Progress Log: eval/progress-log-2026-02-09-run2.md
  • Implementation Review: eval/implementation-review-2026-02-09-run2.md
  • Gap Analysis: eval/gap-analysis-2026-02-09-run2.md

Evaluation Complete: 2026-02-09T23:30:00Z Next Action: Implement immediate fixes (20 minutes) before notifying team of recovery path