stemedb/applications/aphoria/dogfood/dbpool/eval-archive-2026-02-09/EVALUATION-REPORT-2026-02-09.md
jml 3dac3dc914 feat(aphoria): implement Day 3 debugging features and comprehensive documentation
Implements all product gaps identified in msgqueue Day 3 evaluation (VG-DAY3-001/003/004)
and adds comprehensive documentation to prevent dogfooding failures.

## Product Features (VG-DAY3-XXX)

### VG-DAY3-001: --show-observations flag (P0)
- Shows all observations with concept paths for debugging extractor alignment
- Includes claim matching analysis (/ visual feedback)
- Explains tail-path matching and why observations don't match claims
- 8 unit tests in src/report/observations.rs
- 5 integration tests in src/tests/day3_debugging.rs

### VG-DAY3-003: aphoria extractors validate (P2)
- Validates extractor subject fields match claim concept_paths
- Smart fuzzy matching suggests corrections for typos
- Clear error messages with actionable hints
- Proper exit codes (0=success, 1=validation failed)

### VG-DAY3-004: aphoria extractors test NAME --file (P2)
- Tests single extractor pattern against one file (no full scan needed)
- Shows line numbers and matched text
- Previews what observation would be created
- Helpful troubleshooting when pattern doesn't match

## Documentation (P0-P1)

### New Docs Created
- docs/extractors/declarative-extractors.md (800 lines)
  - Complete field reference with emphasis on subject field format
  - 3 worked examples (timeout=0, unbounded queue, TLS disabled)
  - Common mistakes with fixes
  - Validation workflow
  - Debugging 0% detection rate

- docs/examples/extractors/timeout-zero-example.md (500 lines)
  - End-to-end flow: code → extractor → claim → conflict → fix
  - Visual diagrams showing path alignment
  - Troubleshooting guide
  - Validation checklist

- docs/dogfooding-common-mistakes.md (560 lines)
  - Mistake #1: Skipping Day 3 extractor creation (CRITICAL)
  - Mistake #2: Creating extractors with wrong subject format (NEW)
  - Evidence from msgqueue failures
  - Recovery procedures

### Docs Updated
- dogfood/msgqueue/plan.md (Day 3 Steps 3-4)
  - Added complete manual declarative extractor TOML format
  - Added validation workflow BEFORE scanning
  - Added debug workflow for 0% detection after creating extractors

- dogfood/msgqueue/eval/ (evaluation artifacts)
  - EVALUATION-REPORT-2026-02-10.md (600 lines)
  - DOC-FIXES-2026-02-10.md (summary of fixes)
  - IMPLEMENTATION-REVIEW-2026-02-10.md (feature review)

## New Extractors
- src/extractors/ack_mode_config.rs - Detects AckMode::AutoAck violations
- src/extractors/async_blocking.rs - Detects blocking calls in async functions
- src/extractors/unbounded_resources.rs - Detects unbounded queues/connections

## Code Changes
- src/cli/mod.rs: Add --show-observations flag to scan command
- src/cli/extractors.rs: Add Validate and Test subcommands
- src/handlers/scan.rs: Call format_observations when flag enabled
- src/handlers/extractors.rs: Implement handle_validate() and handle_test()
- src/report/observations.rs: Observation formatting with claim matching analysis
- src/tests/day3_debugging.rs: Integration tests for new features

## Dogfood Artifacts
- dogfood/msgqueue/ - Complete msgqueue Day 3 evaluation with findings
- dogfood/dbpool/ - Database pool dogfooding exercise

## Impact
- Time savings: 30 min per Day 3 debugging (67% faster)
- User experience: Transparent debugging (no blind trial-and-error)
- Documentation: 1,860 new lines covering all P0-P1 gaps

## Related Issues
- Closes VG-DAY3-001 (--show-observations)
- Closes VG-DAY3-002 (concept path alignment docs)
- Closes VG-DAY3-003 (extractors validate)
- Closes VG-DAY3-004 (extractors test)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 03:31:06 +00:00

13 KiB

Documentation Evaluation Report

Project: dogfood/dbpool Evaluation Date: 2026-02-09 Documentation Evaluated:

  • applications/aphoria/dogfood/dbpool/plan.md
  • applications/aphoria/dogfood/dbpool/CHECKLIST.md
  • applications/aphoria/dogfood/dbpool/docs/claim-extraction-example.md

Team Phase: Day 1 - Corpus Building Completion Status: 10% (source docs fetched, 0/25-30 claims created)


Executive Summary

Critical Finding: Team stopped after fetching source documents, believing Day 1 was complete. Zero claims created (expected 25-30). Day 1 is 10% complete, not 90%+ as team believed.

Root Cause: CHECKLIST.md structures Day 1 as "Information Needed" with checkboxes only for source document fetching. Actual deliverable (creating 25-30 claims via CLI) is presented as prose without checkboxes, causing team to interpret source fetching as completion.

Documentation Gaps Found: 5 (3 High priority, 2 Medium priority)

  • Missing: Completion criteria, step numbers
  • Unclear: "Information Needed" heading misleads
  • Buried: Claim creation lacks checkboxes, example not integrated

Team Errors: 0 (team followed doc structure exactly)

Impact: Cold-start success rate revised from 85-90% to 10% based on observed completion.


Critical Findings (High Priority)

Finding 1: Unclear Day 1 Completion Criteria

Impact: Team believes Day 1 complete when 0% actually done (0/25-30 claims created)

Location: CHECKLIST.md:103-104

Current State:

## Day 1: Corpus Building - Information Needed

Problem: No explicit success criteria stating "Day 1 complete when 25-30 claims verified in corpus"

Fix: Add completion criteria upfront

## Day 1: Create 25-30 Corpus Claims

**Deliverable:** 25-30 claims created via CLI and verified in corpus database

**Success Criteria:**
```bash
curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=vendor' | \
  jq '.items | map(select(.subject | startswith("dbpool"))) | length'
# Expected output: 25-30

Estimated Time: 4-6 hours


**Priority:** HIGH - This is a blocker. Team cannot proceed to Day 2 without claims.

---

### Finding 2: "Information Needed" Heading Misleads

**Impact:** Team interprets Day 1 as "gather prerequisites" not "execute work"

**Location:** CHECKLIST.md:103

**Current State:**
```markdown
## Day 1: Corpus Building - Information Needed

Problem: "Information Needed" implies passive preparation, not active execution

Fix: Change to action-oriented heading

## Day 1: Create 25-30 Corpus Claims

**What you're doing:** Extract claims from authority sources and create them via CLI
**How long:** 4-6 hours
**Done when:** Verification command returns 25-30 claims

Priority: HIGH - Misleading heading causes fundamental misunderstanding of Day 1 scope


Finding 3: Claim Creation Lacks Checkboxes

Impact: Team completes checkbox items (source docs) and stops, skipping prose section (claim creation)

Location: CHECKLIST.md:157-200

Current State:

  • ✓ Source docs have checkboxes (lines 126-153)
  • ✗ Claim creation is prose without checkboxes (lines 157-180)

Problem: Checkboxes signal "this is the task." Without checkboxes, team treats as reference info.

Fix: Convert claim creation to checkbox format

### ✅ Create Corpus Claims (25-30 total)

- [ ] **Safety Claims (10 claims)**
  - [ ] `dbpool/max_connections` - required: true
  - [ ] `dbpool/min_connections` - min_value: 2
  - [ ] `dbpool/connection_timeout` - max_value: 30
  - [ ] `dbpool/idle_timeout` - required: true
  - [ ] `dbpool/idle_timeout/relationship` - must_be_less_than: server_wait_timeout
  - [ ] `dbpool/max_lifetime` - required: true
  - [ ] `dbpool/max_lifetime/default` - default_value: 1800
  - [ ] `dbpool/validation_timeout` - max_value: 3
  - [ ] `dbpool/leak_detection_threshold` - recommended: true
  - [ ] `dbpool/max_connections/upper_bound` - max_value: database_max - 10

- [ ] **Performance Claims (8 claims)**
  - [ ] `dbpool/max_connections/development` - default_value: 10
  - [ ] `dbpool/max_connections/production` - recommended_range: 50-100
  - [ ] `dbpool/checkout_timeout` - default_value: 5
  - [ ] `dbpool/validation/frequency` - required: on_checkout
  - [ ] `dbpool/connection_test_query` - recommended: SELECT 1
  - [ ] `dbpool/prefill` - recommended: true (production)
  - [ ] `dbpool/fair_queue` - default_value: true
  - [ ] `dbpool/metrics/enabled` - recommended: true

- [ ] **Security Claims (5 claims)**
  - [ ] `dbpool/connection_string/password` - must_not_be: plaintext
  - [ ] `dbpool/connection_string/source` - required: environment_variable
  - [ ] `dbpool/tls/enabled` - recommended: true (production)
  - [ ] `dbpool/tls/certificate_validation` - required: true
  - [ ] `dbpool/credentials/rotation` - recommended: true

- [ ] **Architecture Claims (4 claims)**
  - [ ] `dbpool/health_check/endpoint` - required: true
  - [ ] `dbpool/metrics/exposed` - required: pool_size,active,idle,waiting
  - [ ] `dbpool/error_handling/connection_failure` - must: return_error_not_panic
  - [ ] `dbpool/shutdown/graceful` - required: true

- [ ] **Verify all claims created**
  ```bash
  curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=vendor' | \
    jq '.items | map(select(.subject | startswith("dbpool"))) | length'
  # Expected output: 25-30

**Priority:** HIGH - Critical workflow gap, team needs visual task list

---

## Medium Priority Improvements

### Finding 4: Claim Extraction Example Not Integrated into Workflow

**Impact:** Team reads example but doesn't know to apply it immediately

**Location:** CHECKLIST.md:120 (after claim extraction example section)

**Current State:**
```markdown
**Time to read:** 15-20 minutes
**Key takeaway:** Claims are products with full context, not just grep results

---

### 📚 Authority Source Documents

Problem: No bridge from "read example" to "now use this to create your first 3 claims"

Fix: Add application step

**Time to read:** 15-20 minutes
**Key takeaway:** Claims are products with full context, not just grep results

**Now apply this knowledge:** Create your first 3 claims following the same reasoning process shown in the example:

- [ ] **Practice Claim 1:** Extract from HikariCP "Small Pool Philosophy" paragraph (use same analysis structure from example)
- [ ] **Practice Claim 2:** Extract from PostgreSQL "300-500 connections optimal" empirical result
- [ ] **Practice Claim 3:** Extract from OWASP "plaintext passwords prohibited" requirement

Use the extraction template: identify claimable statement → reason about WHY it matters → write explanation with WHAT/WHY/CONSEQUENCE → submit via `aphoria corpus create`.

---

### 📚 Authority Source Documents

Priority: MEDIUM - Helps team transition from learning to doing


Finding 5: No Step Numbers Within Days

Impact: Team unsure if Day 1 complete, no clear progress indicators

Location: CHECKLIST.md:103-200

Current State: Sections within days, but no numbered steps

Problem: Team self-identified as "Step 3" but docs don't have step numbers for confirmation

Fix: Add explicit step numbers with time estimates

## Day 1: Create 25-30 Corpus Claims

**Total Time:** 4-6 hours

---

### Step 1: Read Claim Extraction Example (15-20 min)

- [ ] Read `docs/claim-extraction-example.md` completely
- [ ] Understand decision framework (what to extract vs skip)
- [ ] Note the WHAT/WHY/CONSEQUENCE structure for explanations

---

### Step 2: Fetch Authority Source Documents (30 min)

- [ ] **HikariCP Configuration Guide** → save to `docs/sources/hikaricp-config.md`
- [ ] **PostgreSQL Pooling Documentation** → save to `docs/sources/postgresql-pooling.md`
- [ ] **OWASP A07:2021** → save to `docs/sources/owasp-credentials.md`

---

### Step 3: Create Corpus Claims via CLI (3-4 hours)

[Insert checkbox list from Finding 3]

---

### Step 4: Verify Completion (2 min)

- [ ] Run verification command:
  ```bash
  curl 'http://localhost:18180/v1/aphoria/corpus?sources[]=vendor' | \
    jq '.items | map(select(.subject | startswith("dbpool"))) | length'
  • Confirm output: 25-30

Day 1 Complete when verification shows 25-30 claims in corpus


**Priority:** MEDIUM - Improves clarity, not a blocker

---

## Team Actions (Not Gaps)

### Correct Action 1: Created `.aphoria/config.toml` Early

**What:** Team created scan configuration file (Day 3 deliverable) during Day 1

**Assessment:** NOT A GAP - Proactive preparation
- Config file is correct (ephemeral mode, proper thresholds)
- Creating early is helpful (ready for Day 3)
- Shows good project setup instincts

### Correct Action 2: Created `src/lib.rs` Placeholder

**What:** Team created minimal 5-line placeholder with intentional violation

**Assessment:** NOT A GAP - Forward-thinking preparation
- Minimal placeholder, not full implementation
- Contains correct intentional violation (`Option<usize>`)
- Doesn't block Day 1 completion

---

## Root Cause Analysis

**The Pattern:**

Day 1 documentation is structured as **"reference information"** not **"execution workflow"**:

| Element | Current State | Team Interpretation | Result |
|---------|---------------|---------------------|--------|
| Heading | "Information Needed" | "Gather prerequisites" | Stopped after fetching docs |
| Checkboxes | Only on source docs | "These are the tasks" | Completed 3/3, moved on |
| Claim creation | Prose, no checkboxes | "Reference for later" | Skipped entirely |
| Example | "Read this first" | "Background reading" | Read but didn't apply |
| Verification | Buried at bottom | "Optional check" | Never ran |

**Fix:** Restructure Day 1 as **execution checklist** with:
- Action-oriented heading ("Create 25-30 Claims")
- Checkboxes for every claim creation task
- Verification as completion gate
- Example integrated into workflow

---

## Recommended Actions

### Immediate (Before Next Team)

1. **Rewrite CHECKLIST.md Day 1 heading** (5 min)
   - Change "Day 1: Corpus Building - Information Needed"
   - To "Day 1: Create 25-30 Corpus Claims"
   - Add explicit success criteria

2. **Convert claim creation to checkboxes** (30 min)
   - Add 27 checkbox items (one per claim from plan.md)
   - Group by category (Safety, Performance, Security, Architecture)
   - Add verification checkbox at end

3. **Add "Now apply this" bridge** (10 min)
   - After claim extraction example
   - Before source documents section
   - Show 3 practice claims to create

**Total time: 45 minutes**

### Short Term (This Week)

4. **Add step numbers to Day 1** (15 min)
   - Step 1: Read example (15-20 min)
   - Step 2: Fetch sources (30 min)
   - Step 3: Create claims (3-4 hours)
   - Step 4: Verify (2 min)

5. **Review Days 2-5 for same pattern** (1 hour)
   - Check if other days have unclear completion criteria
   - Ensure all days have checkbox-driven workflows
   - Add verification steps where missing

**Total time: 1 hour 15 min**

### Long Term (Next Month)

6. **Create visual progress tracker** (4 hours)
   - Script that shows "Day X, Step Y, Z% complete"
   - Runs verification commands automatically
   - Shows clear "✓ Day N Complete" messages

7. **Add estimated time to every task** (2 hours)
   - Help teams gauge progress
   - Set realistic expectations
   - Make scheduling easier

**Total time: 6 hours**

---

## Success Metrics

**Before Fix:**
- Team completion: 10% (3 source docs fetched, 0 claims created)
- Team belief: 90% (thought they were ready for Day 2)
- Gap: 80 percentage points between belief and reality

**After Fix (Predicted):**
- Team completion: 90%+ (will create 25-30 claims)
- Team belief: Accurate (verification command confirms)
- Gap: <5 percentage points (clear success criteria)

**Cold-Start Success Rate:**
- Original estimate: 85-90%
- Observed with current docs: 10%
- Predicted with fixes: 85-90% (achievable if fixes implemented)

---

## Appendices

### Appendix A: Evidence Chain

| Evidence | Location |
|----------|----------|
| Team progress log | `eval/progress-log-2026-02-09.md` |
| Implementation review | `eval/implementation-review-2026-02-09.md` |
| Gap analysis | `eval/gap-analysis-2026-02-09.md` |

### Appendix B: Verification Data

```bash
# Command run:
curl -s 'http://localhost:18180/v1/aphoria/corpus?sources[]=vendor&limit=100' | \
  jq '.items | map(select(.subject | startswith("dbpool"))) | length'

# Result: 0
# Expected: 25-30
# Gap: 100% (no claims created)

Appendix C: Files Created by Team

File Status Purpose
docs/sources/hikaricp-config.md ✓ Complete Authority source
docs/sources/postgresql-pooling.md ✓ Complete Authority source
docs/sources/owasp-credentials.md ✓ Complete Authority source
.aphoria/config.toml ✓ Complete (early) Scan config
src/lib.rs ⚠️ Placeholder (early) Minimal code
Corpus claims ✗ Missing 0/25-30 created

Handoff to aphoria-docs

Documentation gaps identified and analyzed. Ready for implementation.

High priority fixes:

  1. Rewrite CHECKLIST.md:103 heading with completion criteria
  2. Convert CHECKLIST.md:157-200 claim creation to checkboxes
  3. Add "Now apply this" bridge at CHECKLIST.md:120

Use /aphoria-docs to implement these fixes.


Report Complete: 2026-02-09T21:26:00Z