Agents were generating `:id` (Echo/Gin style) instead of `{id}` (chi style),
causing routes to not match. Updated api-designer, go-specialist agents and
skeleton CLAUDE.md with explicit CRITICAL notes about brace syntax.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
14 KiB
Cookbook Tree System
Checkpoint-based cookbook execution with YAML tree definitions. Enables resumable, debuggable E2E test workflows.
Quick Reference
# Validate tree and show execution plan (safe preview)
./cookbooks/scripts/tree-runner.sh run landing-page --project-name my-test --dry-run
# Run a tree (creates checkpoint on each step)
./cookbooks/scripts/tree-runner.sh run landing-page --project-name my-test
# Run with auto-cleanup on exit
./cookbooks/scripts/tree-runner.sh run landing-page --project-name my-test --auto-teardown
# Resume from last checkpoint after failure
./cookbooks/scripts/tree-runner.sh resume landing-page
# Run only a specific step (debugging)
./cookbooks/scripts/tree-runner.sh only landing-page wait-pipeline
# Check status of a tree run
./cookbooks/scripts/tree-runner.sh status landing-page
# Teardown resources (runs tree's teardown section)
./cookbooks/scripts/tree-runner.sh teardown landing-page
# List all available trees
./cookbooks/scripts/tree-runner.sh list
# Clean checkpoint (discard state)
./cookbooks/scripts/tree-runner.sh clean landing-page
Global Flags
| Flag | Description |
|---|---|
--dry-run |
Validate tree and show execution plan without running |
--auto-teardown |
Run teardown steps on exit (success or failure) |
Dependencies
Required tools (pre-flight checks verify these):
yq- YAML parser (brew install yq)jq- JSON parser (brew install jq)curl- HTTP client (usually pre-installed)
Required environment variables:
RDEV_API_URL- API endpoint (e.g.,https://rdev.masq-ops.orchard9.ai)RDEV_API_KEY- API key for authentication
Optional:
API_TIMEOUT- Seconds before API calls timeout (default: 60)
Tree YAML Format
Tree definitions live in cookbooks/trees/ and define workflow steps as a DAG.
name: landing-page
description: Deploy a landing page
version: 1
# Variables (can be overridden via --var-name)
vars:
project_name: "" # Required, no default
template: "app-astro" # Optional, has default
steps:
create-project:
description: Create the project skeleton
action: api
method: POST
endpoint: /project
body:
name: "{{ .vars.project_name }}"
description: "Landing page E2E test"
outputs:
- project_id: .data.name
- domain: .data.domain
add-component:
description: Add landing page component
depends_on: [create-project]
action: api
method: POST
endpoint: "/projects/{{ .outputs.create-project.project_id }}/components"
body:
type: "{{ .vars.template }}"
name: landing
template: "{{ .vars.template }}"
wait-pipeline:
description: Wait for CI pipeline to complete
depends_on: [add-component]
action: wait_pipeline
project_id: "{{ .outputs.create-project.project_id }}"
on_error: continue # Don't fail the whole tree
verify-site:
description: Verify site is accessible
depends_on: [wait-pipeline]
action: wait_site
domain: "{{ .outputs.create-project.domain }}"
project_id: "{{ .outputs.create-project.project_id }}"
# Teardown runs in reverse order on failure or explicit teardown
teardown:
- description: Delete project
action: api
method: DELETE
endpoint: "/project/{{ .outputs.create-project.project_id }}"
Step Properties
| Property | Required | Description |
|---|---|---|
description |
No | Human-readable description |
action |
Yes | Action type: api, wait_pipeline, wait_build, wait_site, diagnose, shell |
depends_on |
No | Array of step names that must complete first |
on_error |
No | fail (default) or continue |
outputs |
No | Extract values from response (jq paths) |
Action Types
api
Make an authenticated API call.
action: api
method: POST # GET, POST, DELETE, PUT, PATCH
endpoint: /projects/{{ .project_id }}/components
body: # Optional, for POST/PUT/PATCH
type: service
name: api
wait_pipeline
Wait for a CI pipeline to complete.
action: wait_pipeline
project_id: "{{ .outputs.create-project.project_id }}"
max_attempts: 60 # Optional, default 60
poll_interval: 5 # Optional, default 5 seconds
wait_build
Wait for a build/agent task to complete. Replaces shell-based polling loops.
action: wait_build
build_id: "{{ .outputs.implement-feature.build_id }}"
max_attempts: 120 # Optional, default 120
poll_interval: 5 # Optional, default 5 seconds
wait_site
Wait for a site to be accessible.
action: wait_site
domain: "{{ .outputs.create-project.domain }}"
project_id: "{{ .outputs.create-project.project_id }}" # For diagnostics
max_attempts: 30
poll_interval: 5
diagnose
Run diagnostic checks.
action: diagnose
type: pipeline # or 'site'
project_id: "{{ .outputs.create-project.project_id }}"
domain: "{{ .outputs.create-project.domain }}" # For site diagnostics
shell
Run a shell command.
action: shell
command: "curl -s https://{{ .outputs.create-project.domain }}/api/health | jq ."
outputs:
- health_status: .status
Template Variables
Variables are expanded using Go template syntax ({{ .path }}):
.vars.<name>- Variables from CLI flags or tree defaults.outputs.<step>.<key>- Outputs captured from previous steps
Checkpoint Format
Checkpoints are stored in cookbooks/.checkpoints/ (gitignored) as JSON:
{
"tree": "landing-page",
"run_id": "landing-page-1706889600",
"status": "partial",
"vars": {
"project_name": "test-landing"
},
"steps": {
"create-project": {
"status": "completed",
"started_at": "2025-02-01T10:00:00Z",
"completed_at": "2025-02-01T10:00:05Z",
"output": {
"project_id": "test-landing",
"domain": "test-landing.threesix.ai"
}
},
"wait-pipeline": {
"status": "failed",
"started_at": "2025-02-01T10:00:05Z",
"completed_at": "2025-02-01T10:05:00Z",
"error": "Pipeline #3 failed with status: failure"
}
},
"last_completed_step": "create-project"
}
Checkpoint Status Values
pending- Tree started but no steps completedpartial- Some steps completed, some pending/failedcompleted- All steps completed successfullyfailed- A step failed withon_error: fail
Creating a New Tree
- Create
cookbooks/trees/<name>.yaml - Define steps with dependencies
- Add teardown section
- Test with
tree-runner.sh run <name> --project-name test-$(date +%s)
Best Practices
- Always include teardown - Clean up resources even if the tree fails
- Use descriptive step names - They appear in status output
- Set on_error: continue for non-critical steps - Pipeline failures shouldn't block site verification
- Capture outputs - Pass data between steps via outputs, not hardcoded values
- Use vars for inputs - Makes trees reusable with different parameters
Common Mistakes
1. YAML Indentation Errors
YAML requires consistent indentation with spaces only (no tabs). Steps must be indented under steps::
# WRONG - tabs or inconsistent spacing
steps:
create-project: # Tab character - will fail
action: api
# CORRECT - 2-space indent
steps:
create-project:
action: api
2. Missing Output Dependencies
If you reference {{ .outputs.step-name.key }}, the referencing step must have step-name in its depends_on array. Validation will catch this:
# WRONG - references create-project but doesn't depend on it
wait-pipeline:
action: wait_pipeline
project_id: "{{ .outputs.create-project.project_id }}"
# Missing: depends_on: [create-project]
# CORRECT
wait-pipeline:
depends_on: [create-project]
action: wait_pipeline
project_id: "{{ .outputs.create-project.project_id }}"
Error message: wait-pipeline: references outputs from "create-project" but does not depend on it (directly or transitively)
Note: Transitive dependencies are valid. If A depends on B, and B depends on C, then A can use outputs from C.
3. Template Escaping in Shell Commands
Shell commands with template variables need proper quoting to handle spaces and special characters:
# RISKY - unquoted expansion
action: shell
command: curl https://{{ .outputs.create-project.domain }}/api/health
# SAFER - quoted expansion
action: shell
command: 'curl "https://{{ .outputs.create-project.domain }}/api/health"'
4. Outputs Array Syntax
Outputs must be an array of single-key objects, not a flat object:
# WRONG - flat object
outputs:
project_id: .data.name
domain: .data.domain
# CORRECT - array of objects
outputs:
- project_id: .data.name
- domain: .data.domain
5. Circular Dependencies
Dependencies form a DAG (directed acyclic graph). Cycles cause validation failures:
# WRONG - circular dependency
step-a:
depends_on: [step-b]
step-b:
depends_on: [step-a] # Creates cycle!
# CORRECT - linear or fan-out dependencies
step-a:
depends_on: []
step-b:
depends_on: [step-a]
step-c:
depends_on: [step-a] # Fan-out OK
Error message: Dependency cycle detected
6. Hardcoded Values Instead of Outputs
Avoid hardcoding values that should come from previous steps:
# WRONG - hardcoded project name
wait-pipeline:
depends_on: [create-project]
action: wait_pipeline
project_id: "my-test-project" # Should use output!
# CORRECT - use captured output
wait-pipeline:
depends_on: [create-project]
action: wait_pipeline
project_id: "{{ .outputs.create-project.project_id }}"
Migrating from Script to Tree
Compare script steps to tree steps:
| Script Pattern | Tree Equivalent |
|---|---|
api_call POST /project "$json" |
action: api, method: POST |
wait_for_pipeline "$project" |
action: wait_pipeline |
wait_for_site "$domain" 30 5 "$project" |
action: wait_site |
diagnose_pipeline_failure "$project" |
action: diagnose, type: pipeline |
curl ... | jq ... |
action: shell, command: "..." |
Troubleshooting
Pre-flight check failures
Pre-flight checks failed:
✗ RDEV_API_URL environment variable is not set
✗ RDEV_API_KEY environment variable is not set
Set the required environment variables before running trees.
Tree not found
Error: Tree 'foo' not found
Available trees: landing-page, composable-app, sdlc-flow
Check that cookbooks/trees/foo.yaml exists.
yq not found
Error: yq is required but not installed
Install with brew install yq.
Resume finds no checkpoint
No checkpoint found for tree 'landing-page'
Run tree-runner.sh run landing-page ... first.
Step failed but outputs missing
Error: Output 'project_id' not found in step 'create-project'
The step may have failed silently. Check the checkpoint file:
cat cookbooks/.checkpoints/landing-page.json | jq '.steps["create-project"]'
API timeout
curl: (28) Operation timed out
Increase timeout with API_TIMEOUT=120 ./tree-runner.sh run ...
Available Trees
Basic Trees
| Tree | Description |
|---|---|
landing-page |
Single-page landing site with astro |
composable-app |
Multi-component monorepo with service + app |
sdlc-flow |
Feature lifecycle with SDLC orchestration |
Aeries Trees (Multi-Phase Game Development)
Multi-phase workflow demonstrating progressive complexity for an AI agent simulation game:
| Tree | Description | Infrastructure |
|---|---|---|
aeries-1-genesis |
Monolith: Core API + React app for agent creation | Postgres |
aeries-2-simulation |
Extraction: Simulation service via strangler pattern | - |
aeries-3-society |
Social layer: Spatial service + Redis pub/sub | Redis |
Running the Aeries sequence:
# Phase 1: Create the monolith
./tree-runner.sh run aeries-1-genesis --project-name aeries-test
# Phase 2: Extract simulation service (operates on existing project)
./tree-runner.sh run aeries-2-simulation --project-id aeries-test
# Phase 3: Add social layer
./tree-runner.sh run aeries-3-society --project-id aeries-test
These trees demonstrate:
- Multi-phase patterns - Later phases take
project_idnotproject_name - Build polling - Shell-based waits for long-running SDLC builds
- Service extraction - Strangler pattern via
/extract-servicecommand - No teardown in phases 2+ - Project lifecycle owned by Phase 1
Slackpath Trees (Reference Architectures)
Progressive complexity paths for building Slack-like platforms:
| Tree | Description | Infrastructure |
|---|---|---|
slackpath-1-authenticated-service |
Identity layer: User auth, JWT, protected routes | CockroachDB |
slackpath-2-async-worker-pipeline |
Background jobs: Producer/consumer with Redis | Redis |
slackpath-3-realtime-chat |
WebSockets: Pub/sub broadcasting | Redis |
slackpath-4-microservice-constellation |
Service mesh: Auth + Chat + Worker coordination | CockroachDB + Redis |
slackpath-5-full-lifecycle |
Full SDLC: All 10 phases with explicit artifact approvals | CockroachDB |
Running a slackpath:
./cookbooks/scripts/tree-runner.sh run slackpath-1-authenticated-service \
--project-name auth-test-$(date +%s)
These trees demonstrate:
- Infrastructure provisioning (
type: postgres,type: redis) - Automatic credential injection (
DATABASE_URL,REDIS_URL) - SDLC-driven implementation via
/implement-featureprompts - End-to-end verification scripts
Files
cookbooks/
├── .checkpoints/ # Checkpoint storage (gitignored)
│ └── landing-page.json
├── scripts/
│ ├── lib/
│ │ ├── checkpoint.sh # Checkpoint I/O
│ │ └── tree-parser.sh # YAML parsing
│ └── tree-runner.sh # Main executable
└── trees/
├── landing-page.yaml
├── composable-app.yaml
├── sdlc-flow.yaml
├── aeries-1-genesis.yaml # Multi-phase: monolith
├── aeries-2-simulation.yaml # Multi-phase: extraction
├── aeries-3-society.yaml # Multi-phase: social layer
├── slackpath-1-authenticated-service.yaml
├── slackpath-2-async-worker-pipeline.yaml
├── slackpath-3-realtime-chat.yaml
├── slackpath-4-microservice-constellation.yaml
└── slackpath-5-full-lifecycle.yaml
Related
- E2E Testing Strategy — When to run trees, philosophy, history tracking
- Composable Monorepo Templates — Template structure tested by trees