Three coordinated fixes for CI pipeline race conditions:
1. Woodpecker step dependencies: Added depends_on: [deps] to all 6 component
templates (service, worker, cli, app-astro, app-react, app-nextjs) so build
steps wait for go work sync to complete.
2. Idempotent resource provisioning: Modified provisionResources() to check
for existing database/cache before creating, preventing "already exists"
errors on component re-adds.
3. Batch component endpoint: POST /projects/{id}/components/batch enables
atomic multi-component additions in a single git commit. Validates all
components upfront, provisions infra sequentially, commits code components
atomically.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
509 lines
14 KiB
Markdown
509 lines
14 KiB
Markdown
# Cookbook Tree System
|
|
|
|
Checkpoint-based cookbook execution with YAML tree definitions. Enables resumable, debuggable E2E test workflows.
|
|
|
|
## Quick Reference
|
|
|
|
```bash
|
|
# Validate tree and show execution plan (safe preview)
|
|
./cookbooks/scripts/tree-runner.sh run landing-page --project-name my-test --dry-run
|
|
|
|
# Run a tree (creates checkpoint on each step)
|
|
./cookbooks/scripts/tree-runner.sh run landing-page --project-name my-test
|
|
|
|
# Run with auto-cleanup on exit
|
|
./cookbooks/scripts/tree-runner.sh run landing-page --project-name my-test --auto-teardown
|
|
|
|
# Resume from last checkpoint after failure
|
|
./cookbooks/scripts/tree-runner.sh resume landing-page
|
|
|
|
# Run only a specific step (debugging)
|
|
./cookbooks/scripts/tree-runner.sh only landing-page wait-pipeline
|
|
|
|
# Check status of a tree run
|
|
./cookbooks/scripts/tree-runner.sh status landing-page
|
|
|
|
# Teardown resources (runs tree's teardown section)
|
|
./cookbooks/scripts/tree-runner.sh teardown landing-page
|
|
|
|
# List all available trees
|
|
./cookbooks/scripts/tree-runner.sh list
|
|
|
|
# Clean checkpoint (discard state)
|
|
./cookbooks/scripts/tree-runner.sh clean landing-page
|
|
```
|
|
|
|
### Global Flags
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--dry-run` | Validate tree and show execution plan without running |
|
|
| `--auto-teardown` | Run teardown steps on exit (success or failure) |
|
|
|
|
## Dependencies
|
|
|
|
Required tools (pre-flight checks verify these):
|
|
- `yq` - YAML parser (`brew install yq`)
|
|
- `jq` - JSON parser (`brew install jq`)
|
|
- `curl` - HTTP client (usually pre-installed)
|
|
|
|
Required environment variables:
|
|
- `RDEV_API_URL` - API endpoint (e.g., `https://rdev.masq-ops.orchard9.ai`)
|
|
- `RDEV_API_KEY` - API key for authentication
|
|
|
|
Optional:
|
|
- `API_TIMEOUT` - Seconds before API calls timeout (default: 60)
|
|
|
|
## Tree YAML Format
|
|
|
|
Tree definitions live in `cookbooks/trees/` and define workflow steps as a DAG.
|
|
|
|
```yaml
|
|
name: landing-page
|
|
description: Deploy a landing page
|
|
version: 1
|
|
|
|
# Variables (can be overridden via --var-name)
|
|
vars:
|
|
project_name: "" # Required, no default
|
|
template: "app-astro" # Optional, has default
|
|
|
|
steps:
|
|
create-project:
|
|
description: Create the project skeleton
|
|
action: api
|
|
method: POST
|
|
endpoint: /project
|
|
body:
|
|
name: "{{ .vars.project_name }}"
|
|
description: "Landing page E2E test"
|
|
outputs:
|
|
- project_id: .data.name
|
|
- domain: .data.domain
|
|
|
|
add-component:
|
|
description: Add landing page component
|
|
depends_on: [create-project]
|
|
action: api
|
|
method: POST
|
|
endpoint: "/projects/{{ .outputs.create-project.project_id }}/components"
|
|
body:
|
|
type: "{{ .vars.template }}"
|
|
name: landing
|
|
template: "{{ .vars.template }}"
|
|
|
|
wait-pipeline:
|
|
description: Wait for CI pipeline to complete
|
|
depends_on: [add-component]
|
|
action: wait_pipeline
|
|
project_id: "{{ .outputs.create-project.project_id }}"
|
|
on_error: continue # Don't fail the whole tree
|
|
|
|
verify-site:
|
|
description: Verify site is accessible
|
|
depends_on: [wait-pipeline]
|
|
action: wait_site
|
|
domain: "{{ .outputs.create-project.domain }}"
|
|
project_id: "{{ .outputs.create-project.project_id }}"
|
|
|
|
# Teardown runs in reverse order on failure or explicit teardown
|
|
teardown:
|
|
- description: Delete project
|
|
action: api
|
|
method: DELETE
|
|
endpoint: "/project/{{ .outputs.create-project.project_id }}"
|
|
```
|
|
|
|
### Step Properties
|
|
|
|
| Property | Required | Description |
|
|
|----------|----------|-------------|
|
|
| `description` | No | Human-readable description |
|
|
| `action` | Yes | Action type: `api`, `wait_pipeline`, `wait_build`, `wait_site`, `diagnose`, `shell` |
|
|
| `depends_on` | No | Array of step names that must complete first |
|
|
| `on_error` | No | `fail` (default) or `continue` |
|
|
| `outputs` | No | Extract values from response (jq paths) |
|
|
|
|
### Action Types
|
|
|
|
#### api
|
|
Make an authenticated API call.
|
|
|
|
```yaml
|
|
action: api
|
|
method: POST # GET, POST, DELETE, PUT, PATCH
|
|
endpoint: /projects/{{ .project_id }}/components
|
|
body: # Optional, for POST/PUT/PATCH
|
|
type: service
|
|
name: api
|
|
```
|
|
|
|
#### wait_pipeline
|
|
Wait for a CI pipeline to complete.
|
|
|
|
```yaml
|
|
action: wait_pipeline
|
|
project_id: "{{ .outputs.create-project.project_id }}"
|
|
max_attempts: 60 # Optional, default 60
|
|
poll_interval: 5 # Optional, default 5 seconds
|
|
```
|
|
|
|
#### wait_build
|
|
Wait for a build/agent task to complete. Replaces shell-based polling loops.
|
|
|
|
```yaml
|
|
action: wait_build
|
|
build_id: "{{ .outputs.implement-feature.build_id }}"
|
|
max_attempts: 120 # Optional, default 120
|
|
poll_interval: 5 # Optional, default 5 seconds
|
|
```
|
|
|
|
#### wait_site
|
|
Wait for a site to be accessible.
|
|
|
|
```yaml
|
|
action: wait_site
|
|
domain: "{{ .outputs.create-project.domain }}"
|
|
project_id: "{{ .outputs.create-project.project_id }}" # For diagnostics
|
|
max_attempts: 30
|
|
poll_interval: 5
|
|
```
|
|
|
|
#### diagnose
|
|
Run diagnostic checks.
|
|
|
|
```yaml
|
|
action: diagnose
|
|
type: pipeline # or 'site'
|
|
project_id: "{{ .outputs.create-project.project_id }}"
|
|
domain: "{{ .outputs.create-project.domain }}" # For site diagnostics
|
|
```
|
|
|
|
#### shell
|
|
Run a shell command.
|
|
|
|
```yaml
|
|
action: shell
|
|
command: "curl -s https://{{ .outputs.create-project.domain }}/api/health | jq ."
|
|
outputs:
|
|
- health_status: .status
|
|
```
|
|
|
|
### Template Variables
|
|
|
|
Variables are expanded using Go template syntax (`{{ .path }}`):
|
|
|
|
- `.vars.<name>` - Variables from CLI flags or tree defaults
|
|
- `.outputs.<step>.<key>` - Outputs captured from previous steps
|
|
|
|
## Checkpoint Format
|
|
|
|
Checkpoints are stored in `cookbooks/.checkpoints/` (gitignored) as JSON:
|
|
|
|
```json
|
|
{
|
|
"tree": "landing-page",
|
|
"run_id": "landing-page-1706889600",
|
|
"status": "partial",
|
|
"vars": {
|
|
"project_name": "test-landing"
|
|
},
|
|
"steps": {
|
|
"create-project": {
|
|
"status": "completed",
|
|
"started_at": "2025-02-01T10:00:00Z",
|
|
"completed_at": "2025-02-01T10:00:05Z",
|
|
"output": {
|
|
"project_id": "test-landing",
|
|
"domain": "test-landing.threesix.ai"
|
|
}
|
|
},
|
|
"wait-pipeline": {
|
|
"status": "failed",
|
|
"started_at": "2025-02-01T10:00:05Z",
|
|
"completed_at": "2025-02-01T10:05:00Z",
|
|
"error": "Pipeline #3 failed with status: failure"
|
|
}
|
|
},
|
|
"last_completed_step": "create-project"
|
|
}
|
|
```
|
|
|
|
### Checkpoint Status Values
|
|
|
|
- `pending` - Tree started but no steps completed
|
|
- `partial` - Some steps completed, some pending/failed
|
|
- `completed` - All steps completed successfully
|
|
- `failed` - A step failed with `on_error: fail`
|
|
|
|
## Creating a New Tree
|
|
|
|
1. Create `cookbooks/trees/<name>.yaml`
|
|
2. Define steps with dependencies
|
|
3. Add teardown section
|
|
4. Test with `tree-runner.sh run <name> --project-name test-$(date +%s)`
|
|
|
|
### Best Practices
|
|
|
|
- **Always include teardown** - Clean up resources even if the tree fails
|
|
- **Use descriptive step names** - They appear in status output
|
|
- **Set on_error: continue for non-critical steps** - Pipeline failures shouldn't block site verification
|
|
- **Capture outputs** - Pass data between steps via outputs, not hardcoded values
|
|
- **Use vars for inputs** - Makes trees reusable with different parameters
|
|
|
|
### Common Mistakes
|
|
|
|
#### 1. YAML Indentation Errors
|
|
|
|
YAML requires consistent indentation with **spaces only** (no tabs). Steps must be indented under `steps:`:
|
|
|
|
```yaml
|
|
# WRONG - tabs or inconsistent spacing
|
|
steps:
|
|
create-project: # Tab character - will fail
|
|
action: api
|
|
|
|
# CORRECT - 2-space indent
|
|
steps:
|
|
create-project:
|
|
action: api
|
|
```
|
|
|
|
#### 2. Missing Output Dependencies
|
|
|
|
If you reference `{{ .outputs.step-name.key }}`, the referencing step **must** have `step-name` in its `depends_on` array. Validation will catch this:
|
|
|
|
```yaml
|
|
# WRONG - references create-project but doesn't depend on it
|
|
wait-pipeline:
|
|
action: wait_pipeline
|
|
project_id: "{{ .outputs.create-project.project_id }}"
|
|
# Missing: depends_on: [create-project]
|
|
|
|
# CORRECT
|
|
wait-pipeline:
|
|
depends_on: [create-project]
|
|
action: wait_pipeline
|
|
project_id: "{{ .outputs.create-project.project_id }}"
|
|
```
|
|
|
|
**Error message:** `wait-pipeline: references outputs from "create-project" but does not depend on it (directly or transitively)`
|
|
|
|
**Note:** Transitive dependencies are valid. If A depends on B, and B depends on C, then A can use outputs from C.
|
|
|
|
#### 3. Template Escaping in Shell Commands
|
|
|
|
Shell commands with template variables need proper quoting to handle spaces and special characters:
|
|
|
|
```yaml
|
|
# RISKY - unquoted expansion
|
|
action: shell
|
|
command: curl https://{{ .outputs.create-project.domain }}/api/health
|
|
|
|
# SAFER - quoted expansion
|
|
action: shell
|
|
command: 'curl "https://{{ .outputs.create-project.domain }}/api/health"'
|
|
```
|
|
|
|
#### 4. Outputs Array Syntax
|
|
|
|
Outputs must be an array of single-key objects, not a flat object:
|
|
|
|
```yaml
|
|
# WRONG - flat object
|
|
outputs:
|
|
project_id: .data.name
|
|
domain: .data.domain
|
|
|
|
# CORRECT - array of objects
|
|
outputs:
|
|
- project_id: .data.name
|
|
- domain: .data.domain
|
|
```
|
|
|
|
#### 5. Circular Dependencies
|
|
|
|
Dependencies form a DAG (directed acyclic graph). Cycles cause validation failures:
|
|
|
|
```yaml
|
|
# WRONG - circular dependency
|
|
step-a:
|
|
depends_on: [step-b]
|
|
step-b:
|
|
depends_on: [step-a] # Creates cycle!
|
|
|
|
# CORRECT - linear or fan-out dependencies
|
|
step-a:
|
|
depends_on: []
|
|
step-b:
|
|
depends_on: [step-a]
|
|
step-c:
|
|
depends_on: [step-a] # Fan-out OK
|
|
```
|
|
|
|
**Error message:** `Dependency cycle detected`
|
|
|
|
#### 6. Hardcoded Values Instead of Outputs
|
|
|
|
Avoid hardcoding values that should come from previous steps:
|
|
|
|
```yaml
|
|
# WRONG - hardcoded project name
|
|
wait-pipeline:
|
|
depends_on: [create-project]
|
|
action: wait_pipeline
|
|
project_id: "my-test-project" # Should use output!
|
|
|
|
# CORRECT - use captured output
|
|
wait-pipeline:
|
|
depends_on: [create-project]
|
|
action: wait_pipeline
|
|
project_id: "{{ .outputs.create-project.project_id }}"
|
|
```
|
|
|
|
## Migrating from Script to Tree
|
|
|
|
Compare script steps to tree steps:
|
|
|
|
| Script Pattern | Tree Equivalent |
|
|
|----------------|-----------------|
|
|
| `api_call POST /project "$json"` | `action: api`, `method: POST` |
|
|
| `wait_for_pipeline "$project"` | `action: wait_pipeline` |
|
|
| `wait_for_site "$domain" 30 5 "$project"` | `action: wait_site` |
|
|
| `diagnose_pipeline_failure "$project"` | `action: diagnose`, `type: pipeline` |
|
|
| `curl ... \| jq ...` | `action: shell`, `command: "..."` |
|
|
|
|
## Troubleshooting
|
|
|
|
### Pre-flight check failures
|
|
```
|
|
Pre-flight checks failed:
|
|
✗ RDEV_API_URL environment variable is not set
|
|
✗ RDEV_API_KEY environment variable is not set
|
|
```
|
|
Set the required environment variables before running trees.
|
|
|
|
### Tree not found
|
|
```
|
|
Error: Tree 'foo' not found
|
|
Available trees: landing-page, composable-app, sdlc-flow
|
|
```
|
|
Check that `cookbooks/trees/foo.yaml` exists.
|
|
|
|
### yq not found
|
|
```
|
|
Error: yq is required but not installed
|
|
```
|
|
Install with `brew install yq`.
|
|
|
|
### Resume finds no checkpoint
|
|
```
|
|
No checkpoint found for tree 'landing-page'
|
|
```
|
|
Run `tree-runner.sh run landing-page ...` first.
|
|
|
|
### Step failed but outputs missing
|
|
```
|
|
Error: Output 'project_id' not found in step 'create-project'
|
|
```
|
|
The step may have failed silently. Check the checkpoint file:
|
|
```bash
|
|
cat cookbooks/.checkpoints/landing-page.json | jq '.steps["create-project"]'
|
|
```
|
|
|
|
### API timeout
|
|
```
|
|
curl: (28) Operation timed out
|
|
```
|
|
Increase timeout with `API_TIMEOUT=120 ./tree-runner.sh run ...`
|
|
|
|
## Available Trees
|
|
|
|
### Basic Trees
|
|
|
|
| Tree | Description |
|
|
|------|-------------|
|
|
| `landing-page` | Single-page landing site with astro |
|
|
| `composable-app` | Multi-component monorepo with service + app |
|
|
| `sdlc-flow` | Feature lifecycle with SDLC orchestration |
|
|
|
|
### Aeries Trees (Multi-Phase Game Development)
|
|
|
|
Multi-phase workflow demonstrating progressive complexity for an AI agent simulation game:
|
|
|
|
| Tree | Description | Infrastructure |
|
|
|------|-------------|----------------|
|
|
| `aeries-1-genesis` | Monolith: Core API + React app for agent creation | Postgres |
|
|
| `aeries-2-simulation` | Extraction: Simulation service via strangler pattern | - |
|
|
| `aeries-3-society` | Social layer: Spatial service + Redis pub/sub | Redis |
|
|
|
|
**Running the Aeries sequence:**
|
|
```bash
|
|
# Phase 1: Create the monolith
|
|
./tree-runner.sh run aeries-1-genesis --project-name aeries-test
|
|
|
|
# Phase 2: Extract simulation service (operates on existing project)
|
|
./tree-runner.sh run aeries-2-simulation --project-id aeries-test
|
|
|
|
# Phase 3: Add social layer
|
|
./tree-runner.sh run aeries-3-society --project-id aeries-test
|
|
```
|
|
|
|
These trees demonstrate:
|
|
- **Multi-phase patterns** - Later phases take `project_id` not `project_name`
|
|
- **Build polling** - Shell-based waits for long-running SDLC builds
|
|
- **Service extraction** - Strangler pattern via `/extract-service` command
|
|
- **No teardown in phases 2+** - Project lifecycle owned by Phase 1
|
|
|
|
### Slackpath Trees (Reference Architectures)
|
|
|
|
Progressive complexity paths for building Slack-like platforms:
|
|
|
|
| Tree | Description | Infrastructure |
|
|
|------|-------------|----------------|
|
|
| `slackpath-1-authenticated-service` | Identity layer: User auth, JWT, protected routes | CockroachDB |
|
|
| `slackpath-2-async-worker-pipeline` | Background jobs: Producer/consumer with Redis | Redis |
|
|
| `slackpath-3-realtime-chat` | WebSockets: Pub/sub broadcasting | Redis |
|
|
| `slackpath-4-microservice-constellation` | Service mesh: Auth + Chat + Worker coordination | CockroachDB + Redis |
|
|
|
|
**Running a slackpath:**
|
|
```bash
|
|
./cookbooks/scripts/tree-runner.sh run slackpath-1-authenticated-service \
|
|
--project-name auth-test-$(date +%s)
|
|
```
|
|
|
|
These trees demonstrate:
|
|
- Infrastructure provisioning (`type: postgres`, `type: redis`)
|
|
- Automatic credential injection (`DATABASE_URL`, `REDIS_URL`)
|
|
- SDLC-driven implementation via `/implement-feature` prompts
|
|
- End-to-end verification scripts
|
|
|
|
## Files
|
|
|
|
```
|
|
cookbooks/
|
|
├── .checkpoints/ # Checkpoint storage (gitignored)
|
|
│ └── landing-page.json
|
|
├── scripts/
|
|
│ ├── lib/
|
|
│ │ ├── checkpoint.sh # Checkpoint I/O
|
|
│ │ └── tree-parser.sh # YAML parsing
|
|
│ └── tree-runner.sh # Main executable
|
|
└── trees/
|
|
├── landing-page.yaml
|
|
├── composable-app.yaml
|
|
├── sdlc-flow.yaml
|
|
├── aeries-1-genesis.yaml # Multi-phase: monolith
|
|
├── aeries-2-simulation.yaml # Multi-phase: extraction
|
|
├── aeries-3-society.yaml # Multi-phase: social layer
|
|
├── slackpath-1-authenticated-service.yaml
|
|
├── slackpath-2-async-worker-pipeline.yaml
|
|
├── slackpath-3-realtime-chat.yaml
|
|
└── slackpath-4-microservice-constellation.yaml
|
|
```
|
|
|
|
## Related
|
|
|
|
- [E2E Testing Strategy](./e2e-testing-strategy.md) — When to run trees, philosophy, history tracking
|
|
- [Composable Monorepo Templates](./composable-monorepo.md) — Template structure tested by trees
|