stemedb/docs/operations
jordan 1e5ba8b946
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
feat: wire auth bootstrap, cluster gateway, k8s deploy skill, and ops docs
- Wire auth bootstrap (root API key, startup guard, auth-first router) in main.rs
- Add cluster gateway handlers with proper error handling
- Update Dockerfile with optimized multi-stage build and .dockerignore
- Add orchard9-deploy skill for CI/CD pipeline (Gitea/Woodpecker/Kaniko/Zot)
- Add k8s deployment roadmap and provision-project-keys script
- Document production infrastructure in CLAUDE.md
- Update three-node-cluster reference architecture
- Trim hosted.rs doc comments to stay under 800-line limit
2026-03-07 00:56:31 -07:00
..
deployment feat: wire auth bootstrap, cluster gateway, k8s deploy skill, and ops docs 2026-03-07 00:56:31 -07:00
monitoring feat: add enterprise production readiness infrastructure 2026-02-12 06:08:15 +00:00
reference-architecture feat: wire auth bootstrap, cluster gateway, k8s deploy skill, and ops docs 2026-03-07 00:56:31 -07:00
runbooks feat: add enterprise production readiness infrastructure 2026-02-12 06:08:15 +00:00
node-lifecycle.md feat(admin): implement stemedb-admin CLI with API contract fixes 2026-02-12 08:23:36 +00:00
pilot-success-criteria.md feat: add enterprise production readiness infrastructure 2026-02-12 06:08:15 +00:00
README.md feat: wire auth bootstrap, cluster gateway, k8s deploy skill, and ops docs 2026-03-07 00:56:31 -07:00
troubleshooting-flowchart.md feat: add enterprise production readiness infrastructure 2026-02-12 06:08:15 +00:00

StemeDB Operations Guide

Welcome to the StemeDB operations hub. This documentation provides everything you need to deploy, monitor, troubleshoot, and maintain StemeDB in production environments.

Need to... Go to
Deploy to k3s (100 projects) k3s Deploy Roadmap
Deploy for the first time Single-Node Pilot Architecture
Troubleshoot an incident Operational Runbooks
Scale to production Three-Node Cluster Architecture
Size your deployment Resource Sizing Guide
Configure networking Network Requirements
Deploy with Docker Compose Pilot with Monitoring
Set up reverse proxy Nginx Config / Envoy Config
Validate pilot success Pilot Success Criteria

Operations Documentation

🚨 Runbooks

When things go wrong at 2am, these runbooks provide step-by-step incident response procedures:

Start here: Troubleshooting Flowchart - Decision tree from symptom to runbook


🏗️ Reference Architectures

Choose your deployment model based on scale, availability requirements, and operational maturity:

Architecture Target Assertions Queries/sec RTO/RPO Guide
Single-Node Pilot PoC, friendly pilot <10K <100/sec 2hr / 24hr Guide
Three-Node Cluster Production <100K <1K/sec 5min / 1min Guide
Enterprise (future) Large-scale >100K >1K/sec 1min / 0min Roadmap (P6+)

Also see:


📦 Deployment Examples

Infrastructure-as-Code examples ready to customize for your environment:


Pilot Success Criteria

Before going to production, validate your pilot meets these criteria:

  • Pilot Success Criteria - Performance, functional, operational requirements
  • 5 Amazement Moments - Demo validation checklist
  • Acceptance Criteria - Must Pass / Should Pass / Nice to Have

Common Tasks

First-Time Deployment

  1. Review Single-Node Pilot Architecture
  2. Follow Resource Sizing Guide to choose hardware
  3. Deploy using Docker Compose example
  4. Configure reverse proxy (Nginx or Envoy)
  5. Validate against Pilot Success Criteria

Incident Response

  1. Identify symptom (error message, alert, user report)
  2. Check Troubleshooting Flowchart
  3. Follow relevant runbook (see list above)
  4. Document resolution and add to runbook if new scenario

Scaling to Production

  1. Validate pilot success with Success Criteria
  2. Review Three-Node Cluster Architecture
  3. Plan migration (data backup, node provisioning, DNS changes)
  4. Execute deployment with rolling validation
  5. Set up monitoring (see Docker Compose example)

Prerequisites

Before using these operations guides, ensure you've completed:


Support

For questions or issues:

  • 📖 Documentation bugs: Report at GitHub Issues
  • 💬 Community support: [Discussion forum link TBD]
  • 🚨 Security issues: security@stemedb.io (or your org's security contact)

Contributing

Operations documentation is living documentation. If you:

  • Encounter an incident not covered by runbooks → Add it
  • Find an architecture pattern that works well → Document it
  • Discover a configuration improvement → Share the example

Submit pull requests to keep this guide current and valuable.


Last Updated: 2026-03-02