When to use this runbook: operating Docker hosts and Swarm clusters that Powernode manages via its DevOps API.
- Prerequisites
- When to use this
- Architecture
- Docker Host Management
- Swarm Cluster Operations
- Deployment Strategies
- Service Layer Reference
- API Endpoints
- Procedure — Adding a New Docker Host
- Procedure — Deploying to Swarm
- Procedure — Container Execution Lifecycle
- Verification
- Rollback
- Monitoring
- Security
- Troubleshooting
- Docker Engine on every managed host (TLS-enabled API recommended)
- Network reachability from the Powernode backend to each host's Docker API endpoint (default port 2376)
- For Swarm: at least one manager node, ideally three for quorum
docker.hosts.manage/swarm.clusters.managepermissions on the user invoking actions- HashiCorp Vault reachable from the backend (for container secret provisioning)
- Onboarding a new Docker host into the managed fleet
- Setting up or expanding a Docker Swarm cluster
- Deploying or rolling back a stack
- Investigating a failed host sync or stuck container instance
flowchart TB
subgraph PN[Powernode API]
CTRL[Controllers]
SVC[Services]
CLIENT[Docker API Client]
CTRL --> SVC --> CLIENT
end
subgraph SWARM[Swarm Cluster]
A[Host A — Manager]
B[Host B — Worker]
C[Host C — Worker]
end
CLIENT -- TLS --> A
CLIENT -- TLS --> B
CLIENT -- TLS --> C
A <--> B
A <--> C
Powernode supports:
- Standalone Docker hosts — individual Docker daemon management
- Swarm clusters — multi-node cluster orchestration
- Hybrid deployments — mix of standalone and clustered hosts
Required fields:
name— unique per accountapi_endpoint— Docker Engine API URL (e.g.https://docker.example.com:2376)environment—staging,production,development, orcustom
TLS configuration:
tls_verify— enable TLS verification- Encrypted TLS credentials stored via
encrypted_tls_credentials
Hosts auto-sync on configurable intervals (30s–3600s):
- Container inventory
- Image inventory
- System info (Docker version, OS, architecture, resources)
- Event stream
Health monitoring:
- Consecutive failures tracked
- Auto-transitions to
errorstatus after 5 consecutive failures - Manual recovery via
record_success!
| Status | Description |
|---|---|
pending |
Newly registered, not yet connected |
connected |
Active and syncing |
disconnected |
Connection lost, not syncing |
error |
Multiple consecutive failures |
maintenance |
Manually taken offline |
Swarm clusters are registered similarly to Docker hosts but represent the manager node endpoint.
Auto-sync capabilities:
- Node inventory and status
- Service definitions and replica counts
- Stack deployments
- Cluster events
Nodes (Devops::SwarmNode):
- Manager and worker node tracking
- Availability and status monitoring
- Resource capacity reporting
Services (Devops::SwarmService):
- Service definition management
- Replica scaling
- Update and rollback configuration
Stacks (Devops::SwarmStack):
- Docker Compose-based stack deployment
- Multi-service orchestration
- Stack-level health monitoring
Deployments (Devops::SwarmDeployment):
- Deployment history tracking
- Rollback support
- Blue / green and canary deployment strategies
Devops::DeploymentStrategies::BlueGreenStrategy
- Deploy new version alongside existing (blue → green)
- Run health checks on the green deployment
- Switch traffic from blue to green
- Keep blue available for instant rollback
Devops::DeploymentStrategies::CanaryStrategy
- Deploy new version to a subset of nodes
- Monitor error rates and performance
- Gradually increase traffic to the new version
- Full rollout or automatic rollback on failures
Low-level Docker Engine API communication with TLS support.
| Service | Operations |
|---|---|
ContainerManager |
create, start, stop, restart, remove, logs, exec |
HostManager |
register, connect, disconnect, sync, health check |
ImageManager |
pull, build, tag, push, remove, inspect |
NetworkManager |
create, remove, connect, disconnect, inspect |
VolumeManager |
create, remove, inspect, prune |
ServiceManager |
create, update, scale, remove, logs |
StackManager |
deploy, remove, list services, status |
SwarmManager |
init, join, leave, update, inspect |
NodeManager |
list, inspect, update, promote, demote |
SecretManager |
create, update, remove, inspect |
HealthMonitor |
host / container / cluster health |
Devops::ContainerOrchestrationService provides high-level container lifecycle management:
- Template-based container creation
- Resource quota enforcement via
QuotaService - Vault token provisioning for secrets
- Execution timeout management
- Cleanup and resource reclamation
GET /api/v1/devops/docker/hosts
POST /api/v1/devops/docker/hosts
GET /api/v1/devops/docker/hosts/:id
PUT /api/v1/devops/docker/hosts/:id
DELETE /api/v1/devops/docker/hosts/:id
GET /api/v1/devops/docker/containers
GET /api/v1/devops/docker/images
GET /api/v1/devops/docker/networks
GET /api/v1/devops/docker/volumes
GET /api/v1/devops/docker/events
GET /api/v1/devops/docker/activities
GET /api/v1/devops/swarm/clusters
POST /api/v1/devops/swarm/clusters
GET /api/v1/devops/swarm/clusters/:id
PUT /api/v1/devops/swarm/clusters/:id
DELETE /api/v1/devops/swarm/clusters/:id
GET /api/v1/devops/swarm/nodes
GET /api/v1/devops/swarm/services
GET /api/v1/devops/swarm/stacks
GET /api/v1/devops/swarm/deployments
GET /api/v1/devops/swarm/events
GET /api/v1/devops/swarm/networks
GET /api/v1/devops/swarm/volumes
GET /api/v1/devops/swarm/secrets
GET /api/v1/devops/swarm/configs
- Register the host via API:
curl -X POST https://api.powernode.example.com/api/v1/devops/docker/hosts \ -H "Authorization: Bearer <jwt>" \ -H "Content-Type: application/json" \ -d '{ "host": { "name": "docker-prod-1", "api_endpoint": "https://10.0.0.10:2376", "environment": "production", "tls_verify": true, "tls_credentials": { /* ca / cert / key — encrypted on save */ } } }'
- System verifies connectivity (status transitions to
connected). - Initial sync pulls container / image inventory.
- Auto-sync begins at the configured interval.
- Register the Swarm cluster with the manager node endpoint.
- The system discovers nodes, services, and stacks.
- Deploy a stack via API or pipeline step.
- Monitor deployment via events and service status.
- Create a
ContainerInstancefrom template or direct config. pending→ Vault token provisioned →provisioning- Container started on target host →
running - Resource usage tracked during execution
- On completion: output captured, Vault token revoked →
completed/failed - Linked A2A tasks updated with results.
After each operation:
GET /api/v1/devops/docker/hosts/:id→status: connectedGET /api/v1/devops/swarm/clusters/:id/health→ no failing nodes / quorum loss- Container instance
statusisrunningandcontainer_idpopulated - No new
Devops::DockerEventrecords of severityerrorsince the operation start
curl -X POST https://api.powernode.example.com/api/v1/devops/swarm/services/:id/rollback \
-H "Authorization: Bearer <jwt>"Or via Docker CLI directly on a manager:
docker service rollback <service-id>Re-deploy the previous Compose file from Devops::SwarmDeployment history:
curl -X POST https://api.powernode.example.com/api/v1/devops/swarm/deployments/:previous_id/redeploy \
-H "Authorization: Bearer <jwt>"HealthMonitor performs:
- Docker daemon connectivity checks
- Container health status aggregation
- Resource utilisation monitoring
- Swarm cluster quorum verification
Devops::DockerEvent— container, image, network, volume eventsDevops::DockerActivity— user-initiated operationsDevops::SwarmEvent— cluster-level events
Container instances track:
memory_used_mb/cpu_used_millicoresstorage_used_bytesnetwork_bytes_in/network_bytes_out
Hosts track:
container_count/image_countmemory_bytes/cpu_count/storage_bytes- Docker version, OS type, architecture
All Docker API communication supports TLS with:
- Encrypted credential storage
- Certificate verification toggle
- Per-host TLS configuration
Container instances integrate with HashiCorp Vault:
- Token provisioning on container creation
- Automatic token revocation on completion
cleanup_vault_token!for manual cleanup
Container instances record security violations:
- Violation details with detection timestamps
has_security_violations?check- Violations accessible via the instance details API
- Docker Swarm secrets via
SecretManager Devops::SecretReferencefor secret tracking- Encrypted credential storage for integrations
| Symptom | Likely cause | First action |
|---|---|---|
Host stuck in pending |
API endpoint unreachable | Verify network / firewall + curl the endpoint manually |
Repeated error status |
Bad TLS credentials | Re-upload credentials; check daemon TLS config |
| Stack deploy fails with "service convergence" | Image pull failure on workers | Pre-pull image on every node; check registry auth |
| Lost Swarm quorum | Manager nodes < majority | Demote / re-init managers; restore from etcd snapshot if needed |
Container provisioning → never running |
Vault token issuance failure | Check Vault reachability / policy; cleanup_vault_token! and retry |
- production-deployment.md — Initial platform deployment
- worker-operations.md — Sidekiq worker that drives Docker sync
- performance-tuning.md — Sizing and throughput guidance
docs/infrastructure/DOCKER_SWARM_OPERATIONS.md
Last verified: 2026-05-17