diff --git a/content/nomad/v1.11.x/content/docs/job-declare/strategy/singleton.mdx b/content/nomad/v1.11.x/content/docs/job-declare/strategy/singleton.mdx new file mode 100644 index 0000000000..1cf25e5144 --- /dev/null +++ b/content/nomad/v1.11.x/content/docs/job-declare/strategy/singleton.mdx @@ -0,0 +1,301 @@ +--- +layout: docs +page_title: Configure singleton deployments +description: |- + Declare a job that guarantees only a single instance can run at a time, with + minimal downtime. +--- + +# Configure singleton deployments + +A singleton deployment is one where there is at most one instance of a given +allocation running on the cluster at one time. You might need this if the +workload needs exclusive access to a remote resource like a data store. Nomad +does not support singleton deployments as a built-in feature. Your workloads +continue to run even when the Nomad client agent has crashed, so ensuring +there's at most one allocation for a given workload requires some cooperation from the +job. This document describes how to implement singleton deployments. + +## Design Goals + +The configuration described here meets these primary design goals: + +- The design prevents a specific process with a task from running if there + is another instance of that task running anywhere else on the Nomad cluster. +- Nomad should be able to recover from failure of the task or the node on which + the task is running with minimal downtime, where "recovery" means that Nomad should stop the + original task and schedule a replacement + task. +- Nomad should minimize false positive detection of failures to avoid + unnecessary downtime during the cutover. + +There's a tradeoff between between recovery speed and false positives. The +faster you make Nomad attempt to recover from failure, the more likely that a +transient failure causes Nomad to schedule a replacement and a subsequent +downtime. + +Note that it's not possible to design a perfectly zero-downtime singleton +allocation in a distributed system. This design errs on the side of +correctness: having zero or one allocations running rather than the incorrect one or two +allocations running. + +## Overview + +There are several options available for some details of the implementation, but +all of them include the following: + +- You must have a distributed lock with a TTL that's refreshed from the + allocation. The process that sets and refreshes the lock must have its + lifecycle tied to the main task. It can be either in-process, in-task with + supervision, or run as a sidecar. If the allocation cannot obtain the lock, + then it must not start whatever process or operation you intend to be a + singleton. After a configurable window without obtaining the lock, the + allocation must fail. +- You must set the [`group.disconnect.stop_on_client_after`][] field. This + forces a Nomad client that's disconnected from the server to stop the + singleton allocation, which in turn releases the lock or allows its TTL to + expire. + +Tune the lock TTL, the time it takes the alloc to +give up, and the `stop_on_client_after` duration timer values to reduce the +maximum amount of downtime the application can have. + +The Nomad [Locks API][] can support the operations needed. In psuedo-code these +operations are the following: + +- To acquire the lock, `PUT /v1/var/:path?lock-acquire` + - On success: start heartbeat every 1/2 TTL + - On conflict or failure: retry with backoff and timeout. + - Once out of attempts, exit the process with error code. +- To heartbeat, `PUT /v1/var/:path?lock-renew` + - On success: continue + - On conflict: exit the process with error code + - On failure: retry with backoff up to TTL. + - If TTL expires, attempt to revoke lock, then exit the process with error code. + +The allocation can safely use the Nomad [Task API][] socket to write to the +locks API, rather than communicating with the server directly. This reduces load +on the server and speeds up detection of failed client nodes because the +disconnected client cannot forward the Task API requests to the leader. + +The [`nomad var lock`][] command implements this logic, so you can use it to shim +the process being locked. + +### ACLs + +Allocations cannot write to Nomad variables by default. You must configure a +[workload-associated ACL policy][] that allows write access in the +[`namespace.variables`][] block. For example, the following ACL policy allows +access to write a lock on the path `nomad/jobs/example/lock` in the `prod` +namespace: + +``` +namespace "prod" { + variables { + path "nomad/jobs/example/lock" { + capabilities = ["write", "read", "list"] + } + } +} +``` + +You set this policy on the job with `nomad acl policy apply -namespace prod -job +example example-lock ./policy.hcl`. + +## Implementation + +### Use `nomad var lock` + +We recommend implementing the locking logic with `nomad var lock` as a shim in +your task. This example jobspec assumes there's a Nomad binary in the container +image. + +```hcl +job "example" { + group "group" { + + disconnect { + stop_on_client_after = "1m" + } + + task "primary" { + config { + driver = "docker" + image = "example/app:1" + command = "nomad" + args = [ + "var", "lock", "nomad/jobs/example/lock", # lock + "busybox", "httpd", # application + "-vv", "-f", "-p", "8001", "-h", "/local" # application args + ] + } + + identity { + env = true + } + } + } +} +``` + +If you don't want to ship a Nomad binary in the container image, make a +read-only mount of the binary from a host volume. This only works in cases +where the Nomad binary has been statically linked or you have glibc in the +container image. + + + +```hcl +job "example" { + group "group" { + + disconnect { + stop_on_client_after = "1m" + } + + volume "binaries" { + type = "host" + source = "binaries" + read_only = true + } + + task "primary" { + config { + driver = "docker" + image = "example/app:1" + command = "/opt/bin/nomad" + args = [ + "var", "lock", "nomad/jobs/example/lock", # lock + "busybox", "httpd", # application + "-vv", "-f", "-p", "8001", "-h", "/local" # application args + ] + } + + identity { + env = true # make NOMAD_TOKEN available to lock command + } + + volume_mount { + volume = "binaries" + destination = "/opt/bin" + } + } + } +} + +### Sidecar lock + +If you cannot implement the lock logic in your application or with a shim such +as `nomad var lock`, you need to implement it such that the task you are locking +is running as a sidecar of the locking task, which has [`task.leader=true`][] +set. + + + +```hcl +job "example" { + group "group" { + + disconnect { + stop_on_client_after = "1m" + } + + task "lock" { + leader = true + config { + driver = "raw_exec" + command = "/opt/lock-script.sh" + pid_mode = "host" + } + + identity { + env = true # make NOMAD_TOKEN available to lock command + } + } + + task "application" { + lifecycle { + hook = "poststart" + sidecar = true + } + + config { + driver = "docker" + image = "example/app:1" + } + } + } +} + +The locking task has the following requirements: + +- Must be in the same group as the task being locked. +- Must be able to terminate the task being locked without the Nomad client being + up. For example, they share the same PID namespace, or the locking task is + privileged. +- Must have a way of signalling the task being locked that it is safe to start. + For example, the locking task can write a Sentinel file into the `/alloc` + directory, which the locked task tries to read on startup and blocks until it + exists. + +If you cannot meet the third requirement, then you need to split the lock +acquisition and lock heartbeat into separate tasks. + + + +```hcl +job "example" { + group "group" { + + disconnect { + stop_on_client_after = "1m" + } + + task "acquire" { + lifecycle { + hook = "prestart" + sidecar = false + } + config { + driver = "raw_exec" + command = "/opt/lock-acquire-script.sh" + } + identity { + env = true # make NOMAD_TOKEN available to lock command + } + } + + task "heartbeat" { + leader = true + config { + driver = "raw_exec" + command = "/opt/lock-heartbeat-script.sh" + pid_mode = "host" + } + identity { + env = true # make NOMAD_TOKEN available to lock command + } + } + + task "application" { + lifecycle { + hook = "poststart" + sidecar = true + } + + config { + driver = "docker" + image = "example/app:1" + } + } + } +} + +[`group.disconnect.stop_on_client_after`]: /nomad/docs/job-specification/disconnect#stop_on_client_after +[Locks API]: /nomad/api-docs/variables/locks +[Task API]: /nomad/api-docs/task-api +[`nomad var lock`]: /nomad/commands/var/lock +[workload-associated ACL policy]: /nomad/docs/concepts/workload-identity#workload-associated-acl-policies +[`namespace.variables`]: /nomad/docs/other-specifications/acl-policy#variables +[`task.leader=true`]: /nomad/docs/job-specification/task#leader +[`restart`]: /nomad/docs/job-specification/restart diff --git a/content/nomad/v1.11.x/data/docs-nav-data.json b/content/nomad/v1.11.x/data/docs-nav-data.json index e2a2fdcb15..fa2d9528f2 100644 --- a/content/nomad/v1.11.x/data/docs-nav-data.json +++ b/content/nomad/v1.11.x/data/docs-nav-data.json @@ -697,6 +697,10 @@ { "title": "Configure rolling", "path": "job-declare/strategy/rolling" + }, + { + "title": "Configure singleton", + "path": "job-declare/strategy/singleton" } ] },