From 6690f070379dbb745bf26ea88426775f9f78e8f5 Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 06:50:51 +0100 Subject: [PATCH 1/9] feat: add initial autoscaling doc --- platform/private-locations/autoscaling.mdx | 53 ++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 platform/private-locations/autoscaling.mdx diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx new file mode 100644 index 00000000..21d7a207 --- /dev/null +++ b/platform/private-locations/autoscaling.mdx @@ -0,0 +1,53 @@ +--- +title: 'Autoscaling' +sidebarTitle: 'Autoscaling' +description: 'TO BE ADDED' +--- + +Checkly provides Prometheus metrics for a private location to enable automatic scaling and right-sizing of agents running against it. + + + +Setting up autoscaling requires Prometheus metrics to be ingested. + +For more information and setup guide go to [Exporting Metrics & Data via Prometheus V2](/integrations/observability/prometheus-v2/) + + + +## Metric-based autoscaling + +The method of scaling is based on Checkly's internal scaling where the autoscaling is set-up to match the total load going through the system. +This is achieved using the `checkly_private_location_check_runs` gauge metric which representes the best-effort* estimate of the check runs going through the private location. +* * Short-running checks are often excluded from the metric as their impact is negligible on the Private Location. + +## Example using KEDA + +While this can be achieved using any autoscaling component, the example below will be using [KEDA](https://keda.sh) as it is the most popular autoscaler for Kubernetes. + +```yaml +apiVersion: keda.sh/v1alpha1 +kind: ScaledObject +spec: + scaleTargetRef: + kind: Deployment # Optional. Default: Deployment + namespace: # Optional. + name: # Required. The name of the Checkly agent deployment. + minReplicaCount: 2 # Optional. Default: 0 + maxReplicaCount: 10 # Optional. Default: 100 + pollingInterval: 30 # Optional. Default: 30 seconds + cooldownPeriod: 300 # Optional. Default: 300 seconds + triggers: + - type: prometheus + metadata: + serverAddress: http://prometheus-k8s.monitoring.svc.cluster.local:9090 + metricName: checkly_private_location_check_runs + threshold: "2" + query: sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) +``` + +### Query Explanation: + +sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) +* `state` - the state at which the job is queued = check has been scheduled but yet to be picked up by a consumer +* `inflight` - the job is being executed by the agent + From 0be99fc7bcda812bddc2b153a1e89732090c501d Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 06:58:57 +0100 Subject: [PATCH 2/9] feat: AI-refined docs --- docs.json | 1 + platform/private-locations/autoscaling.mdx | 129 +++++++++++++++++---- 2 files changed, 107 insertions(+), 23 deletions(-) diff --git a/docs.json b/docs.json index eeede7a6..d17dd9ce 100644 --- a/docs.json +++ b/docs.json @@ -85,6 +85,7 @@ "platform/private-locations/kubernetes-deployment", "platform/private-locations/proxy-setup", "platform/private-locations/scaling-redundancy", + "platform/private-locations/autoscaling", "platform/private-locations/change-log" ] }, diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx index 21d7a207..56a0d57f 100644 --- a/platform/private-locations/autoscaling.mdx +++ b/platform/private-locations/autoscaling.mdx @@ -1,41 +1,49 @@ --- title: 'Autoscaling' -sidebarTitle: 'Autoscaling' -description: 'TO BE ADDED' +description: 'Autoscale Checkly Agents in a Private Location with KEDA based on queued and in-flight check runs.' --- -Checkly provides Prometheus metrics for a private location to enable automatic scaling and right-sizing of agents running against it. +Scale Checkly Agent pods automatically so capacity tracks live load on a Private Location. This page covers the KEDA-based recipe; for static capacity planning, see [Scaling and Redundancy](/platform/private-locations/scaling-redundancy). -Setting up autoscaling requires Prometheus metrics to be ingested. - -For more information and setup guide go to [Exporting Metrics & Data via Prometheus V2](/integrations/observability/prometheus-v2/) +- Prometheus V2 metrics are being ingested for your account. See [Exporting Metrics & Data via Prometheus V2](/integrations/observability/prometheus-v2/). +- Checkly Agents are deployed as a Kubernetes `Deployment`. See [Kubernetes Deployment](/platform/private-locations/kubernetes-deployment). +- [KEDA](https://keda.sh) is installed in the cluster. -## Metric-based autoscaling +## The signal + +Checkly exposes the `checkly_private_location_check_runs` gauge through the Prometheus V2 exporter. Filtered by `state` and a `private_location_slug_name`, it gives a near-real-time count of pending and currently-executing check runs in a single Private Location — the signal you drive replica count from. + +The relevant `state` values are: -The method of scaling is based on Checkly's internal scaling where the autoscaling is set-up to match the total load going through the system. -This is achieved using the `checkly_private_location_check_runs` gauge metric which representes the best-effort* estimate of the check runs going through the private location. -* * Short-running checks are often excluded from the metric as their impact is negligible on the Private Location. +- `queued` — the check run has been scheduled but not yet picked up by an agent. +- `inflight` — the check run is currently being executed by an agent. -## Example using KEDA +Summing both gives total live load and is what you want to scale on. -While this can be achieved using any autoscaling component, the example below will be using [KEDA](https://keda.sh) as it is the most popular autoscaler for Kubernetes. + +Short-running checks may be excluded from the gauge because their impact on Private Location capacity is negligible. + + +## KEDA `ScaledObject` ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject +metadata: + name: checkly-agent-autoscaler spec: scaleTargetRef: - kind: Deployment # Optional. Default: Deployment - namespace: # Optional. - name: # Required. The name of the Checkly agent deployment. - minReplicaCount: 2 # Optional. Default: 0 - maxReplicaCount: 10 # Optional. Default: 100 - pollingInterval: 30 # Optional. Default: 30 seconds - cooldownPeriod: 300 # Optional. Default: 300 seconds + kind: Deployment # Optional. Default: Deployment. + namespace: # Optional. + name: # Required. + minReplicaCount: 2 # Optional. Default: 0. + maxReplicaCount: 10 # Optional. Default: 100. + pollingInterval: 30 # Optional. Default: 30 (seconds). + cooldownPeriod: 300 # Optional. Default: 300. Scale-to-zero only; see below. triggers: - type: prometheus metadata: @@ -45,9 +53,84 @@ spec: query: sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) ``` -### Query Explanation: +The query is scoped to a single Private Location by `private_location_slug_name`, so create one `ScaledObject` per Private Location. + +For a Prometheus instance outside the cluster, add an [`authenticationRef`](https://keda.sh/docs/latest/scalers/prometheus/) pointing at a `TriggerAuthentication` resource with the appropriate credentials. + +## How desired replica count is computed + +KEDA publishes the PromQL result as an external metric and the HPA it creates applies: + +``` +desiredReplicas = ceil( PromQL_result / threshold ) +``` + +clamped to `[minReplicaCount, maxReplicaCount]`. `threshold` is a per-pod target value, not a trigger boundary. + +With `threshold: "2"`, `minReplicaCount: 2`, `maxReplicaCount: 10`: + +| `sum(...)` result | `ceil(result / 2)` | After clamp | Resulting pods | +|---|---|---|---| +| 0 | 0 | 2 | 2 (idle floor) | +| 1 | 1 | 2 | 2 | +| 4 | 2 | 2 | 2 | +| 5 | 3 | 3 | 3 | +| 11 | 6 | 6 | 6 | +| 25 | 13 | 10 | 10 (capped) | + +## Tuning the bounds + +- **`threshold`** — lower values produce more pods per unit of work. A reasonable starting point is the per-agent `JOB_CONCURRENCY` configured on the Deployment; halve it to add headroom. +- **`minReplicaCount`** — the redundancy floor. Keep at `2` or higher so a single agent failure doesn't take the Private Location offline. See [Scaling and Redundancy](/platform/private-locations/scaling-redundancy). +- **`maxReplicaCount`** — must exceed `ceil(peak_load / threshold)`. If the cap is too low, queued check runs accumulate above it and are dropped after the 6-minute queue TTL. + +## Scale-down behavior and `cooldownPeriod` + +`cooldownPeriod` governs scale-to-zero only — the wait between the trigger going inactive and KEDA scaling the workload to `0`. With `minReplicaCount: 2`, scale-to-zero never happens, so this field is dormant. + +Scale-in between `maxReplicaCount` and `minReplicaCount` is controlled by the HPA's own scale-down stabilization window (default 300 seconds). To dampen scale-in more aggressively, override the HPA behavior on the `ScaledObject`: + +```yaml + advanced: + horizontalPodAutoscalerConfig: + behavior: + scaleDown: + stabilizationWindowSeconds: 600 + policies: + - type: Pods + value: 1 + periodSeconds: 60 +``` + +## Graceful termination + +In-flight checks on a terminating pod are rerun on another agent after a 300-second timeout. Set `terminationGracePeriodSeconds` above this on the agent pod spec so an evicted pod has room to drain before `SIGKILL`: + +```yaml +spec: + template: + spec: + terminationGracePeriodSeconds: 330 +``` + +## Verify + +1. Confirm KEDA created the HPA and is reading the metric: + + ```bash + kubectl get scaledobject,hpa -n + ``` + +2. Probe the signal directly: + + ``` + sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) + ``` + +3. Schedule a burst of checks against the Private Location and watch the replica count climb toward `maxReplicaCount`, then settle back to `minReplicaCount` after the scale-down stabilization window elapses. -sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) -* `state` - the state at which the job is queued = check has been scheduled but yet to be picked up by a consumer -* `inflight` - the job is being executed by the agent +## See also +- [Scaling and Redundancy](/platform/private-locations/scaling-redundancy) +- [Exporting Metrics & Data via Prometheus V2](/integrations/observability/prometheus-v2/) +- [Kubernetes Deployment](/platform/private-locations/kubernetes-deployment) From 355984adc5c5172cc548421a7a65e2147b547c53 Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 07:29:42 +0100 Subject: [PATCH 3/9] feat: apply feedback --- platform/private-locations/autoscaling.mdx | 86 ++++++++-------------- 1 file changed, 32 insertions(+), 54 deletions(-) diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx index 56a0d57f..64615c6b 100644 --- a/platform/private-locations/autoscaling.mdx +++ b/platform/private-locations/autoscaling.mdx @@ -3,27 +3,25 @@ title: 'Autoscaling' description: 'Autoscale Checkly Agents in a Private Location with KEDA based on queued and in-flight check runs.' --- -Scale Checkly Agent pods automatically so capacity tracks live load on a Private Location. This page covers the KEDA-based recipe; for static capacity planning, see [Scaling and Redundancy](/platform/private-locations/scaling-redundancy). +Scale Checkly Agent pods automatically in relation to live load. This page covers the KEDA-based recipe; for static capacity planning, see [Scaling and Redundancy](/platform/private-locations/scaling-redundancy). -- Prometheus V2 metrics are being ingested for your account. See [Exporting Metrics & Data via Prometheus V2](/integrations/observability/prometheus-v2/). -- Checkly Agents are deployed as a Kubernetes `Deployment`. See [Kubernetes Deployment](/platform/private-locations/kubernetes-deployment). +- Prometheus V2 metrics are being ingested for your account — the only source for this gauge. See [Exporting Metrics & Data via Prometheus V2](/integrations/observability/prometheus-v2/). +- Checkly Agents are deployed via the [Checkly agent Helm chart](https://github.com/checkly/helm-charts/tree/main/charts/agent) (or an equivalent `Deployment`). See [Kubernetes Deployment](/platform/private-locations/kubernetes-deployment). - [KEDA](https://keda.sh) is installed in the cluster. ## The signal -Checkly exposes the `checkly_private_location_check_runs` gauge through the Prometheus V2 exporter. Filtered by `state` and a `private_location_slug_name`, it gives a near-real-time count of pending and currently-executing check runs in a single Private Location — the signal you drive replica count from. +Checkly exposes the `checkly_private_location_check_runs` gauge through the Prometheus V2 exporter. Filtered by `state` and a `private_location_slug_name`, it provides the count of pending and currently-executing check runs in a single Private Location — the signal you drive replica count from. The relevant `state` values are: - `queued` — the check run has been scheduled but not yet picked up by an agent. - `inflight` — the check run is currently being executed by an agent. -Summing both gives total live load and is what you want to scale on. - Short-running checks may be excluded from the gauge because their impact on Private Location capacity is negligible. @@ -37,70 +35,50 @@ metadata: name: checkly-agent-autoscaler spec: scaleTargetRef: - kind: Deployment # Optional. Default: Deployment. - namespace: # Optional. - name: # Required. - minReplicaCount: 2 # Optional. Default: 0. - maxReplicaCount: 10 # Optional. Default: 100. - pollingInterval: 30 # Optional. Default: 30 (seconds). - cooldownPeriod: 300 # Optional. Default: 300. Scale-to-zero only; see below. + namespace: + name: + minReplicaCount: 2 + maxReplicaCount: 10 triggers: - type: prometheus metadata: serverAddress: http://prometheus-k8s.monitoring.svc.cluster.local:9090 metricName: checkly_private_location_check_runs - threshold: "2" - query: sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) + threshold: "1" # Match this to the agent's JOB_CONCURRENCY. + query: sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) ``` The query is scoped to a single Private Location by `private_location_slug_name`, so create one `ScaledObject` per Private Location. + +If you deploy agents with the [Checkly agent Helm chart](https://github.com/checkly/helm-charts/tree/main/charts/agent), template the `ScaledObject` alongside your chart values so the autoscaler ships with the deployment. + + For a Prometheus instance outside the cluster, add an [`authenticationRef`](https://keda.sh/docs/latest/scalers/prometheus/) pointing at a `TriggerAuthentication` resource with the appropriate credentials. -## How desired replica count is computed +## How many pods you'll get -KEDA publishes the PromQL result as an external metric and the HPA it creates applies: +KEDA queries Prometheus on its polling interval and turns the result into a target pod count. With `threshold: "1"`, that target is roughly the number of queued plus in-flight check runs — one pod per check. The pod count is then kept within `minReplicaCount` and `maxReplicaCount`. -``` -desiredReplicas = ceil( PromQL_result / threshold ) -``` +For example, with `threshold: "1"`, `minReplicaCount: 2`, `maxReplicaCount: 10`: -clamped to `[minReplicaCount, maxReplicaCount]`. `threshold` is a per-pod target value, not a trigger boundary. - -With `threshold: "2"`, `minReplicaCount: 2`, `maxReplicaCount: 10`: - -| `sum(...)` result | `ceil(result / 2)` | After clamp | Resulting pods | -|---|---|---|---| -| 0 | 0 | 2 | 2 (idle floor) | -| 1 | 1 | 2 | 2 | -| 4 | 2 | 2 | 2 | -| 5 | 3 | 3 | 3 | -| 11 | 6 | 6 | 6 | -| 25 | 13 | 10 | 10 (capped) | +| Queued + in-flight check runs | Resulting pods | +|---|---| +| 0 | 2 (idle floor) | +| 1 | 2 | +| 3 | 3 | +| 7 | 7 | +| 20 | 10 (capped) | ## Tuning the bounds -- **`threshold`** — lower values produce more pods per unit of work. A reasonable starting point is the per-agent `JOB_CONCURRENCY` configured on the Deployment; halve it to add headroom. -- **`minReplicaCount`** — the redundancy floor. Keep at `2` or higher so a single agent failure doesn't take the Private Location offline. See [Scaling and Redundancy](/platform/private-locations/scaling-redundancy). -- **`maxReplicaCount`** — must exceed `ceil(peak_load / threshold)`. If the cap is too low, queued check runs accumulate above it and are dropped after the 6-minute queue TTL. - -## Scale-down behavior and `cooldownPeriod` +- **`threshold`** — set it to match the agent's `JOB_CONCURRENCY`. The default `JOB_CONCURRENCY` is `1`, so leave `threshold: "1"`. A higher value packs more checks per pod and can cause scheduling delays for long-running checks. +- **`minReplicaCount`** — keep at `2` or higher so a single agent failure doesn't take the Private Location offline. See [Scaling and Redundancy](/platform/private-locations/scaling-redundancy). +- **`maxReplicaCount`** — must exceed your expected peak queued + in-flight check runs. If the cap is too low, queued check runs accumulate above it and are dropped after the 6-minute queue TTL. -`cooldownPeriod` governs scale-to-zero only — the wait between the trigger going inactive and KEDA scaling the workload to `0`. With `minReplicaCount: 2`, scale-to-zero never happens, so this field is dormant. - -Scale-in between `maxReplicaCount` and `minReplicaCount` is controlled by the HPA's own scale-down stabilization window (default 300 seconds). To dampen scale-in more aggressively, override the HPA behavior on the `ScaledObject`: - -```yaml - advanced: - horizontalPodAutoscalerConfig: - behavior: - scaleDown: - stabilizationWindowSeconds: 600 - policies: - - type: Pods - value: 1 - periodSeconds: 60 -``` + +If you set `minReplicaCount: 0` to scale to zero when idle, [`cooldownPeriod`](https://keda.sh/docs/latest/reference/scaledobject-spec/#cooldownperiod) becomes important — it controls how long KEDA waits after the trigger goes inactive before scaling the deployment down to zero. + ## Graceful termination @@ -124,10 +102,10 @@ spec: 2. Probe the signal directly: ``` - sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) + sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) ``` -3. Schedule a burst of checks against the Private Location and watch the replica count climb toward `maxReplicaCount`, then settle back to `minReplicaCount` after the scale-down stabilization window elapses. +3. Schedule a burst of checks against the Private Location and watch the replica count climb toward `maxReplicaCount`, then settle back to `minReplicaCount` once the burst clears. ## See also From 5a052c8577fd2ec38ce82197deae579365f5df79 Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 07:58:58 +0100 Subject: [PATCH 4/9] feat: add termination period recommendations --- platform/private-locations/autoscaling.mdx | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx index 64615c6b..5773ff2d 100644 --- a/platform/private-locations/autoscaling.mdx +++ b/platform/private-locations/autoscaling.mdx @@ -37,8 +37,11 @@ spec: scaleTargetRef: namespace: name: - minReplicaCount: 2 - maxReplicaCount: 10 + minReplicaCount: 2 # Optional. Default: 0. + maxReplicaCount: 10 # Optional. Default: 100. + pollingInterval: 30 # Optional. Default: 30 (seconds). + cooldownPeriod: 300 # Optional. Default: 300. Scale-to-zero only; see below. + maxReplicaCount: 10 triggers: - type: prometheus metadata: @@ -88,9 +91,18 @@ In-flight checks on a terminating pod are rerun on another agent after a 300-sec spec: template: spec: - terminationGracePeriodSeconds: 330 + terminationGracePeriodSeconds: 330 # Aligns with the longest-running checktype. ``` +Maximum runtime by check type: + +| Check type | Maximum runtime | +|------------------------|-----------------| +| API, TCP, DNS, ICMP | 30 seconds | +| Browser | 4 minutes | +| Multistep | 4 minutes | +| Playwright Check Suite | 60 minutes | + ## Verify 1. Confirm KEDA created the HPA and is reading the metric: From e5f4fd3d1d674a0bddc12efcc5d4c7601b7ba812 Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 08:01:52 +0100 Subject: [PATCH 5/9] feat: fix indentation + add advanced autoscaling config --- platform/private-locations/autoscaling.mdx | 28 +++++++++++++++++----- 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx index 5773ff2d..4a4bbb09 100644 --- a/platform/private-locations/autoscaling.mdx +++ b/platform/private-locations/autoscaling.mdx @@ -37,11 +37,27 @@ spec: scaleTargetRef: namespace: name: - minReplicaCount: 2 # Optional. Default: 0. - maxReplicaCount: 10 # Optional. Default: 100. - pollingInterval: 30 # Optional. Default: 30 (seconds). - cooldownPeriod: 300 # Optional. Default: 300. Scale-to-zero only; see below. - maxReplicaCount: 10 + minReplicaCount: 2 # Optional. Default: 0. + maxReplicaCount: 10 # Optional. Default: 100. + pollingInterval: 30 # Optional. Default: 30 (seconds). + cooldownPeriod: 300 # Optional. Default: 300. Scale-to-zero only; see below. + advanced: + horizontalPodAutoscalerConfig: + behavior: + scaleUp: + stabilizationWindowSeconds: 0 + selectPolicy: Max + policies: + - type: Pods + value: 1 + periodSeconds: 60 + scaleDown: + stabilizationWindowSeconds: 300 # 5m dampening + selectPolicy: Min + policies: + - type: Pods + value: 1 + periodSeconds: 60 triggers: - type: prometheus metadata: @@ -91,7 +107,7 @@ In-flight checks on a terminating pod are rerun on another agent after a 300-sec spec: template: spec: - terminationGracePeriodSeconds: 330 # Aligns with the longest-running checktype. + terminationGracePeriodSeconds: 330 # Set to your longest-running check type; up to 1800 for Playwright Check Suites. ``` Maximum runtime by check type: From 60673698019f31b346a67b9ac52574502444dd06 Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 08:02:24 +0100 Subject: [PATCH 6/9] feat: update stabilisation window --- platform/private-locations/autoscaling.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx index 4a4bbb09..9a64ce77 100644 --- a/platform/private-locations/autoscaling.mdx +++ b/platform/private-locations/autoscaling.mdx @@ -45,7 +45,7 @@ spec: horizontalPodAutoscalerConfig: behavior: scaleUp: - stabilizationWindowSeconds: 0 + stabilizationWindowSeconds: 60 selectPolicy: Max policies: - type: Pods From 04b3bf9121115c01275a9067e154c40d338d629e Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 08:10:09 +0100 Subject: [PATCH 7/9] feat: fixes --- platform/private-locations/autoscaling.mdx | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx index 9a64ce77..3dd7d772 100644 --- a/platform/private-locations/autoscaling.mdx +++ b/platform/private-locations/autoscaling.mdx @@ -37,22 +37,17 @@ spec: scaleTargetRef: namespace: name: - minReplicaCount: 2 # Optional. Default: 0. - maxReplicaCount: 10 # Optional. Default: 100. - pollingInterval: 30 # Optional. Default: 30 (seconds). - cooldownPeriod: 300 # Optional. Default: 300. Scale-to-zero only; see below. + minReplicaCount: 2 + maxReplicaCount: 10 advanced: horizontalPodAutoscalerConfig: behavior: scaleUp: - stabilizationWindowSeconds: 60 - selectPolicy: Max policies: - type: Pods value: 1 periodSeconds: 60 scaleDown: - stabilizationWindowSeconds: 300 # 5m dampening selectPolicy: Min policies: - type: Pods @@ -63,7 +58,7 @@ spec: metadata: serverAddress: http://prometheus-k8s.monitoring.svc.cluster.local:9090 metricName: checkly_private_location_check_runs - threshold: "1" # Match this to the agent's JOB_CONCURRENCY. + threshold: "1" # Match the agent's JOB_CONCURRENCY. query: sum(checkly_private_location_check_runs{state=~"queued|inflight", private_location_slug_name=""}) ``` From 79b5618ea19cd468eed3384adb6170e1eec6d47f Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 08:13:08 +0100 Subject: [PATCH 8/9] feat: added explicit explanation as to why short-running checks are excluded --- platform/private-locations/autoscaling.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx index 3dd7d772..0cf5f991 100644 --- a/platform/private-locations/autoscaling.mdx +++ b/platform/private-locations/autoscaling.mdx @@ -23,7 +23,7 @@ The relevant `state` values are: - `inflight` — the check run is currently being executed by an agent. -Short-running checks may be excluded from the gauge because their impact on Private Location capacity is negligible. +The gauge is aggregated on a ~1 minute interval, so checks that start and finish within that window may be excluded — their impact on Private Location capacity is negligible. ## KEDA `ScaledObject` From 95683885e8ad3e7a4dc625ba89dfd79844b5c92f Mon Sep 17 00:00:00 2001 From: ejanusevicius Date: Fri, 29 May 2026 08:25:57 +0100 Subject: [PATCH 9/9] chore: comment around sane defaults --- platform/private-locations/autoscaling.mdx | 2 ++ 1 file changed, 2 insertions(+) diff --git a/platform/private-locations/autoscaling.mdx b/platform/private-locations/autoscaling.mdx index 0cf5f991..02ddd3cc 100644 --- a/platform/private-locations/autoscaling.mdx +++ b/platform/private-locations/autoscaling.mdx @@ -28,6 +28,8 @@ The gauge is aggregated on a ~1 minute interval, so checks that start and finish ## KEDA `ScaledObject` +The `ScaledObject` below provides sensible defaults — adjust the bounds and scaling behavior to match your check workload. + ```yaml apiVersion: keda.sh/v1alpha1 kind: ScaledObject