From 1bc5b63cd64a10a17ae536bc43560dac56f3eca3 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Fri, 12 Dec 2025 09:19:31 +0100 Subject: [PATCH 1/2] docs: Add usage guide on how to submit some tasks locally instead of k8s --- .../using-kubernetes-executors.adoc | 46 +++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc b/docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc index 559bf594..e0514a99 100644 --- a/docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc +++ b/docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc @@ -31,6 +31,52 @@ spec: # ... ---- +== Task startup latency + +While it has many benefits to spawn a dedicated Pod for every task, this introduces some latency. +The shorter the actual task runs, the higher the effect of this latency get's on the overall DAG runtime. + +If your tasks don't do computational expensive things (e.g. only submit a query to a Trino cluster), you can schedule them to run locally (on the scheduler) and not spawn a Pod to reduce the DAG runtime. + +To achieve this enable the `LocalExecutor` in your Airflow stacklet with + +[source,yaml] +---- +spec: + webservers: + envOverrides: &envOverrides + # We default our tasks to KubernetesExecutor, however, tasks can opt in to using the LocalExecutor + # See https://docs.stackable.tech/home/stable/airflow/usage-guide/using-kubernetes-executors/ + AIRFLOW__CORE__EXECUTOR: KubernetesExecutor,LocalExecutor + schedulers: + envOverrides: *envOverrides + kubernetesExecutors: + envOverrides: *envOverrides +---- + +Afterwards tasks can opt-in to the `LocalExecutor` using + +[source,python] +---- +@task(executor="LocalExecutor") +def hello_world(): + print("hello world!") +---- + +As an alternative if *all* tasks of your DAG should run locally, you can also configure this on a DAG level (tasks can still explicitly use `KubernetesExecutor`): + +[source,python] +---- +with DAG( + dag_id="hello_worlds", + default_args={"executor": "LocalExecutor"}, # Applies to all tasks in the Dag +) as dag: +---- + +See the https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html#using-multiple-executors-concurrently[official Airflow documentation] for details. + +TIP: You might need to increase the scheduler resources, as it now runs more stuff. + == Logging Kubernetes Executors and their respective Pods only live as long as the task they are executing. From 48566acc208192a8e92dec78bb4360573653ccfe Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Fri, 12 Dec 2025 13:39:10 +0100 Subject: [PATCH 2/2] Apply suggestions from code review Co-authored-by: Nick <10092581+NickLarsenNZ@users.noreply.github.com> --- .../airflow/pages/usage-guide/using-kubernetes-executors.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc b/docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc index e0514a99..cdfa0ae5 100644 --- a/docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc +++ b/docs/modules/airflow/pages/usage-guide/using-kubernetes-executors.adoc @@ -33,10 +33,10 @@ spec: == Task startup latency -While it has many benefits to spawn a dedicated Pod for every task, this introduces some latency. +While there are many benefits to spawning a dedicated Pod for every task, this introduces some latency. The shorter the actual task runs, the higher the effect of this latency get's on the overall DAG runtime. -If your tasks don't do computational expensive things (e.g. only submit a query to a Trino cluster), you can schedule them to run locally (on the scheduler) and not spawn a Pod to reduce the DAG runtime. +If your tasks don't do computationally expensive things (e.g. only submit a query to a Trino cluster), you can schedule them to run locally (on the scheduler) and not spawn a Pod to reduce the DAG runtime. To achieve this enable the `LocalExecutor` in your Airflow stacklet with