-
Notifications
You must be signed in to change notification settings - Fork 28
[HWORKS-2810] Add Airflow 3.0.6 docs (upgrade + security + operator notes) #588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
d81ac2a
[HWORKS-2810] Add Airflow 3.0.6 user + admin docs
jimdowling 34b8280
[HWORKS-2810] Note legacy-embed compatibility for existing DAGs
jimdowling 6856b44
[HWORKS-2810] Doc CI: restore stripped imports with noqa, normalize f…
jimdowling c16ca29
[HWORKS-2810] Doc: sharpen the Variable-fallback isolation caveat
jimdowling 7c5a0c6
[HWORKS-2810] Address Copilot review on logicalclocks.github.io PR #588
jimdowling bad7dfa
[HWORKS-2810] airflow.md: PEP-8 blank lines in code blocks
jimdowling 6ba459c
Merge remote-tracking branch 'upstream/main' into airflow3-upgrade
jimdowling da61070
Merge remote-tracking branch 'upstream/main' into airflow3-upgrade
jimdowling File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| .claude/CLAUDE.md |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| # Operator Notes: Airflow 3 on Hopsworks | ||
|
|
||
| Administrative reference for cluster operators upgrading or installing Hopsworks with Airflow 3. | ||
|
|
||
| ## What the chart deploys | ||
|
|
||
| The Airflow subchart now creates four Kubernetes objects in addition to | ||
| what the v1 chart deployed: | ||
|
|
||
| 1. `dag-processor` Deployment — runs `airflow dag-processor`, parses | ||
| DAGs listed in the manifest. Carries only validator keys (no private | ||
| keys). | ||
| 2. `keys-bootstrap` pre-install Job — generates two RSA 4096 keypairs | ||
| (api-server + scheduler) and writes them to the | ||
| `hopsworks-airflow-keys` Secret. Idempotent; re-runs are no-ops. | ||
| 3. `db-reset` pre-install Job — drops and recreates the Airflow metadata database before migration. | ||
| Install-only: gated by `hopsworkslib.isInstall`, so it never re-fires on `helm upgrade` or ArgoCD resync. | ||
| Set `global._hopsworks.mode=install` on the first install only. | ||
| 4. `hopsworks-airflow-keys` Secret — four PEM files (two private, two | ||
| public). Pods project only the keys they need. | ||
|
|
||
| The existing `webserver` (now Airflow `api-server`) and `scheduler` | ||
| Deployments keep their resource names; only the container command and | ||
| environment changed. | ||
|
|
||
| ## Resource matrix | ||
|
|
||
| | Component | Image | Command | Private keys mounted | | ||
| | --- | --- | --- | --- | | ||
| | `airflow-webserver` | `apache/airflow:3.0.6-python3.12` + Hopsworks layers | `airflow api-server --proxy-headers` | api-server-private | | ||
| | `airflow-scheduler` | same | `airflow scheduler` | scheduler-private | | ||
| | `airflow-dag-processor` | same | `airflow dag-processor` | none (validator-only) | | ||
|
|
||
| All three pods render `/opt/airflow/airflow.cfg` from `airflow.cfg.template` at container start (the runtime MySQL password is substituted into the template before the main process execs). | ||
| Liveness probes use a TCP check on the scheduler's serve_logs port (8793) and a `/proc/1/comm` read for the dag-processor; `pgrep` is unreliable under kernel hardening such as `kernel.yama.ptrace_scope >= 1` or `hidepid`. | ||
|
|
||
| ## Key rotation | ||
|
|
||
| Re-run the `keys-bootstrap` Job with the `--rotate` flag (or delete the | ||
| `hopsworks-airflow-keys` Secret and run a `helm upgrade`), then | ||
| rolling-restart in this order: | ||
|
|
||
| ```text | ||
| api-server → scheduler → dag-processor | ||
| ``` | ||
|
|
||
| Outstanding user cookies and outstanding Execution-API task tokens are | ||
| invalidated; the proxy re-mints on 401. Plan for ~30s of UI | ||
| unavailability during the api-server restart. | ||
|
|
||
| ## Metadata DB on upgrade | ||
|
|
||
| The v3 chart drops and recreates the Airflow metadata database **on install only**, gated by `hopsworkslib.isInstall` (set `global._hopsworks.mode=install` on the first install). | ||
| There is no in-place 1.x → 3.x schema migration path; DAG-run history, ad-hoc Variables, ad-hoc Connections, and audit records from a 1.x deployment are not preserved by the cutover. | ||
| Schema changes after install go through the existing `migrate` job's Alembic migrations. | ||
|
|
||
| Customer DAG files in HopsFS at `Projects/<P>/Airflow/` are untouched. | ||
| Snapshot HopsFS before the upgrade if you need a rollback path that preserves DAG sources. | ||
|
|
||
| ## Reverse proxy contract | ||
|
|
||
| `AirflowProxyServlet` in hopsworks-ee validates the Hopsworks JWT and forwards `Set-Cookie` from Airflow unchanged. | ||
| The proxy does not rewrite cookie `Path=`; Airflow sets the cookie path from `[api] base_url` automatically. | ||
|
|
||
| Membership is **not** refreshed on every forwarded request. | ||
| The Airflow JWT carries `project_ids` / `project_roles` / `is_admin` at mint time and is stable for the cookie's TTL (1 hour by default). | ||
| Real-time membership changes are propagated by the Hopsworks backend pushing to `POST /auth/internal/invalidate`, which evicts the affected user's cached entry so the next login re-fetches the membership. | ||
| A 60-second safety-net TTL on the cache catches drift even without an explicit invalidation. | ||
| See [Airflow Security Model](../../user_guides/projects/airflow/security_model.md#token--cookie-behavior) for the full description. | ||
|
|
||
| ## Metrics | ||
|
|
||
| The legacy `airflow-exporter` 1.3.0 does not support Airflow 3. Metrics | ||
| are now emitted via Airflow-native StatsD into a sidecar | ||
| `statsd_exporter` Pod, scraped by Prometheus on its `/metrics` | ||
| endpoint. The legacy `AllowMetricsSecurityManager` is gone. | ||
|
|
||
| ## See also | ||
|
|
||
| - User-facing release notes: `user_guides/projects/airflow/airflow3_upgrade.md` | ||
| - Security model: `user_guides/projects/airflow/security_model.md` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| # Airflow 3 in Hopsworks | ||
|
|
||
| Hopsworks now ships Apache Airflow 3.0.6 as its workflow engine. | ||
| Airflow 3 is a major release with breaking changes to the DAG authoring API; the old 1.10-era DAGs do not run on it. | ||
| This page covers what changed, what you need to do to your DAGs, and what the new per-project security model guarantees. | ||
|
|
||
| ## Per-project DAG isolation | ||
|
|
||
| A non-admin Hopsworks user can only see, trigger, edit, pause, clear, or read logs of DAGs that belong to a Hopsworks project they are a member of. | ||
| The boundary is enforced on every authenticated request to the Airflow API server, every navigation in the Airflow UI, and every CLI call. | ||
|
|
||
| The **Audit Log** is visible to every authenticated user but its rows are filtered server-side: non-admin users see only events whose `dag_id` belongs to one of their projects. | ||
| The Hopsworks platform admin (`HOPS_ADMIN`) sees the unfiltered log. | ||
|
|
||
| What this **does not** isolate in this release: | ||
|
|
||
| - **Execution-time data access.** DAG tasks run in one shared scheduler process (LocalExecutor). | ||
| A task can in principle read any Airflow Variable, Connection, or XCom row, regardless of project. | ||
| Treat Airflow Variables, Connections, and Pools as cluster-wide. | ||
| - **DAG parsing.** DAGs from all projects are parsed in one shared process. | ||
| Module-top-level code in a DAG file runs with the dag-processor's privileges; treat it as cluster-wide too. | ||
|
|
||
| These are tracked for a future release that switches to KubernetesExecutor plus per-team dag-processors. | ||
| Until then, do not put project-private secrets in Airflow Variables or Connections — the per-DAG Hopsworks API key written by Hopsworks (see [API key for operators](#api-key-for-operators-no-embed)) is the exception, written by the platform itself rather than by users. | ||
|
|
||
|
jimdowling marked this conversation as resolved.
|
||
| ## What changed in the DAG API | ||
|
|
||
| You **must rewrite** your existing 1.10 DAGs for Airflow 3. | ||
| No automated rewrite tool ships with this release. | ||
| Concrete things to change: | ||
|
|
||
| | Old (Airflow 1.10) | New (Airflow 3.0.6) | | ||
| | --- | --- | | ||
| | `schedule_interval='@daily'` | `schedule='@daily'` | | ||
| | `provide_context=True` | implicit, remove the argument | | ||
| | `execution_date` in a template | `logical_date` | | ||
| | `from airflow.operators.python_operator import ...` | `from airflow.operators.python import ...` | | ||
| | `@apply_defaults` on custom operators | removed; declare `__init__` params normally | | ||
| | `SubDagOperator` | TaskGroups + Assets | | ||
| | `from airflow.models import BaseOperator` | `from airflow.sdk.bases.operator import BaseOperator` | | ||
| | Custom Hopsworks operators imported via plugins | Provider package `apache-airflow-providers-hopsworks` | | ||
| | Default `catchup_by_default = True` | Default `catchup=False`; set explicitly | | ||
|
|
||
| The Hopsworks-provided operators are now exposed via a standard provider: | ||
|
|
||
| ```python | ||
| from hopsworks.airflow.operators import HopsworksLaunchOperator # noqa: F401 | ||
| from hopsworks.airflow.sensors import ( # noqa: F401 | ||
| HopsworksHdfsSensor, | ||
| HopsworksJobSuccessSensor, | ||
| ) | ||
| ``` | ||
|
|
||
| `HopsworksHdfsSensor` replaces the legacy `HopsworksHdfsSensor` plugin from the 1.x shim. | ||
| It polls `/hopsworks-api/api/project/<id>/dataset/<path>?action=stat` and accepts either `project_id` or `project_name`. | ||
|
|
||
| ## API key for operators (no embed) | ||
|
|
||
| Hopsworks operators and sensors authenticate via the `HopsworksHook`, which resolves a credential in this order: | ||
|
|
||
| 1. **Task token exchange** — the scheduler signs a per-task RS256 token; the hook POSTs it to `/api/auth/airflow-task-exchange/exchange` on Hopsworks and gets a project-scoped JWT back. | ||
| 2. **Per-DAG Airflow Variable** — Hopsworks writes a per-DAG API key into an Airflow Variable named `hopsworks_api_key_<sha256(dag_id)[:16]>` (Fernet-encrypted at rest) on every DAG compose. | ||
| The hook reads it at task runtime via `Variable.get(...)` and uses it as `Authorization: ApiKey <key>`. | ||
| 3. **Airflow Connection** `hopsworks_default` — `conn.password` is read as a literal API key. | ||
| Useful for out-of-cluster operators. | ||
| 4. **`HOPSWORKS_API_KEY` env var** — manual override for power users. | ||
|
|
||
| The generated DAG file **never carries the API key** — the secret lives only in the Airflow Variables table (admin-only via `HopsworksAuthManager`), so the DAG `.py` is safe to inspect, version-control, or share. | ||
| Re-generate the DAG from the Hopsworks UI to rotate the key. | ||
|
|
||
| DAG files composed before this change still embed `os.environ.setdefault("HOPSWORKS_API_KEY", "<key>")` near the top of the file. | ||
| They continue to work because the env-var path is the fourth tier in the hook's fallback, but the secret is in the file. | ||
| Regenerate the DAG from the Hopsworks UI to drop the embed and switch to the Variable-fetch path. | ||
|
|
||
| ## DAG identity | ||
|
|
||
| The Airflow `dag_id` for a Hopsworks-composed DAG is now: | ||
|
|
||
| ```text | ||
| p_<project_slug>_<project_id>__<dag_user_name> | ||
| ``` | ||
|
|
||
| For example, project `acme` (id `42`) with a DAG named `daily_ingest` | ||
| becomes `p_acme_42__daily_ingest` in the Airflow UI. The Hopsworks UI | ||
| hides the prefix when displaying DAG names. | ||
|
|
||
| If you reference your DAGs by `dag_id` from external code (XCom pulls | ||
| across DAGs, `TriggerDagRunOperator`, REST API integrations), update | ||
| those references to the new format. | ||
|
|
||
| ## REST authentication | ||
|
|
||
| The Airflow API server is reached through the standard Hopsworks reverse | ||
| proxy at `https://<hopsworks>/hopsworks-api/airflow/`. Browsers carry an | ||
| HttpOnly `_token` cookie set by the auth manager; external clients use | ||
| bearer tokens. | ||
|
|
||
| To obtain a bearer token from a Hopsworks JWT: | ||
|
|
||
| ```bash | ||
| curl -X POST "https://<hopsworks>/hopsworks-api/airflow/auth/token" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"hopsworks_jwt": "<your-hopsworks-jwt>"}' | ||
| ``` | ||
|
|
||
| Response: | ||
|
|
||
| ```json | ||
| {"access_token": "<airflow-jwt>", "token_type": "Bearer", "expires_in": 3600} | ||
| ``` | ||
|
|
||
| Use the returned `access_token` in `Authorization: Bearer` on all | ||
| `/api/v2/*` calls. | ||
|
|
||
| ## Recent runs in the Hopsworks UI | ||
|
|
||
| The Hopsworks Airflow page lists each DAG with a **Last runs** column that renders the most recent runs as colored squares (green = success, red = failed, blue = running, yellow = queued / scheduled, gray = other). | ||
| Hover any square for the run id, state, and start time. | ||
| The data is read from a project-scoped Hopsworks endpoint that proxies to the auth manager and walks the same `dag_run` table the Airflow UI does, so the two views stay consistent. | ||
|
|
||
| Clicking anywhere on a DAG row opens the DAG in the Airflow UI (in a new tab). | ||
| The pencil column at the row's end opens the generated Python file in a Hopsworks editor without leaving Hopsworks. | ||
|
|
||
| ## Metadata DB on upgrade | ||
|
|
||
| Upgrading from Airflow 1.10 to Airflow 3 drops and recreates the Airflow metadata DB. | ||
| Historical DAG-run records, task logs in the DB, and any ad-hoc Variables / Connections you had configured are not preserved. | ||
| Snapshot HopsFS `Projects/<P>/Airflow/` before upgrade so you can roll back DAG files; the DB itself is not recoverable through the chart. | ||
|
|
||
| ## Task logs across pod restarts | ||
|
|
||
| Airflow 3 with the LocalExecutor writes task logs to the scheduler pod's local filesystem and serves them on port 8793 to the api-server pod. | ||
| The scheduler records the source endpoint (host + port) on the task instance row. | ||
| Hopsworks configures the scheduler with `[core] hostname_callable = airflow.utils.net.get_host_ip_address` so that endpoint is the pod IP (routable across pods), not the pod's DNS hostname (not resolvable from sibling pods). | ||
|
|
||
| Logs from runs that started before the scheduler pod was last restarted are unrecoverable — the pod's filesystem is ephemeral. | ||
| Re-trigger the DAG to regenerate task logs if you need them. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.