[HWORKS-2810] [Append] Update airflow docs#592
Merged
Conversation
Three new pages and one rewrite, covering the v3 upgrade landing on the `airflow3-upgrade` branch across hopsworks-front, hopsworks-ee, docker-images, and hopsworks-helm. - `user_guides/projects/airflow/airflow3_upgrade.md` — porting checklist from Airflow 1.10 → 3.0.6 (schedule_interval → schedule, the FAB-era operator imports → standard provider package, `dag_id` naming scheme), the new Hopsworks operators / sensors shipped in `apache-airflow-providers-hopsworks`, the per-DAG API key written into an Airflow Variable (so the DAG file never carries the secret), the Hopsworks UI's Last-runs column and row-click behavior, the metadata-DB-on-upgrade caveat, and the task-logs-across-pod-restarts caveat (the `hostname_callable = airflow.utils.net.get_host_ip_address` scheduler config is what makes serve_logs reachable cross-pod). - `user_guides/projects/airflow/security_model.md` — surface-by-surface authorization matrix. Notes that Audit Log is intentionally visible to all users but row-filtered server-side to `dag_id IN (user's project dag_ids)` via the auth-manager's `do_orm_execute` hook, that Airflow 3.0.x does not expose a synchronous refresh_user hook so membership is stable for the cookie TTL, and that the per-DAG API key Variable is the one platform-managed exception to the "don't put project-private secrets in Airflow Variables" rule. - `setup_installation/admin/airflow3.md` — operator notes. Pins to `apache/airflow:3.0.6-python3.12`. Calls out that `db-reset` is install-only (`hopsworkslib.isInstall` gate, set `global._hopsworks.mode=install` on first install). Documents the liveness-probe choice (TCP on 8793 for scheduler, `/proc/1/comm` read for dag-processor) because `pgrep` is unreliable under `yama.ptrace_scope >= 1` / `hidepid` hardening. - `user_guides/projects/airflow/airflow.md` — rewrite the operator invocation examples for the v3 import paths, drop the obsolete `access_control` parameter (the new auth manager doesn't honor it; authz comes from the dag_id-to-project sidecar table), and link forward to the two new pages plus the upgrade page. Nav entries added under `User Guides → Projects → Airflow` and under `Setup → Administration`. Reviewed-by: OpenAI Codex (GPT-5 via codex-plugin-cc 0.1.0) <codex@openai.com> Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Docs were silent on the migration path for DAGs composed before the no-embed change. The `HOPSWORKS_API_KEY` env-var line still works as the fourth tier of the hook's fallback, so existing DAG files keep running; users who want the key out of the file regenerate the DAG. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ences `md-snakeoil` runs in CI and strips F401 unused imports inside Python code fences. Two doc blocks were demonstration imports of the new Hopsworks providers — restore them with `# noqa: F401` so snakeoil keeps them, and pre-apply the trailing-blank-line trim the CI runs after snakeoil so `git diff --exit-code` passes on the next run. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR review pointed out that the per-DAG `hopsworks_api_key_<sha256(dag_id)[:16]>` Airflow Variable is not actually a per-DAG security boundary at task runtime: `dag_id` is non-secret and the hash is reproducible, so any running DAG can `Variable.get` another DAG's hashed name. Make this explicit in the security model so users understand the fallback is a shared credential surface within the cluster, not an isolated per-DAG vault. Point at task-token-exchange as the path that actually stays per-task. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gicalclocks#588 https://hopsworks.atlassian.net/browse/HWORKS-2810 Reconcile the inconsistencies the Copilot review flagged and fix the snippets that lint with F821: security_model.md: - Reflowed the multi-sentence-on-one-line lines per the repo's one-sentence-per-line style (intro, "What is not isolated"). - The `POST /auth/token` reference now spells out the full proxied path (`POST /hopsworks-api/airflow/auth/token`) so it matches the upgrade page and operators don't have to guess what's under the `[api] base_url` prefix. setup_installation/admin/airflow3.md: - Reflowed the multi-line intro. - Reverse-proxy section was contradictory with security_model.md: it claimed membership is refreshed "on every forwarded request" via an `X-Hopsworks-JWT` header. The auth manager doesn't refresh per-request in 3.0.x — it caches `project_ids` / `project_roles` for the cookie TTL (1 hour) and the Hopsworks backend evicts the cache via `POST /auth/internal/invalidate` on membership changes. Rewritten to match what the code does and cross-link to the security model page. user_guides/projects/airflow/airflow.md: - Three Python snippets reference `HopsworksLaunchOperator`, `HopsworksJobSuccessSensor`, and `HopsworksHdfsSensor` without importing them. `snakeoil` lints every fenced Python block via ruff and F821s on undefined names. Added the `from hopsworks.airflow.{operators,sensors} import ...` lines so the blocks are self-contained. - The page described the runs column as "colored dots"; the v3 upgrade page (and the actual UI) renders **colored squares**. Aligned the wording and reused the same colour legend. user_guides/projects/airflow/airflow3_upgrade.md: - Reflowed the "must rewrite" paragraph that mixed three sentences on one line. Reference-style heading-ID linking (Copilot suggestion) was left as a docs-team follow-up: mkdocs + mike already resolves the `.md` relative links correctly across versioned builds, and switching to autorefs heading IDs would touch every airflow page in lockstep without a corresponding repo-wide policy decision. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
https://hopsworks.atlassian.net/browse/HWORKS-2810 Snakeoil's ruff rules expect two blank lines between top-level import statements and the next definition / statement (E302). The three operator + sensor examples on the Airflow user-guide page had only one blank line between the `from … import …` and the following call, so the CI snakeoil step rewrote them and then failed on the resulting `git diff --exit-code`. Add the second blank line in each of the three code blocks so the rendered docs match what snakeoil would produce. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…conciler, orphan cleanup, OpenShift https://hopsworks.atlassian.net/browse/HWORKS-2810 Documents the Airflow-side changes that landed on the HWORKS-2810 branch after the initial doc commit on 2026-05-22: user_guides/projects/airflow/airflow.md * Row-actions section now mentions the trash icon and the full cleanup chain on delete (HopsFS file, per-DAG API key Variable, dag_project_index row, plus airflow.api.common .delete_dag.delete_dag for dag/dag_run/task_instance/ xcom/log). * Workflow-builder section notes that "@continuous" is rejected by both the UI form and the backend, with the OOM-loop rationale. user_guides/projects/airflow/airflow3_upgrade.md * Adds "@continuous" to the porting table so users who carry a 1.x continuous DAG over hit a clear "use cron or @once" line instead of debugging a scheduler crash-loop. user_guides/projects/airflow/security_model.md * New "Active-project scoping" subsection under "What is isolated". Documents the ?hopsworks_project=<id> bounce, the active_project_id JWT claim, what happens for multi-project members and admins, and the cookie-TTL handover when the user switches project in the Hopsworks UI. setup_installation/admin/airflow3.md * AirflowDagReconciler: 60s singleton EJB on the admin pod, what it reconciles, the transaction-attribute caveat, the logger name. * Orphan-cleanup CronJob: gated on airflow.enabled (no separate flag), the full list of child tables it covers including the four added recently (asset_dag_run_queue .target_dag_id, task_outlet_asset_reference.dag_id, deadline.dag_id, deadline.dagrun_id), and the one-Pod-retained behavior. * OpenShift compatibility: the chown :0 + chmod g=u pattern on /etc/airflow and launcher.sh, no runAsUser override needed. markdownlint-cli2 clean on all four files. No new pages, no mkdocs.yml change, no Python code blocks added so snakeoil is not in scope. Signed-off-by: Jim Dowling <jim@logicalclocks.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…io into airflow3-upgrade # Conflicts: # docs/setup_installation/admin/airflow3.md # docs/user_guides/projects/airflow/airflow.md # docs/user_guides/projects/airflow/airflow3_upgrade.md # docs/user_guides/projects/airflow/security_model.md
ErmiasG
approved these changes
May 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://hopsworks.atlassian.net/browse/HWORKS-2810
Appends documentation for the recent Airflow changes that landed after
the initial Airflow 3.0.6 docs (#588). Net change over the default
branch is the new sections only.
setup_installation/admin/airflow3.md: documents the DAG reconciler
(
AirflowDagReconcilersingleton EJB), theairflow-orphan-cleanupCronJob, and the OpenShift arbitrary-UID + GID-0 image contract.
user_guides/projects/airflow/airflow.md: documents the DAG delete
action from the Hopsworks UI and the cleanup it performs.
user_guides/projects/airflow/airflow3_upgrade.md: adds the
@continuousschedule rejection to the migration table.user_guides/projects/airflow/security_model.md: documents active-project
scoping when opening Airflow from a project's UI.
🤖 Generated with Claude Code