Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,18 @@

## [Unreleased]

### Added

- Add a flag to determine if database initialization steps should be executed ([#669]).

### Fixed

- Don't panic on invalid authorization config. Previously, a missing OPA ConfigMap would crash the operator ([#667]).
- Fix OPA authorization for Airflow 3. Airflow 3 needs to be configured via env variables, the operator now does this correctly ([#668]).

[#667]: https://github.com/stackabletech/airflow-operator/pull/667
[#668]: https://github.com/stackabletech/airflow-operator/pull/668
[#669]: https://github.com/stackabletech/airflow-operator/pull/669

## [25.7.0] - 2025-07-23

Expand Down
10 changes: 10 additions & 0 deletions deploy/helm/airflow-operator/crds/crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -591,6 +591,16 @@ spec:
- repo
type: object
type: array
databaseInitialization:
default:
enabled: true
description: Settings related to the database initialization routines (which are always executed by default).
properties:
enabled:
default: true
description: 'Whether to execute the database initialization routines (a combination of database initialization, upgrade and migration depending on the Airflow version). Defaults to true to be backwards-compatible. WARNING: setting this to false is *unsupported* as subsequent updates to the Airflow cluster may result in broken behaviour due to inconsistent metadata! Do not change the default unless you know what you are doing!'
type: boolean
type: object
exposeConfig:
default: false
description: for internal use only - not for production use.
Expand Down
26 changes: 26 additions & 0 deletions docs/modules/airflow/pages/usage-guide/db-init.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
= Database initialization
:description: Configure Airflow Database start-up.

By default, Airflow will run database initialization routines (checking and/or creating the metadata schema and creating an admin user) on start-up.
These are idempotent and can be run every time as the overhead is minimal.
However, if these steps should be skipped, a running Airflow cluster can be patched with a resource like this to deactivate the initialization:

[source,yaml]
----
---
apiVersion: airflow.stackable.tech/v1alpha1
kind: AirflowCluster
metadata:
name: airflow
spec:
clusterConfig:
databaseInitialization:
enabled: false # <1>
----
<1> Turn off the initialization routine by setting `databaseInitialization.enabled` to `false`

NOTE: The field `databaseInitialization.enabled` is `true` by default to be backwards-compatible.
A fresh Airflow cluster cannot be created with this field set to `false` as this results in missing metadata in the Airflow database.

WARNING: Setting `databaseInitialization.enabled` to `false` is an unsupported operation as subsequent updates to a running Airflow cluster can result in broken behaviour due to inconsistent metadata.
Only set `databaseInitialization.enabled` to `false` if you know what you are doing!
1 change: 1 addition & 0 deletions docs/modules/airflow/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
** xref:airflow:getting_started/first_steps.adoc[]
* xref:airflow:required-external-components.adoc[]
* xref:airflow:usage-guide/index.adoc[]
** xref:airflow:usage-guide/db-init.adoc[]
** xref:airflow:usage-guide/mounting-dags.adoc[]
** xref:airflow:usage-guide/applying-custom-resources.adoc[]
** xref:airflow:usage-guide/listenerclass.adoc[]
Expand Down
7 changes: 5 additions & 2 deletions rust/operator-binary/src/airflow_controller.rs
Original file line number Diff line number Diff line change
Expand Up @@ -951,8 +951,11 @@ fn build_server_rolegroup_statefulset(
.context(GracefulShutdownSnafu)?;

let mut airflow_container_args = Vec::new();
airflow_container_args
.extend(airflow_role.get_commands(authentication_config, resolved_product_image));
airflow_container_args.extend(airflow_role.get_commands(
airflow,
authentication_config,
resolved_product_image,
));

airflow_container
.image_from_product_image(resolved_product_image)
Expand Down
110 changes: 78 additions & 32 deletions rust/operator-binary/src/crd/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,10 @@ pub mod versioned {
#[serde(default)]
pub load_examples: bool,

/// Settings related to the database initialization routines (which are always executed by default).
#[serde(default)]
pub database_initialization: DatabaseInitializationConfig,

/// Name of the Vector aggregator [discovery ConfigMap](DOCS_BASE_URL_PLACEHOLDER/concepts/service_discovery).
/// It must contain the key `ADDRESS` with the address of the Vector aggregator.
/// Follow the [logging tutorial](DOCS_BASE_URL_PLACEHOLDER/tutorials/logging-vector-aggregator)
Expand All @@ -268,7 +272,6 @@ pub mod versioned {
#[schemars(schema_with = "raw_object_list_schema")]
pub volume_mounts: Vec<VolumeMount>,
}

// TODO: move generic version to op-rs?
#[derive(Clone, Debug, Deserialize, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
Expand All @@ -282,6 +285,28 @@ pub mod versioned {
}
}

#[derive(Clone, Debug, Deserialize, JsonSchema, PartialEq, Serialize)]
#[serde(rename_all = "camelCase")]
pub struct DatabaseInitializationConfig {
/// Whether to execute the database initialization routines (a combination of database initialization, upgrade and migration depending on the Airflow version). Defaults to true to be backwards-compatible.
/// WARNING: setting this to false is *unsupported* as subsequent updates to the Airflow cluster may result in broken behaviour due to inconsistent metadata!
/// Do not change the default unless you know what you are doing!
#[serde(default = "default_db_init")]
pub enabled: bool,
}

impl Default for DatabaseInitializationConfig {
fn default() -> Self {
Self {
enabled: default_db_init(),
}
}
}

pub fn default_db_init() -> bool {
true
}

impl Default for v1alpha1::WebserverRoleConfig {
fn default() -> Self {
v1alpha1::WebserverRoleConfig {
Expand Down Expand Up @@ -547,6 +572,7 @@ impl AirflowRole {
/// if authentication is enabled.
pub fn get_commands(
&self,
airflow: &v1alpha1::AirflowCluster,
auth_config: &AirflowClientAuthenticationDetailsResolved,
resolved_product_image: &ResolvedProductImage,
) -> Vec<String> {
Expand Down Expand Up @@ -576,21 +602,30 @@ impl AirflowRole {
"airflow api-server &".to_string(),
]);
}
AirflowRole::Scheduler => command.extend(vec![
"airflow db migrate".to_string(),
"airflow users create \
--username \"$ADMIN_USERNAME\" \
--firstname \"$ADMIN_FIRSTNAME\" \
--lastname \"$ADMIN_LASTNAME\" \
--email \"$ADMIN_EMAIL\" \
--password \"$ADMIN_PASSWORD\" \
--role \"Admin\""
.to_string(),
"prepare_signal_handlers".to_string(),
container_debug_command(),
"airflow dag-processor &".to_string(),
"airflow scheduler &".to_string(),
]),
AirflowRole::Scheduler => {
if airflow.spec.cluster_config.database_initialization.enabled {
tracing::info!("Database initialization has been enabled.");
command.extend(vec![
"airflow db migrate".to_string(),
"airflow users create \
--username \"$ADMIN_USERNAME\" \
--firstname \"$ADMIN_FIRSTNAME\" \
--lastname \"$ADMIN_LASTNAME\" \
--email \"$ADMIN_EMAIL\" \
--password \"$ADMIN_PASSWORD\" \
--role \"Admin\""
.to_string(),
]);
} else {
tracing::info!("Database initialization routines have been skipped!")
}
command.extend(vec![
"prepare_signal_handlers".to_string(),
container_debug_command(),
"airflow dag-processor &".to_string(),
"airflow scheduler &".to_string(),
]);
}
AirflowRole::Worker => command.extend(vec![
"prepare_signal_handlers".to_string(),
container_debug_command(),
Expand All @@ -608,22 +643,31 @@ impl AirflowRole {
"airflow webserver &".to_string(),
]);
}
AirflowRole::Scheduler => command.extend(vec![
// Database initialization is limited to the scheduler, see https://github.com/stackabletech/airflow-operator/issues/259
"airflow db init".to_string(),
"airflow db upgrade".to_string(),
"airflow users create \
--username \"$ADMIN_USERNAME\" \
--firstname \"$ADMIN_FIRSTNAME\" \
--lastname \"$ADMIN_LASTNAME\" \
--email \"$ADMIN_EMAIL\" \
--password \"$ADMIN_PASSWORD\" \
--role \"Admin\""
.to_string(),
"prepare_signal_handlers".to_string(),
container_debug_command(),
"airflow scheduler &".to_string(),
]),
AirflowRole::Scheduler => {
if airflow.spec.cluster_config.database_initialization.enabled {
tracing::info!("Database initialization has been enabled.");
command.extend(vec![
// Database initialization is limited to the scheduler, see https://github.com/stackabletech/airflow-operator/issues/259
"airflow db init".to_string(),
"airflow db upgrade".to_string(),
"airflow users create \
--username \"$ADMIN_USERNAME\" \
--firstname \"$ADMIN_FIRSTNAME\" \
--lastname \"$ADMIN_LASTNAME\" \
--email \"$ADMIN_EMAIL\" \
--password \"$ADMIN_PASSWORD\" \
--role \"Admin\""
.to_string(),
]);
} else {
tracing::info!("Database initialization routines have been skipped!")
}
command.extend(vec![
"prepare_signal_handlers".to_string(),
container_debug_command(),
"airflow scheduler &".to_string(),
]);
}
AirflowRole::Worker => command.extend(vec![
"prepare_signal_handlers".to_string(),
container_debug_command(),
Expand Down Expand Up @@ -981,5 +1025,7 @@ mod tests {
assert_eq!("KubernetesExecutor", cluster.spec.executor.to_string());
assert!(cluster.spec.cluster_config.load_examples);
assert!(cluster.spec.cluster_config.expose_config);
// defaults to true
assert!(cluster.spec.cluster_config.database_initialization.enabled);
}
}
8 changes: 8 additions & 0 deletions tests/templates/kuttl/cluster-operation/09-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# For this assert we expect the database operation to be logged
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 30
commands:
- script: |
kubectl -n $NAMESPACE logs airflow-scheduler-default-0 | grep "Database migrating done!"
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ spec:
vectorAggregatorConfigMapName: vector-aggregator-discovery
{% endif %}
credentialsSecret: test-airflow-credentials
databaseInitialization:
enabled: false
webservers:
roleConfig:
listenerClass: external-unstable
Expand Down
8 changes: 8 additions & 0 deletions tests/templates/kuttl/cluster-operation/31-assert.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
# For this step we expect the database operation to NOT be logged
apiVersion: kuttl.dev/v1beta1
kind: TestAssert
timeout: 30
commands:
- script: |
kubectl -n $NAMESPACE logs airflow-scheduler-default-0 | grep -q "Database migrating done!" && exit 1 || exit 0