Skip to content

Conversation

@martinpitt
Copy link
Contributor

@martinpitt martinpitt commented Jun 3, 2025

Add a knob to tests/tasks/install_and_check.yml to skip the role run, as we don't want to do this during validation.


This uses a new test, as e.g. tests/tests_default.yml are already just wrappers
around task includes, and modifying them would be messier and harder to
understand than a new test.

The other commits are identical to linux-system-roles/sudo#59 and linux-system-roles/.github#114 .

Summary by Sourcery

Add support for end-to-end validation of bootc container images via QEMU and update related CI, defaults, and test logic to accommodate bootc scenarios

New Features:

  • Add a bootc end-to-end Ansible test playbook to build and validate container images via QEMU
  • Introduce a __run_role flag to skip role execution during validation runs

Enhancements:

  • Disable PostgreSQL server tuning by default when using buildah container connection
  • Update install_and_check tasks to respect __run_role and skip tuning checks under bootc validation

CI:

  • Bump tox-lsr to v3.10.0 in all CI workflows
  • Add a Podman 5.x upgrade step in QEMU integration workflow for bootc-image-builder compatibility
  • Add a CI job to run bootc validation tests in QEMU

Documentation:

  • Clarify in README that server tuning is disabled during container builds

Tests:

  • Add tests/tests_bootc_e2e.yml for bootc validation
  • Adjust existing test tasks to include conditional logic for bootc validation flags

@martinpitt martinpitt requested review from richm and spetrosi as code owners June 3, 2025 08:47
@sourcery-ai
Copy link

sourcery-ai bot commented Jun 3, 2025

Reviewer's Guide

Integrates bootc end-to-end validation into CI by adding a dedicated test playbook, gating role execution with a new flag, disabling tuning in buildah containers, and updating GitHub workflows for tox-lsr 3.10.0 and Podman 5.x support.

Sequence Diagram for Bootc End-to-End Validation Test

sequenceDiagram
    participant CI as GitHub Actions
    participant Playbook as tests_bootc_e2e.yml
    participant Builder as bootc-image-builder
    participant TargetEnv as QEMU VM / Deployed Image

    CI->>Playbook: Trigger bootc E2E test
    Playbook->>Playbook: Prepare environment (e.g., Podman 5.x)
    Playbook->>Builder: Build bootc image
    Builder-->>Playbook: Image built
    Playbook->>TargetEnv: Deploy/Run image
    Playbook->>TargetEnv: Validate deployed image
    TargetEnv-->>Playbook: Validation results
    Playbook->>CI: Report test status (pass/fail)
Loading

Flow Diagram for postgresql_server_tuning Default Logic

graph TD
    subgraph "Setting postgresql_server_tuning Default"
        Start[Load defaults/main.yml] --> CheckConn{ansible_connection == 'buildah'?};
        CheckConn -- "Yes" --> SetFalse[postgresql_server_tuning = false];
        CheckConn -- "No" --> SetTrue[postgresql_server_tuning = true];
        SetFalse --> EndValue[Value Set];
        SetTrue --> EndValue[Value Set];
    end
Loading

File-Level Changes

Change Details Files
Introduce __run_role flag to optionally skip role execution in validation tasks
  • Add __run_role variable check around include_role in install_and_check.yml
  • Wrap postgresql role invocation with when: __run_role
d(true)
Disable server tuning by default in buildah container builds
  • Change postgresql_server_tuning default to '{{ false if ansible_connection == "buildah" else true }}'
  • Annotate rationale for disabling tuning in defaults/main.yml
defaults/main.yml
Document buildah-specific tuning behavior in README
  • Add note under configuration section explaining buildah connection disables tuning by default
README.md
Add new bootc end-to-end test playbook
  • Create tests/tests_bootc_e2e.yml to build bootc image and validate deployed image
  • Use include_role and include_tasks with __run_role and __bootc_validation flags in playbook
tests/tests_bootc_e2e.yml
Bump tox-lsr dependency to v3.10.0 in CI workflows
  • Update pip3 install pin from 3.9.0 to 3.10.0 in ansible-lint.yml
  • Update pip3 install pin from 3.9.0 to 3.10.0 in ansible-managed-var-comment.yml
  • Update pip3 install pin from 3.9.0 to 3.10.0 in ansible-test.yml
  • Update pip3 install pin from 3.9.0 to 3.10.0 in qemu-kvm-integration-tests.yml
.github/workflows/ansible-lint.yml
.github/workflows/ansible-managed-var-comment.yml
.github/workflows/ansible-test.yml
.github/workflows/qemu-kvm-integration-tests.yml
Enhance QEMU CI for Podman 5.x compatibility and bootc validation
  • Add apt source override and package pinning to install Podman 5.x in qemu-kvm-integration-tests.yml
  • Adjust tox invocation to skip tests::bootc-e2e tag
  • Introduce 'Run bootc validation tests in QEMU' step that iterates QCOW2 images and logs PASS/FAIL
.github/workflows/qemu-kvm-integration-tests.yml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @martinpitt - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Review instructions: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@martinpitt martinpitt marked this pull request as draft June 3, 2025 08:55
@martinpitt martinpitt removed request for richm and spetrosi June 3, 2025 08:55
@martinpitt
Copy link
Contributor Author

That first version fails as expected, already noticed in https://issues.redhat.com/browse/RHEL-88396:

 postgresql.service - PostgreSQL database server
     Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Tue 2025-06-03 09:06:07 UTC; 23s ago
    Process: 876 ExecStartPre=/usr/libexec/postgresql-check-db-dir postgresql (code=exited, status=0/SUCCESS)
    Process: 910 ExecStart=/usr/bin/postmaster -D ${PGDATA} (code=exited, status=1/FAILURE)
   Main PID: 910 (code=exited, status=1/FAILURE)
        CPU: 31ms

Jun 03 09:06:07 localhost.localdomain systemd[1]: Starting PostgreSQL database server...
Jun 03 09:06:07 localhost.localdomain postmaster[910]: 2025-06-03 09:06:07.232 UTC [910] FATAL:  could not map anonymous shared memory: Cannot allocate memory
Jun 03 09:06:07 localhost.localdomain postmaster[910]: 2025-06-03 09:06:07.232 UTC [910] HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 4250107904 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
Jun 03 09:06:07 localhost.localdomain postmaster[910]: 2025-06-03 09:06:07.232 UTC [910] LOG:  database system is shut down
Jun 03 09:06:07 localhost.localdomain systemd[1]: postgresql.service: Main process exited, code=exited, status=1/FAILURE
Jun 03 09:06:07 localhost.localdomain systemd[1]: postgresql.service: Failed with result 'exit-code'.
Jun 03 09:06:07 localhost.localdomain systemd[1]: Failed to start PostgreSQL database server.

When running the role during a container build, the shared_buffers auto-detection is done based on the container RAM, which can be significantly higher than the VM's -- and either way it is completely unrelated. This needs to be disabled.

@martinpitt martinpitt force-pushed the qemu-bootc branch 3 times, most recently from 101b9fe to 272e0f4 Compare June 3, 2025 09:56
@martinpitt martinpitt marked this pull request as ready for review June 3, 2025 09:56
@martinpitt martinpitt requested a review from richm June 3, 2025 09:56
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @martinpitt - I've reviewed your changes - here's some feedback:

  • Consolidate the repeated tox-lsr installation steps across workflows into a shared template or composite action to reduce duplication and maintenance overhead.
  • Define sensible defaults for the __run_role and __bootc_validation variables in role defaults or a common vars file so tests have predictable behavior and avoid false negatives.
  • Add a clear TODO or expiration note around the ‘plucky’ Podman pinning hack, so it’s removed once Ubuntu 26.04 LTS is in place and the workaround is no longer needed.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Review instructions: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

changed_when: true

- name: Validation of deployed image
when: ansible_connection | d("") != "buildah"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions can ansible_connection be undefined? Because above you reference it without the | d() in postgresql_server_tuning: "{{ false if ansible_connection == 'buildah' else true }}" ?

Copy link
Contributor Author

@martinpitt martinpitt Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible_connection is undefined for QEMU runs, i.e. when not using an external connection plugin.

Without the | d() it fails like this:

TASK [Flush handlers] ******************************************************************************************************************
task path: /var/home/martin/upstream/lsr/postgresql/tests/tasks/install_and_check.yml:11
Tuesday 03 June 2025  18:48:02 +0200 (0:00:00.017)       0:00:00.082 ********** 
ERROR! The conditional check 'ansible_connection != "buildah"' failed. The error was: error while evaluating conditional (ansible_connection != "buildah"): 'ansible_connection' is undefined. 'ansible_connection' is undefined

The error appears to be in '/var/home/martin/upstream/lsr/postgresql/tests/tasks/install_and_check.yml': line 11, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

Apparently Ansible gets along with testing for equality == against an undefined value, but not with !=. However, that's confusing, and I applied the default everywhere now for consistency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible_connection is undefined for QEMU runs, i.e. when not using an external connection plugin.

That's really strange. I cannot reproduce this behavior. No matter what I try, I get "ansible_connection": "ssh" unless I explicitly specify another connection type.

Without the | d() it fails like this:

TASK [Flush handlers] ******************************************************************************************************************
task path: /var/home/martin/upstream/lsr/postgresql/tests/tasks/install_and_check.yml:11
Tuesday 03 June 2025  18:48:02 +0200 (0:00:00.017)       0:00:00.082 ********** 
ERROR! The conditional check 'ansible_connection != "buildah"' failed. The error was: error while evaluating conditional (ansible_connection != "buildah"): 'ansible_connection' is undefined. 'ansible_connection' is undefined

The error appears to be in '/var/home/martin/upstream/lsr/postgresql/tests/tasks/install_and_check.yml': line 11, column 7, but may
be elsewhere in the file depending on the exact syntax problem.

Can you provide your exact tox command line?

Apparently Ansible gets along with testing for equality == against an undefined value, but not with !=.

That is even stranger.

However, that's confusing, and I applied the default everywhere now for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prep (works):

LSR_CONTAINER_PROFILE=false LSR_CONTAINER_PRETTY=false tox -e container-ansible-core-2.16 -- --image-name centos-9-bootc tests/tests_bootc_e2e.yml

Validate:

tox -e qemu-ansible-core-2.16 -- --image-file `pwd`/tests/tmp/tests_bootc_e2e/qcow2/disk.qcow2 --log-level=debug -e __bootc_validation=true -- tests/tests_bootc_e2e.yml

Perhaps you forgot the -e __bootc_validation=true?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable reference is in a handler - code run in handlers is in a strange variable namespace - I have seen cases where vars defined in the playbook are not available in the handler - but this one doesn't make any sense - if I put a debug var: ansible_connection at the beginning of the playbook, it prints the value - I have no idea why it would be removed before the handler is run - I'll chalk this up to a bug in ansible that we have to workaround.

@martinpitt martinpitt force-pushed the qemu-bootc branch 2 times, most recently from 397646d to 7e74d99 Compare June 3, 2025 17:01
@martinpitt martinpitt requested a review from richm June 3, 2025 17:14
@martinpitt
Copy link
Contributor Author

The prep commits landed in the meantime in #127, so rebasing.

Cause: Detecting available RAM and basing the shared memory settings of
the server on that is futile and wrong during a container image build.
The available RAM during that build is independent of the actual
runtime, be that a bootc deployment or a system container.

Consequence: postgresql.service failed to start when the available RAM
during container build was higher than at deployment time. In the
opposite case, the shared memory tuning was too small and thus also
wrong.

Fix: Disable `postgresql_server_tuning` by default for container builds,
i.e. when being connected to a buildah instance.
Add a knob to tests/tasks/install_and_check.yml to skip the role run, as
we don't want to do this during validation.

Skip the memory tuning checks for bootc validation, as running the
role during a container build disables the automatic tuning.
@martinpitt martinpitt merged commit 18be022 into linux-system-roles:main Jun 4, 2025
18 checks passed
@martinpitt martinpitt deleted the qemu-bootc branch June 4, 2025 04:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants