Skip to content

bug: Podman driver fails on macOS with stale Podman machine — host-gateway unresolvable #1307

@arewm

Description

@arewm

Agent Diagnostic

Skills loaded: openshell-cli, create-github-issue.

Investigation method: Reproduced the failure with openshell sandbox create on macOS. Traced through the Podman driver, compared with Docker and VM driver approaches, validated gvproxy DNS behavior on both old and new VM images.

Reproduction confirmed on macOS (Apple Silicon) with Podman 5.8.2 client and a Podman machine created on Podman 5.5 (Fedora CoreOS 41, containers-common 0.63.1):

$ openshell sandbox create --no-keep --name repro-test
Error: create sandbox failed: podman API error (500):
  failed to create new hosts file: unable to replace "host-gateway" of
  host entry "host.containers.internal:host-gateway": host containers
  internal IP address is empty

Root cause: The Podman driver unconditionally uses the host-gateway magic keyword in container hostadd entries (crates/openshell-driver-podman/src/container.rs:508-511). Resolution of host-gateway was broken on Podman machines with containers-common < 0.64.0 and fixed upstream in containers/common#2464. However, Podman machines are stream-pinned — a machine created on Podman 5.5 stays on the machine-os:5.5 image stream and never receives the fix, even if the client is upgraded. The only upgrade path is podman machine reset && podman machine init, which destroys the VM.

Key finding: gvproxy DNS resolution of host.containers.internal works on both old and new VMs — it has been available since Podman PR #11649 (late 2021). The hostadd injection of host.containers.internal via host-gateway is redundant on Podman machine; gvproxy already resolves it to 192.168.127.254.

Validated on old VM (containers-common 0.63.1):

Test Result
host-gateway in hostadd Fails (500 error)
host.containers.internal via gvproxy DNS (no hostadd) Works → 192.168.127.254
host.openshell.internal via DNS (no hostadd) Fails (gvproxy doesn't know it)

Validated on new VM (containers-common 0.67.0):

Test Result
host-gateway in hostadd Works → 192.168.127.254
host.containers.internal via gvproxy DNS (no hostadd) Works → 192.168.127.254

Description

Actual behavior: openshell sandbox create fails with a Podman API 500 error on macOS when the Podman machine was created on Podman <= 5.5.

Expected behavior: Sandbox creation should succeed regardless of Podman machine age.

Reproduction Steps

  1. Have a macOS system (Apple Silicon) with Podman installed via Homebrew
  2. Have a Podman machine created on Podman <= 5.5 (or any machine with containers-common < 0.64.0)
  3. Start a Podman-backed gateway: mise run gateway
  4. Run openshell sandbox create
  5. Container creation fails with the 500 error

Note: a fresh machine (podman machine init on Podman >= 5.6) does not exhibit this issue.

Environment

  • OS: macOS (Apple Silicon, Apple HV virtualization)
  • Podman client: 5.8.2 (Homebrew)
  • Podman server (VM): 5.5.2
  • containers-common: 0.63.1 (pre-fix; needs >= 0.64.0)
  • Fedora CoreOS: 41.20250215.3.0 (9 months old at time of failure)
  • VM stream: machine-os:5.5 (pinned, does not cross-upgrade)

Logs

Error:   × status: Internal, message: "create sandbox failed: podman API error (500):
  │ failed to create new hosts file: unable to replace "host-gateway" of
  │ host entry "host.containers.internal:host-gateway": host containers
  │ internal IP address is empty"

Proposed Fix

Drop the dependency on host-gateway for host.containers.internal on Podman machine. gvproxy DNS already resolves host.containers.internal to 192.168.127.254 on all Podman machine versions — the hostadd injection is redundant and broken on stale VMs.

For host.openshell.internal (the driver-neutral alias used by policies and e2e tests), gvproxy does not provide DNS resolution. Options:

  1. Inject host.openshell.internal with the explicit gvproxy IP (192.168.127.254 on macOS) instead of host-gateway
  2. Drop host.openshell.internal and standardize policies/tests on host.containers.internal
  3. Add better error detection — if host-gateway fails, surface an actionable error telling users to recreate their Podman machine

Additionally, the error message from Podman is opaque. Regardless of the fix, the driver should catch this specific 500 and provide a clear diagnostic.

Context

  • The Docker driver handles the same VM-vs-native split via DockerGatewayRoute enum (HostGateway for Docker Desktop, Bridge for native Linux)
  • The VM driver hardcodes 192.168.127.254 (GVPROXY_HOST_LOOPBACK_IP) — the same gvproxy NAT IP
  • The bridge gateway IP (e.g., 10.89.1.1) does NOT reach the macOS host — it's an address inside the Linux VM. A fallback to the bridge gateway IP would silently break supervisor connectivity.
  • Our docs list "Podman 5.x" as the minimum but don't mention containers-common versions, Podman machine stream-pinning, or macOS-specific setup

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions