Skip to content

fix(container): use lstrip("/") in inode path join and try cgroup path first#950

Open
vlad-scherbich wants to merge 2 commits into
masterfrom
vlad/fix-cgroup-container-id-inode-path
Open

fix(container): use lstrip("/") in inode path join and try cgroup path first#950
vlad-scherbich wants to merge 2 commits into
masterfrom
vlad/fix-cgroup-container-id-inode-path

Conversation

@vlad-scherbich

Copy link
Copy Markdown

Summary

Two bugs in Cgroup container ID resolution in datadog/dogstatsd/container.py:

Bug 1 — _get_cgroup_from_inode: absolute cgroup node path drops mount prefix

os.path.join silently discards all preceding components when it encounters an
absolute path segment. The previous guard only handled the root case (!= "/"),
so any non-root absolute node path (e.g. "/system.slice/docker-deadbeef.scope")
caused the cgroup mount prefix to be lost:

# Before — wrong for any non-root absolute path:
os.path.join("/sys/fs/cgroup", "", "/system.slice/foo") == "/system.slice/foo"

# After — correct:
os.path.join("/sys/fs/cgroup", "", "system.slice/foo") == "/sys/fs/cgroup/system.slice/foo"

Fix: strip the leading slash with lstrip("/") before joining.

Bug 2 — __init__: cgroup path skipped when not in host cgroup namespace

The original logic used _is_host_cgroup_namespace() as a gate: if False, it went
straight to the inode fallback and never tried to parse the cgroup path. Containers
with a private cgroup namespace but a still-parseable cgroup v1 path (a valid
configuration) were silently routed to inode lookup even though a container ID was
available in the path.

Fix: always attempt _read_cgroup_path first; only fall back to _get_cgroup_from_inode
when the path yields None and the process is not in the host cgroup namespace.

Bonus: defensive try/except in _get_cgroup_from_inode

Wrap the inode lookup in a try/except so transient OS errors (e.g. missing cgroup
mount) surface as None rather than a propagating exception — consistent with the
defensive style already used in _read_cgroup_path.

Tests

  • All 6 pre-existing parametrized cases pass unchanged.
  • test_container_id_inode updated to expect two open calls (the new _read_cgroup_path call + the inode path call) and now asserts call_count == 2.
  • test_container_id_inode_absolute_node_path (new): verifies the lstrip fix by exercising a cgroup v2 entry with a non-root absolute node path.
  • test_container_id_cgroup_path_takes_priority_over_inode (new): verifies Bug 2 is fixed — cgroup path wins over inode when the process is not in the host namespace.

Context

These bugs were discovered while backporting cgroup v2 origin-detection support into
dd-trace-py, where container.py is
vendored. The fixes were validated there first and are contributed back upstream.

…h first

Two bugs in Cgroup container ID resolution:

1. _get_cgroup_from_inode used `!= "/"` to guard against the root case,
   but os.path.join silently drops all preceding components when it sees an
   absolute path segment.  Any non-root absolute node path (e.g.
   "/system.slice/docker-deadbeef.scope") caused the cgroup mount prefix
   to be discarded, so os.stat was called on the wrong path.
   Fix: strip the leading slash with lstrip("/") before joining.

2. __init__ skipped _read_cgroup_path entirely when the process was not in
   the host cgroup namespace.  Containers with a private cgroup namespace
   but a still-parseable cgroup v1 path (uncommon but valid) were
   silently routed to the inode fallback.
   Fix: always attempt _read_cgroup_path first; only fall back to the
   inode approach when the path yields nothing AND the process is not in
   the host cgroup namespace.

Also wrap _get_cgroup_from_inode in a try/except so transient OS errors
(e.g. a missing cgroup mount) surface as None rather than a propagating
exception, consistent with the defensive style already used in
_read_cgroup_path.
@vlad-scherbich vlad-scherbich requested review from a team as code owners June 2, 2026 19:45
@datadog-prod-us1-6

datadog-prod-us1-6 Bot commented Jun 2, 2026

Copy link
Copy Markdown

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 3 Pipeline jobs failed

test | Python 2.7 (legacy) on Linux   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). NameError: global name 'PermissionError' is not defined in test_container.py:297.

test | Python pypy2.7 on Linux   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). NameError: global name 'PermissionError' is not defined in tests/unit/dogstatsd/test_container.py:297

Ensure labels | changelog   View in Datadog   GitHub Actions

🛟 This job is unlikely to succeed on retry. Please review your pipeline configuration. Missing required label 'changelog/*' for the changelog job.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: b17e906 | Docs | Datadog PR Page | Give us feedback!

…r, and inode fallback

- Parametrized: add K8s systemd slice/scope hierarchy (docker-{id}.scope leaf)
- Parametrized: non-containerized Linux case already covered
- _is_host_cgroup_namespace: add 4 unit tests (missing file, matching inode,
  non-matching inode, exception swallowed)
- _get_cgroup_from_inode: add test for cgroup v1 memory inode=0 falling
  through to the cgroup v2 (CGROUPV2_BASE_CONTROLLER="") entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant