feat(actor-template): per-container securityContext#73
Open
Davanum Srinivas (dims) wants to merge 4 commits into
Open
feat(actor-template): per-container securityContext#73Davanum Srinivas (dims) wants to merge 4 commits into
Davanum Srinivas (dims) wants to merge 4 commits into
Conversation
Add an opt-in `securityContext.capabilities.add` field on
`ActorTemplate.spec.containers[]`. Empty templates produce the same
OCI bundle as before — the default sandbox set
(`CAP_AUDIT_WRITE`, `CAP_KILL`, `CAP_NET_BIND_SERVICE`) still applies
unconditionally; the `add` list extends it on top.
The field plumbs through `ateletpb` to atelet's OCI bundle builder:
ActorTemplate.spec.containers[].securityContext.capabilities.add
→ ateletpb.Container.security_context.capabilities.add
→ resolveCapabilities() in cmd/atelet/oci.go
→ process.capabilities.{Bounding,Effective,Inheritable,Permitted}
`resolveCapabilities` normalises each entry to its `CAP_…` form so
templates may write either `NET_ADMIN` or `CAP_NET_ADMIN`, and
de-duplicates against the default set. The pause container always
uses the default set unmodified — it never carries the workload's
capabilities.
The motivating workload is NVIDIA OpenShell's `openshell-sandbox`
supervisor, which needs `CAP_NET_ADMIN`, `CAP_SETUID`, `CAP_SETGID`
to configure the actor's network and user namespaces before
launching the inner workload. A gVisor compatibility spike confirmed
runsc honours the OCI cap set exactly: granting `CAP_SETUID` and
`CAP_SETGID` unblocks `setresuid` inside the actor, while
`unshare(CLONE_NEWNET)` remains refused regardless of caps
(architectural refusal in the sentry, unrelated to capability bits).
Tests cover `resolveCapabilities` normalisation/dedup/blank-handling
and round-trip DeepCopy of `ContainerSecurityContext`.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Add `RunAsUser` and `RunAsGroup` (both `*int64`) to
`ContainerSecurityContext`, plumbed through `ateletpb.SecurityContext`
to atelet's OCI bundle builder. Unset fields preserve the existing
behaviour of `Process.User.{UID,GID} = 0` (root); a set value lands
directly in the OCI spec for that container.
The pause container always runs as root: `prepareOCIDirectory` is
called with `0, 0` from both the run and the restore paths. Only the
application container call sites read from
`ctr.GetSecurityContext().GetRunAs{User,Group}()`.
The proto fields are bare `int64` rather than `optional int64`. At the
proto boundary "unset" and "0" both mean root, and atelet's OCI bundle
builder collapses them into the same `Process.User` block, so the
extra nullability buys nothing on the wire. The CRD shape keeps
`*int64` so K8s users can express the usual "unset vs. explicit 0"
distinction in YAML even though the runtime ignores it.
This is the field that actually makes the actor *start* at a non-root
UID. `Capabilities.Add` alone (12b) only enables `setresuid` inside
the running process — useful for supervisors that drop privileges
mid-startup, but the entry point still runs as root until they do.
NVIDIA OpenShell's `prepare_filesystem` step requires `CAP_CHOWN` plus
this field together to chown the workload's `read_write` paths and
hand the namespace over to a non-root supervisor.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
CI's `./hack/verify/gofmt.sh` rejected the single-space alignment around the `nil,` / `0, 0,` comments at the pause-container call sites. gofmt wants the comment column aligned across the `nil,` and `0, 0,` lines (the longer prefix wins), which means two spaces after `nil,`. No behaviour change; whitespace only. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
The `SecurityContext` field godoc on `Container` named a specific downstream consumer as the motivating workload. Substrate's public surface should describe the field in generic terms: any workload that sets up its own network or user namespaces — for example a privileged supervisor that hands off to a less-privileged inner process — may need additional capabilities beyond the default sandbox set. Also retitle the `TestContainerSecurityContextDeepCopy` fixture container from a workload-specific name to a generic `app` / `registry.example/app:test`. Regenerated `manifests/ate-install/generated/ate.dev_actortemplates.yaml` to flush the godoc change through to the CRD description. Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Davanum Srinivas (dims)
added a commit
to dims/openshell-driver-substrate
that referenced
this pull request
May 24, 2026
…text) The substrate-side PR #73 — per-container `securityContext` on `ActorTemplate.spec.containers[]` with both `capabilities.add` and `runAsUser` / `runAsGroup` — is the field that lets this driver's `synthesize_template` start emitting capability adds and a non-root supervisor start UID once it merges. Empty templates produce the same OCI bundle as before; opt-in per container. Surface the PR in three places: the top-of-doc header in poc-intro (alongside #66 and #67), the §3 "Companion changes" component table, and the §9 "Where to next" item 8 that was previously an open TODO about capability plumbing. Also tidy the embedded `~/notes/...` references in poc-intro: the local agent-substrate notes (kind-local-dev runbook, Shorewall recipe) moved from `~/notes/` to `~/notes/agent-substrate/` to mirror the existing `~/notes/openshell-on-substrate/` layout. Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an opt-in
securityContextblock onActorTemplate.spec.containers[], plumbed throughateletpbto atelet's OCI bundle builder. Templates that omit it produce the same OCI bundle as before.Two fields are exposed:
capabilities.add— Linux capabilities to grant on top of the default sandbox set (CAP_AUDIT_WRITE,CAP_KILL,CAP_NET_BIND_SERVICE). Entries may be written with or without theCAP_prefix; case is normalised; duplicates collapse against the defaults.runAsUser/runAsGroup— the UID and GID to start the container process as. Unset preserves atelet's existing default of root.The motivating workload is NVIDIA OpenShell's
openshell-sandboxsupervisor, which needsCAP_NET_ADMIN,CAP_SETUID,CAP_SETGIDto configure the actor's network and user namespaces, and a non-root start UID for the supervisor process itself. Capabilities alone are not enough — the entry point still runs as root until something drops privileges.Test plan:
go vetclean on touched packagesresolveCapabilities: defaults, prefix normalisation, case folding, dedup, blank-entry skipContainerSecurityContextDeepCopy round-trip with pointer-isolation assertions forCapabilities.AddandRunAsUsercmd/ateapi/internal/controlapiworkflow tests pass with the new copy block in resume + suspend