Skip to content

cleanup: use a cache for JWT/CA instead of reading from disk every time#17

Open
Yuval Kohavi (yuval-k) wants to merge 2 commits into
agent-substrate:mainfrom
yuval-k:yuval-k/cache-jwt
Open

cleanup: use a cache for JWT/CA instead of reading from disk every time#17
Yuval Kohavi (yuval-k) wants to merge 2 commits into
agent-substrate:mainfrom
yuval-k:yuval-k/cache-jwt

Conversation

@yuval-k
Copy link
Copy Markdown

Small PR to fix re-reading the JWT keys and CAs every-time.

For sessionIDJWTPoolFile, sessionIDCAPoolFile check modification time, and if not changed, use cached value.

  • Tests pass

For sessionIDJWTPoolFile, sessionIDCAPoolFile check modification time, and if not changed, use cached value.
@ahmedtd
Copy link
Copy Markdown
Collaborator

This looks good, but actually I think we probably don't even need to check mtimes --- it's probably sufficient to just check if the keys have changed every 5 minutes or so. Root key rotations are not part of the normal operation of the system.

Shashank Barsin (shashankbarsin) added a commit to shashankbarsin/substrate that referenced this pull request May 24, 2026
…mplate

Implements decisions 1 and 2 of ADR-0010 (planning repo f7d7e6b).

Type changes:

  * WorkerPoolSpec.Backend (new): WorkerPoolBackend enum
    {gvisor, dragonball}, optional, kubebuilder default "gvisor".
    Lets the WorkerPool controller branch worker-Pod shaping
    decisions (device mounts, capabilities) without inferring from
    the AteomImage tag. Backward-compatible: existing WorkerPools
    parse as backend=gvisor.

  * ActorTemplateSpec.Runsc: required → optional pointer
    (*RunscConfig). Callers in cmd/ateapi/internal/controlapi/
    (workflow_resume.go, workflow_suspend.go) now nil-guard the
    field at the outer level.

  * ActorTemplateSpec.Dragonball (new): *DragonballConfig sibling to
    Runsc. Carries Kernel + Rootfs (ArtifactRef), VCPUs, MemoryMiB.

  * ArtifactRef (new): shared URL + SHA256Hash shape. Reusable for
    any fetch-and-verify artifact (kernels, rootfs, future backend
    blobs).

  * ActorTemplateSpec CEL XValidation: exactly-one-of runsc|dragonball
    enforced at the API server (rule: has(self.runsc) != has(self.dragonball)).
    Cross-resource validation (ActorTemplate.backend ↔
    WorkerPool.backend) deferred to an admission webhook follow-up.

Generated outputs regenerated via `go generate ./pkg/api/v1alpha1/`:
zz_generated.deepcopy.go, manifests/ate-install/generated/
ate.dev_{actortemplates,workerpools}.yaml.

This is commit 1 of 3 for ADR-0010 implementation (issue agent-substrate#17 in
planning repo). Commits 2 (controller branching for /dev/kvm) and
3 (proto + dispatch) follow.

Tests: ./pkg/api/v1alpha1 passes. Pre-existing darwin-only netlink
build error in cmd/ateom-gvisor is unrelated and unchanged.
Shashank Barsin (shashankbarsin) added a commit to shashankbarsin/substrate that referenced this pull request May 24, 2026
Implements decision 3 of ADR-0010 (planning repo f7d7e6b).

createActorDeploymentSpec gains a WorkerPoolBackend parameter. For
backend=dragonball, the worker Pod template now also gets:

  * a /dev/kvm HostPath (HostPathCharDev) volume + volumeMount, and
  * an explicit NET_ADMIN capability on the ateom container.

privileged=true is preserved for both backends, so NET_ADMIN is
already granted; the explicit capability documents the dependency at
the spec level and keeps the deployment working if privileged is
later dropped (out of scope for this commit).

backend=gvisor (and the empty zero value, for backward-compat with
WorkerPools authored before ADR-0010) is unchanged.

Both callsites in workerpool_controller.go now pass wp.Spec.Backend.

This is commit 2 of 3 for ADR-0010 implementation (issue agent-substrate#17 in
planning repo). Commit 3 (proto bytes backend_payload + atelet
dispatch + ateom-dragonball decode) follows.
Shashank Barsin (shashankbarsin) added a commit to shashankbarsin/substrate that referenced this pull request May 24, 2026
Per ADR-0010 (planning repo), the ateom gRPC contract carries a typed
`bytes backend_payload` (tag 10) on RunWorkloadRequest,
CheckpointWorkloadRequest, and RestoreWorkloadRequest. Atelet
populates this with a serialized backend-specific payload; ateom
implementations decode according to their backend.

For the Dragonball backend, the payload shape is defined by the new
`internal/proto/dragonballpb/dragonball.proto`
(`DragonballBackendPayload{kernel_path, rootfs_path, vcpus, memory_mib}`).

The gVisor backend leaves backend_payload empty and continues to read
`runsc_path` as before — no behavior change for the existing path.

ateom-dragonball (Rust) builds its tonic stubs from
internal/proto/ateompb/ateom.proto via build.rs, so it picks up the
new field on next cargo build with no manual regen.

Refs: agent-substrate-planning ADR-0010, issue agent-substrate#17 (I-011).
Shashank Barsin (shashankbarsin) added a commit to shashankbarsin/substrate that referenced this pull request May 24, 2026
Implements the atelet half of ADR-0010's backend-agnostic dispatch.

ateletpb (regenerated):
- New `ArtifactRef{url, sha256_hash}` and `DragonballConfig{kernel, rootfs, vcpus, memory_mib}` messages.
- `DragonballConfig dragonball = 9` added to RunRequest, CheckpointRequest, RestoreRequest. The gVisor fields (`runsc`) are unchanged.

ateapi (workflow_resume, workflow_suspend):
- Build `dragonballCfg` from `state.ActorTemplate.Spec.Dragonball` when set; attach to all three outgoing requests. No behavior change for the gVisor path.

atelet:
- New `internal/ateompath` helpers `DragonballKernelPath` / `DragonballRootfsPath` (content-addressed under `StaticFilesDir`).
- New `cmd/atelet/dragonball.go` with:
  - `fetchArtifact`: generic `ArtifactRef` → cached local file with SHA-256 verify (mirrors `fetchRunsc`'s pattern; anonymous GCS for now).
  - `stageDragonballArtifacts`: parallel fetch of kernel + rootfs via errgroup.
  - `marshalDragonballPayload`: builds `dragonballpb.DragonballBackendPayload` bytes.
  - `runDragonball` / `checkpointDragonball` / `restoreDragonball`: backend-specific handlers that skip gVisor-only OCI prep, runsc fetch, and gVisor checkpoint file marshaling. They populate `ateompb.*.BackendPayload` and leave `RunscPath` empty.
- `Run` / `Checkpoint` / `Restore` in main.go gain a top-of-function `if req.GetDragonball() != nil { return s.{run,checkpoint,restore}Dragonball(...) }` early-return. No edits below that branch.

Test fixup:
- `functional_test.go`: existing actor-template fixture updated to use `Runsc: &RunscConfig{...}` (ADR-0010 made the field a pointer to allow Dragonball as a sibling).

Refs: agent-substrate-planning ADR-0010, issue agent-substrate#17 (I-011).
Shashank Barsin (shashankbarsin) added a commit to shashankbarsin/substrate that referenced this pull request May 24, 2026
Closes the Rust half of ADR-0010's backend-agnostic dispatch.

- `build.rs` now also compiles `internal/proto/dragonballpb/dragonball.proto`
  alongside `ateom.proto`. tonic-build emits both packages.
- `service.rs` gains a `dragonball_proto` module (`tonic::include_proto!("dragonball")`)
  and two helpers:
  - `decode_payload(bytes)` — returns `None` for empty bytes (legacy /
    gVisor-style caller), `Some(DragonballBackendPayload)` otherwise.
    Bad bytes yield `Status::invalid_argument`.
  - `apply_payload(&mut BootConfig, &payload)` — non-zero fields on the
    payload override the existing config; zero/empty fields fall through
    to the CLI-provided defaults. This lets atelet send a partial payload
    without clobbering defaults.
- `run_workload` and `restore_workload` now decode the payload and apply
  it on top of `default_boot_config` before handing it to the VMM.
  `checkpoint_workload` does not consult the payload — the VM is already
  running and the kernel/rootfs paths are not needed at checkpoint time.
- `rootfs_path` is decoded but currently only logged with a `warn!`; the
  virtio-blk rootfs wiring is not yet present on the `BootConfig`/`Vmm`
  surface and will land in a follow-up.
- Test fixtures updated for the new `backend_payload` field on the three
  ateompb request structs.

Refs: agent-substrate-planning ADR-0010, issue agent-substrate#17 (I-011).
Local cargo build will be verified on substrate-poc-1 (no Rust toolchain on Mac).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants