Skip to content

feat: add ECS and AgentCore container build/package/deploy support#9013

Open
singledigit wants to merge 24 commits into
aws:developfrom
singledigit:feat/ecs-agentcore-container-builds-v2
Open

feat: add ECS and AgentCore container build/package/deploy support#9013
singledigit wants to merge 24 commits into
aws:developfrom
singledigit:feat/ecs-agentcore-container-builds-v2

Conversation

@singledigit

Copy link
Copy Markdown

Which issue(s) does this change fix?

Fixes #8933

Why is this change necessary?

SAM CLI provides an excellent developer experience for Lambda Image functions (sam build && sam deploy), but users deploying containerized workloads to ECS (Fargate) or Bedrock AgentCore must manage their Docker build/push/deploy pipeline separately — even when these resources live in the same CloudFormation template. This creates a fragmented workflow requiring external tooling for an identical operation: build image → push to ECR → deploy.

How does it address the issue?

Extends the existing Lambda Image build pipeline to recognize AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources with a Metadata block containing Dockerfile and DockerContext. No new commands — sam build, sam package, sam deploy, and sam sync gain awareness of these resource types.

Template example:

Resources:
  MyAgent:
    Type: AWS::BedrockAgentCore::Runtime
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./agent
      Architecture: arm64
    Properties:
      AgentRuntimeArtifact:
        ContainerConfiguration:
          ContainerUri: placeholder

  MyTask:
    Type: AWS::ECS::TaskDefinition
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./app
      ContainerName: web
    Properties:
      ContainerDefinitions:
        - Name: web
          Image: placeholder

Key implementation details:

  • Reuses _build_lambda_image() — same Docker build logic, buildkit support included
  • --resolve-image-repos auto-creates ECR repos via companion stack
  • ContainerName metadata targets specific containers in multi-container TaskDefinitions
  • Architecture metadata sets --platform (e.g., arm64 for AgentCore)
  • ARTIFACT_TYPE = ZIP to pass the PackageType filter (these resources don't have PackageType)
  • No SAM Transform changes needed — uses native CloudFormation resource types

Design document: designs/container_image_builds_ecs_agentcore.md

What side effects does this change have?

  • sam build logs "Found N container service resource(s) to build" when applicable resources are present. No behavior change for templates without these resources.
  • --resolve-image-repos creates ECR repos for ECS/AgentCore in addition to Lambda Image functions.
  • _update_built_resource adds an optional metadata parameter (backward compatible, defaults to None).

Mandatory Checklist

PRs will only be reviewed after checklist is complete

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@singledigit singledigit requested a review from a team as a code owner May 15, 2026 07:18
@github-actions github-actions Bot added area/build sam build command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels May 15, 2026

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..052400d
Files: 19
Comments: 3

if container_def.get("Name") == target_name:
container_def["Image"] = path
break
else:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] The else on line 429 binds to the if target_name: on line 424, not to the for loop on line 425. As a result, when ContainerName is set in Metadata but no container in ContainerDefinitions matches that name (e.g. typo, renamed container), no Image field is updated on any container — silently. The image is built and the ECR repo is created, but the post-build template the user inspects/deploys still has the placeholder. The fallback to the first container only fires when ContainerName is unset.

Either match the for/else semantics so a missing target falls back to the first container, or — better — surface a clear error so the user knows their ContainerName doesn't resolve. Suggested fix:

if target_name:
   for container_def in container_defs:
       if container_def.get("Name") == target_name:
           container_def["Image"] = path
           break
   else:
       raise ValueError(
           f"Metadata.ContainerName '{target_name}' does not match any "
           f"container in ContainerDefinitions"
       )
else:
   container_defs[0]["Image"] = path

The new tests in test_container_build_integration.py only cover the matching and no-name cases, so this regression isn't caught.

for i, cd in enumerate(container_defs):
if cd.get("Name") == target_name:
return i
return 0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] _get_target_index returns 0 (the first container) when Metadata.ContainerName is set but doesn't match any container in ContainerDefinitions. This means sam package/sam deploy will silently rewrite the Image of the wrong container — typically a sidecar like public.ecr.aws/envoy:latest in your own integration test template — replacing it with the freshly pushed ECR URI. The user's intended container keeps its placeholder.

Combined with the matching bug in app_builder._update_built_resource, a typo in ContainerName can cause a silent, incorrect deployment rather than a build/package failure.

Treat the no-match case explicitly. For example:

def gettarget_index(self, container_defs):
   metadata = getattr(self, "resource_metadata", None) or {}
   target_name = metadata.get("ContainerName")
   if target_name:
       for i, cd in enumerate(container_defs):
           if cd.get("Name") == target_name:
               return i
       raise exceptions.ExportFailedError(
           resource_id=...,  # plumb through
           property_name=self.PROPERTY_NAME,
           property_value=target_name,
           ex=ValueError(
               f"Metadata.ContainerName '{target_name}' does not match any "
               f"container in ContainerDefinitions"
           ),
       )
   return 0


if family:
# Paginate through all clusters
cluster_paginator = ecs_client.get_paginator("list_clusters")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PERFORMANCE] _force_ecs_deployment paginates every cluster in the account and every service in every cluster, then calls describe_services in batches of 10 to find services whose taskDefinition family matches. In accounts with many ECS clusters/services this fan-out is O(clusters × services / 10) describe_services calls on every sam sync iteration — slow, rate-limit-prone, and requires ecs:ListClusters/ecs:ListServices across the entire account.

Since the user already supplied the task definition (and SAM owns the CloudFormation stack), the cluster/service references are typically discoverable from the template itself (e.g., sibling AWS::ECS::Service resources referencing this task definition via !Ref) or from the deployed stack resources. Consider scoping the search to services declared in or deployed by the current stack rather than scanning the whole account.

@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 052400d to 825128f Compare May 15, 2026 09:23

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..825128f
Files: 19
Comments: 3

Comment thread samcli/lib/build/app_builder.py Outdated
"Metadata.ContainerName '%s' does not match any container in ContainerDefinitions",
target_name,
)
container_defs[0]["Image"] = path

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When Metadata.ContainerName is set but does not match any container in ContainerDefinitions (typo, renamed container, copy/paste from another task definition), the for…else fall-through warns and then silently writes the built image to container_defs[0]["Image"]. In a multi-container task definition where index 0 is a sidecar (e.g. public.ecr.aws/envoy:latest, as in the integration test fixture in tests/integration/testdata/buildcmd/container_image/template.yaml), the sidecar's image reference is silently replaced with the freshly built ECR URI while the user's intended container retains placeholder. The resulting post-build template deploys a broken task definition that the user only discovers at deploy/runtime.

A LOG.warning is not sufficient — users routinely miss warnings and the failure mode is data corruption of the template. Either fail fast, or skip the update entirely so the user is forced to address the mismatch:

if resource_type == AWS_ECS_TASK_DEFINITION:
   container_defs = resource_properties.get("ContainerDefinitions", [])
   if not container_defs:
       return
   target_name = metadata.get("ContainerName") if metadata else None
   if target_name:
       for container_def in container_defs:
           if container_def.get("Name") == target_name:
               container_def["Image"] = path
               return
       raise DockerBuildFailed(
           f"Metadata.ContainerName '{target_name}' does not match any container "
           f"in ContainerDefinitions. Update Metadata.ContainerName or add a matching "
           f"container."
       )
   if len(container_defs) > 1:
       raise DockerBuildFailed(
           "ContainerDefinitions has multiple containers but Metadata.ContainerName "
           "is not set. Specify Metadata.ContainerName to disambiguate."
       )
   container_defs[0]["Image"] = path

This was raised in the previous review and has not been addressed or dismissed — re-raising.

"Metadata.ContainerName '%s' does not match any container in ContainerDefinitions, defaulting to index 0",
target_name,
)
return 0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] _get_target_index has the exact same silent-fallback problem as app_builder._update_built_resource: when Metadata.ContainerName is set but no container matches, it logs a warning and returns 0. sam package / sam deploy then rewrite the Image of ContainerDefinitions[0] — which is typically a sidecar in multi-container task definitions — replacing it with the pushed ECR URI. The user's intended container keeps its placeholder/non-ECR value and the deploy proceeds with a broken task definition.

Two reasons to fix this regardless of whether app_builder already validates: (1) the package code path is exercised independently of build (e.g. sam package on an already-built template, or templates produced outside SAM), and (2) the delete path (line 748) uses the same _get_target_index, so sam delete can attempt to delete the wrong ECR artifact.

Raise an error rather than silently defaulting:

def gettarget_index(self, container_defs):
   metadata = getattr(self, "resource_metadata", None) or {}
   target_name = metadata.get("ContainerName")
   if target_name:
       for i, cd in enumerate(container_defs):
           if cd.get("Name") == target_name:
               return i
       raise exceptions.ExportFailedError(
           resource_id=getattr(self, "_resource_id", ""),
           property_name=self.PROPERTY_NAME,
           property_value=target_name,
           ex=ValueError(
               f"Metadata.ContainerName '{target_name}' does not match any "
               f"container in ContainerDefinitions"
           ),
       )
   if len(container_defs) > 1:
       raise exceptions.ExportFailedError(
           resource_id=getattr(self, "_resource_id", ""),
           property_name=self.PROPERTY_NAME,
           property_value=None,
           ex=ValueError(
               "ContainerDefinitions has multiple containers but "
               "Metadata.ContainerName is not set"
           ),
       )
   return 0

This was raised in the previous review and has not been addressed or dismissed — re-raising.


if family:
# Paginate through all clusters
cluster_paginator = ecs_client.get_paginator("list_clusters")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PERFORMANCE] _force_ecs_deployment performs an unbounded scan of the entire account on every sam sync iteration: it paginates list_clusters, then list_services for every cluster, then batches describe_services in chunks of 10 — only to filter results client-side by taskDefinition family. In an account with N clusters and M services per cluster this is O(N + N·M/10) API calls per task-definition sync, repeated on every file-watch iteration of sam sync --watch. This is slow, easy to throttle (describe_services has aggressive rate limits in shared accounts), and forces the deployer's IAM principal to hold ecs:ListClusters and ecs:ListServices across the whole account — privileges that are typically far broader than what's needed to deploy a single task definition.

The user already provides the stack name (self._deploy_context.stack_name). The supported design is to discover services in the same CloudFormation stack rather than scanning the account: enumerate AWS::ECS::Service resources from the stack (you already load self._stacks), resolve their physical IDs from self._physical_id_mapping, and call update_service directly — or describe StackResources to find services pointing at this task definition family. This is O(services-in-stack) and uses only the IAM permissions the user already needs to deploy the stack.

If a wider scan is required for some edge case, gate it behind an opt-in flag rather than making it the default path on every sync.

This was raised in the previous review and has not been addressed or dismissed — re-raising.

@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 825128f to 03fd9e6 Compare May 15, 2026 09:42

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..03fd9e6
Files: 19
Comments: 2

Comment thread samcli/lib/build/app_builder.py Outdated
"Metadata.ContainerName '%s' does not match any container in ContainerDefinitions",
target_name,
)
container_defs[0]["Image"] = path

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When Metadata.ContainerName is set but no container in ContainerDefinitions has that Name (typo, renamed container, copy/paste from another task definition), the for…else falls through, logs a warning, and silently writes the freshly built image to container_defs[0]["Image"]. In a multi-container task definition where index 0 is a sidecar, this overwrites the sidecar's image with the user's app image — exactly the layout used by the integration test fixture added in this PR (tests/integration/testdata/buildcmd/container_image/template.yaml):

ContainerDefinitions:
  - Name: sidecar
   Image: public.ecr.aws/envoy:latest
  - Name: web              # if user mistypes ContainerName: 'wbe', envoy gets overwritten
   Image: placeholder

The post-build template the user inspects/deploys still has the placeholder on the intended container, while the unrelated sidecar entry now points at their freshly pushed ECR image — a hard-to-diagnose deploy-time failure. A warning in the build log is easy to miss in CI output.

This was raised on the previous review and not addressed. Fail loudly when the explicit ContainerName doesn't match — raise or skip the update — rather than guessing index 0:

if target_name:
   for container_def in container_defs:
       if container_def.get("Name") == target_name:
           container_def["Image"] = path
           break
   else:
       raise InvalidResourceException(
           resource_name,
           f"Metadata.ContainerName '{target_name}' does not match any "
           f"container in ContainerDefinitions",
       )
else:
   container_defs[0]["Image"] = path

"Metadata.ContainerName '%s' does not match any container in ContainerDefinitions, defaulting to index 0",
target_name,
)
return 0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] ECSTaskDefinitionImageResource._get_target_index has the same silent-fallback problem: when Metadata.ContainerName is set but doesn't match any Name in ContainerDefinitions, it logs a warning and returns 0. sam package / sam deploy then rewrite ContainerDefinitions[0]["Image"] — typically a sidecar in multi-container task definitions — replacing it with the pushed ECR URI. The user's intended container keeps its placeholder, and the sidecar starts running the wrong image at deploy time.

This is the package-side counterpart of the build-side bug above and compounds it: even if the build path is fixed, package will still corrupt the template here. It was raised on the previous review and not addressed.

Either return a sentinel and let export skip with a hard error, or raise directly. Returning index 0 on a typo is the most surprising possible behavior for a multi-container resource:

def gettarget_index(self, container_defs):
   metadata = getattr(self, "resource_metadata", None) or {}
   target_name = metadata.get("ContainerName")
   if target_name:
       for i, cd in enumerate(container_defs):
           if cd.get("Name") == target_name:
               return i
       raise exceptions.ExportFailedError(
           resource_id=...,
           property_name=self.PROPERTY_NAME,
           property_value=target_name,
           ex=ValueError(
               f"Metadata.ContainerName '{target_name}' does not match "
               f"any container in ContainerDefinitions"
           ),
       )
   return 0

@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 03fd9e6 to e0e5b2a Compare May 15, 2026 09:45

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..e0e5b2a
Files: 19
Comments: 1

container_defs = resource_dict.get("ContainerDefinitions", [])
if not container_defs:
return
target_idx = self._get_target_index(container_defs)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] ECSTaskDefinitionImageResource.delete() calls _get_target_index(container_defs), which reads self.resource_metadata via getattr(self, "resource_metadata", None) or {}. However, resource_metadata is only set on the exporter instance during the export flow (artifact_exporter.py:495) — the delete flow at artifact_exporter.py:522 instantiates the exporter and calls exporter.delete(...) without ever assigning resource_metadata.

As a result, during sam delete, _get_target_index always falls through to return 0, even when Metadata.ContainerName is set. In the multi-container TaskDefinition layout used by the integration test fixture in this PR (tests/integration/testdata/buildcmd/container_image/template.yaml) — where index 0 is a sidecar (public.ecr.aws/envoy:latest) and the user's container is web at index 1 — sam delete will inspect the sidecar's Image instead of the user's container. With a public image at index 0, the deletion silently no-ops (the user's pushed ECR image is leaked), and if index 0 happened to be a private ECR image, the wrong artifact would be deleted.

This is the same class of silent-fallback bug the export path was just fixed for. Either:

  • set exporter.resource_metadata = resource.get("Metadata") on the delete path in artifact_exporter.py before calling exporter.delete(...), mirroring the export path; or
  • have the delete code path resolve ContainerName from the resource_dict directly rather than relying on instance state set elsewhere.
# samcli/lib/package/artifact_exporter.py — delete() loop
exporter = exporter_class(self.uploaders, None)
exporter.resource_metadata = resource.get("Metadata")  # add this
exporter.delete(resource_id, resource_dict)

@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from e0e5b2a to a29380d Compare May 15, 2026 10:39
Extends SAM CLI to build, package, and deploy container images for
AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources
using the same Metadata convention as Lambda Image functions.

Fixes aws#8933
@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from a29380d to 2eeb2b3 Compare May 17, 2026 14:32
@reedham-aws

Copy link
Copy Markdown
Contributor

Hello, could you review the failures in the tests?

…lure

- Extract magic number 3 to _ECS_SERVICE_ARN_PARTS constant
- Switch test Dockerfile to public.ecr.aws alpine (multi-arch, no rate limits)
Windows CI runners use Windows containers and cannot pull Linux images.
This matches the pattern used by existing Lambda image build tests.

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..ec83c23
Files: 19
Comments: 1

for svc in svc_response.get("services", []):
svc_task_def = svc.get("taskDefinition", "")
# Check if this service references our task definition family
if physical_id in svc_task_def or (

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] The substring check physical_id in svc_task_def can produce false positives that trigger forceNewDeployment on the wrong ECS service. physical_id is the TaskDefinition ARN including revision (e.g. arn:aws:ecs:...:task-definition/my-app:5), and ARNs are prefix-substrings of higher-revision ARNs of the same family (arn:aws:ecs:...:task-definition/my-app:50). They are also substrings of family names that share a prefix (e.g. my-app vs my-app-staging). When that happens, sam sync issues forceNewDeployment against an unrelated service in the account.

The family-name comparison on the right side of the or already correctly identifies services that reference the same task-definition family, regardless of revision. Drop the substring check and rely on the family-name comparison alone:

svc_family = svc_task_def.rsplit("/", 1)[-1].split(":", 1)[0]
my_family = physical_id.rsplit("/", 1)[-1].split(":", 1)[0]
if svc_family and svc_family == my_family:
   ecs_client.update_service(
       cluster=cluster,
       service=service_name,
       forceNewDeployment=True,
   )

…nd add error case tests

- Remove substring check in _force_ecs_deployment that could match wrong
  services (e.g. my-app:5 matching my-app:50). Use family-name comparison only.
- Add unit tests for ContainerName mismatch error handling in both
  app_builder and packageable_resources.
- Clean up unused imports in test file.
@singledigit

Copy link
Copy Markdown
Author

Hi @reedham-aws — the test failures have been resolved:

  1. Lint (ruff PLR2004) — Extracted magic number to a named constant.
  2. Windows integ test — Added @skipIf for Windows CI (matches existing Lambda image test pattern — Windows runners can't pull Linux images).
  3. PR Reviewer bot findings — Fixed the substring false-positive in _force_ecs_deployment and added unit tests for the ContainerName mismatch error paths.

All 51 real checks are green now. The one remaining failure was a pyinstaller-linux infra flake (build artifact not produced), so I've retriggered CI. Ready for review when you get a chance!

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..ca5a310
Files: 19
Comments: 1


builder = ApplicationBuilder(
(
self._build_context.collect_build_resources(self._resource_identifier)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When sam sync falls into _build_from_scratch (i.e. _application_build_result is None), ApplicationBuilder is constructed with:

self._build_context.collect_build_resources(self._resource_identifier)
if hasattr(self._build_context, "collect_build_resources")
else self._build_context.get_resources_to_build()

BuildContext.collect_build_resources() only collects functions (via _collect_single_function_and_dependent_layers) and layers (via _collect_single_buildable_layer). When neither is found it raises ResourceNotFound("Unable to find a function or layer with name ''") (build_context.py:1265-1271). Since the _resource_identifier here is an AWS::ECS::TaskDefinition or AWS::BedrockAgentCore::Runtime logical ID — never a function or layer — this branch will always raise on first reach, breaking the from-scratch sync path for ECS/AgentCore resources.

The hasattr(...) guard is also dead code: BuildContext always defines collect_build_resources, so the else branch (get_resources_to_build()) is never taken.

Suggested fix: build an empty/synthetic ResourcesToBuildCollector (the actual container build is performed by builder.build_container_images([build_def]) on line 127, which doesn't depend on resources_to_build), or guard _build_from_scratch so it short-circuits when only container resources are involved.

collect_build_resources() only handles functions/layers and raises
ResourceNotFound for ECS/AgentCore resource IDs. Since
build_container_images() doesn't depend on resources_to_build,
pass an empty collector instead.
@bnusunny

bnusunny commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Thanks for this contribution — the design is generally well-shaped and the build/package paths look solid. Before we can merge, there are three correctness issues in the sync path and the multi-container handling that need to be addressed.

1. sam sync for ECS pushes a new image but redeploys the old TaskDefinition revision

File: samcli/lib/sync/flows/ecs_container_sync_flow.py:153-156

ECRUploader.upload builds tags via tag_translation (samcli/lib/package/image_utils.py:54), which produces {name}-{12-char-image-id}-{tag}. The 12-char segment is derived from the docker image id, so any source change yields a new ECR tag. The flow currently:

  1. Pushes the new image under the new tag.
  2. Discards the upload return value (compare ImageFunctionSyncFlow.sync at image_function_sync_flow.py:150, which captures image_uri = ecr_uploader.upload(...) and feeds it to update_function).
  3. Calls ecs_client.update_service(forceNewDeployment=True).

forceNewDeployment redeploys the same task-definition revision, which still references the previous image tag. The freshly-pushed tag is never consulted, but the logs say "Forced new deployment for service …" so the failure is silent.

To make sync work end-to-end, the flow needs to register a new TaskDefinition revision with the updated ContainerDefinitions[i].Image (using the URI returned from ecr_uploader.upload) and then update the service to that revision. RegisterTaskDefinition is not called anywhere in the codebase today — that step is missing.

2. AgentCore sync has no rollout step

File: samcli/lib/sync/flows/ecs_container_sync_flow.py (whole sync() path)

AWS::BedrockAgentCore::Runtime is wired to _create_ecs_container_flow in sync_flow_factory.py:374, so AgentCore resources land in ECSContainerSyncFlow.sync(). After the ECR push, the only update step is _force_ecs_deployment, which iterates physical_id_mapping looking for "service/" substrings. AgentCore Runtime physical IDs are ARNs of the form arn:aws:bedrock-agentcore:…:runtime/… — no service/ segment — so the loop body never fires for AgentCore. There is no bedrock-agentcore boto client constructed anywhere in samcli/, and no UpdateAgentRuntime call.

The design doc's success criterion 4 says sync "triggers redeployment for these resources," but the implementation doesn't honor that for AgentCore. Either:

  • Implement an AgentCore-specific update step (e.g., bedrock-agentcore.UpdateAgentRuntime against AgentRuntimeArtifact.ContainerConfiguration.ContainerUri), or
  • If AgentCore sync is out of scope for this PR, drop it from GENERATOR_MAPPING and document the limitation.

A single ECSContainerSyncFlow covering both resource types also feels like the wrong altitude — the runtime APIs are completely different. Consider splitting into ECSTaskDefinitionSyncFlow and AgentCoreRuntimeSyncFlow so each owns its own update path cleanly.

3. Multi-container TaskDefinitions silently mis-deploy when Metadata.ContainerName is unset

Files: samcli/lib/build/app_builder.py:420-435, samcli/lib/package/packageable_resources.py:693-744

When a TaskDefinition has more than one entry in ContainerDefinitions and no Metadata.ContainerName, the build path silently writes container_defs[0]["Image"] = path and the package path's _get_target_index returns 0. Sidecars and other containers keep whatever the user wrote (often literal local strings like "my-app:latest" that aren't valid image URIs). CloudFormation does not validate Image strings, so the deploy succeeds; ECS then rejects the task at launch with CannotPullContainerError, far away from the SAM CLI invocation.

The design doc's example template (lines 77–92) explicitly shows ContainerName: web for a multi-container TaskDef, so the intent is clear — but the code doesn't enforce it. Two reasonable options:

  • Strict (preferred for multi-container): if len(ContainerDefinitions) > 1 and ContainerName is missing, raise a clear DockerBuildFailed/InvalidTemplate error pointing the user at the metadata key.
  • Relaxed: keep the index-0 fallback but LOG.warning when there's more than one container, so the silent footgun is at least visible.

Either way, the docstring at packageable_resources.py:696 ("The image reference is at ContainerDefinitions[0].Image") should be updated — it bakes the silent assumption into the contract.


A few smaller things you'll likely want to address in the same revision (non-blocking but worth bundling):

  • _update_built_resource AgentCore branch uses chained setdefault on AgentRuntimeArtifact, which corrupts {"Ref": "Foo"} intrinsics or raises AttributeError on a non-dict value. The Lambda IMAGE branches (resource_properties["Code"] = {"ImageUri": path}) overwrite the whole property — please follow that pattern.
  • _force_ecs_deployment's "service/" in physical_id substring check matches App Runner / AppMesh / VPC Lattice ARNs and triggers spurious describe_services calls. Tighten to an ECS service ARN prefix check.
  • SyncCodeResources._accepted_resources was not extended; sam sync --code-resource AWS::ECS::TaskDefinition is rejected by click.Choice before any flow is constructed.
  • CONTAINER_BUILD_DEFINITIONS, _container_build_definitions, get_container_build_definitions, and put_container_build_definition in build_graph.py have zero callers and are not (de)serialized in _read/_write. Either wire them up to provide caching/incremental parity with Lambda, or drop them from this PR.
  • _get_resource_api_calls returns ApiCallTypes.UPDATE_FUNCTION_CODE for ECS/AgentCore. It works today only because of CFN logical-id uniqueness; please add an ECS-appropriate enum value.
  • _has_container_build_metadata and _update_built_resource should isinstance(metadata, dict) rather than if not metadata, matching the existing convention in sam_function_provider.py:825. Today a non-dict Metadata crashes with AttributeError.

Happy to re-review once these are addressed. Thanks again for the work — this is a feature lots of folks have been asking for.

bnusunny added 3 commits June 6, 2026 12:34
…tension functions (aws#9030)

* refactor(package): replace METADATA_EXPORT_LIST and GLOBAL_EXPORT_DICT with typed registries

Migrate two flat package exporter registries to typed dataclass form:

- METADATA_EXPORT_LIST (list of Resource subclasses) -> METADATA_EXPORTS
  (List[MetadataExportSpec]). Each spec captures both the metadata_type
  ('AWS::ServerlessRepo::Application') and the per-property exporter
  classes that handle LicenseUrl/ReadmeUrl. Template._export_metadata
  now dispatches via a metadata_type lookup instead of a per-class
  RESOURCE_TYPE filter.

- GLOBAL_EXPORT_DICT (dict keyed by Fn::Transform) -> GLOBAL_TRANSFORM_EXPORTS
  (List[GlobalTransformExportSpec]). Each spec carries a discriminator
  callable (e.g. _is_aws_include for matching AWS::Include) plus the
  handler. _export_global_artifacts now dispatches through the
  discriminator so future Fn::Transform variants can register without
  touching the walker.

The typed shape is the contract a follow-up commit will read from to
process AWS::Include before language-extension expansion.

* feat(package): merge SAR Metadata exports back into LE child templates

Extend merge_language_extensions_s3_uris with a registry-driven Metadata
pass that copies rewritten property values (LicenseUrl, ReadmeUrl) from
the exported template back into the original (Fn::ForEach-preserving)
template after LE expansion.

Without this pass, when a child stack uses Transform: AWS::LanguageExtensions
and declares AWS::ServerlessRepo::Application metadata, sam package
silently dropped the License/Readme S3 URLs from the merged output —
they were uploaded but never wired into the template the user deploys.

Implementation iterates METADATA_EXPORTS (the registry added in the prior
commit) so future metadata types pick up merge support without touching
the merge walker.

* fix(package): process AWS::Include before language-extension expansion (aws#9027)

Closes aws#9027.

Symptom: when a template uses both Transform: AWS::LanguageExtensions and
contains Fn::Transform: AWS::Include buried inside an LE function (e.g.
Fn::ToJsonString), sam package fails to rewrite the include's Location to
an S3 URL. CloudFormation then rejects the deploy with 'The location
parameter is not a valid S3 uri.'

Root cause: PackageContext._export ran expand_language_extensions BEFORE
the artifact exporter walked Fn::Transform nodes. LE functions like
Fn::ToJsonString json.dumps() their argument, collapsing the structural
Fn::Transform subtree into a JSON-string literal. By the time the
exporter ran, the include was no longer a structural dict node and was
invisible to the global-transform walker.

Fix: process AWS::Include (and any other GLOBAL_TRANSFORM_EXPORTS
handler) on the original template BEFORE LE expansion runs, mirroring
CloudFormation's own server-side transform ordering — CFN resolves
inline Fn::Transform macros before evaluating AWS::LanguageExtensions.

Implementation:

- Extract Template._export_global_artifacts to a module-level
  _export_global_artifacts_pass(template_dict, uploader, template_dir).
  The instance method becomes a one-line delegate so existing callers
  keep working.
- Call _export_global_artifacts_pass on the original template before
  expand_language_extensions in PackageContext._export (root flow) and
  in CloudFormationStackResource.do_export (nested-stack child flow).

Dynamic-Location AWS::Include inside Fn::ForEach (e.g.
Location: ./swagger-${Name}.yaml) is not supported by sam package: a
local file path with literal ${...} placeholders does not exist on disk,
so is_local_file fails and the existing InvalidLocalPathError fires —
which is the right user-facing failure. CloudFormation does not
substitute loop variables into Fn::Transform paths server-side either,
so the limitation matches CFN's actual capability.

* test(package): integration coverage for AWS::Include + SAR Metadata in LE templates

Three end-to-end tests that exercise the package_context._export-equivalent
flow on language-extension templates:

- test_le_template_with_top_level_aws_include_merges_location verifies
  AWS::Include in Outputs gets its Location rewritten to s3:// after
  the pre-LE pass.
- test_le_template_with_serverless_repo_metadata_merges_license_url
  verifies SAR LicenseUrl/ReadmeUrl in Metadata get merged back.
- TestPackageContextIssue9027 reproduces the user template from
  aws#9027 (Fn::ToJsonString over Fn::Transform: AWS::Include
  buried inside AWS::SSM::Parameter) and asserts the buried Location is
  rewritten to s3://. Locks down the regression.

* docs(cfn-lang-ext): document AWS::Include processing order vs language extensions

Add a section explaining that sam package processes Fn::Transform: AWS::Include
before language-extension expansion, mirroring CloudFormation's server-side
transform ordering. This means AWS::Include Location rewrites work correctly
even when the include lives buried inside language-extension functions like
Fn::ToJsonString or Fn::ForEach bodies.

* test(package): align LE tests with opt-in API from aws#9033

PR aws#9033 made AWS::LanguageExtensions local processing opt-in. Two API
changes broke three tests on this branch after the merge from develop:

- expand_language_extensions() now requires a keyword-only `enabled` arg
- PackageContext reads self._language_extensions_enabled, set in __init__

Pass enabled=True to expand_language_extensions in the two artifact-exporter
tests, and set _language_extensions_enabled on the PackageContext stub that
bypasses __init__ via __new__. All three tests exercise templates with
Transform: AWS::LanguageExtensions, so enabling LE is the intended behavior.

* test(package): hoist inline imports in TestPackageContextBuriedAWSInclude

Move os, Destination/Uploaders, and yaml_parse to module-level imports;
drop the redundant inline tempfile/PackageContext (already at module level).

Addresses inline-imports review comment on aws#9030.

* refactor(package): hoist InvalidSamDocumentException and expand_language_extensions imports

Move the two inline imports inside _export() to module scope. No
behavior change; this is preparation for the upcoming _export() split
which references these from a new branch method.

* refactor(package): add _export_without_language_extensions (off-path branch)

New private method that mirrors pre-1.160.0 sam-cli behavior: a tight
Template.export() pipeline with no LE machinery. Unused by _export()
until the dispatcher is cut over in a follow-up commit.

Also adds two structural-gate smoke tests that stay red until the
dispatcher is wired up — they assert that the off path never invokes
expand_language_extensions or the pre-LE _export_global_artifacts_pass.

* refactor(package): add _export_with_language_extensions (on-path branch)

New private method containing the existing aws#9027 ordering: pre-LE
include pass, LE expansion, Template.export(), merge, Mappings.
Unused by _export() until the dispatcher is cut over in the next commit.

* refactor(package): split _export() into LE / non-LE branches gated on flag

_export() becomes a thin dispatcher: read the template, branch on
self._language_extensions_enabled, dump. The two branch methods added
in the prior two commits now own the actual work.

Treats --language-extensions as a hard correctness boundary: when off,
no LE machinery is invoked at all (no expand_language_extensions, no
pre-LE _export_global_artifacts_pass, no merge, no Mappings).
Template.export() still handles AWS::Include for non-LE templates via
its own internal _export_global_artifacts pass — that path has worked
since long before the aws#9027 fix and is unchanged by this refactor.

Public surface (PackageContext._export(template_path, use_json) -> str)
is preserved; all existing _export tests pass unchanged.

* test(package): repoint expand_language_extensions patches to package_context use site

The hoist in 756c502 moved `expand_language_extensions` to a
module-level import in `package_context.py`. The use-site binding is
now captured at module load, so existing tests that patched the source
module `samcli.lib.cfn_language_extensions.sam_integration.expand_language_extensions`
no longer intercept calls from `_export()`.

Repoint two patches at the use-site
`samcli.commands.package.package_context.expand_language_extensions`:

- `test_phase_separation.py::test_package_context_export_calls_expand_language_extensions`
  was failing intermittently in CI depending on test-collection order.
- `test_package_context.py::test_off_path_does_not_invoke_expand_language_extensions`
  is retargeted for consistency now that the inline-import shadow is
  gone (post-Task-4 cutover); the dead `had_language_extensions=False`
  scaffolding can also go since the off path no longer calls expand.

Other source-module patches in `test_artifact_exporter.py` are correct
as-is — those tests cover `Template.export()` paths whose use site is
the artifact_exporter module.

* refactor(package): hoist do_export inline imports and repoint test patches

Hoist expand_language_extensions, IntrinsicsSymbolTable,
generate_and_apply_artifact_mappings, and merge_language_extensions_s3_uris
to module-level imports in artifact_exporter.py — preparation for the
LE / non-LE structural-gate split of CloudFormationStackResource.do_export.

Module-level binding requires existing tests that patched
samcli.lib.cfn_language_extensions.sam_integration.expand_language_extensions
(the source module) to repoint at the use-site
samcli.lib.package.artifact_exporter.expand_language_extensions, since
the use-site name is captured at module load and source-module patches
no longer intercept once the inline import is gone. Same lesson as
PR aws#9030 commit 45dea4b for package_context.

* refactor(package): add _do_export_without_language_extensions branch method

LE-off branch for CloudFormationStackResource: path-based Template
construction with no pre-LE pass, no parameter_values, no deepcopy.
Method is unused until the dispatcher rewrite — wiring lands in the
follow-up commit. Adds TestDoExportLanguageExtensionsStructuralGate
with two red structural-gate assertions that the dispatcher rewrite
will turn green.

* refactor(package): add _do_export_with_language_extensions branch method

Lifts the existing LE-expansion body from CloudFormationStackResource.do_export
into a dedicated method (resource_id, template_path, parent_dir,
abs_template_path, resource_dict) -> Dict. No behavior change yet — the
dispatcher in the follow-up commit will route LE-on calls into it.

Threads resource_dict so the branch can read nested-stack Parameters
without the dispatcher passing them separately. Returns the exported
template dict so the dispatcher owns the final yaml_dump + upload tail.

* refactor(package): split do_export into LE / non-LE branches gated on flag

CloudFormationStackResource.do_export is now a thin dispatcher that:
  1) does the early-exit guards (None / S3 URL / non-local file)
  2) routes to _do_export_with_language_extensions or
     _do_export_without_language_extensions based on
     self.language_extensions_enabled
  3) owns the final yaml_dump + upload + property mutation tail.

The off path is now structurally LE-free: no expand_language_extensions
call, no _export_global_artifacts_pass, no IntrinsicsSymbolTable
pseudo-param plumbing, no parent_parameter_values, no copy.deepcopy.
Template.export() runs its own internal _export_global_artifacts so
AWS::Include still resolves on the off path.

The structural-gate tests added in the previous commit are now green.
Existing LE tests that rely on language_extensions_enabled = True now
set the flag explicitly (where they previously depended on the LE
machinery running unconditionally as a passthrough).

* chore(package): apply make format to do_export refactor

Black collapsed multi-line calls and signatures whose single-line
form fits under the 120-char limit. No behavior change.

* refactor(package): add _build_child_parameter_values helper

Adds a CFN-parity helper that builds the parameter_values dict for a
nested child stack: pseudo-params (with parent pseudo-name overrides
honored) plus parent-rebound values via the nested-stack Parameters
property. Non-pseudo parent names are not copied; child Defaults are
read by the LE expander itself via parsed_template.parameters.

Helper is unused production code in this commit. Wired into
CloudFormationStackResource._do_export_with_language_extensions in
the next commit.

* fix(package): tighten parent_parameter_values scoping in LE path

CloudFormationStackResource._do_export_with_language_extensions used to
bulk-copy the parent stack's full parameter_values into the child, so a
parent's `Foo=parent_foo` could silently shadow an unrelated child
Parameter named Foo. This diverged from CloudFormation's nested-stack
contract (only parent-rebound names reach the child) and from the
non-LE path (which threads no parameters at all).

Replace the merge block with _build_child_parameter_values, which
returns pseudo-params (with parent pseudo-name overrides honored) plus
parent-rebound values from the nested-stack Parameters property. Child
template Defaults still resolve via the LE expander's parsed_template
fallback (resolvers/fn_ref.py:_resolve_parameter).

Adds end-to-end coverage:
- non-pseudo parent param does not leak into child's parameter_values
- child Default still resolves via resolver fallback after the change

* refactor(package): trim Task 2 redundant comments and verbose test docstrings

Code-review polish for the LE parent-param scoping fix:
- drop call-site comment block at _do_export_with_language_extensions
  that duplicated _build_child_parameter_values's docstring
- shorten the two new test docstrings in
  TestCloudFormationStackResourceChildExpansion to match neighbor
  style (no internal module paths)

* refactor(tests): hoist inline imports added in LE parent-param fix

Move imports introduced by the previous two commits from inside test
bodies up to the module-level import block:
- _build_child_parameter_values, IntrinsicsSymbolTable
- yaml_dump (samcli.yamlhelper), shutil, inspect (stdlib)

None of these symbols are @patch targets, so hoisting does not affect
any mock binding (use-site patches in artifact_exporter remain correct).

* refactor(tests): hoist remaining inline imports and extract LE fixtures

Address review feedback on PR aws#9030: inline imports should live at
module scope, and on-the-fly file writes for LE child templates should
use checked-in fixtures instead of tempfile.mkdtemp() + yaml_dump().

Hoist inline imports (LanguageExtensionResult, _resolve_nested_stack_parameters,
yaml_dump, shutil, the packageable_resources block) to the module-level
import block. None are @patch targets, so use-site patches remain correct.

Add tests/unit/lib/package/test_data/ following the precedent in
tests/unit/lib/intrinsic_resolver/test_data, with TEST_DATA_PATH
constant defined as Path(__file__).resolve().parent / "test_data".

Convert the LE parent-param test methods, the buried-AWS::Include test,
the expansion-error-handling pair, and the structural-gate pair to read
their child templates (and the export-events.json include target) from
the fixture tree. The on-the-fly tempdir + yaml_dump scaffolding is
removed entirely along with the corresponding _write_child /
_write_minimal_child helpers.

No production code change; all 110 unit tests in test_artifact_exporter
still pass.

* refactor(tests): extract TestPackageContextBuriedAWSInclude inline fixture

Replace the on-the-fly tempfile + open()+write() scaffolding in
TestPackageContextBuriedAWSInclude.setUp with a checked-in fixture under
tests/unit/commands/package/test_data/buried_aws_include/. Mirrors the
TEST_DATA_PATH pattern already in tests/unit/lib/package/test_data and
tests/unit/lib/intrinsic_resolver/test_data.

setUp now resolves template_path from TEST_DATA_PATH; the export-events
include target lives next to template.yaml so the relative Location in
the template still resolves at sam-package time.

No production code change; all 146 tests in test_package_context still
pass.
…8934)

Replace hand-rolled copytree implementations with
shutil.copytree(..., symlinks=True, dirs_exist_ok=True) in both
osutils.py and copy_terraform_built_artifacts.py.

The custom implementations were originally needed for Python 3.7
compatibility (dirs_exist_ok was added in 3.8), but SAM CLI now
requires Python 3.10+, making them unnecessary. Using the stdlib
version with symlinks=True correctly preserves both file and directory
symlinks without following them.

Changes:
- osutils.copytree: replaced ~40-line manual implementation with
  single shutil.copytree call; removed unused errno import; replaced
  os.remove try/except with Path.unlink(missing_ok=True) in remove()
- Terraform copy_terraform_built_artifacts.copytree: same replacement
- Updated unit tests to match new implementation
- Added integration-style tests verifying file and directory symlink
  preservation on disk
- Audited os.walk(followlinks=True) in package/utils.py — intentional
  and correct for zip packaging (zip files need dereferenced content)

This is a code quality / defense-in-depth improvement, not a security
fix. We do not believe this warrants a new CVE assignment.
bnusunny and others added 5 commits June 6, 2026 12:34
Extends SAM CLI to build, package, and deploy container images for
AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources
using the same Metadata convention as Lambda Image functions.

Fixes aws#8933
…sync

- sam sync now captures the ECR URI returned by ECRUploader.upload and
  registers a new TaskDefinition revision with the updated image, then
  updates matching ECS services to that revision (fixes silent no-op)
- Tighten _force_ecs_deployment → _update_services_to_task_definition to
  require both the arn:aws:ecs: prefix and :service/ segment, preventing
  false matches on App Runner / VPC Lattice / AppMesh ARNs
- Split AgentCore out of ECSContainerSyncFlow: GENERATOR_MAPPING now routes
  AWS::BedrockAgentCore::Runtime to _create_agentcore_flow which logs a
  clear 'not yet supported' warning and returns None (UpdateAgentRuntime
  not yet implemented)
- Add AWS::ECS::TaskDefinition to SyncCodeResources._accepted_resources so
  sam sync --code-resource AWS::ECS::TaskDefinition is accepted by Click
- Add ApiCallTypes.UPDATE_CONTAINER_IMAGE enum value; use it in
  ECSContainerSyncFlow._get_resource_api_calls instead of UPDATE_FUNCTION_CODE
- Fix multi-container TaskDefinition safety: raise DockerBuildFailed when
  >1 ContainerDefinitions and no Metadata.ContainerName (app_builder.py)
- Fix AgentCore _update_built_resource: replace setdefault chain (which
  corrupts intrinsics) with direct dict assignment
- Fix isinstance(metadata, dict) guards in _update_built_resource,
  _has_container_build_metadata, and ECSTaskDefinitionImageResource
- Update ECSTaskDefinitionImageResource docstring and add LOG.warning for
  multi-container fallback (packageable_resources.py)
- Remove dead BuildGraph.CONTAINER_BUILD_DEFINITIONS constant and
  get/put_container_build_definition methods (zero callers, not
  serialized in _read/_write)
- Add test_ecs_container_sync_flow.py with 17 unit tests covering the
  new sync path, ARN filtering, error handling, and tag propagation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..854065b
Files: 49
Comments: 1

property_name=self.PROPERTY_NAME,
property_value=target_name,
ex=ValueError(
f"Metadata.ContainerName '{target_name}' does not match any container in ContainerDefinitions"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] The silent fallback in _get_target_index is inconsistent with the build path and can cause sam package to upload to the wrong container. When Metadata.ContainerName is unset on a multi-container AWS::ECS::TaskDefinition, this code logs a warning and returns index 0:

if len(container_defs) > 1:
   LOG.warning(
       "TaskDefinition has multiple containers but Metadata.ContainerName is not set; "
       "packaging the first container. Add 'ContainerName: <name>' to the resource Metadata "
       "to avoid ambiguity."
   )
return 0

The corresponding build path in app_builder.py:_update_built_resource handles the exact same scenario by raising DockerBuildFailed:

elif len(container_defs) > 1:
   raise DockerBuildFailed(
       f"TaskDefinition has {len(container_defs)} containers but Metadata.ContainerName is not set. "
       f"Add 'ContainerName: <name>' to the resource Metadata to specify which container to build."
   )

When sam package is run independently of sam build (e.g., user pre-built their image, or running on a templated artifact pipeline), this fall-through still triggers the same silent-wrong-container failure mode the previous bot comments flagged for the ContainerName-mismatch case: the freshly pushed ECR URI is written into ContainerDefinitions[0]["Image"] — typically a sidecar in a multi-container task definition (e.g. public.ecr.aws/envoy:latest in your own integration test fixture). The user's intended container retains its placeholder, while the sidecar is silently overwritten.

The fix is to mirror the build path and raise ExportFailedError here instead of warning + returning 0. The existing raise exceptions.ExportFailedError(...) branch above (for the target_name-set-but-no-match case) is the right pattern to follow.

@singledigit

Copy link
Copy Markdown
Author

Hi @bnusunny — thanks for the thorough review. All findings have been addressed and pushed. Here's what changed:

Blocking issues

1. sam sync deploys old TaskDefinition revision
ECRUploader.upload's return value is now captured. After the ECR push, the flow calls ecs.describe_task_definition to fetch the current definition, swaps the image field on the target container, registers a new revision via ecs.register_task_definition, then calls ecs.update_service(taskDefinition=<new_arn>) — matching the pattern from ImageFunctionSyncFlow. Tags are preserved.

2. AgentCore sync has no rollout step
AWS::BedrockAgentCore::Runtime is now routed to a new _create_agentcore_flow that logs a clear warning ("sam sync is not yet supported for AWS::BedrockAgentCore::Runtime — use sam deploy") and returns None. I agree the right long-term shape is a separate AgentCoreRuntimeSyncFlow with an UpdateAgentRuntime call; I've left that as a follow-up rather than a half-finished implementation here.

3. Multi-container TaskDefinition silent mis-deploy
app_builder._update_built_resource now raises DockerBuildFailed when len(ContainerDefinitions) > 1 and Metadata.ContainerName is not set, with a message pointing the user at the metadata key. Went with the strict (preferred) option.

Non-blocking fixes

  • AgentCore setdefault chain — replaced with direct dict assignment (resource_properties["AgentRuntimeArtifact"] = {"ContainerConfiguration": {"ContainerUri": path}}) matching the Lambda IMAGE pattern.
  • _force_ecs_deployment ARN check — tightened to require both arn:aws:ecs: prefix and :service/ segment; App Runner / VPC Lattice / AppMesh ARNs are now correctly skipped.
  • SyncCodeResources._accepted_resourcesAWS::ECS::TaskDefinition added so sam sync --code-resource AWS::ECS::TaskDefinition is accepted by Click.
  • ApiCallTypes — added UPDATE_CONTAINER_IMAGE = "UpdateContainerImage"; ECSContainerSyncFlow._get_resource_api_calls now uses it instead of UPDATE_FUNCTION_CODE.
  • isinstance(metadata, dict) guards — applied in _update_built_resource, _has_container_build_metadata, and ECSTaskDefinitionImageResource._get_target_index.
  • Dead build_graph.py codeCONTAINER_BUILD_DEFINITIONS, get_container_build_definitions, and put_container_build_definition removed (zero callers, not serialized in _read/_write).

Added 17 unit tests for ECSContainerSyncFlow covering the new sync path, ARN filtering, tag propagation, named-container targeting, and error handling.

… TaskDef

Mirror the build path: when len(ContainerDefinitions) > 1 and
Metadata.ContainerName is unset, sam package now raises ExportFailedError
instead of silently packaging the first container. Prevents the
wrong-container overwrite in standalone sam package invocations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@singledigit

Copy link
Copy Markdown
Author

Good catch on the _get_target_index inconsistency — the warning+fallback was the wrong call given that the build path already raises. Fixed to raise ExportFailedError with the same message pattern, so sam package now fails fast with a clear error on multi-container TaskDefinitions when Metadata.ContainerName is unset. Pushed in 99822c6.

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..99822c6
Files: 16 (production code only — skipped tests, fixtures, and docs)
Comments: 2


Comments on lines outside the diff:

[samcli/lib/package/artifact_exporter.py:619] [BUG] -622

The new export loop sets exporter.resource_metadata twice on consecutive lines:

exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")          # first assignment
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")          # duplicate
exporter.export(full_path, resource_dict, self.template_dir)

This looks like a merge artifact from the recent fix that wired resource_metadata into the export path. Both assignments set the same value so it's not functionally broken, but it's confusing for readers and a clear sign the change wasn't reviewed in isolation. Remove one of them. Also note the equivalent block in Template.delete() only sets it once, which is the intended pattern.

updated = True
if container_name is not None:
break

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] -215

In _register_updated_task_definition, when _get_container_name() returns None (i.e. the user did not set Metadata.ContainerName), the loop overwrites the image field on every container in the live task definition:

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

The build path now rejects multi-container local templates without ContainerName, so the assumption seems to be "if container_name is None, there's only one container." But register_input["containerDefinitions"] comes from describe_task_definition against the live task definition, not the local template. Common drift scenarios produce a multi-container live TD with a single-container local template:

  • User deployed with one container, then added a sidecar (xray-daemon, envoy, datadog-agent) via console / aws ecs register-task-definition / a separate IaC tool.
  • User imported an existing multi-container TD into the SAM stack but only manages the primary container in the template.

In both cases, sam sync will silently rewrite every container's image to the freshly built ECR URI, replacing the sidecars. This is the same class of silent-corruption bug the author already fixed in _update_built_resource and _get_target_index — the sync path needs the same guard. Suggest raising (or at minimum logging a warning and skipping) when container_name is None and len(container_defs) > 1, matching the build/package error message pattern.

…lder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..79965b4
Files: 49
Comments: 2

container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When Metadata.ContainerName is unset and the live task definition has multiple containers, this loop silently overwrites the image on every container — including sidecars (e.g. envoy proxy, log router, X-Ray daemon). The build path raises DockerBuildFailed and the package path raises ExportFailedError for the exact same condition; the sync path is now the only execution path that doesn't fail-fast and instead corrupts unrelated containers.

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

Concrete failure scenario: a user runs sam sync on a task definition with [envoy-sidecar, app] and forgot to set ContainerName. The freshly built app image gets written to envoy-sidecar.image AND app.image, then registered as a new revision and rolled out. The sidecar is now broken. The unit tests in test_ecs_container_sync_flow.py only cover the single-container container_name=None case and the explicit container_name="app" case — they never exercise the multi-container + None combination, which is why this slipped through.

Fix: when container_name is None and len(container_defs) > 1, log an error and return None (consistent with the existing "ContainerName not found" early-return below) so a new revision is not registered. Mirroring the build/package error message keeps the three paths consistent.

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
if container_name is None and len(container_defs) > 1:
   LOG.warning(
       "%sTaskDefinition has %d containers but Metadata.ContainerName is not set; "
       "skipping registration. Add 'ContainerName: <name>' to the resource Metadata.",
       self.log_prefix, len(container_defs),
   )
   return None

updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once on line 621 and again on line 623, with the same value, around an unrelated language_extensions_enabled assignment in between:

exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")              # line 621
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")              # line 623 — duplicate
exporter.export(full_path, resource_dict, self.template_dir)

This is functionally a no-op (same value re-assigned) but is clearly an accidental merge/rebase artifact — the diff hunk shows both lines as added. Beyond cosmetics, leaving it in is a maintainability hazard: a future reader changing one site (e.g. wrapping in copy.deepcopy or threading a different metadata source) will likely miss the duplicate and introduce a real divergence. Delete one of the two lines.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..b3a7232
Files: 49
Comments: 2

register_input["tags"] = tags

try:
reg_response = ecs_client.register_task_definition(**register_input)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When Metadata.ContainerName is unset, _get_container_name() returns None and the loop in _register_updated_task_definition overwrites image on every container in the live task definition — including sidecars (envoy/X-Ray/log routers). This is the same condition that the build path (app_builder._update_built_resource) and the package path (packageable_resources._get_target_index, fixed in 99822c6) both fail-fast on with DockerBuildFailed/ExportFailedError. The sync path is now the only path that silently rewrites the wrong containers.

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

This was raised in a prior review and not explicitly dismissed by the author — the 99822c6 fix only covers packageable_resources._get_target_index. Match the build/package semantics: when container_name is None and len(container_defs) > 1, raise (or at minimum return None and abort the registration) with the same "Add 'ContainerName: ' to the resource Metadata…" message used elsewhere. The single-container case (len == 1) can keep the current behavior.

# Export code resources
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once on line 621, again on line 623, with the same value, separated only by an unrelated language_extensions_enabled assignment:

exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")          # line 621
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")          # line 623
exporter.export(full_path, resource_dict, self.template_dir)

Functionally redundant but it has been flagged in multiple previous review iterations without an explicit dismissal. It looks like a merge/rebase artifact and should be deduplicated to avoid confusion about whether one of them was meant to point at a different value (e.g. the intent of one of the two being moved before/after a different setter).

Annotate new_arn as Optional[str] to satisfy mypy's no-any-return check
on the dict.get() chain from boto3 response.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..9d09336
Files: 49
Comments: 2

exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once at line 621 and again at line 623, separated only by an unrelated language_extensions_enabled assignment:

exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")          # line 621
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")          # line 623 — duplicate
exporter.export(full_path, resource_dict, self.template_dir)

The second assignment is redundant — both lines compute and assign identical values to the same attribute. This was flagged in multiple prior reviews (9ffa5ed8..03fd9e68, 999d51fa..b3a72328, 999d51fa..854065ba) but has not been removed. It looks like a merge artifact from when the metadata-passing fix was applied. Drop one of them — keeping line 621 reads more naturally with the other exporter setup.

container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When Metadata.ContainerName is unset and the live task definition fetched via describe_task_definition has multiple containers, this loop silently overwrites image on every container — including sidecars (envoy, X-Ray daemon, log routers):

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

The build path (ApplicationBuilder._update_built_resource) raises DockerBuildFailed on this same condition, and the package path (ECSTaskDefinitionImageResource._get_target_index, fixed in 99822c6) raises ExportFailedError. The sync path is now the only execution path that fails open and corrupts the registered task definition.

This is reachable in practice: a user can run sam build && sam deploy with a single-container template, then later edit the live task definition (or another teammate does) to add a sidecar; on the next sam sync, the local template still has no ContainerName and the sync path will rewrite the sidecar image alongside the app image.

Fail-fast to match the other two paths — when container_name is None and len(container_defs) > 1, raise (or log an error and return None) with the same message pattern: "TaskDefinition has N containers but Metadata.ContainerName is not set...".

…iner strict error

- test_ecs_and_agentcore_use_same_flow_creator → assertNotEqual since
  AgentCore now routes to _create_agentcore_flow, not _create_ecs_container_flow
- test_ecs_task_definition_falls_back_to_first_without_container_name →
  now expects DockerBuildFailed for multi-container TaskDef without ContainerName
- Add @skipIf(IS_WINDOWS) to TestContainerImageBuild integ test —
  Linux container builds fail on Windows CI runners

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..5f39dd8
Files: 49
Comments: 2

exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] The line exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — first at line 621, then again at line 623, with the same right-hand side, separated only by an unrelated language_extensions_enabled assignment:

exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")          # line 621
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")          # line 623 — duplicate
exporter.export(full_path, resource_dict, self.template_dir)

The second assignment is dead code. This was flagged in multiple prior bot reviews (most recently against 9d09336) and was not addressed in 5f39dd8. Drop one of the two assignments — keeping the one above language_extensions_enabled (line 621) is consistent with the order used elsewhere.

"registeredAt",
"registeredBy",
}
register_input = {k: v for k, v in task_def.items() if k not in _READONLY_FIELDS}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] In _register_updated_task_definition, when Metadata.ContainerName is unset (so _get_container_name() returns None) and the live task definition has multiple containers, this loop silently overwrites the image field on every container — including sidecars (envoy, X-Ray daemon, log routers, OTel collectors):

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

When container_name is None, the conditional short-circuits to True for every container, the inner break is never hit, and every container's image is replaced with the freshly pushed URI.

The build path (app_builder._update_built_resource) and the package path (packageable_resources.ECSTaskDefinitionImageResource._get_target_index, both fixed earlier in this PR) raise on the exact same condition — multi-container TaskDefinition without Metadata.ContainerName. The sync path is now the only execution path that doesn't fail-fast, leaving inconsistent behavior between sam build/package/deploy and sam sync for the same template. This was flagged in multiple prior bot reviews and was not addressed.

Recommended fix: when container_name is None and len(container_defs) > 1, log an error and return None (skip the registration) — or, to match the build/package paths, raise so the user gets a clear "set Metadata.ContainerName" message. Either is fine; the current silent multi-overwrite is not.

…--resource

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..9c60e9e
Files: 50
Comments: 2

exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once at line 621, then again at line 623, separated only by an unrelated language_extensions_enabled assignment. The diff added both lines:

exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")          # line 621 (added)
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")          # line 623 (added — duplicate)
exporter.export(full_path, resource_dict, self.template_dir)

The right-hand side is identical, so functionally this is a dead store. It looks like an unresolved merge/rebase artifact. Remove the duplicate at line 623 (the one between language_extensions_enabled and exporter.export) so the assignment appears exactly once and stays grouped with the other exporter-attribute assignments.

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When Metadata.ContainerName is unset (so _get_container_name() returns None) and the live task definition fetched via describe_task_definition has multiple containers, this loop silently overwrites the image field on every container — including sidecars (envoy, X-Ray daemon, log routers, OTel collectors):

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

When container_name is None, the if container_name is not None: break never fires, so the loop runs to completion mutating every entry. The build path (app_builder._update_built_resource) and the package path (packageable_resources.ECSTaskDefinitionImageResource._get_target_index) were both updated in this PR to raise (DockerBuildFailed / ExportFailedError) for exactly this condition; the sync path is now the only execution path that silently corrupts the user's task definition instead of failing fast. Mirror the same fail-fast check here: if container_name is None and len(container_defs) > 1, raise (or log an error and return None like the existing "ContainerName not found" branch does) rather than fanning the new image URI out across every container. Consistency across build/package/sync was the explicit reason the same fallback was removed from the other two paths in this PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..b4d96e4
Files: 50
Comments: 2

exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] The line exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once at line 621, then again at line 623, with the same right-hand side, separated only by an unrelated language_extensions_enabled assignment. The diff added both lines:

exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")          # line 621 (added)
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")          # line 623 (added, duplicate)
exporter.export(full_path, resource_dict, self.template_dir)

The second assignment is dead code — same attribute, same value. This looks like a merge/rebase artifact. It was raised in multiple previous review comments and has not been addressed. Drop one of the two assignments (and ideally place the surviving one adjacent to the other exporter attribute setters for consistency with the delete() path at line ~648, which has only one).

container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] In _register_updated_task_definition, when Metadata.ContainerName is unset (so _get_container_name() returns None) and the live task definition fetched via describe_task_definition has multiple containers, this loop silently overwrites the image field on every container — including sidecars (envoy, X-Ray daemon, log routers, OTel collectors):

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

The build path (app_builder._update_built_resource) raises DockerBuildFailed for exactly this condition, and the package path (packageable_resources.ECSTaskDefinitionImageResource._get_target_index) was just hardened to raise ExportFailedError for the same condition (per the author's response in PR discussion / commit 99822c6). The sync path is now the only execution path that doesn't fail-fast on multi-container TaskDefinitions without ContainerName — and the consequence here is worse than the package path, because this code is mutating the live, deployed TaskDefinition fetched from ECS, then registering a new revision with every container's image clobbered. Multiple previous review comments raised this and it has not been explicitly addressed.

Mirror the build/package behavior: when container_name is None and len(container_defs) > 1, log/raise and return None (skip registration) instead of silently overwriting all containers. Single-container TaskDefinitions (the unambiguous case) can keep the index-0 fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@singledigit

Copy link
Copy Markdown
Author

@bnusunny All the issues from your review have been addressed and CI is fully green (the 2 remaining failures are infra flakes — network/tar errors on the runners before any tests ran, unrelated to our code). Ready for another look when you have a chance!

@aws-sam-tooling-bot aws-sam-tooling-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 999d51f..1a7d7f3
Files: 50
Comments: 2


Comments on lines outside the diff:

[samcli/lib/package/artifact_exporter.py:619] [BUG] -621

exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once at line 619 and again at line 621, with the same right-hand side, separated only by an unrelated language_extensions_enabled assignment:

exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")
exporter.export(full_path, resource_dict, self.template_dir)

The second assignment is dead code — both reads call resource.get("Metadata") on the same resource and the first assignment is never read between them. This was flagged in earlier review iterations and not addressed. Functionally harmless today, but it's a refactoring hazard: anyone who later reorders the setters or adds a property descriptor on resource_metadata will hit a confusing double-fire. Drop the second line (line 621).

tags = response.get("tags", [])

# Fields that are output-only and must be stripped before re-registering
_READONLY_FIELDS = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] -194

When Metadata.ContainerName is unset (so _get_container_name() returns None) and the live task definition fetched via describe_task_definition has multiple containers, this loop silently overwrites the image field on every container — including sidecars (envoy, X-Ray daemon, log routers, OTel collectors):

container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
   if container_name is None or cd.get("name") == container_name:
       cd["image"] = image_uri
       updated = True
       if container_name is not None:
           break

This is the same condition that the build path raises DockerBuildFailed for (app_builder._update_built_resource, line 433) and that the package path raises ExportFailedError for (packageable_resources.ECSTaskDefinitionImageResource._get_target_index). Those two paths were tightened in this PR specifically to fail fast on this ambiguity; the sync path is now the only one that silently corrupts the live task definition.

A scenario that actually breaks: a TaskDefinition with [envoy-sidecar, app]. sam build correctly fails because Metadata.ContainerName is not set; the user adds ContainerName: app to fix it; on the subsequent sam sync, if the user (or CI) passes --code against an older committed template still missing ContainerName, the sidecar's image is silently rewritten to the user's app image. The new revision is registered and then update_service rolls it out — taking the workload down with no error from sam sync.

Match the build/package paths and raise on this condition (or, at minimum, fail-fast and skip registration with a clear error) when container_name is None and len(container_defs) > 1. The single-container case is the only one safe to default. The test suite already mocks single-container task definitions for the container_name=None paths (test_register_strips_readonly_fields, test_register_tags_included_when_present); please add coverage for the multi-container ambiguous case once it raises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build sam build command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Extend SAM container management to Bedrock AgentCore, ECS, and Fargate

4 participants