feat: add ECS and AgentCore container build/package/deploy support#9013
feat: add ECS and AgentCore container build/package/deploy support#9013singledigit wants to merge 24 commits into
Conversation
| if container_def.get("Name") == target_name: | ||
| container_def["Image"] = path | ||
| break | ||
| else: |
There was a problem hiding this comment.
[BUG] The else on line 429 binds to the if target_name: on line 424, not to the for loop on line 425. As a result, when ContainerName is set in Metadata but no container in ContainerDefinitions matches that name (e.g. typo, renamed container), no Image field is updated on any container — silently. The image is built and the ECR repo is created, but the post-build template the user inspects/deploys still has the placeholder. The fallback to the first container only fires when ContainerName is unset.
Either match the for/else semantics so a missing target falls back to the first container, or — better — surface a clear error so the user knows their ContainerName doesn't resolve. Suggested fix:
if target_name:
for container_def in container_defs:
if container_def.get("Name") == target_name:
container_def["Image"] = path
break
else:
raise ValueError(
f"Metadata.ContainerName '{target_name}' does not match any "
f"container in ContainerDefinitions"
)
else:
container_defs[0]["Image"] = pathThe new tests in test_container_build_integration.py only cover the matching and no-name cases, so this regression isn't caught.
| for i, cd in enumerate(container_defs): | ||
| if cd.get("Name") == target_name: | ||
| return i | ||
| return 0 |
There was a problem hiding this comment.
[BUG] _get_target_index returns 0 (the first container) when Metadata.ContainerName is set but doesn't match any container in ContainerDefinitions. This means sam package/sam deploy will silently rewrite the Image of the wrong container — typically a sidecar like public.ecr.aws/envoy:latest in your own integration test template — replacing it with the freshly pushed ECR URI. The user's intended container keeps its placeholder.
Combined with the matching bug in app_builder._update_built_resource, a typo in ContainerName can cause a silent, incorrect deployment rather than a build/package failure.
Treat the no-match case explicitly. For example:
def gettarget_index(self, container_defs):
metadata = getattr(self, "resource_metadata", None) or {}
target_name = metadata.get("ContainerName")
if target_name:
for i, cd in enumerate(container_defs):
if cd.get("Name") == target_name:
return i
raise exceptions.ExportFailedError(
resource_id=..., # plumb through
property_name=self.PROPERTY_NAME,
property_value=target_name,
ex=ValueError(
f"Metadata.ContainerName '{target_name}' does not match any "
f"container in ContainerDefinitions"
),
)
return 0|
|
||
| if family: | ||
| # Paginate through all clusters | ||
| cluster_paginator = ecs_client.get_paginator("list_clusters") |
There was a problem hiding this comment.
[PERFORMANCE] _force_ecs_deployment paginates every cluster in the account and every service in every cluster, then calls describe_services in batches of 10 to find services whose taskDefinition family matches. In accounts with many ECS clusters/services this fan-out is O(clusters × services / 10) describe_services calls on every sam sync iteration — slow, rate-limit-prone, and requires ecs:ListClusters/ecs:ListServices across the entire account.
Since the user already supplied the task definition (and SAM owns the CloudFormation stack), the cluster/service references are typically discoverable from the template itself (e.g., sibling AWS::ECS::Service resources referencing this task definition via !Ref) or from the deployed stack resources. Consider scoping the search to services declared in or deployed by the current stack rather than scanning the whole account.
052400d to
825128f
Compare
| "Metadata.ContainerName '%s' does not match any container in ContainerDefinitions", | ||
| target_name, | ||
| ) | ||
| container_defs[0]["Image"] = path |
There was a problem hiding this comment.
[BUG] When Metadata.ContainerName is set but does not match any container in ContainerDefinitions (typo, renamed container, copy/paste from another task definition), the for…else fall-through warns and then silently writes the built image to container_defs[0]["Image"]. In a multi-container task definition where index 0 is a sidecar (e.g. public.ecr.aws/envoy:latest, as in the integration test fixture in tests/integration/testdata/buildcmd/container_image/template.yaml), the sidecar's image reference is silently replaced with the freshly built ECR URI while the user's intended container retains placeholder. The resulting post-build template deploys a broken task definition that the user only discovers at deploy/runtime.
A LOG.warning is not sufficient — users routinely miss warnings and the failure mode is data corruption of the template. Either fail fast, or skip the update entirely so the user is forced to address the mismatch:
if resource_type == AWS_ECS_TASK_DEFINITION:
container_defs = resource_properties.get("ContainerDefinitions", [])
if not container_defs:
return
target_name = metadata.get("ContainerName") if metadata else None
if target_name:
for container_def in container_defs:
if container_def.get("Name") == target_name:
container_def["Image"] = path
return
raise DockerBuildFailed(
f"Metadata.ContainerName '{target_name}' does not match any container "
f"in ContainerDefinitions. Update Metadata.ContainerName or add a matching "
f"container."
)
if len(container_defs) > 1:
raise DockerBuildFailed(
"ContainerDefinitions has multiple containers but Metadata.ContainerName "
"is not set. Specify Metadata.ContainerName to disambiguate."
)
container_defs[0]["Image"] = pathThis was raised in the previous review and has not been addressed or dismissed — re-raising.
| "Metadata.ContainerName '%s' does not match any container in ContainerDefinitions, defaulting to index 0", | ||
| target_name, | ||
| ) | ||
| return 0 |
There was a problem hiding this comment.
[BUG] _get_target_index has the exact same silent-fallback problem as app_builder._update_built_resource: when Metadata.ContainerName is set but no container matches, it logs a warning and returns 0. sam package / sam deploy then rewrite the Image of ContainerDefinitions[0] — which is typically a sidecar in multi-container task definitions — replacing it with the pushed ECR URI. The user's intended container keeps its placeholder/non-ECR value and the deploy proceeds with a broken task definition.
Two reasons to fix this regardless of whether app_builder already validates: (1) the package code path is exercised independently of build (e.g. sam package on an already-built template, or templates produced outside SAM), and (2) the delete path (line 748) uses the same _get_target_index, so sam delete can attempt to delete the wrong ECR artifact.
Raise an error rather than silently defaulting:
def gettarget_index(self, container_defs):
metadata = getattr(self, "resource_metadata", None) or {}
target_name = metadata.get("ContainerName")
if target_name:
for i, cd in enumerate(container_defs):
if cd.get("Name") == target_name:
return i
raise exceptions.ExportFailedError(
resource_id=getattr(self, "_resource_id", ""),
property_name=self.PROPERTY_NAME,
property_value=target_name,
ex=ValueError(
f"Metadata.ContainerName '{target_name}' does not match any "
f"container in ContainerDefinitions"
),
)
if len(container_defs) > 1:
raise exceptions.ExportFailedError(
resource_id=getattr(self, "_resource_id", ""),
property_name=self.PROPERTY_NAME,
property_value=None,
ex=ValueError(
"ContainerDefinitions has multiple containers but "
"Metadata.ContainerName is not set"
),
)
return 0This was raised in the previous review and has not been addressed or dismissed — re-raising.
|
|
||
| if family: | ||
| # Paginate through all clusters | ||
| cluster_paginator = ecs_client.get_paginator("list_clusters") |
There was a problem hiding this comment.
[PERFORMANCE] _force_ecs_deployment performs an unbounded scan of the entire account on every sam sync iteration: it paginates list_clusters, then list_services for every cluster, then batches describe_services in chunks of 10 — only to filter results client-side by taskDefinition family. In an account with N clusters and M services per cluster this is O(N + N·M/10) API calls per task-definition sync, repeated on every file-watch iteration of sam sync --watch. This is slow, easy to throttle (describe_services has aggressive rate limits in shared accounts), and forces the deployer's IAM principal to hold ecs:ListClusters and ecs:ListServices across the whole account — privileges that are typically far broader than what's needed to deploy a single task definition.
The user already provides the stack name (self._deploy_context.stack_name). The supported design is to discover services in the same CloudFormation stack rather than scanning the account: enumerate AWS::ECS::Service resources from the stack (you already load self._stacks), resolve their physical IDs from self._physical_id_mapping, and call update_service directly — or describe StackResources to find services pointing at this task definition family. This is O(services-in-stack) and uses only the IAM permissions the user already needs to deploy the stack.
If a wider scan is required for some edge case, gate it behind an opt-in flag rather than making it the default path on every sync.
This was raised in the previous review and has not been addressed or dismissed — re-raising.
825128f to
03fd9e6
Compare
| "Metadata.ContainerName '%s' does not match any container in ContainerDefinitions", | ||
| target_name, | ||
| ) | ||
| container_defs[0]["Image"] = path |
There was a problem hiding this comment.
[BUG] When Metadata.ContainerName is set but no container in ContainerDefinitions has that Name (typo, renamed container, copy/paste from another task definition), the for…else falls through, logs a warning, and silently writes the freshly built image to container_defs[0]["Image"]. In a multi-container task definition where index 0 is a sidecar, this overwrites the sidecar's image with the user's app image — exactly the layout used by the integration test fixture added in this PR (tests/integration/testdata/buildcmd/container_image/template.yaml):
ContainerDefinitions:
- Name: sidecar
Image: public.ecr.aws/envoy:latest
- Name: web # if user mistypes ContainerName: 'wbe', envoy gets overwritten
Image: placeholderThe post-build template the user inspects/deploys still has the placeholder on the intended container, while the unrelated sidecar entry now points at their freshly pushed ECR image — a hard-to-diagnose deploy-time failure. A warning in the build log is easy to miss in CI output.
This was raised on the previous review and not addressed. Fail loudly when the explicit ContainerName doesn't match — raise or skip the update — rather than guessing index 0:
if target_name:
for container_def in container_defs:
if container_def.get("Name") == target_name:
container_def["Image"] = path
break
else:
raise InvalidResourceException(
resource_name,
f"Metadata.ContainerName '{target_name}' does not match any "
f"container in ContainerDefinitions",
)
else:
container_defs[0]["Image"] = path| "Metadata.ContainerName '%s' does not match any container in ContainerDefinitions, defaulting to index 0", | ||
| target_name, | ||
| ) | ||
| return 0 |
There was a problem hiding this comment.
[BUG] ECSTaskDefinitionImageResource._get_target_index has the same silent-fallback problem: when Metadata.ContainerName is set but doesn't match any Name in ContainerDefinitions, it logs a warning and returns 0. sam package / sam deploy then rewrite ContainerDefinitions[0]["Image"] — typically a sidecar in multi-container task definitions — replacing it with the pushed ECR URI. The user's intended container keeps its placeholder, and the sidecar starts running the wrong image at deploy time.
This is the package-side counterpart of the build-side bug above and compounds it: even if the build path is fixed, package will still corrupt the template here. It was raised on the previous review and not addressed.
Either return a sentinel and let export skip with a hard error, or raise directly. Returning index 0 on a typo is the most surprising possible behavior for a multi-container resource:
def gettarget_index(self, container_defs):
metadata = getattr(self, "resource_metadata", None) or {}
target_name = metadata.get("ContainerName")
if target_name:
for i, cd in enumerate(container_defs):
if cd.get("Name") == target_name:
return i
raise exceptions.ExportFailedError(
resource_id=...,
property_name=self.PROPERTY_NAME,
property_value=target_name,
ex=ValueError(
f"Metadata.ContainerName '{target_name}' does not match "
f"any container in ContainerDefinitions"
),
)
return 003fd9e6 to
e0e5b2a
Compare
| container_defs = resource_dict.get("ContainerDefinitions", []) | ||
| if not container_defs: | ||
| return | ||
| target_idx = self._get_target_index(container_defs) |
There was a problem hiding this comment.
[BUG] ECSTaskDefinitionImageResource.delete() calls _get_target_index(container_defs), which reads self.resource_metadata via getattr(self, "resource_metadata", None) or {}. However, resource_metadata is only set on the exporter instance during the export flow (artifact_exporter.py:495) — the delete flow at artifact_exporter.py:522 instantiates the exporter and calls exporter.delete(...) without ever assigning resource_metadata.
As a result, during sam delete, _get_target_index always falls through to return 0, even when Metadata.ContainerName is set. In the multi-container TaskDefinition layout used by the integration test fixture in this PR (tests/integration/testdata/buildcmd/container_image/template.yaml) — where index 0 is a sidecar (public.ecr.aws/envoy:latest) and the user's container is web at index 1 — sam delete will inspect the sidecar's Image instead of the user's container. With a public image at index 0, the deletion silently no-ops (the user's pushed ECR image is leaked), and if index 0 happened to be a private ECR image, the wrong artifact would be deleted.
This is the same class of silent-fallback bug the export path was just fixed for. Either:
- set
exporter.resource_metadata = resource.get("Metadata")on the delete path inartifact_exporter.pybefore callingexporter.delete(...), mirroring the export path; or - have the delete code path resolve
ContainerNamefrom theresource_dictdirectly rather than relying on instance state set elsewhere.
# samcli/lib/package/artifact_exporter.py — delete() loop
exporter = exporter_class(self.uploaders, None)
exporter.resource_metadata = resource.get("Metadata") # add this
exporter.delete(resource_id, resource_dict)e0e5b2a to
a29380d
Compare
Extends SAM CLI to build, package, and deploy container images for AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources using the same Metadata convention as Lambda Image functions. Fixes aws#8933
a29380d to
2eeb2b3
Compare
|
Hello, could you review the failures in the tests? |
…lure - Extract magic number 3 to _ECS_SERVICE_ARN_PARTS constant - Switch test Dockerfile to public.ecr.aws alpine (multi-arch, no rate limits)
Windows CI runners use Windows containers and cannot pull Linux images. This matches the pattern used by existing Lambda image build tests.
| for svc in svc_response.get("services", []): | ||
| svc_task_def = svc.get("taskDefinition", "") | ||
| # Check if this service references our task definition family | ||
| if physical_id in svc_task_def or ( |
There was a problem hiding this comment.
[BUG] The substring check physical_id in svc_task_def can produce false positives that trigger forceNewDeployment on the wrong ECS service. physical_id is the TaskDefinition ARN including revision (e.g. arn:aws:ecs:...:task-definition/my-app:5), and ARNs are prefix-substrings of higher-revision ARNs of the same family (arn:aws:ecs:...:task-definition/my-app:50). They are also substrings of family names that share a prefix (e.g. my-app vs my-app-staging). When that happens, sam sync issues forceNewDeployment against an unrelated service in the account.
The family-name comparison on the right side of the or already correctly identifies services that reference the same task-definition family, regardless of revision. Drop the substring check and rely on the family-name comparison alone:
svc_family = svc_task_def.rsplit("/", 1)[-1].split(":", 1)[0]
my_family = physical_id.rsplit("/", 1)[-1].split(":", 1)[0]
if svc_family and svc_family == my_family:
ecs_client.update_service(
cluster=cluster,
service=service_name,
forceNewDeployment=True,
)…nd add error case tests - Remove substring check in _force_ecs_deployment that could match wrong services (e.g. my-app:5 matching my-app:50). Use family-name comparison only. - Add unit tests for ContainerName mismatch error handling in both app_builder and packageable_resources. - Clean up unused imports in test file.
|
Hi @reedham-aws — the test failures have been resolved:
All 51 real checks are green now. The one remaining failure was a pyinstaller-linux infra flake (build artifact not produced), so I've retriggered CI. Ready for review when you get a chance! |
|
|
||
| builder = ApplicationBuilder( | ||
| ( | ||
| self._build_context.collect_build_resources(self._resource_identifier) |
There was a problem hiding this comment.
[BUG] When sam sync falls into _build_from_scratch (i.e. _application_build_result is None), ApplicationBuilder is constructed with:
self._build_context.collect_build_resources(self._resource_identifier)
if hasattr(self._build_context, "collect_build_resources")
else self._build_context.get_resources_to_build()BuildContext.collect_build_resources() only collects functions (via _collect_single_function_and_dependent_layers) and layers (via _collect_single_buildable_layer). When neither is found it raises ResourceNotFound("Unable to find a function or layer with name ''") (build_context.py:1265-1271). Since the _resource_identifier here is an AWS::ECS::TaskDefinition or AWS::BedrockAgentCore::Runtime logical ID — never a function or layer — this branch will always raise on first reach, breaking the from-scratch sync path for ECS/AgentCore resources.
The hasattr(...) guard is also dead code: BuildContext always defines collect_build_resources, so the else branch (get_resources_to_build()) is never taken.
Suggested fix: build an empty/synthetic ResourcesToBuildCollector (the actual container build is performed by builder.build_container_images([build_def]) on line 127, which doesn't depend on resources_to_build), or guard _build_from_scratch so it short-circuits when only container resources are involved.
collect_build_resources() only handles functions/layers and raises ResourceNotFound for ECS/AgentCore resource IDs. Since build_container_images() doesn't depend on resources_to_build, pass an empty collector instead.
|
Thanks for this contribution — the design is generally well-shaped and the build/package paths look solid. Before we can merge, there are three correctness issues in the sync path and the multi-container handling that need to be addressed. 1.
|
…tension functions (aws#9030) * refactor(package): replace METADATA_EXPORT_LIST and GLOBAL_EXPORT_DICT with typed registries Migrate two flat package exporter registries to typed dataclass form: - METADATA_EXPORT_LIST (list of Resource subclasses) -> METADATA_EXPORTS (List[MetadataExportSpec]). Each spec captures both the metadata_type ('AWS::ServerlessRepo::Application') and the per-property exporter classes that handle LicenseUrl/ReadmeUrl. Template._export_metadata now dispatches via a metadata_type lookup instead of a per-class RESOURCE_TYPE filter. - GLOBAL_EXPORT_DICT (dict keyed by Fn::Transform) -> GLOBAL_TRANSFORM_EXPORTS (List[GlobalTransformExportSpec]). Each spec carries a discriminator callable (e.g. _is_aws_include for matching AWS::Include) plus the handler. _export_global_artifacts now dispatches through the discriminator so future Fn::Transform variants can register without touching the walker. The typed shape is the contract a follow-up commit will read from to process AWS::Include before language-extension expansion. * feat(package): merge SAR Metadata exports back into LE child templates Extend merge_language_extensions_s3_uris with a registry-driven Metadata pass that copies rewritten property values (LicenseUrl, ReadmeUrl) from the exported template back into the original (Fn::ForEach-preserving) template after LE expansion. Without this pass, when a child stack uses Transform: AWS::LanguageExtensions and declares AWS::ServerlessRepo::Application metadata, sam package silently dropped the License/Readme S3 URLs from the merged output — they were uploaded but never wired into the template the user deploys. Implementation iterates METADATA_EXPORTS (the registry added in the prior commit) so future metadata types pick up merge support without touching the merge walker. * fix(package): process AWS::Include before language-extension expansion (aws#9027) Closes aws#9027. Symptom: when a template uses both Transform: AWS::LanguageExtensions and contains Fn::Transform: AWS::Include buried inside an LE function (e.g. Fn::ToJsonString), sam package fails to rewrite the include's Location to an S3 URL. CloudFormation then rejects the deploy with 'The location parameter is not a valid S3 uri.' Root cause: PackageContext._export ran expand_language_extensions BEFORE the artifact exporter walked Fn::Transform nodes. LE functions like Fn::ToJsonString json.dumps() their argument, collapsing the structural Fn::Transform subtree into a JSON-string literal. By the time the exporter ran, the include was no longer a structural dict node and was invisible to the global-transform walker. Fix: process AWS::Include (and any other GLOBAL_TRANSFORM_EXPORTS handler) on the original template BEFORE LE expansion runs, mirroring CloudFormation's own server-side transform ordering — CFN resolves inline Fn::Transform macros before evaluating AWS::LanguageExtensions. Implementation: - Extract Template._export_global_artifacts to a module-level _export_global_artifacts_pass(template_dict, uploader, template_dir). The instance method becomes a one-line delegate so existing callers keep working. - Call _export_global_artifacts_pass on the original template before expand_language_extensions in PackageContext._export (root flow) and in CloudFormationStackResource.do_export (nested-stack child flow). Dynamic-Location AWS::Include inside Fn::ForEach (e.g. Location: ./swagger-${Name}.yaml) is not supported by sam package: a local file path with literal ${...} placeholders does not exist on disk, so is_local_file fails and the existing InvalidLocalPathError fires — which is the right user-facing failure. CloudFormation does not substitute loop variables into Fn::Transform paths server-side either, so the limitation matches CFN's actual capability. * test(package): integration coverage for AWS::Include + SAR Metadata in LE templates Three end-to-end tests that exercise the package_context._export-equivalent flow on language-extension templates: - test_le_template_with_top_level_aws_include_merges_location verifies AWS::Include in Outputs gets its Location rewritten to s3:// after the pre-LE pass. - test_le_template_with_serverless_repo_metadata_merges_license_url verifies SAR LicenseUrl/ReadmeUrl in Metadata get merged back. - TestPackageContextIssue9027 reproduces the user template from aws#9027 (Fn::ToJsonString over Fn::Transform: AWS::Include buried inside AWS::SSM::Parameter) and asserts the buried Location is rewritten to s3://. Locks down the regression. * docs(cfn-lang-ext): document AWS::Include processing order vs language extensions Add a section explaining that sam package processes Fn::Transform: AWS::Include before language-extension expansion, mirroring CloudFormation's server-side transform ordering. This means AWS::Include Location rewrites work correctly even when the include lives buried inside language-extension functions like Fn::ToJsonString or Fn::ForEach bodies. * test(package): align LE tests with opt-in API from aws#9033 PR aws#9033 made AWS::LanguageExtensions local processing opt-in. Two API changes broke three tests on this branch after the merge from develop: - expand_language_extensions() now requires a keyword-only `enabled` arg - PackageContext reads self._language_extensions_enabled, set in __init__ Pass enabled=True to expand_language_extensions in the two artifact-exporter tests, and set _language_extensions_enabled on the PackageContext stub that bypasses __init__ via __new__. All three tests exercise templates with Transform: AWS::LanguageExtensions, so enabling LE is the intended behavior. * test(package): hoist inline imports in TestPackageContextBuriedAWSInclude Move os, Destination/Uploaders, and yaml_parse to module-level imports; drop the redundant inline tempfile/PackageContext (already at module level). Addresses inline-imports review comment on aws#9030. * refactor(package): hoist InvalidSamDocumentException and expand_language_extensions imports Move the two inline imports inside _export() to module scope. No behavior change; this is preparation for the upcoming _export() split which references these from a new branch method. * refactor(package): add _export_without_language_extensions (off-path branch) New private method that mirrors pre-1.160.0 sam-cli behavior: a tight Template.export() pipeline with no LE machinery. Unused by _export() until the dispatcher is cut over in a follow-up commit. Also adds two structural-gate smoke tests that stay red until the dispatcher is wired up — they assert that the off path never invokes expand_language_extensions or the pre-LE _export_global_artifacts_pass. * refactor(package): add _export_with_language_extensions (on-path branch) New private method containing the existing aws#9027 ordering: pre-LE include pass, LE expansion, Template.export(), merge, Mappings. Unused by _export() until the dispatcher is cut over in the next commit. * refactor(package): split _export() into LE / non-LE branches gated on flag _export() becomes a thin dispatcher: read the template, branch on self._language_extensions_enabled, dump. The two branch methods added in the prior two commits now own the actual work. Treats --language-extensions as a hard correctness boundary: when off, no LE machinery is invoked at all (no expand_language_extensions, no pre-LE _export_global_artifacts_pass, no merge, no Mappings). Template.export() still handles AWS::Include for non-LE templates via its own internal _export_global_artifacts pass — that path has worked since long before the aws#9027 fix and is unchanged by this refactor. Public surface (PackageContext._export(template_path, use_json) -> str) is preserved; all existing _export tests pass unchanged. * test(package): repoint expand_language_extensions patches to package_context use site The hoist in 756c502 moved `expand_language_extensions` to a module-level import in `package_context.py`. The use-site binding is now captured at module load, so existing tests that patched the source module `samcli.lib.cfn_language_extensions.sam_integration.expand_language_extensions` no longer intercept calls from `_export()`. Repoint two patches at the use-site `samcli.commands.package.package_context.expand_language_extensions`: - `test_phase_separation.py::test_package_context_export_calls_expand_language_extensions` was failing intermittently in CI depending on test-collection order. - `test_package_context.py::test_off_path_does_not_invoke_expand_language_extensions` is retargeted for consistency now that the inline-import shadow is gone (post-Task-4 cutover); the dead `had_language_extensions=False` scaffolding can also go since the off path no longer calls expand. Other source-module patches in `test_artifact_exporter.py` are correct as-is — those tests cover `Template.export()` paths whose use site is the artifact_exporter module. * refactor(package): hoist do_export inline imports and repoint test patches Hoist expand_language_extensions, IntrinsicsSymbolTable, generate_and_apply_artifact_mappings, and merge_language_extensions_s3_uris to module-level imports in artifact_exporter.py — preparation for the LE / non-LE structural-gate split of CloudFormationStackResource.do_export. Module-level binding requires existing tests that patched samcli.lib.cfn_language_extensions.sam_integration.expand_language_extensions (the source module) to repoint at the use-site samcli.lib.package.artifact_exporter.expand_language_extensions, since the use-site name is captured at module load and source-module patches no longer intercept once the inline import is gone. Same lesson as PR aws#9030 commit 45dea4b for package_context. * refactor(package): add _do_export_without_language_extensions branch method LE-off branch for CloudFormationStackResource: path-based Template construction with no pre-LE pass, no parameter_values, no deepcopy. Method is unused until the dispatcher rewrite — wiring lands in the follow-up commit. Adds TestDoExportLanguageExtensionsStructuralGate with two red structural-gate assertions that the dispatcher rewrite will turn green. * refactor(package): add _do_export_with_language_extensions branch method Lifts the existing LE-expansion body from CloudFormationStackResource.do_export into a dedicated method (resource_id, template_path, parent_dir, abs_template_path, resource_dict) -> Dict. No behavior change yet — the dispatcher in the follow-up commit will route LE-on calls into it. Threads resource_dict so the branch can read nested-stack Parameters without the dispatcher passing them separately. Returns the exported template dict so the dispatcher owns the final yaml_dump + upload tail. * refactor(package): split do_export into LE / non-LE branches gated on flag CloudFormationStackResource.do_export is now a thin dispatcher that: 1) does the early-exit guards (None / S3 URL / non-local file) 2) routes to _do_export_with_language_extensions or _do_export_without_language_extensions based on self.language_extensions_enabled 3) owns the final yaml_dump + upload + property mutation tail. The off path is now structurally LE-free: no expand_language_extensions call, no _export_global_artifacts_pass, no IntrinsicsSymbolTable pseudo-param plumbing, no parent_parameter_values, no copy.deepcopy. Template.export() runs its own internal _export_global_artifacts so AWS::Include still resolves on the off path. The structural-gate tests added in the previous commit are now green. Existing LE tests that rely on language_extensions_enabled = True now set the flag explicitly (where they previously depended on the LE machinery running unconditionally as a passthrough). * chore(package): apply make format to do_export refactor Black collapsed multi-line calls and signatures whose single-line form fits under the 120-char limit. No behavior change. * refactor(package): add _build_child_parameter_values helper Adds a CFN-parity helper that builds the parameter_values dict for a nested child stack: pseudo-params (with parent pseudo-name overrides honored) plus parent-rebound values via the nested-stack Parameters property. Non-pseudo parent names are not copied; child Defaults are read by the LE expander itself via parsed_template.parameters. Helper is unused production code in this commit. Wired into CloudFormationStackResource._do_export_with_language_extensions in the next commit. * fix(package): tighten parent_parameter_values scoping in LE path CloudFormationStackResource._do_export_with_language_extensions used to bulk-copy the parent stack's full parameter_values into the child, so a parent's `Foo=parent_foo` could silently shadow an unrelated child Parameter named Foo. This diverged from CloudFormation's nested-stack contract (only parent-rebound names reach the child) and from the non-LE path (which threads no parameters at all). Replace the merge block with _build_child_parameter_values, which returns pseudo-params (with parent pseudo-name overrides honored) plus parent-rebound values from the nested-stack Parameters property. Child template Defaults still resolve via the LE expander's parsed_template fallback (resolvers/fn_ref.py:_resolve_parameter). Adds end-to-end coverage: - non-pseudo parent param does not leak into child's parameter_values - child Default still resolves via resolver fallback after the change * refactor(package): trim Task 2 redundant comments and verbose test docstrings Code-review polish for the LE parent-param scoping fix: - drop call-site comment block at _do_export_with_language_extensions that duplicated _build_child_parameter_values's docstring - shorten the two new test docstrings in TestCloudFormationStackResourceChildExpansion to match neighbor style (no internal module paths) * refactor(tests): hoist inline imports added in LE parent-param fix Move imports introduced by the previous two commits from inside test bodies up to the module-level import block: - _build_child_parameter_values, IntrinsicsSymbolTable - yaml_dump (samcli.yamlhelper), shutil, inspect (stdlib) None of these symbols are @patch targets, so hoisting does not affect any mock binding (use-site patches in artifact_exporter remain correct). * refactor(tests): hoist remaining inline imports and extract LE fixtures Address review feedback on PR aws#9030: inline imports should live at module scope, and on-the-fly file writes for LE child templates should use checked-in fixtures instead of tempfile.mkdtemp() + yaml_dump(). Hoist inline imports (LanguageExtensionResult, _resolve_nested_stack_parameters, yaml_dump, shutil, the packageable_resources block) to the module-level import block. None are @patch targets, so use-site patches remain correct. Add tests/unit/lib/package/test_data/ following the precedent in tests/unit/lib/intrinsic_resolver/test_data, with TEST_DATA_PATH constant defined as Path(__file__).resolve().parent / "test_data". Convert the LE parent-param test methods, the buried-AWS::Include test, the expansion-error-handling pair, and the structural-gate pair to read their child templates (and the export-events.json include target) from the fixture tree. The on-the-fly tempdir + yaml_dump scaffolding is removed entirely along with the corresponding _write_child / _write_minimal_child helpers. No production code change; all 110 unit tests in test_artifact_exporter still pass. * refactor(tests): extract TestPackageContextBuriedAWSInclude inline fixture Replace the on-the-fly tempfile + open()+write() scaffolding in TestPackageContextBuriedAWSInclude.setUp with a checked-in fixture under tests/unit/commands/package/test_data/buried_aws_include/. Mirrors the TEST_DATA_PATH pattern already in tests/unit/lib/package/test_data and tests/unit/lib/intrinsic_resolver/test_data. setUp now resolves template_path from TEST_DATA_PATH; the export-events include target lives next to template.yaml so the relative Location in the template still resolves at sam-package time. No production code change; all 146 tests in test_package_context still pass.
…8934) Replace hand-rolled copytree implementations with shutil.copytree(..., symlinks=True, dirs_exist_ok=True) in both osutils.py and copy_terraform_built_artifacts.py. The custom implementations were originally needed for Python 3.7 compatibility (dirs_exist_ok was added in 3.8), but SAM CLI now requires Python 3.10+, making them unnecessary. Using the stdlib version with symlinks=True correctly preserves both file and directory symlinks without following them. Changes: - osutils.copytree: replaced ~40-line manual implementation with single shutil.copytree call; removed unused errno import; replaced os.remove try/except with Path.unlink(missing_ok=True) in remove() - Terraform copy_terraform_built_artifacts.copytree: same replacement - Updated unit tests to match new implementation - Added integration-style tests verifying file and directory symlink preservation on disk - Audited os.walk(followlinks=True) in package/utils.py — intentional and correct for zip packaging (zip files need dereferenced content) This is a code quality / defense-in-depth improvement, not a security fix. We do not believe this warrants a new CVE assignment.
Extends SAM CLI to build, package, and deploy container images for AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources using the same Metadata convention as Lambda Image functions. Fixes aws#8933
…sync - sam sync now captures the ECR URI returned by ECRUploader.upload and registers a new TaskDefinition revision with the updated image, then updates matching ECS services to that revision (fixes silent no-op) - Tighten _force_ecs_deployment → _update_services_to_task_definition to require both the arn:aws:ecs: prefix and :service/ segment, preventing false matches on App Runner / VPC Lattice / AppMesh ARNs - Split AgentCore out of ECSContainerSyncFlow: GENERATOR_MAPPING now routes AWS::BedrockAgentCore::Runtime to _create_agentcore_flow which logs a clear 'not yet supported' warning and returns None (UpdateAgentRuntime not yet implemented) - Add AWS::ECS::TaskDefinition to SyncCodeResources._accepted_resources so sam sync --code-resource AWS::ECS::TaskDefinition is accepted by Click - Add ApiCallTypes.UPDATE_CONTAINER_IMAGE enum value; use it in ECSContainerSyncFlow._get_resource_api_calls instead of UPDATE_FUNCTION_CODE - Fix multi-container TaskDefinition safety: raise DockerBuildFailed when >1 ContainerDefinitions and no Metadata.ContainerName (app_builder.py) - Fix AgentCore _update_built_resource: replace setdefault chain (which corrupts intrinsics) with direct dict assignment - Fix isinstance(metadata, dict) guards in _update_built_resource, _has_container_build_metadata, and ECSTaskDefinitionImageResource - Update ECSTaskDefinitionImageResource docstring and add LOG.warning for multi-container fallback (packageable_resources.py) - Remove dead BuildGraph.CONTAINER_BUILD_DEFINITIONS constant and get/put_container_build_definition methods (zero callers, not serialized in _read/_write) - Add test_ecs_container_sync_flow.py with 17 unit tests covering the new sync path, ARN filtering, error handling, and tag propagation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| property_name=self.PROPERTY_NAME, | ||
| property_value=target_name, | ||
| ex=ValueError( | ||
| f"Metadata.ContainerName '{target_name}' does not match any container in ContainerDefinitions" |
There was a problem hiding this comment.
[BUG] The silent fallback in _get_target_index is inconsistent with the build path and can cause sam package to upload to the wrong container. When Metadata.ContainerName is unset on a multi-container AWS::ECS::TaskDefinition, this code logs a warning and returns index 0:
if len(container_defs) > 1:
LOG.warning(
"TaskDefinition has multiple containers but Metadata.ContainerName is not set; "
"packaging the first container. Add 'ContainerName: <name>' to the resource Metadata "
"to avoid ambiguity."
)
return 0The corresponding build path in app_builder.py:_update_built_resource handles the exact same scenario by raising DockerBuildFailed:
elif len(container_defs) > 1:
raise DockerBuildFailed(
f"TaskDefinition has {len(container_defs)} containers but Metadata.ContainerName is not set. "
f"Add 'ContainerName: <name>' to the resource Metadata to specify which container to build."
)When sam package is run independently of sam build (e.g., user pre-built their image, or running on a templated artifact pipeline), this fall-through still triggers the same silent-wrong-container failure mode the previous bot comments flagged for the ContainerName-mismatch case: the freshly pushed ECR URI is written into ContainerDefinitions[0]["Image"] — typically a sidecar in a multi-container task definition (e.g. public.ecr.aws/envoy:latest in your own integration test fixture). The user's intended container retains its placeholder, while the sidecar is silently overwritten.
The fix is to mirror the build path and raise ExportFailedError here instead of warning + returning 0. The existing raise exceptions.ExportFailedError(...) branch above (for the target_name-set-but-no-match case) is the right pattern to follow.
|
Hi @bnusunny — thanks for the thorough review. All findings have been addressed and pushed. Here's what changed: Blocking issues1. 2. AgentCore sync has no rollout step 3. Multi-container TaskDefinition silent mis-deploy Non-blocking fixes
Added 17 unit tests for |
… TaskDef Mirror the build path: when len(ContainerDefinitions) > 1 and Metadata.ContainerName is unset, sam package now raises ExportFailedError instead of silently packaging the first container. Prevents the wrong-container overwrite in standalone sam package invocations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Good catch on the |
There was a problem hiding this comment.
Code Review Results
Reviewed: 999d51f..99822c6
Files: 16 (production code only — skipped tests, fixtures, and docs)
Comments: 2
Comments on lines outside the diff:
[samcli/lib/package/artifact_exporter.py:619] [BUG] -622
The new export loop sets exporter.resource_metadata twice on consecutive lines:
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata") # first assignment
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata") # duplicate
exporter.export(full_path, resource_dict, self.template_dir)This looks like a merge artifact from the recent fix that wired resource_metadata into the export path. Both assignments set the same value so it's not functionally broken, but it's confusing for readers and a clear sign the change wasn't reviewed in isolation. Remove one of them. Also note the equivalent block in Template.delete() only sets it once, which is the intended pattern.
| updated = True | ||
| if container_name is not None: | ||
| break | ||
|
|
There was a problem hiding this comment.
[BUG] -215
In _register_updated_task_definition, when _get_container_name() returns None (i.e. the user did not set Metadata.ContainerName), the loop overwrites the image field on every container in the live task definition:
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
breakThe build path now rejects multi-container local templates without ContainerName, so the assumption seems to be "if container_name is None, there's only one container." But register_input["containerDefinitions"] comes from describe_task_definition against the live task definition, not the local template. Common drift scenarios produce a multi-container live TD with a single-container local template:
- User deployed with one container, then added a sidecar (xray-daemon, envoy, datadog-agent) via console / aws ecs register-task-definition / a separate IaC tool.
- User imported an existing multi-container TD into the SAM stack but only manages the primary container in the template.
In both cases, sam sync will silently rewrite every container's image to the freshly built ECR URI, replacing the sidecars. This is the same class of silent-corruption bug the author already fixed in _update_built_resource and _get_target_index — the sync path needs the same guard. Suggest raising (or at minimum logging a warning and skipping) when container_name is None and len(container_defs) > 1, matching the build/package error message pattern.
…lder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| container_defs = register_input.get("containerDefinitions", []) | ||
| updated = False | ||
| for cd in container_defs: | ||
| if container_name is None or cd.get("name") == container_name: |
There was a problem hiding this comment.
[BUG] When Metadata.ContainerName is unset and the live task definition has multiple containers, this loop silently overwrites the image on every container — including sidecars (e.g. envoy proxy, log router, X-Ray daemon). The build path raises DockerBuildFailed and the package path raises ExportFailedError for the exact same condition; the sync path is now the only execution path that doesn't fail-fast and instead corrupts unrelated containers.
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
breakConcrete failure scenario: a user runs sam sync on a task definition with [envoy-sidecar, app] and forgot to set ContainerName. The freshly built app image gets written to envoy-sidecar.image AND app.image, then registered as a new revision and rolled out. The sidecar is now broken. The unit tests in test_ecs_container_sync_flow.py only cover the single-container container_name=None case and the explicit container_name="app" case — they never exercise the multi-container + None combination, which is why this slipped through.
Fix: when container_name is None and len(container_defs) > 1, log an error and return None (consistent with the existing "ContainerName not found" early-return below) so a new revision is not registered. Mirroring the build/package error message keeps the three paths consistent.
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
if container_name is None and len(container_defs) > 1:
LOG.warning(
"%sTaskDefinition has %d containers but Metadata.ContainerName is not set; "
"skipping registration. Add 'ContainerName: <name>' to the resource Metadata.",
self.log_prefix, len(container_defs),
)
return None
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
break| exporter.parent_parameter_values = self.parameter_values | ||
| exporter.resource_metadata = resource.get("Metadata") | ||
| exporter.language_extensions_enabled = self.language_extensions_enabled | ||
| exporter.resource_metadata = resource.get("Metadata") |
There was a problem hiding this comment.
[BUG] exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once on line 621 and again on line 623, with the same value, around an unrelated language_extensions_enabled assignment in between:
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata") # line 621
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata") # line 623 — duplicate
exporter.export(full_path, resource_dict, self.template_dir)This is functionally a no-op (same value re-assigned) but is clearly an accidental merge/rebase artifact — the diff hunk shows both lines as added. Beyond cosmetics, leaving it in is a maintainability hazard: a future reader changing one site (e.g. wrapping in copy.deepcopy or threading a different metadata source) will likely miss the duplicate and introduce a real divergence. Delete one of the two lines.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| register_input["tags"] = tags | ||
|
|
||
| try: | ||
| reg_response = ecs_client.register_task_definition(**register_input) |
There was a problem hiding this comment.
[BUG] When Metadata.ContainerName is unset, _get_container_name() returns None and the loop in _register_updated_task_definition overwrites image on every container in the live task definition — including sidecars (envoy/X-Ray/log routers). This is the same condition that the build path (app_builder._update_built_resource) and the package path (packageable_resources._get_target_index, fixed in 99822c6) both fail-fast on with DockerBuildFailed/ExportFailedError. The sync path is now the only path that silently rewrites the wrong containers.
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
breakThis was raised in a prior review and not explicitly dismissed by the author — the 99822c6 fix only covers packageable_resources._get_target_index. Match the build/package semantics: when container_name is None and len(container_defs) > 1, raise (or at minimum return None and abort the registration) with the same "Add 'ContainerName: ' to the resource Metadata…" message used elsewhere. The single-container case (len == 1) can keep the current behavior.
| # Export code resources | ||
| exporter = exporter_class(self.uploaders, self.code_signer, cache) | ||
| exporter.parent_parameter_values = self.parameter_values | ||
| exporter.resource_metadata = resource.get("Metadata") |
There was a problem hiding this comment.
[BUG] exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once on line 621, again on line 623, with the same value, separated only by an unrelated language_extensions_enabled assignment:
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata") # line 621
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata") # line 623
exporter.export(full_path, resource_dict, self.template_dir)Functionally redundant but it has been flagged in multiple previous review iterations without an explicit dismissal. It looks like a merge/rebase artifact and should be deduplicated to avoid confusion about whether one of them was meant to point at a different value (e.g. the intent of one of the two being moved before/after a different setter).
Annotate new_arn as Optional[str] to satisfy mypy's no-any-return check on the dict.get() chain from boto3 response. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| exporter.parent_parameter_values = self.parameter_values | ||
| exporter.resource_metadata = resource.get("Metadata") | ||
| exporter.language_extensions_enabled = self.language_extensions_enabled | ||
| exporter.resource_metadata = resource.get("Metadata") |
There was a problem hiding this comment.
[BUG] exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once at line 621 and again at line 623, separated only by an unrelated language_extensions_enabled assignment:
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata") # line 621
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata") # line 623 — duplicate
exporter.export(full_path, resource_dict, self.template_dir)The second assignment is redundant — both lines compute and assign identical values to the same attribute. This was flagged in multiple prior reviews (9ffa5ed8..03fd9e68, 999d51fa..b3a72328, 999d51fa..854065ba) but has not been removed. It looks like a merge artifact from when the metadata-passing fix was applied. Drop one of them — keeping line 621 reads more naturally with the other exporter setup.
| container_defs = register_input.get("containerDefinitions", []) | ||
| updated = False | ||
| for cd in container_defs: | ||
| if container_name is None or cd.get("name") == container_name: |
There was a problem hiding this comment.
[BUG] When Metadata.ContainerName is unset and the live task definition fetched via describe_task_definition has multiple containers, this loop silently overwrites image on every container — including sidecars (envoy, X-Ray daemon, log routers):
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
breakThe build path (ApplicationBuilder._update_built_resource) raises DockerBuildFailed on this same condition, and the package path (ECSTaskDefinitionImageResource._get_target_index, fixed in 99822c6) raises ExportFailedError. The sync path is now the only execution path that fails open and corrupts the registered task definition.
This is reachable in practice: a user can run sam build && sam deploy with a single-container template, then later edit the live task definition (or another teammate does) to add a sidecar; on the next sam sync, the local template still has no ContainerName and the sync path will rewrite the sidecar image alongside the app image.
Fail-fast to match the other two paths — when container_name is None and len(container_defs) > 1, raise (or log an error and return None) with the same message pattern: "TaskDefinition has N containers but Metadata.ContainerName is not set...".
…iner strict error - test_ecs_and_agentcore_use_same_flow_creator → assertNotEqual since AgentCore now routes to _create_agentcore_flow, not _create_ecs_container_flow - test_ecs_task_definition_falls_back_to_first_without_container_name → now expects DockerBuildFailed for multi-container TaskDef without ContainerName - Add @skipIf(IS_WINDOWS) to TestContainerImageBuild integ test — Linux container builds fail on Windows CI runners Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| exporter.parent_parameter_values = self.parameter_values | ||
| exporter.resource_metadata = resource.get("Metadata") | ||
| exporter.language_extensions_enabled = self.language_extensions_enabled | ||
| exporter.resource_metadata = resource.get("Metadata") |
There was a problem hiding this comment.
[BUG] The line exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — first at line 621, then again at line 623, with the same right-hand side, separated only by an unrelated language_extensions_enabled assignment:
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata") # line 621
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata") # line 623 — duplicate
exporter.export(full_path, resource_dict, self.template_dir)The second assignment is dead code. This was flagged in multiple prior bot reviews (most recently against 9d09336) and was not addressed in 5f39dd8. Drop one of the two assignments — keeping the one above language_extensions_enabled (line 621) is consistent with the order used elsewhere.
| "registeredAt", | ||
| "registeredBy", | ||
| } | ||
| register_input = {k: v for k, v in task_def.items() if k not in _READONLY_FIELDS} |
There was a problem hiding this comment.
[BUG] In _register_updated_task_definition, when Metadata.ContainerName is unset (so _get_container_name() returns None) and the live task definition has multiple containers, this loop silently overwrites the image field on every container — including sidecars (envoy, X-Ray daemon, log routers, OTel collectors):
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
breakWhen container_name is None, the conditional short-circuits to True for every container, the inner break is never hit, and every container's image is replaced with the freshly pushed URI.
The build path (app_builder._update_built_resource) and the package path (packageable_resources.ECSTaskDefinitionImageResource._get_target_index, both fixed earlier in this PR) raise on the exact same condition — multi-container TaskDefinition without Metadata.ContainerName. The sync path is now the only execution path that doesn't fail-fast, leaving inconsistent behavior between sam build/package/deploy and sam sync for the same template. This was flagged in multiple prior bot reviews and was not addressed.
Recommended fix: when container_name is None and len(container_defs) > 1, log an error and return None (skip the registration) — or, to match the build/package paths, raise so the user gets a clear "set Metadata.ContainerName" message. Either is fine; the current silent multi-overwrite is not.
…--resource Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| exporter.parent_parameter_values = self.parameter_values | ||
| exporter.resource_metadata = resource.get("Metadata") | ||
| exporter.language_extensions_enabled = self.language_extensions_enabled | ||
| exporter.resource_metadata = resource.get("Metadata") |
There was a problem hiding this comment.
[BUG] exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once at line 621, then again at line 623, separated only by an unrelated language_extensions_enabled assignment. The diff added both lines:
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata") # line 621 (added)
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata") # line 623 (added — duplicate)
exporter.export(full_path, resource_dict, self.template_dir)The right-hand side is identical, so functionally this is a dead store. It looks like an unresolved merge/rebase artifact. Remove the duplicate at line 623 (the one between language_extensions_enabled and exporter.export) so the assignment appears exactly once and stays grouped with the other exporter-attribute assignments.
| container_name = self._get_container_name() | ||
| container_defs = register_input.get("containerDefinitions", []) | ||
| updated = False | ||
| for cd in container_defs: |
There was a problem hiding this comment.
[BUG] When Metadata.ContainerName is unset (so _get_container_name() returns None) and the live task definition fetched via describe_task_definition has multiple containers, this loop silently overwrites the image field on every container — including sidecars (envoy, X-Ray daemon, log routers, OTel collectors):
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
breakWhen container_name is None, the if container_name is not None: break never fires, so the loop runs to completion mutating every entry. The build path (app_builder._update_built_resource) and the package path (packageable_resources.ECSTaskDefinitionImageResource._get_target_index) were both updated in this PR to raise (DockerBuildFailed / ExportFailedError) for exactly this condition; the sync path is now the only execution path that silently corrupts the user's task definition instead of failing fast. Mirror the same fail-fast check here: if container_name is None and len(container_defs) > 1, raise (or log an error and return None like the existing "ContainerName not found" branch does) rather than fanning the new image URI out across every container. Consistency across build/package/sync was the explicit reason the same fallback was removed from the other two paths in this PR.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| exporter.parent_parameter_values = self.parameter_values | ||
| exporter.resource_metadata = resource.get("Metadata") | ||
| exporter.language_extensions_enabled = self.language_extensions_enabled | ||
| exporter.resource_metadata = resource.get("Metadata") |
There was a problem hiding this comment.
[BUG] The line exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once at line 621, then again at line 623, with the same right-hand side, separated only by an unrelated language_extensions_enabled assignment. The diff added both lines:
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata") # line 621 (added)
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata") # line 623 (added, duplicate)
exporter.export(full_path, resource_dict, self.template_dir)The second assignment is dead code — same attribute, same value. This looks like a merge/rebase artifact. It was raised in multiple previous review comments and has not been addressed. Drop one of the two assignments (and ideally place the surviving one adjacent to the other exporter attribute setters for consistency with the delete() path at line ~648, which has only one).
| container_defs = register_input.get("containerDefinitions", []) | ||
| updated = False | ||
| for cd in container_defs: | ||
| if container_name is None or cd.get("name") == container_name: |
There was a problem hiding this comment.
[BUG] In _register_updated_task_definition, when Metadata.ContainerName is unset (so _get_container_name() returns None) and the live task definition fetched via describe_task_definition has multiple containers, this loop silently overwrites the image field on every container — including sidecars (envoy, X-Ray daemon, log routers, OTel collectors):
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
breakThe build path (app_builder._update_built_resource) raises DockerBuildFailed for exactly this condition, and the package path (packageable_resources.ECSTaskDefinitionImageResource._get_target_index) was just hardened to raise ExportFailedError for the same condition (per the author's response in PR discussion / commit 99822c6). The sync path is now the only execution path that doesn't fail-fast on multi-container TaskDefinitions without ContainerName — and the consequence here is worse than the package path, because this code is mutating the live, deployed TaskDefinition fetched from ECS, then registering a new revision with every container's image clobbered. Multiple previous review comments raised this and it has not been explicitly addressed.
Mirror the build/package behavior: when container_name is None and len(container_defs) > 1, log/raise and return None (skip registration) instead of silently overwriting all containers. Single-container TaskDefinitions (the unambiguous case) can keep the index-0 fallback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@bnusunny All the issues from your review have been addressed and CI is fully green (the 2 remaining failures are infra flakes — network/tar errors on the runners before any tests ran, unrelated to our code). Ready for another look when you have a chance! |
There was a problem hiding this comment.
Code Review Results
Reviewed: 999d51f..1a7d7f3
Files: 50
Comments: 2
Comments on lines outside the diff:
[samcli/lib/package/artifact_exporter.py:619] [BUG] -621
exporter.resource_metadata = resource.get("Metadata") is assigned twice on consecutive lines in Template.export() — once at line 619 and again at line 621, with the same right-hand side, separated only by an unrelated language_extensions_enabled assignment:
exporter = exporter_class(self.uploaders, self.code_signer, cache)
exporter.parent_parameter_values = self.parameter_values
exporter.resource_metadata = resource.get("Metadata")
exporter.language_extensions_enabled = self.language_extensions_enabled
exporter.resource_metadata = resource.get("Metadata")
exporter.export(full_path, resource_dict, self.template_dir)The second assignment is dead code — both reads call resource.get("Metadata") on the same resource and the first assignment is never read between them. This was flagged in earlier review iterations and not addressed. Functionally harmless today, but it's a refactoring hazard: anyone who later reorders the setters or adds a property descriptor on resource_metadata will hit a confusing double-fire. Drop the second line (line 621).
| tags = response.get("tags", []) | ||
|
|
||
| # Fields that are output-only and must be stripped before re-registering | ||
| _READONLY_FIELDS = { |
There was a problem hiding this comment.
[BUG] -194
When Metadata.ContainerName is unset (so _get_container_name() returns None) and the live task definition fetched via describe_task_definition has multiple containers, this loop silently overwrites the image field on every container — including sidecars (envoy, X-Ray daemon, log routers, OTel collectors):
container_name = self._get_container_name()
container_defs = register_input.get("containerDefinitions", [])
updated = False
for cd in container_defs:
if container_name is None or cd.get("name") == container_name:
cd["image"] = image_uri
updated = True
if container_name is not None:
breakThis is the same condition that the build path raises DockerBuildFailed for (app_builder._update_built_resource, line 433) and that the package path raises ExportFailedError for (packageable_resources.ECSTaskDefinitionImageResource._get_target_index). Those two paths were tightened in this PR specifically to fail fast on this ambiguity; the sync path is now the only one that silently corrupts the live task definition.
A scenario that actually breaks: a TaskDefinition with [envoy-sidecar, app]. sam build correctly fails because Metadata.ContainerName is not set; the user adds ContainerName: app to fix it; on the subsequent sam sync, if the user (or CI) passes --code against an older committed template still missing ContainerName, the sidecar's image is silently rewritten to the user's app image. The new revision is registered and then update_service rolls it out — taking the workload down with no error from sam sync.
Match the build/package paths and raise on this condition (or, at minimum, fail-fast and skip registration with a clear error) when container_name is None and len(container_defs) > 1. The single-container case is the only one safe to default. The test suite already mocks single-container task definitions for the container_name=None paths (test_register_strips_readonly_fields, test_register_tags_included_when_present); please add coverage for the multi-container ambiguous case once it raises.
Which issue(s) does this change fix?
Fixes #8933
Why is this change necessary?
SAM CLI provides an excellent developer experience for Lambda Image functions (
sam build && sam deploy), but users deploying containerized workloads to ECS (Fargate) or Bedrock AgentCore must manage their Docker build/push/deploy pipeline separately — even when these resources live in the same CloudFormation template. This creates a fragmented workflow requiring external tooling for an identical operation: build image → push to ECR → deploy.How does it address the issue?
Extends the existing Lambda Image build pipeline to recognize
AWS::ECS::TaskDefinitionandAWS::BedrockAgentCore::Runtimeresources with aMetadatablock containingDockerfileandDockerContext. No new commands —sam build,sam package,sam deploy, andsam syncgain awareness of these resource types.Template example:
Key implementation details:
_build_lambda_image()— same Docker build logic, buildkit support included--resolve-image-reposauto-creates ECR repos via companion stackContainerNamemetadata targets specific containers in multi-container TaskDefinitionsArchitecturemetadata sets--platform(e.g.,arm64for AgentCore)ARTIFACT_TYPE = ZIPto pass thePackageTypefilter (these resources don't havePackageType)Design document:
designs/container_image_builds_ecs_agentcore.mdWhat side effects does this change have?
sam buildlogs "Found N container service resource(s) to build" when applicable resources are present. No behavior change for templates without these resources.--resolve-image-reposcreates ECR repos for ECS/AgentCore in addition to Lambda Image functions._update_built_resourceadds an optionalmetadataparameter (backward compatible, defaults toNone).Mandatory Checklist
PRs will only be reviewed after checklist is complete
make prpassesmake update-reproducible-reqsif dependencies were changedBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.