Skip to content

Diagnose deploy workflow failure by surfacing cross-account ECR mismatch#346

Open
docwho2 wants to merge 1 commit intomainfrom
codex/investigate-last-workflow-error-cause
Open

Diagnose deploy workflow failure by surfacing cross-account ECR mismatch#346
docwho2 wants to merge 1 commit intomainfrom
codex/investigate-last-workflow-error-cause

Conversation

@docwho2
Copy link
Member

@docwho2 docwho2 commented Mar 16, 2026

Motivation

  • The scheduled Deploy CDK Stack run failed in the deploy (prod-us-east) job at the Promote Tested Image Into Production ECR step.
  • The failure is likely caused by a cross-account ECR mismatch where the prod job uses AWS_ACCOUNT_ID credentials but the tested image URI points at a different account, causing ECR login/pull to fail.
  • The workflow needed an early, clear diagnostic to distinguish account-mismatch issues from other Docker/ECR errors.

Description

  • Added extraction of SOURCE_ACCOUNT_ID from the tested image URI by parsing SOURCE_REGISTRY in the Promote Tested Image Into Production ECR step.
  • Added an explicit guard that compares SOURCE_ACCOUNT_ID to AWS_ACCOUNT_ID and fails early with a clear diagnostic message when they differ.
  • The change was applied to .github/workflows/deploy.yml so the workflow surfaces account-mismatch problems before attempting docker pull/docker login.

Testing

  • Queried the failing run and job details via the GitHub Actions API using curl and parsed jobs with a short Python script to confirm the failing step was Promote Tested Image Into Production ECR.
  • Verified the workflow modification and the presence of the new SOURCE_ACCOUNT_ID extraction and guard using git diff and by printing the updated lines with nl -ba .github/workflows/deploy.yml | sed -n '210,270p'.
  • All inspection commands completed successfully and show the new early-fail diagnostic in the workflow.

Codex Task

Copy link
Member Author

@docwho2 docwho2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 23b6a3c6ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

PROMOTED_IMAGE_TAG="promoted-${GITHUB_SHA}"
TARGET_IMAGE_URI="${AWS_ACCOUNT_ID}.dkr.ecr.${{ vars.AWS_REGION || 'us-east-1' }}.amazonaws.com/${TARGET_REPOSITORY_NAME}:${PROMOTED_IMAGE_TAG}"

if [ "$SOURCE_ACCOUNT_ID" != "$AWS_ACCOUNT_ID" ]; then

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Don't fail promotion when source and target accounts differ

This new guard unconditionally exits when SOURCE_ACCOUNT_ID and AWS_ACCOUNT_ID differ, but the repository’s documented deployment model explicitly supports stage-to-prod promotion across separate AWS accounts (see README.md lines 180-194 and 234-240, which require granting the prod role pull access to stage ECR). In that supported configuration, cross-account ECR pull is expected behavior, so this check converts a valid multi-account deployment into a guaranteed failure before docker pull runs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant