Fix developer RBAC pre-flight gaps: auto-assign AI User, add role-write check, detect ABAC registries#7763
Fix developer RBAC pre-flight gaps: auto-assign AI User, add role-write check, detect ABAC registries#7763therealjohn wants to merge 3 commits intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR tightens the azure.ai.agents extension’s developer RBAC preflight checks to prevent post-deploy 403s when the extension needs to create role assignments, and improves ACR RBAC validation by detecting ABAC-mode registries.
Changes:
- Auto-assignes Azure AI User to the developer when missing, and introduces an explicit roleAssignments/write capability check on the Foundry Project scope.
- Detects ABAC-mode ACR registries via ARM (RoleAssignmentMode) and switches the required role set accordingly.
- Updates shared role-assignment helper to support multiple principal types and adds/extends unit tests for new constants/role lists.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/extensions/azure.ai.agents/internal/project/developer_rbac_check.go | Adds developer auto-assign for Azure AI User, adds roleAssignments/write check, and detects ABAC ACR registries when checking ACR roles. |
| cli/azd/extensions/azure.ai.agents/internal/project/developer_rbac_check_test.go | Extends unit tests to cover new role constants and new “sufficient roles” lists. |
| cli/azd/extensions/azure.ai.agents/internal/project/agent_identity_rbac.go | Updates assignRoleToIdentity to accept an explicit PrincipalType and updates callers. |
| cli/azd/extensions/azure.ai.agents/internal/exterrors/codes.go | Adds a new structured error code for missing roleAssignments/write capability. |
| cli/azd/extensions/azure.ai.agents/go.mod | Upgrades armcontainerregistry dependency to access ABAC roleAssignmentMode property. |
| cli/azd/extensions/azure.ai.agents/go.sum | Adds checksums for the upgraded armcontainerregistry dependency. |
jongio
left a comment
There was a problem hiding this comment.
Four findings on the RBAC pre-flight changes. The core logic (3-check structure, auto-assign, ABAC branching) looks solid. CI note: cspell doesn't know "ABAC" - needs adding to cspell.yaml.
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/authorization/armauthorization/v3 v3.0.0-beta.2 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/cognitiveservices/armcognitiveservices/v2 v2.0.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerregistry/armcontainerregistry v1.2.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerregistry/armcontainerregistry v1.3.0-beta.3 |
There was a problem hiding this comment.
This is a beta SDK (v1.3.0-beta.3). Beta Azure SDKs can introduce breaking changes - if RoleAssignmentMode or AbacRepositoryPermissions gets renamed/restructured before GA, this breaks silently (compile will pass but the enum comparison won't match).
Is there a GA version that exposes this property? If not, worth a code comment noting the beta dependency so future readers know to check for a GA upgrade.
| return nil | ||
| } | ||
| // Always resolve registry info to accurately detect ABAC mode. | ||
| acrResourceID, isAbac, err := resolveACRInfo(ctx, cred, info.SubscriptionID, acrEndpoint) |
There was a problem hiding this comment.
The old code short-circuited via AZURE_CONTAINER_REGISTRY_RESOURCE_ID (set during init) to skip listing registries. This now always calls resolveACRInfo which pages through ALL registries in the subscription.
For subscriptions with many registries, this is noticeably slower. Consider: when the env var is available, parse the resource group + registry name from the resource ID and use RegistriesClient.Get() for a single targeted fetch (still gets RoleAssignmentMode for ABAC detection, but O(1) instead of O(n)).
| ctx, cred, principalID, roleAzureAIUser, | ||
| "Azure AI User → Foundry Project", info.ProjectScope, | ||
| armauthorization.PrincipalTypeUser, | ||
| ); assignErr != nil { |
There was a problem hiding this comment.
assignErr isn't included in the structured error. A 403, a network timeout, and a malformed scope all produce the same auto-assign failed message. Consider including it:
| ); assignErr != nil { | |
| if _, assignErr := assignRoleToIdentity( | |
| ctx, cred, principalID, roleAzureAIUser, | |
| "Azure AI User -> Foundry Project", info.ProjectScope, | |
| armauthorization.PrincipalTypeUser, | |
| ); assignErr != nil { | |
| return exterrors.Auth( | |
| exterrors.CodeDeveloperMissingAIUserRole, | |
| fmt.Sprintf( | |
| "your identity (%s) does not have the 'Azure AI User' role on the Foundry Project %s/%s "+ | |
| "and auto-assign failed: %s", | |
| userProfile.DisplayName, info.AccountName, info.ProjectName, assignErr, | |
| ), |
The underlying error helps users (and support) distinguish permissions issues from transient failures.
|
|
||
| fmt.Println(" ✓ Container Registry role on ACR") | ||
| if isAbac { | ||
| fmt.Println(" ✓ Container Registry Repository Writer role on ACR (ABAC mode)") |
There was a problem hiding this comment.
Nit: this hardcodes "Repository Writer" but the developer might have Owner or Repository Contributor. Consider matching the classic-mode pattern:
| fmt.Println(" ✓ Container Registry Repository Writer role on ACR (ABAC mode)") | |
| fmt.Println(" ✓ Container Registry role on ACR (ABAC mode)") |
Problem
A developer with only the Contributor role on a Foundry Project could pass the pre-flight RBAC check (CheckDeveloperRBAC) but then hit a 403 AuthorizationFailed error during postdeploy when the extension tried to
assign the Azure AI User role to agent service principals:
ERROR: agent identity RBAC setup failed: failed to assign Azure AI User role:
RESPONSE 403: AuthorizationFailed — does not have authorization to perform action
'Microsoft.Authorization/roleAssignments/write'
This happens because Contributor explicitly blocks Microsoft.Authorization/*/Write in its notActions. The pre-flight check verified the developer could use the project, but not whether they could write role
assignments. Two additional gaps existed: the extension didn't attempt to auto-assign Azure AI User when it was missing (unlike the VS Code extension), and the ACR role check didn't distinguish between classic and
ABAC-mode registries (where classic roles like AcrPush don't work).
Changes
Gap 1 — Auto-assign Azure AI User if missing (Check 1)
If the developer lacks Azure AI User on the Foundry Project, the pre-flight now attempts to assign it (using PrincipalTypeUser). This matches VS Code's ensureAzureAIUserRole pattern. Only fails with a structured error
if the auto-assign itself returns 403.
Gap 2 — Explicit roleAssignments/write check (new Check 2)
Added a check that the developer has at least one of: Owner, User Access Administrator, or Role Based Access Control Administrator on the Foundry Project scope. Contributor is intentionally excluded — it cannot write
role assignments. The error suggestion includes the az role assignment create command and an AZD_AGENT_SKIP_ROLE_ASSIGNMENTS=true opt-out for environments where role assignments are managed externally.
Gap 3 — ABAC registry detection (Check 3)
resolveACRResourceID → resolveACRInfo(ctx, cred, subscriptionID, loginServer) (resourceID, isAbac, error). Uses armcontainerregistry v1.3.0-beta.3 (upgraded from v1.2.0) which exposes
RegistryProperties.RoleAssignmentMode. When ABAC mode is detected, the check uses sufficientACRAbacRoles (Owner, Container Registry Repository Writer, Container Registry Repository Contributor) instead of the classic
role list. The AZURE_CONTAINER_REGISTRY_RESOURCE_ID env-var short-circuit was removed so ABAC mode is always detected via the ARM API.
Supporting changes