Skip to content

feat: Async Refresh for Regional Access Boundaries#1880

Open
vverman wants to merge 13 commits intogoogleapis:feat-tb-safrom
vverman:regional-access-boundary-update
Open

feat: Async Refresh for Regional Access Boundaries#1880
vverman wants to merge 13 commits intogoogleapis:feat-tb-safrom
vverman:regional-access-boundary-update

Conversation

@vverman
Copy link
Contributor

@vverman vverman commented Jan 25, 2026

Contains changes for the feature Regional Access Boundary (Previously Called Trust Boundaries).

The following are salient changes:

Calls to refresh RAB are now all async and in a separate thread.
Logic for refreshing RAB now exists in its own class for cleaner maintenance.
Self-signed jwts are within scope.
Changes to how we trigger RAB refresh and deal with refresh errors.

@vverman vverman requested review from a team January 25, 2026 21:35
@product-auto-label product-auto-label bot added the size: xl Pull request size is extra large. label Jan 25, 2026
@vverman vverman requested review from lqiu96 and nbayati January 25, 2026 21:49
@lqiu96 lqiu96 changed the base branch from feat-tb-sa to feat/agentic-identities-cloudrun February 3, 2026 19:16
@lqiu96 lqiu96 requested a review from a team as a code owner February 3, 2026 19:16
@vverman vverman changed the base branch from feat/agentic-identities-cloudrun to feat-tb-sa February 6, 2026 22:40
Comment on lines 225 to +243
@@ -193,33 +228,41 @@ static TrustBoundary refresh(
.setInitialIntervalMillis(OAuth2Utils.INITIAL_RETRY_INTERVAL_MILLIS)
.setRandomizationFactor(OAuth2Utils.RETRY_RANDOMIZATION_FACTOR)
.setMultiplier(OAuth2Utils.RETRY_MULTIPLIER)
.setMaxElapsedTimeMillis(maxRetryElapsedTimeMillis)
.build();

HttpUnsuccessfulResponseHandler unsuccessfulResponseHandler =
new HttpBackOffUnsuccessfulResponseHandler(backoff);
new HttpBackOffUnsuccessfulResponseHandler(backoff)
.setBackOffRequired(
response -> {
int statusCode = response.getStatusCode();
return statusCode == 500
|| statusCode == 502
|| statusCode == 503
|| statusCode == 504;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, since the server-side will still make the RAB call if we have failure (e.g. no header is attached), can we have this call not have any retries and max duration of like 1-2s?

The reason I'm thinking this way is that the async RAB call uses ForkJoinPool's commonPool. I think we should try to time bound the IO requests that go to that executor pool since that is a JDK executor pool that is shared with the customer's application.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, while requests will succeed without the RAB header, they are expected to encounter significant latency, hence we retry only on transient server errors which are expected to be short lived.

We'd like to ensure that there are as few requests as possible that go without the x-allowed-locations header.

@lqiu96 lqiu96 changed the title feat: Regional Access Boundary Update feat: Async Refresh for Regional Access Boundaries Feb 10, 2026
@lqiu96
Copy link
Member

lqiu96 commented Feb 10, 2026

Thanks for the clarification @vverman! Would it be possible to split the stable RAB + seeding logic into a separate PR?

@vverman vverman requested review from lqiu96 and nbayati February 24, 2026 22:30
public static final String X_ALLOWED_LOCATIONS_HEADER_KEY = "x-allowed-locations";
private static final long serialVersionUID = -2428522338274020302L;

static final String ENABLE_EXPERIMENT_ENV_VAR = "GOOGLE_AUTH_TRUST_BOUNDARY_ENABLE_EXPERIMENT";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the env var altogether since this is going to a feature branch and we don't need the feature gate for the release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I will leave this comment open and make the change once the logic is good to go.

* Checks if the trust boundary feature is enabled based on an environment variable. The feature
* is enabled if the environment variable is set to "true" or "1" (case-insensitive). Any other
* value, or if the variable is unset, will result in the feature being disabled.
* Checks if the regional access boundary feature is enabled. The feature is enabled if the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my other comment, I think we can remove this method along with all env var related code, as they shouldn't be present in the final code being merged to main.

…now considers expiryTime as opposed to start time. Minor fixes.
…case of async token refresh) or instantiate a new thread.
Comment on lines +142 to +148
if (executor != null) {
executor.execute(refreshTask);
} else {
Thread refreshThread = new Thread(refreshTask, "RAB-refresh-thread");
refreshThread.setDaemon(true);
refreshThread.start();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about this that explains our decisions for this? I think would make sense given the context and constraints we have but may not be super clear to someone reading without knowing our discussions.

e.g. RAB refresh is async and ensured to be at most once 6 hours... intend for this to not block the commonPool if we use completeableAsync, etc.

// Atomically check if a refresh is already running. If compareAndSet returns true,
// this thread "won the race" and is responsible for starting the background task.
// All other concurrent threads will return false and exit immediately.
if (refreshFuture.compareAndSet(null, future)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, can you remind me again if we can just set tthe refresh future just be an atomic boolean? Is there a need for it to be a CompletableFuture?

clock.currentTimeMillis() + INITIAL_COOLDOWN_MILLIS, INITIAL_COOLDOWN_MILLIS);
} else {
long nextDuration = Math.min(current.durationMillis * 2, MAX_COOLDOWN_MILLIS);
next = new CooldownState(clock.currentTimeMillis() + nextDuration, nextDuration);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq, do we want the next cooldown duration to be from when the call was made or from when the previous cooldown was expired?

e.g.
Call 1 - Time: 0min, cooldown 15 min (error)
Call 2 - Time 30min, cooldown 30 min (error)

I believe this would set the cooldown time to be at 1h. Should be 1hr or 45 min?

Comment on lines +1151 to +1163
List<String> authHeaders = requestMetadata.get(AuthHttpConstants.AUTHORIZATION);
if (authHeaders != null && !authHeaders.isEmpty()) {
// Extract the token value to trigger a background Regional Access Boundary refresh.
String authHeader = authHeaders.get(0);
if (authHeader.startsWith(AuthHttpConstants.BEARER + " ")) {
String tokenValue = authHeader.substring((AuthHttpConstants.BEARER + " ").length());
// Use a null expiration as JWTs are short-lived anyway.
AccessToken wrappedToken = new AccessToken(tokenValue, null);
// Self-signed JWT do not go through the typical OAuth2Credentials.getRequestMetadata()
// flow.
// We explicitly trigger it here to ensure the RAB cache is populated/maintained.
refreshRegionalAccessBoundaryIfExpired(uri, wrappedToken, null);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we move this logic to a helper? This would be specifcally for SSJWT

@vverman vverman requested a review from nbayati March 9, 2026 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants