feat: add configurable token cache for customer flow#133
Conversation
prashantrakheja
left a comment
There was a problem hiding this comment.
changes look good, some minor feedback, in particular i would like to remove usage of app_tid from token caching because its an optional parameter and likely to be removed going forward..
| Returns: | ||
| System-scoped access token. | ||
| """ | ||
| cached = cache.get_system_token(app_tid) |
There was a problem hiding this comment.
i am not sure if app_tid would be needed for fetching the tokens, is there any other way to do this?
| logger.debug("Loaded %d tool(s) from %s", len(server_tools), dep.ord_id) | ||
| except Exception: | ||
| except Exception as exc: | ||
| unwrapped = _unwrap_exception_group(exc) |
There was a problem hiding this comment.
do u feel the exception block is unusually big? can we put this stuff inside a method?
| ) | ||
| continue | ||
| except Exception: | ||
| logger.exception( |
There was a problem hiding this comment.
i feel we should not swallow the exception here, if there is a failure we should fail-fast, what's ur take?
| self._config = config or ClientConfig() | ||
| self._token_cache = _TokenCache(self._config) | ||
|
|
||
| @record_metrics(Module.AGENTGATEWAY, Operation.AGENTGATEWAY_CLEAR_TOKEN_CACHE) |
There was a problem hiding this comment.
this operation is "internal" to the functioning of the sdk, do u feel it makes sense to capture metrics for this?
There was a problem hiding this comment.
I didn't see value exposing this method
| self._config = config or ClientConfig() | ||
| self._token_cache = _TokenCache(self._config) | ||
|
|
||
| @record_metrics(Module.AGENTGATEWAY, Operation.AGENTGATEWAY_CLEAR_TOKEN_CACHE) |
There was a problem hiding this comment.
I didn't see value exposing this method
Description
Add in-process token cache for customer agent flow. Previously every
list_mcp_tools()/call_mcp_tool()call fetched fresh IAS token via mTLS - unnecessary latency in agentic loops.Changes:
_token_cache.py(new):_TokenCache- TTL + LRU eviction for system tokens (key:app_tid) and user tokens (key:sha256(user_jwt+"|"+app_tid)[:16]). Expiry is fromexpires_in,id_tokenexp claim, or fallback TTL._customer.py:get_system_token_mtls/exchange_user_tokento consult/populate cache. 401 response from MCP server → invalidate stale token + retry once.agw_client.py:AgentGatewayClientowns_TokenCache. Exposesclear_token_cache()for forced refresh (revoked creds, tenant change).config.py: 4 newClientConfigfields added -token_expiry_buffer_seconds(60 s),max_system_token_cache_size(10 s),max_user_token_cache_size(10 s),fallback_token_ttl_seconds(300 s).LoB flow unaffected - delegates to BTP Destination Service.
Type of Change
Please check the relevant option:
How to Test
Describe how reviewers can test your changes:
Checklist
Before submitting your PR, please review and check the following:
Breaking Changes
None. Cache internal to
AgentGatewayClient- existingcreate_client()calls get caching automatically.ClientConfignew fields all have defaults.Additional Notes
Thread safety: GIL makes individual
OrderedDictops atomic, but check-then-set is not. Concurrent coroutines on same key may both miss and both fetch - redundant requests, not corruption. Acceptable for agentic loop use case.401 retry: Both
get_mcp_tools_customerandcall_mcp_tool_customerinvalidate + retry once on 401, handling server-side revocation before token expiry.