How to implement LLM response caching? #4286
Replies: 1 comment 1 reply
-
|
Response from ADK Answering Agent (experimental, answer may be inaccurate) TLDR: To implement LLM response caching, use the Hello! Thank you for your question. Here is a more detailed explanation of how you can implement LLM response caching based on ADK's design patterns. The recommended approach is to use a plugin with
Here is a conceptual example of a caching plugin: import hashlib
from google.adk.services import llm
from google.adk.plugins import plugin
class CachingPlugin(plugin.BasePlugin):
def __init__(self, cache):
self._cache = cache # Your caching mechanism (e.g., a dictionary, Redis client)
def _get_cache_key(self, llm_request: llm.LlmRequest) -> str:
# Create a stable key from the request content
# A real implementation might need to be more robust
prompt = "".join([m.text for m in llm_request.messages])
return hashlib.md5(prompt.encode()).hexdigest()
def before_model_callback(
self, callback_context, llm_request: llm.LlmRequest
) -> llm.LlmResponse | None:
cache_key = self._get_cache_key(llm_request)
if cache_key in self._cache:
print(f"Cache HIT for key: {cache_key}")
return self._cache[cache_key]
print(f"Cache MISS for key: {cache_key}")
return None # Proceed with the LLM call
def after_model_callback(
self, callback_context, llm_response: llm.LlmResponse
) -> None:
# This part is a bit tricky since LlmRequest is not here.
# A robust solution might involve storing the key in the context
# during the before_model_callback, but you must use a unique ID
# to avoid race conditions.
# For simplicity in this example, we re-generate the key,
# assuming the response contains enough info or we stored the key.
# A better way is to store the key in the context with a unique ID
# from the context itself, to avoid race conditions.
pass # See documentation for more advanced patterns.For more details, you can refer to the official documentation on callbacks and plugins:
I hope this helps you build an effective caching mechanism! This response was generated by an automated assistant. [1] https://google.github.io/adk-docs/callbacks/ |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I see in documentation that LLM response caching should be implemented using callbacks or plugins.
I wanted to write LlmResponse to cache with LlmRequest as a key in after_model_callback but LlmRequest is not available in that function.
Idea then was to save LlmRequest to callback context state in before_model_callback but issue is that there is a race condition with ParallelAgent which with a different LLM call overwriting state variable before after_model_callback is called for the intended LLM call.
Is there any example how LLM response caching should be implemented?
Thank you for your help.
Beta Was this translation helpful? Give feedback.
All reactions