Skip to content

Conversation

@ppgranger
Copy link

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

cc @GWeale

Description of the change:

Problem:
When deploying an agent with a custom BaseLlm implementation using adk deploy agent_engine, the deployment succeeds but querying the deployed agent fails with:
Agent Engine Error: Default method query not found. Available methods are:
['async_delete_session', 'get_session', 'delete_session', 'create_session',
'async_create_session', 'async_search_memory', 'async_get_session', 'list_sessions',
'async_list_sessions', 'async_add_session_to_memory']

The same agent works correctly:

  • In ADK Playground (adk web agents)
  • When deployed inline with agent_engines.create() where all code is in a single file

Root Cause:
cloudpickle serializes imported classes by reference (import path) instead of by value (full code). When Agent Engine deserializes the agent at runtime, it can't find the custom class because the import path doesn't match the deployed module structure.

Solution:
Use cloudpickle.register_pickle_by_value() in the generated Agent Engine app template to force value-based serialization for the agent module and all its submodules. This ensures custom BaseLlm classes are serialized with their full definition, making them available at Agent Engine runtime.

Additionally, add pre-deployment validation to catch issues early with clear error messages.

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

======================== 39 passed, 1 warning in 1.26s =========================

New tests added:

TestValidateAgentImport (10 tests):

  • test_skips_config_agents - Config agents skip validation
  • test_raises_on_missing_agent_module - Error when agent.py is missing
  • test_raises_on_missing_export - Error when root_agent/app export is missing
  • test_success_with_root_agent_export - Success with root_agent
  • test_success_with_app_export - Success with app
  • test_raises_on_import_error - Helpful message on ImportError
  • test_raises_on_basellm_import_error - Specific guidance for BaseLlm errors
  • test_raises_on_syntax_error - Error on syntax errors
  • test_cleans_up_sys_modules - Cleanup verification
  • test_restores_sys_path - sys.path restoration verification

TestValidateAgentObject (6 tests):

  • test_skips_app_export - Skips validation for 'app' exports
  • test_warns_on_non_baseagent - Warns for non-BaseAgent objects
  • test_skips_string_models - Skips validation when model is a string
  • test_validates_custom_basellm_serialization - Validates serializable custom BaseLlm
  • test_raises_on_non_serializable_custom_basellm - Raises on non-serializable custom BaseLlm
  • test_skips_builtin_models - Skips check for built-in ADK models

TestAgentEngineAppTemplate (5 tests):

  • test_template_includes_cloudpickle_imports - Template imports cloudpickle and sys
  • test_template_registers_agent_module_for_pickle_by_value - Registers agent module
  • test_template_registers_submodules_for_pickle_by_value - Registers submodules (clients/, tools/)
  • test_template_handles_non_registerable_modules - Handles non-registerable modules gracefully
  • test_template_skips_cloudpickle_for_config_agents - Config agents skip cloudpickle registration

TestCloudpickleSerializationFix (2 tests):

  • test_custom_basellm_in_submodule_can_be_serialized - Custom BaseLlm in submodule serializes correctly
  • test_agent_with_custom_basellm_can_be_serialized - Agent with custom BaseLlm serializes correctly

Manual End-to-End (E2E) Tests:

Test setup with custom BaseLlm in submodule:
agents/
├── init.py
├── agent.py
├── clients/
│ ├── init.py
│ └── custom_client.py # Custom BaseLlm implementation
└── requirements.txt

Scenario Before Fix After Fix
adk deploy agent_engine ✅ Deploys ✅ Deploys
agent.stream_query() ❌ "query method not found" ✅ Works correctly
agent.async_stream_query() ❌ "query method not found" ✅ Works correctly

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Changes made:

  1. _AGENT_ENGINE_APP_TEMPLATE - Added cloudpickle registration:
    import cloudpickle
    from . import agent as _agent_module
    cloudpickle.register_pickle_by_value(_agent_module)
    # Also register submodules (clients/, tools/, etc.)
    for name, module in list(sys.modules.items()):
      if module is not None and name.startswith(_agent_module.__name__.rsplit('.', 1)[0] + '.'):
        try:
          cloudpickle.register_pickle_by_value(module)
        except Exception:
          pass
  2. _validate_agent_import() - Pre-deployment validation that checks:
    - Agent module exists and can be imported
    - Expected export (root_agent or app) is present
    - Provides specific guidance for BaseLlm-related import errors
  3. _validate_agent_object() - Deep validation that checks:
    - Custom BaseLlm implementations can be serialized by cloudpickle
    - Warns about module path structure for proper relative imports

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ppgranger, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical deployment issue for agents utilizing custom BaseLlm implementations within the Agent Engine. Previously, such deployments would fail at runtime due to cloudpickle's default serialization behavior. The core change ensures that custom LLM classes are serialized by value, making them properly accessible in the deployed environment. Furthermore, the PR significantly enhances the developer experience by integrating robust pre-deployment validation, which proactively identifies and guides users through common configuration and serialization problems, preventing failed deployments and providing actionable feedback.

Highlights

  • Custom BaseLlm Serialization Fix: Implemented a fix to ensure custom BaseLlm implementations are correctly serialized by value using cloudpickle.register_pickle_by_value() within the Agent Engine app template. This resolves the 'query method not found' error encountered when deploying agents with custom LLM clients.
  • Pre-deployment Validation: Introduced new pre-deployment validation steps (_validate_agent_import and _validate_agent_object) to catch common issues early. This includes checking for agent module importability, presence of expected exports, and serializability of custom BaseLlm instances, providing clear error messages and guidance.
  • Enhanced Error Messaging: Improved error messages for import failures, especially those related to BaseLlm, offering specific advice on module structure, dependencies, and serialization requirements.
  • Comprehensive Unit Testing: Added extensive unit tests (23 new tests across four new test classes) to cover the new validation logic and confirm the cloudpickle serialization fix, ensuring robustness and preventing regressions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces crucial pre-deployment validation for Agent Engine deployments, particularly addressing issues with custom BaseLlm implementations and cloudpickle serialization. The changes include modifying the Agent Engine app template to register modules for pickle-by-value serialization and adding robust validation functions (_validate_agent_import and _validate_agent_object) to catch common errors early. The accompanying unit tests are comprehensive and cover various edge cases, ensuring the reliability of the new validation logic and the serialization fix. Overall, these changes significantly improve the developer experience by providing clearer error messages and preventing runtime failures.

@ryanaiagent ryanaiagent self-assigned this Jan 22, 2026
@ryanaiagent ryanaiagent added agent engine [Component] This issue is related to Vertex AI Agent Engine and removed tools [Component] This issue is related to tools labels Jan 23, 2026
@ryanaiagent
Copy link
Collaborator

Hi @ppgranger, Thank you for your contribution! We appreciate you taking the time to submit this pull request. Can you fix the formatting errors before we can proceed with the review. You can use autoformat.sh.

@ppgranger
Copy link
Author

Hi @ryanaiagent, Done! I've fixed the formatting errors and the logger pattern. Ready for review.

@ppgranger
Copy link
Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces crucial fixes and validations for deploying agents with custom BaseLlm implementations to Agent Engine. The core problem of cloudpickle serializing by reference instead of by value is addressed by explicitly registering modules for pickle-by-value serialization within the generated app template. Additionally, comprehensive pre-deployment validation has been added to catch common issues early, providing clear and actionable error messages to the user. The extensive test suite thoroughly covers various scenarios, ensuring the robustness and correctness of the new logic. This is a well-executed and important change that significantly improves the developer experience for ADK users.

@ppgranger
Copy link
Author

Fixed the failing test the assertions were checking for "except Exception:" and "pass", but the template actually uses except "Exception as e:" with "_logger.debug()". Updated the test to match the implementation.

@ryanaiagent ryanaiagent added the needs review [Status] The PR/issue is awaiting review from the maintainer label Jan 26, 2026
@ryanaiagent
Copy link
Collaborator

Hi @ppgranger , Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share.

@ryanaiagent
Copy link
Collaborator

Hi @GWeale , can you please review this.

Add pre-deployment validation to `adk deploy agent_engine` that catches
import errors before deployment. This provides clearer error messages
and prevents deployments that would fail at runtime.

The validation:
- Checks that agent.py exists and can be imported
- Verifies the expected export (root_agent or app) is present
- Provides specific guidance for BaseLlm-related import errors
- Properly cleans up sys.path and sys.modules after validation

Note: The cloudpickle.register_pickle_by_value() workaround has been
removed as Agent Engine now uses source-based deployment via
source_packages, which uploads the full source code directly.
This makes pickle-based serialization unnecessary for custom classes.

Fixes google#4208
@ppgranger ppgranger force-pushed the fix/agent-engine-custom-basellm-validation branch from 4d8f631 to 44ad42c Compare January 27, 2026 21:15
@ppgranger
Copy link
Author

ppgranger commented Jan 27, 2026

Hi @yeesian @GWeale,

I've simplified this PR significantly after investigating ADK 1.23 (and rebasing code on it).

I removed the cloudpickle.register_pickle_by_value() workaround from the template. After reviewing the current main branch, I noticed that Agent Engine has switched to source-based deployment via source_packages, which uploads the full source code directly instead of relying on pickle serialization.

Still in PR :

  • _validate_agent_import(): Pre-deployment validation that catches import errors (missing dependencies, syntax errors, missing exports) before deployment, providing clearer error messages instead of cryptic runtime failures
  • Specific guidance for BaseLlm-related import errors
  • Proper cleanup of sys.path and sys.modules after validation

This validation is still valuable as it fails fast with actionable error messages rather than letting users discover issues only after deployment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent engine [Component] This issue is related to Vertex AI Agent Engine needs review [Status] The PR/issue is awaiting review from the maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

adk deploy agent_engine with custom BaseLlm but query methods not registered - "Default method query not found"

3 participants