Skip to content

Conversation

@ppgranger
Copy link

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

cc @GWeale

Description of the change:

Problem:
When deploying an agent with a custom BaseLlm implementation using adk deploy agent_engine, the deployment succeeds but querying the deployed agent fails with:
Agent Engine Error: Default method query not found. Available methods are:
['async_delete_session', 'get_session', 'delete_session', 'create_session',
'async_create_session', 'async_search_memory', 'async_get_session', 'list_sessions',
'async_list_sessions', 'async_add_session_to_memory']

The same agent works correctly:

  • In ADK Playground (adk web agents)
  • When deployed inline with agent_engines.create() where all code is in a single file

Root Cause:
cloudpickle serializes imported classes by reference (import path) instead of by value (full code). When Agent Engine deserializes the agent at runtime, it can't find the custom class because the import path doesn't match the deployed module structure.

Solution:
Use cloudpickle.register_pickle_by_value() in the generated Agent Engine app template to force value-based serialization for the agent module and all its submodules. This ensures custom BaseLlm classes are serialized with their full definition, making them available at Agent Engine runtime.

Additionally, add pre-deployment validation to catch issues early with clear error messages.

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

======================== 39 passed, 1 warning in 1.26s =========================

New tests added:

TestValidateAgentImport (10 tests):

  • test_skips_config_agents - Config agents skip validation
  • test_raises_on_missing_agent_module - Error when agent.py is missing
  • test_raises_on_missing_export - Error when root_agent/app export is missing
  • test_success_with_root_agent_export - Success with root_agent
  • test_success_with_app_export - Success with app
  • test_raises_on_import_error - Helpful message on ImportError
  • test_raises_on_basellm_import_error - Specific guidance for BaseLlm errors
  • test_raises_on_syntax_error - Error on syntax errors
  • test_cleans_up_sys_modules - Cleanup verification
  • test_restores_sys_path - sys.path restoration verification

TestValidateAgentObject (6 tests):

  • test_skips_app_export - Skips validation for 'app' exports
  • test_warns_on_non_baseagent - Warns for non-BaseAgent objects
  • test_skips_string_models - Skips validation when model is a string
  • test_validates_custom_basellm_serialization - Validates serializable custom BaseLlm
  • test_raises_on_non_serializable_custom_basellm - Raises on non-serializable custom BaseLlm
  • test_skips_builtin_models - Skips check for built-in ADK models

TestAgentEngineAppTemplate (5 tests):

  • test_template_includes_cloudpickle_imports - Template imports cloudpickle and sys
  • test_template_registers_agent_module_for_pickle_by_value - Registers agent module
  • test_template_registers_submodules_for_pickle_by_value - Registers submodules (clients/, tools/)
  • test_template_handles_non_registerable_modules - Handles non-registerable modules gracefully
  • test_template_skips_cloudpickle_for_config_agents - Config agents skip cloudpickle registration

TestCloudpickleSerializationFix (2 tests):

  • test_custom_basellm_in_submodule_can_be_serialized - Custom BaseLlm in submodule serializes correctly
  • test_agent_with_custom_basellm_can_be_serialized - Agent with custom BaseLlm serializes correctly

Manual End-to-End (E2E) Tests:

Test setup with custom BaseLlm in submodule:
agents/
├── init.py
├── agent.py
├── clients/
│ ├── init.py
│ └── custom_client.py # Custom BaseLlm implementation
└── requirements.txt

Scenario Before Fix After Fix
adk deploy agent_engine ✅ Deploys ✅ Deploys
agent.stream_query() ❌ "query method not found" ✅ Works correctly
agent.async_stream_query() ❌ "query method not found" ✅ Works correctly

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Changes made:

  1. _AGENT_ENGINE_APP_TEMPLATE - Added cloudpickle registration:
    import cloudpickle
    from . import agent as _agent_module
    cloudpickle.register_pickle_by_value(_agent_module)
    # Also register submodules (clients/, tools/, etc.)
    for name, module in list(sys.modules.items()):
      if module is not None and name.startswith(_agent_module.__name__.rsplit('.', 1)[0] + '.'):
        try:
          cloudpickle.register_pickle_by_value(module)
        except Exception:
          pass
  2. _validate_agent_import() - Pre-deployment validation that checks:
    - Agent module exists and can be imported
    - Expected export (root_agent or app) is present
    - Provides specific guidance for BaseLlm-related import errors
  3. _validate_agent_object() - Deep validation that checks:
    - Custom BaseLlm implementations can be serialized by cloudpickle
    - Warns about module path structure for proper relative imports

Adds pre-deployment validation to `adk deploy agent_engine` to catch
import errors (especially with custom BaseLlm implementations) before
deployment. This provides clearer error messages and prevents deployments
that would fail at runtime with cryptic "query method not found" errors.

The validation:
- Checks that agent.py exists and can be imported
- Verifies the expected export (root_agent or app) is present
- Provides specific guidance for BaseLlm-related import errors
- Properly cleans up sys.path and sys.modules after validation

Fixes google#4208
Fixes the "query method not found" error when deploying agents with
custom BaseLlm implementations via `adk deploy agent_engine`.

Root cause: cloudpickle serializes imported classes by reference (import
path) instead of by value (full code). When Agent Engine deserializes
the agent at runtime, it can't find the custom class because the import
path doesn't match.

The fix:
- Uses cloudpickle.register_pickle_by_value() to force value-based
  serialization for the agent module and all its submodules
- This ensures custom BaseLlm classes are serialized with their full
  definition, making them available at Agent Engine runtime

Also includes pre-deployment validation:
- Checks that agent.py exists and can be imported
- Verifies the expected export (root_agent or app) is present
- Validates custom BaseLlm implementations can be serialized
- Provides specific guidance for BaseLlm-related errors
- Properly cleans up sys.path and sys.modules after validation

Fixes google#4208
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ppgranger, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical deployment issue for agents utilizing custom BaseLlm implementations within the Agent Engine. Previously, such deployments would fail at runtime due to cloudpickle's default serialization behavior. The core change ensures that custom LLM classes are serialized by value, making them properly accessible in the deployed environment. Furthermore, the PR significantly enhances the developer experience by integrating robust pre-deployment validation, which proactively identifies and guides users through common configuration and serialization problems, preventing failed deployments and providing actionable feedback.

Highlights

  • Custom BaseLlm Serialization Fix: Implemented a fix to ensure custom BaseLlm implementations are correctly serialized by value using cloudpickle.register_pickle_by_value() within the Agent Engine app template. This resolves the 'query method not found' error encountered when deploying agents with custom LLM clients.
  • Pre-deployment Validation: Introduced new pre-deployment validation steps (_validate_agent_import and _validate_agent_object) to catch common issues early. This includes checking for agent module importability, presence of expected exports, and serializability of custom BaseLlm instances, providing clear error messages and guidance.
  • Enhanced Error Messaging: Improved error messages for import failures, especially those related to BaseLlm, offering specific advice on module structure, dependencies, and serialization requirements.
  • Comprehensive Unit Testing: Added extensive unit tests (23 new tests across four new test classes) to cover the new validation logic and confirm the cloudpickle serialization fix, ensuring robustness and preventing regressions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces crucial pre-deployment validation for Agent Engine deployments, particularly addressing issues with custom BaseLlm implementations and cloudpickle serialization. The changes include modifying the Agent Engine app template to register modules for pickle-by-value serialization and adding robust validation functions (_validate_agent_import and _validate_agent_object) to catch common errors early. The accompanying unit tests are comprehensive and cover various edge cases, ensuring the reliability of the new validation logic and the serialization fix. Overall, these changes significantly improve the developer experience by providing clearer error messages and preventing runtime failures.

Address code review feedback:
- Add logger.debug for module registration failures in generated
  Agent Engine app template to aid debugging
- Add logger.debug when ADK imports fail in _validate_agent_object
  to clarify why validation is skipped
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tools [Component] This issue is related to tools

Projects

None yet

Development

Successfully merging this pull request may close these issues.

adk deploy agent_engine with custom BaseLlm but query methods not registered - "Default method query not found"

2 participants