Skip to content

Conversation

@lwangverizon
Copy link
Contributor

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

  • N/A (No existing issue)

2. Or, if no issue exists, describe the change:

Problem:
Event compaction was running synchronously and blocking runner.run_async() exit, causing significant delays on the frontend. When compaction was enabled, the async generator would not complete until compaction finished, which could take several seconds because compaction involves:

  • LLM API calls for event summarization (maybe_summarize_events) - typically taking 1-3 seconds per compaction
  • Database writes (append_event) - adding additional latency

Impact:
Even though all agent events had already been yielded to the frontend, the generator would not complete until compaction finished. This meant:

  • Frontend had to wait for compaction to complete before receiving the completion signal
  • User-perceived latency increased by the compaction duration (often 1-3+ seconds)
  • Poor user experience, especially noticeable in interactive applications
  • Compaction, intended as a background maintenance task, was blocking user-facing responses

Additional Issue:
Under high concurrency scenarios, there was no mechanism to limit concurrent compaction tasks, which could lead to:

  • Resource exhaustion (too many concurrent LLM API calls hitting rate limits)
  • Database connection pool exhaustion
  • Unbounded background task accumulation
  • Potential service degradation under load

Solution:
This PR introduces a solution to make event compaction truly non-blocking, improving application performance by eliminating the blocking delay before runner.run_async() exits.

  1. Made compaction non-blocking: Changed compaction from synchronous await to an asynchronous background task using asyncio.create_task(). This allows:

    • The generator to complete immediately after yielding all events
    • Frontend to receive the completion signal without waiting for compaction
    • Compaction to run asynchronously in the background without blocking user-facing responses
    • Performance improvement: Eliminates 1-3+ second delays caused by LLM calls during compaction
  2. Added concurrency control: Introduced a configurable max_concurrent_compactions parameter (default: 10) to the Runner class that uses a semaphore to limit concurrent compaction tasks. This prevents:

    • Resource exhaustion under high concurrency
    • LLM API rate limit violations
    • Database connection pool exhaustion
    • Unbounded background task accumulation
  3. Improved error handling: Wrapped compaction in comprehensive error handling so failures:

    • Don't crash the runner
    • Are logged appropriately for debugging
    • Don't affect user responses
  4. Updated documentation: Updated docstrings to accurately reflect that compaction runs asynchronously and no longer blocks generator completion.

Key Improvement:
The solution transforms compaction from a blocking operation (that delayed frontend responses) into a truly asynchronous background task, significantly improving application responsiveness while maintaining all compaction functionality.

The solution maintains backward compatibility (default behavior works for most scenarios) while providing fine-grained control for production environments with different resource constraints.

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Test Results:

tests/unittests/test_runners.py::TestRunnerCompaction::test_max_concurrent_compactions_default_value PASSED
tests/unittests/test_runners.py::TestRunnerCompaction::test_max_concurrent_compactions_custom_value PASSED
tests/unittests/test_runners.py::TestRunnerCompaction::test_max_concurrent_compactions_shared_across_instances PASSED
tests/unittests/test_runners.py::TestRunnerCompaction::test_max_concurrent_compactions_validation PASSED
tests/unittests/test_runners.py::TestRunnerCompaction::test_compaction_runs_in_background_non_blocking PASSED
tests/unittests/test_runners.py::TestRunnerCompaction::test_compaction_semaphore_limits_concurrency PASSED
tests/unittests/test_runners.py::TestRunnerCompaction::test_compaction_error_does_not_block_generator PASSED
tests/unittests/test_runners.py::TestRunnerCompaction::test_compaction_not_run_when_config_missing PASSED

======================== 8 passed, 6 warnings in 2.14s =========================

Test Coverage:

  • Configuration: Default values, custom values, validation, shared semaphore
  • Non-blocking behavior: Generator completes before compaction finishes
  • Concurrency control: Semaphore limits concurrent compactions
  • Error handling: Compaction errors don't block generator
  • Edge cases: Missing config, timing verification

Manual End-to-End (E2E) Tests:

Setup:

  1. Create an app with event compaction enabled:
from google.adk import Agent
from google.adk.apps import App
from google.adk.apps.app import EventsCompactionConfig
from google.adk.runners import Runner

app = App(
    name='test_app',
    root_agent=Agent(model='gemini-2.0-flash', name='test_agent'),
    events_compaction_config=EventsCompactionConfig(
        compaction_interval=2,
        overlap_size=1,
    ),
)

# Test with default limit
runner = Runner(
    app=app,
    session_service=session_service,
    artifact_service=artifact_service,
)

# Test with custom limit
runner_custom = Runner(
    app=app,
    session_service=session_service,
    artifact_service=artifact_service,
    max_concurrent_compactions=5,
)

Manual Testing Steps:

  1. Non-blocking behavior: Run multiple invocations and observe that the generator completes immediately (within milliseconds) while compaction runs in the background. Verify frontend receives completion signal without delay.

  2. Concurrency limiting: Under high load (multiple concurrent requests), verify that compaction tasks are limited by the semaphore. Monitor resource usage (LLM API calls, DB connections) to ensure they don't exceed limits.

  3. Error handling: Simulate compaction failures (e.g., network errors) and verify that:

    • Generator still completes successfully
    • Errors are logged but don't crash the runner
    • User responses are not affected
  4. Configuration validation: Test invalid max_concurrent_compactions values (0, -1) and verify ValueError is raised.

Expected Results:

  • Generator completes immediately after yielding all events
  • Compaction runs in background without blocking
  • Frontend receives responses without delay
  • Concurrent compactions are limited by semaphore
  • Errors are handled gracefully

Sample Code:
See contributing/samples/compaction_config_example/ for complete examples demonstrating the new features.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

Additional context

Files Changed:

  • src/google/adk/runners.py: Added non-blocking compaction, semaphore-based concurrency control, and max_concurrent_compactions parameter
  • tests/unittests/test_runners.py: Added comprehensive test suite (TestRunnerCompaction class with 8 tests)
  • contributing/samples/compaction_config_example/: Added sample code demonstrating the new features

Key Implementation Details:

  1. Background Task: Compaction runs via asyncio.create_task() with error handling
  2. Semaphore: Class-level semaphore shared across all Runner instances for global concurrency control
  3. Default Value: 10 concurrent compactions (reasonable default for most scenarios)
  4. Validation: Parameter must be positive integer (raises ValueError if invalid)

Performance Impact:

  • Positive: Frontend responsiveness significantly improved (generator completes immediately)
  • Positive: Resource usage controlled under high concurrency
  • Neutral: Compaction still happens, just asynchronously
  • Minimal: Semaphore overhead is negligible

Backward Compatibility:

  • Fully backward compatible: Default behavior works for existing code
  • Optional parameter: max_concurrent_compactions is optional with sensible default
  • No API changes: Only adds optional parameter, doesn't change existing behavior

Production Recommendations:

  • Default (10): Works well for most scenarios
  • Low-traffic: 5-10 concurrent compactions
  • High-traffic: 15-20 concurrent compactions (ensure infrastructure can handle)
  • Resource-constrained: 2-5 concurrent compactions

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lwangverizon, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance and stability of the application by transforming event compaction from a blocking operation into an asynchronous background process. Previously, compaction, which involves LLM API calls and database writes, caused noticeable delays for users. The changes introduce concurrency control to manage resource usage under high load and robust error handling to prevent service disruptions, ultimately leading to a more responsive and resilient user experience.

Highlights

  • Non-blocking Event Compaction: Event compaction now runs as an asynchronous background task using asyncio.create_task(), preventing delays in frontend responses and allowing the generator to complete immediately.
  • Concurrency Control: A new max_concurrent_compactions parameter (default: 10) and a shared asyncio.Semaphore have been introduced to limit concurrent compaction tasks, mitigating resource exhaustion and API rate limit issues.
  • Improved Error Handling: Compaction failures are now gracefully handled and logged, ensuring they do not crash the runner or affect user-facing responses.
  • Updated Documentation: Docstrings for the run and run_async methods have been updated to accurately reflect the non-blocking nature of event compaction.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Jan 30, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great improvement, making event compaction non-blocking and adding concurrency controls. The changes significantly enhance performance and robustness by moving the synchronous compaction process to a background task, managed by a semaphore to prevent resource exhaustion. The implementation is well-thought-out, with good error handling and comprehensive tests. I've included a few suggestions to further improve robustness, such as ensuring background tasks are not prematurely garbage collected and enhancing thread safety. Overall, this is an excellent contribution.

@lwangverizon lwangverizon marked this pull request as ready for review January 30, 2026 20:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants