Skip to content

Conversation

@jonathanmetzman
Copy link
Collaborator

No description provided.

Added a mechanism to check current load (queued and scheduled jobs) in GCP Batch regions.
The service now avoids scheduling tasks in regions with a high number of pending jobs
(currently thresholded at 5). This is configurable via `queue_check_regions` in
`batch.yaml`.

This helps in distributing the load more evenly and avoiding delays when some
regions are heavily utilized.
Added a mechanism to check current load (queued and scheduled jobs) in GCP Batch regions.
The service now avoids scheduling tasks in regions with a high number of pending jobs
(currently thresholded at 50). This is configurable via `queue_check_regions` in
`batch.yaml`.

Includes comprehensive unit tests for the load-balancing logic and API interaction.
Added a mechanism to check current load (queued and scheduled jobs) in GCP Batch regions.
The service now avoids scheduling tasks in regions with a high number of pending jobs
(currently thresholded at 50). This is configurable via `queue_check_regions` in
`batch.yaml`.

Includes comprehensive unit tests for the load-balancing logic and API interaction.
# Conflicts:
#	src/clusterfuzz/_internal/batch/service.py
#	src/clusterfuzz/_internal/tests/core/batch/batch_service_test.py
…e O(N^2) bottleneck and fix bug filing backlog."

This reverts commit ad01cdd.
@@ -1,8 +1,8 @@
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done by gemini but I think I'm reverting it to the correct form.

Copy link
Collaborator

@ViniciustCosta ViniciustCosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm with nits

# See https://cloud.google.com/batch/quotas#job_limits
MAX_CONCURRENT_VMS_PER_JOB = 1000

MAX_QUEUE_SIZE = 50
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be a config per project?


if region in queue_check_regions:
load = get_region_load(project, region)
logs.info(f'Region {region} has {load} queued/scheduled jobs.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe return like -1 when the get_region_load() fails and check it before logging to avoid confusion by saying that the region has 0 queued.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants