-
Notifications
You must be signed in to change notification settings - Fork 600
Use private batch API to avoid infinite queing #5140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Added a mechanism to check current load (queued and scheduled jobs) in GCP Batch regions. The service now avoids scheduling tasks in regions with a high number of pending jobs (currently thresholded at 5). This is configurable via `queue_check_regions` in `batch.yaml`. This helps in distributing the load more evenly and avoiding delays when some regions are heavily utilized.
Added a mechanism to check current load (queued and scheduled jobs) in GCP Batch regions. The service now avoids scheduling tasks in regions with a high number of pending jobs (currently thresholded at 50). This is configurable via `queue_check_regions` in `batch.yaml`. Includes comprehensive unit tests for the load-balancing logic and API interaction.
Added a mechanism to check current load (queued and scheduled jobs) in GCP Batch regions. The service now avoids scheduling tasks in regions with a high number of pending jobs (currently thresholded at 50). This is configurable via `queue_check_regions` in `batch.yaml`. Includes comprehensive unit tests for the load-balancing logic and API interaction.
… bottleneck and fix bug filing backlog.
# Conflicts: # src/clusterfuzz/_internal/batch/service.py # src/clusterfuzz/_internal/tests/core/batch/batch_service_test.py
…e O(N^2) bottleneck and fix bug filing backlog." This reverts commit ad01cdd.
| @@ -1,8 +1,8 @@ | |||
| # Copyright 2025 Google LLC | |||
| # | |||
| # Licensed under the Apache License, Version 2.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done by gemini but I think I'm reverting it to the correct form.
ViniciustCosta
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm with nits
| # See https://cloud.google.com/batch/quotas#job_limits | ||
| MAX_CONCURRENT_VMS_PER_JOB = 1000 | ||
|
|
||
| MAX_QUEUE_SIZE = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be a config per project?
|
|
||
| if region in queue_check_regions: | ||
| load = get_region_load(project, region) | ||
| logs.info(f'Region {region} has {load} queued/scheduled jobs.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe return like -1 when the get_region_load() fails and check it before logging to avoid confusion by saying that the region has 0 queued.
No description provided.