Skip to content

[Swarming] Push preprocess tasks to swarming queue#5282

Open
IvanBM18 wants to merge 6 commits into
masterfrom
feature/swarming/swarming_cron_job_update
Open

[Swarming] Push preprocess tasks to swarming queue#5282
IvanBM18 wants to merge 6 commits into
masterfrom
feature/swarming/swarming_cron_job_update

Conversation

@IvanBM18
Copy link
Copy Markdown
Collaborator

@IvanBM18 IvanBM18 commented May 18, 2026

Overview

This change adds support for scheduling tasks to the new Swarming backend. Because Swarming uses a different execution model and to be able to later account for backpressure it requires its own separate preprocess queue and a much lower default target size (5) to prevent unbounded task queuing.

By refactoring the cron scheduling logic, we can now simultaneously feed both the Swarming and Batch environments at their respective ideal rates.

Changes

  • Add a distinct SWARMING_PREPROCESS_TARGET_SIZE_DEFAULT set to 5 to support the Swarming backend's task capacity needs.
  • Define Swarming-specific queue mappings (SWARMING_QUEUES).
  • Refactored the scheduler_fuzz so that the schedulers are composed of multiple helper functions to avoid complex inheritance:
    • Renamed them to Providers, as they no longer schedule tasks, they only look for them
  • Update ChromeFuzzTaskScheduler to independently schedule Swarming tasks alongside standard Batch tasks.
    • Now it also looks for Android jobs as well
  • Update schedule_fuzz_test.py to match the updated scheduler class signatures.

TODO

  • src/clusterfuzz/_internal/base/feature_flags.py: Update this value based off dev & stage metrics and tests.

@IvanBM18 IvanBM18 requested a review from a team as a code owner May 18, 2026 05:28
@IvanBM18 IvanBM18 force-pushed the feature/swarming/swarming_cron_job_update branch from 4993e5d to 99bf603 Compare May 18, 2026 05:31
from clusterfuzz._internal.metrics import logs

PREPROCESS_TARGET_SIZE_DEFAULT = 10000
SWARMING_PREPROCESS_TARGET_SIZE_DEFAULT = 5
Copy link
Copy Markdown
Collaborator Author

@IvanBM18 IvanBM18 May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Swarming pool has a hard limit of 25 (LINUX) bots running 1 task each.

  • At average 1 hour per fuzzing/swarming task, those 25 bots can finish 4 to 5 tasks every 10 minutes (the interval the cron job runs)
  • Because the 2,000(in prod) preprocess tworkers almost instantly process the preprocess queue, the target size acts as an injection rate more than a buffer.

So, Injecting 5 tasks every 10 minutes matches the expected Swarming rate, preventing an infinitely growing backlog of stale tasks. This is still the default value, the real value is managed trough a feature flag, we will later tweak this feature flag based on metrics & how swarming handled this workload, so that we have a more acqurate value

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prevent the infinitely-growing queue of tasks using CountTasks and a pretty low limit on the utask_main queue size, such that we can aim for a bit more saturation here (e.g. 10 tasks, such that we never expect the queue to be empty). This works for now though.

@IvanBM18 IvanBM18 self-assigned this May 18, 2026
@IvanBM18 IvanBM18 added the swarming Changes related to the clusterfuzz-swarming integration label May 18, 2026
Copy link
Copy Markdown
Collaborator

@fernandofloresg fernandofloresg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just had one question

Comment thread src/clusterfuzz/_internal/base/feature_flags.py
@IvanBM18 IvanBM18 changed the title Add support for Swarming preprocess queue and task scheduling [Swarming] Push & pull preprocess tasks to swarming queue May 20, 2026
Copy link
Copy Markdown
Collaborator

@letitz letitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big comment about reworking the main logic here.

Comment thread src/clusterfuzz/_internal/base/feature_flags.py
PREPROCESS_QUEUE_SIZE_LIMIT = 'preprocess_queue_size_limit'

SWARMING_REMOTE_EXECUTION = 'swarming_remote_execution'
# TODO(ibarba): Set this value based off dev & stage metrics and tests.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO should reference a bug link instead.

Copy link
Copy Markdown
Collaborator Author

@IvanBM18 IvanBM18 May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree its best to link to a bug, i previously asked gemini if we can add any go link or bug references in open source code/reviews in github, basically it said:

Throughout the repos you maintain (READMEs, code comments, etc.): Don't put any Google internal info, including go/ and b/ links. If you're sure the referenced content can be public, move it to GitHub issues or README or Wiki

Although im aware ai can allucinate and im not a open source expert, but as far as i know, thats the reason for this repo be full of #TODO(metzman): comments

So your call, can we include bug links in here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javanlacerda or @ViniciustCosta would know I'm sure!

Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py Outdated
Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py Outdated
from clusterfuzz._internal.metrics import logs

PREPROCESS_TARGET_SIZE_DEFAULT = 10000
SWARMING_PREPROCESS_TARGET_SIZE_DEFAULT = 5
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should prevent the infinitely-growing queue of tasks using CountTasks and a pretty low limit on the utask_main queue size, such that we can aim for a bit more saturation here (e.g. 10 tasks, such that we never expect the queue to be empty). This works for now though.

@IvanBM18 IvanBM18 changed the title [Swarming] Push & pull preprocess tasks to swarming queue [Swarming] Push preprocess tasks to swarming queue May 21, 2026
@IvanBM18 IvanBM18 requested a review from letitz May 22, 2026 08:39
Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py Outdated
Copy link
Copy Markdown
Collaborator

@letitz letitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, a bunch of small comments left to address.

Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py
Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py Outdated
Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py Outdated
Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py Outdated
Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py Outdated
Comment thread src/clusterfuzz/_internal/cron/schedule_fuzz.py Outdated
Comment thread src/clusterfuzz/_internal/swarming/__init__.py Outdated
"""Returns True if the job environment contains swarming env vars."""
return bool(
job_environment and
(utils.string_is_true(job_environment.get('IS_SWARMING_JOB')) or
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use IS_SWARMING_JOB anywhere? Can we remove it for simplicity? (this can be in a followup)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do, each time we define a new job we don't necessarily need specific swarming dimensions for it, so for this cases we have been using IS_SWARMING_JOB instead.

Comment thread src/clusterfuzz/_internal/tests/appengine/handlers/cron/schedule_fuzz_test.py Outdated
Comment thread src/clusterfuzz/_internal/tests/core/swarming/swarming_test.py
@IvanBM18 IvanBM18 requested a review from letitz May 22, 2026 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

swarming Changes related to the clusterfuzz-swarming integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants