new argument: drain_backlog_during_run (default: True) in file_list_pipeline#11
Draft
sfc-gh-kdintakurthi wants to merge 1 commit intoCrunchyData:mainfrom
Draft
new argument: drain_backlog_during_run (default: True) in file_list_pipeline#11sfc-gh-kdintakurthi wants to merge 1 commit intoCrunchyData:mainfrom
sfc-gh-kdintakurthi wants to merge 1 commit intoCrunchyData:mainfrom
Conversation
… one can control whether to drain the backlog in the current run or not
sfc-gh-mslot
reviewed
Apr 2, 2026
| max_batch_size int default 100, | ||
| schedule text default '*/15 * * * *', | ||
| execute_immediately bool default true, | ||
| drain_backlog_during_run bool default true) |
Collaborator
There was a problem hiding this comment.
could we have something like "max_batches_per_run" and default to -1? (no limit)
Gives a bit more flexibility and I think "draining" and "backlog" are not explicitly defined concepts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Current behavior: When file_list_pipeline schedule for running, it will go through all files (both in batch/non-batch mode) to drain; in one transaction.
Made the new parameter: drain_backlog_during_run as True, to retain the behavior
If set to False, it won't drain the backlog. Backlog will be picked up in the next run. This would help reduce the transaction size as well, other pipelines will get chance to do processing with this approach