Skip to content

new argument: drain_backlog_during_run (default: True) in file_list_pipeline#11

Draft
sfc-gh-kdintakurthi wants to merge 1 commit intoCrunchyData:mainfrom
sfc-gh-kdintakurthi:main
Draft

new argument: drain_backlog_during_run (default: True) in file_list_pipeline#11
sfc-gh-kdintakurthi wants to merge 1 commit intoCrunchyData:mainfrom
sfc-gh-kdintakurthi:main

Conversation

@sfc-gh-kdintakurthi
Copy link
Copy Markdown

@sfc-gh-kdintakurthi sfc-gh-kdintakurthi commented Apr 2, 2026

Current behavior: When file_list_pipeline schedule for running, it will go through all files (both in batch/non-batch mode) to drain; in one transaction.

Made the new parameter: drain_backlog_during_run as True, to retain the behavior

If set to False, it won't drain the backlog. Backlog will be picked up in the next run. This would help reduce the transaction size as well, other pipelines will get chance to do processing with this approach

… one can control whether to drain the backlog in the current run or not
max_batch_size int default 100,
schedule text default '*/15 * * * *',
execute_immediately bool default true,
drain_backlog_during_run bool default true)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have something like "max_batches_per_run" and default to -1? (no limit)

Gives a bit more flexibility and I think "draining" and "backlog" are not explicitly defined concepts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants