Skip to content

CoalescePartitionsExec delays cancellation of child operators #22016

@Samyak2

Description

@Samyak2

Describe the bug

  • CoalescePartitionsExec tasks hold on to an Arc ref of the input plan (ref)
    • It doesn't actually need it after input.execute(..) except for printing the plan in debug logs
  • RepartitionExec has some state in the plan itself
    • One such state holds on to Arc refs of Arc<Vec<SpawnedTask<()>>> (ref)
    • Which means that the tasks are only cancelled once all Arc refs to the plan are dropped
    • So each layer of CoalescePartitionsExec-RepartitionExec delays cancellation of the query

To Reproduce

I have a reproducer here: Samyak2#1 (warning: mostly LLM-generated, but I have verified that it actually checks the correct thing)

Relevant parts of the output:

repartition_task_group=0 input_partition=0 kind=pull_from_input drop_elapsed_ms=68
repartition_task_group=1 input_partition=0 kind=pull_from_input drop_elapsed_ms=80
repartition_task_group=1 input_partition=1 kind=pull_from_input drop_elapsed_ms=85
output_partitions=32 input_rows_per_partition=1024000 all_repartition_operator_drop_elapsed_ms=80
all_repartition_task_drop_elapsed_ms=85
all_observed_drop_elapsed_ms=85

The cancellation is delayed by ~80ms due to CoalescePartitionsExec

Expected behavior

CoalescePartitionsExec should drop child plan early

Additional context

I have a fix for this. Will raise a PR soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions