Skip to content

[SYSTEMDS-2651] Replace fixed-sleep federated worker startup with Poll#2468

Open
Baunsgaard wants to merge 1 commit into
apache:mainfrom
Baunsgaard:FederatedWorkerReadyPolling
Open

[SYSTEMDS-2651] Replace fixed-sleep federated worker startup with Poll#2468
Baunsgaard wants to merge 1 commit into
apache:mainfrom
Baunsgaard:FederatedWorkerReadyPolling

Conversation

@Baunsgaard
Copy link
Copy Markdown
Contributor

@Baunsgaard Baunsgaard commented May 15, 2026

WIP

This PR replace the thread.sleep with a poll based startup of federated workers in testing. The change helps our test suites to not have timeouts, or failures because of inconsistent launches of federated workers.

@github-project-automation github-project-automation Bot moved this to In Progress in SystemDS PR Queue May 15, 2026
@Baunsgaard Baunsgaard changed the title [SYSTEMDS-2651][] Replace fixed-sleep federated worker startup with Poll [SYSTEMDS-2651] Replace fixed-sleep federated worker startup with Poll May 15, 2026
…rtup

Replace fixed Thread.sleep after each federated worker start with TCP
port polling that returns as soon as the worker accepts a connection.
Add bulk helpers that spawn N workers in parallel and wait once for the
slowest to become ready, instead of summing per-worker waits.

Cuts the federated CI total by ~7 min (-5%) vs main, with the biggest
wins in setup-heavy suites such as transform+fedplanner (-66%) and
codegen (-25%).

Closes apache#2468.
@Baunsgaard Baunsgaard force-pushed the FederatedWorkerReadyPolling branch from 8804921 to 0c830d4 Compare May 18, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant