Skip to content

[improve][test] Speed up KeySharedSubscriptionTest from ~10min to ~3min#25445

Closed
merlimat wants to merge 1 commit intoapache:masterfrom
merlimat:improve/speed-up-key-shared-subscription-test
Closed

[improve][test] Speed up KeySharedSubscriptionTest from ~10min to ~3min#25445
merlimat wants to merge 1 commit intoapache:masterfrom
merlimat:improve/speed-up-key-shared-subscription-test

Conversation

@merlimat
Copy link
Copy Markdown
Contributor

Motivation

KeySharedSubscriptionTest was taking ~10 minutes to execute, mostly due to
unnecessary Thread.sleep calls, generous receive timeouts, and redundant
PulsarClient instance creation.

Modifications

Replace artificial delays with condition-based waits or remove them entirely:

  • Remove Thread.sleep(100) × 100 iterations in producer loop of
    testStickyKeyRangesRestartConsumers (~10s saved)
  • Remove Thread.sleep(3000/5000/2000) calls, replace with Awaitility
    conditions where needed (~25s saved)
  • Reduce ack timeout from 3s to 1s in createConsumer
  • Reduce TTL from 3s to 1s in testContinueDispatchMessagesWhenMessageTTL
  • Reduce deliverAfter from 10s to 1s in
    testContinueDispatchMessagesWhenMessageDelayed
  • Reduce receive() timeouts from 1-3s to 200-500ms for negative assertions
    and "no more messages" probing
  • Use shared pulsarClient in testMakingProgressWithSlowerConsumer instead
    of creating 10 separate PulsarClient instances (~20s saved per variant)
  • Call redeliverUnacknowledgedMessages() in receiveAndCheck for immediate
    redelivery instead of waiting for ack timeout

Verifying this change

All 105 tests in KeySharedSubscriptionTest pass (0 failures, 3 skipped).
Total execution time reduced from ~10 minutes to ~3 minutes.

Documentation

  • doc-not-needed

Matching PR in forked repository

No response

Copy link
Copy Markdown
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should split out very slow tests to a regression test suite that we only run in a scheduled job?
Some of the sleeps were required to trigger race conditions that reproduced real production issues. Without them, the tests won't catch regressions of these fixes in the future.

Comment on lines -1805 to -1807
// this sleep is to trigger a race condition that happens
// when there are some messages that cannot be dispatched while consuming
Thread.sleep(3000);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the sleeps were required to trigger race conditions that reproduced real production issues. Without them, the tests won't catch regressions of these fixes in the future. Another option is to move very slow tests to only run in a scheduled job on the master branch to catch such regressions. That approach might be useful for some cases.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not ideal.. though let me put the sleep back

remainingMessageValues.add(i);
}

Thread.sleep(2 * pauseTime);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were also needed to reproduce the issue.

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Mar 31, 2026
@merlimat merlimat force-pushed the improve/speed-up-key-shared-subscription-test branch from 0d098e6 to b9eb072 Compare March 31, 2026 17:52
Comment on lines +2284 to +2285
// Wait for messages to be dispatched and verify limit is respected
Awaitility.await().untilAsserted(checkLimit::run);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Awaitility here changes the original logic. The intention is to run the check several times and verify that the limit isn't exceeded. pauseTime is only 100ms so there shouldn't be a need to change the original logic.


Thread.sleep(pauseTime);
checkLimit.run();
Awaitility.await().untilAsserted(checkLimit::run);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Awaitility here changes the original logic. The intention is to run the check several times and verify that the limit isn't exceeded. pauseTime is only 100ms so there shouldn't be a need to change the original logic.

.subscriptionType(SubscriptionType.Key_Shared)
.messageListener((MessageListener<Integer>) (consumer1, msg) -> {
try {
Thread.sleep(random.nextInt(5));
Copy link
Copy Markdown
Member

@lhotari lhotari Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, In this particular case, the random delay triggered the problem. removing this will change the original intention. It's my bad that this hasn't been properly documented initially in test code. I'd assume that the PR has some description.
The impact to the test duration will be around 2000 ms due to this 0-4ms random delay for 1000 messages.

@merlimat
Copy link
Copy Markdown
Contributor Author

merlimat commented Apr 1, 2026

Closing for now

@merlimat merlimat closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/test doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants