[improve][test] Speed up KeySharedSubscriptionTest from ~10min to ~3min#25445

Closed

merlimat wants to merge 1 commit intoapache:masterfrom

merlimat:improve/speed-up-key-shared-subscription-test

Contributor

merlimat commented Mar 31, 2026

Motivation

KeySharedSubscriptionTest was taking ~10 minutes to execute, mostly due to
unnecessary Thread.sleep calls, generous receive timeouts, and redundant
PulsarClient instance creation.

Modifications

Replace artificial delays with condition-based waits or remove them entirely:

Remove Thread.sleep(100) × 100 iterations in producer loop of
testStickyKeyRangesRestartConsumers (~10s saved)
Remove Thread.sleep(3000/5000/2000) calls, replace with Awaitility
conditions where needed (~25s saved)
Reduce ack timeout from 3s to 1s in createConsumer
Reduce TTL from 3s to 1s in testContinueDispatchMessagesWhenMessageTTL
Reduce deliverAfter from 10s to 1s in
testContinueDispatchMessagesWhenMessageDelayed
Reduce receive() timeouts from 1-3s to 200-500ms for negative assertions
and "no more messages" probing
Use shared pulsarClient in testMakingProgressWithSlowerConsumer instead
of creating 10 separate PulsarClient instances (~20s saved per variant)
Call redeliverUnacknowledgedMessages() in receiveAndCheck for immediate
redelivery instead of waiting for ack timeout

Verifying this change

All 105 tests in KeySharedSubscriptionTest pass (0 failures, 3 skipped).
Total execution time reduced from ~10 minutes to ~3 minutes.

Documentation

doc-not-needed

Matching PR in forked repository

No response

merlimat added area/test ready-to-test labels

lhotari reviewed

View reviewed changes

Member

lhotari left a comment

Perhaps we should split out very slow tests to a regression test suite that we only run in a scheduled job?
Some of the sleeps were required to trigger race conditions that reproduced real production issues. Without them, the tests won't catch regressions of these fixes in the future.

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java Show resolved Hide resolved

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

Comment on lines -1805 to -1807

-                      // this sleep is to trigger a race condition that happens
-                      // when there are some messages that cannot be dispatched while consuming
-                      Thread.sleep(3000);

Member

lhotari Mar 31, 2026

Some of the sleeps were required to trigger race conditions that reproduced real production issues. Without them, the tests won't catch regressions of these fixes in the future. Another option is to move very slow tests to only run in a scheduled job on the master branch to catch such regressions. That approach might be useful for some cases.

Contributor Author

merlimat Mar 31, 2026

Yes, it's not ideal.. though let me put the sleep back

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

                           remainingMessageValues.add(i);
                       }
-                      Thread.sleep(2 * pauseTime);

Member

lhotari Mar 31, 2026

These were also needed to reproduce the issue.

github-actions bot added the doc-not-needed label


          [improve][test] Speed up KeySharedSubscriptionTest from ~10min to ~3min

b9eb072

merlimat force-pushed the improve/speed-up-key-shared-subscription-test branch from 0d098e6 to b9eb072 Compare

March 31, 2026 17:52

lhotari reviewed

View reviewed changes

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

Comment on lines +2284 to +2285

		// Wait for messages to be dispatched and verify limit is respected
		Awaitility.await().untilAsserted(checkLimit::run);

Member

lhotari Mar 31, 2026

Using Awaitility here changes the original logic. The intention is to run the check several times and verify that the limit isn't exceeded. pauseTime is only 100ms so there shouldn't be a need to change the original logic.

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

-                      Thread.sleep(pauseTime);
-                      checkLimit.run();
+                      Awaitility.await().untilAsserted(checkLimit::run);

Member

lhotari Mar 31, 2026

Using Awaitility here changes the original logic. The intention is to run the check several times and verify that the limit isn't exceeded. pauseTime is only 100ms so there shouldn't be a need to change the original logic.

pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java

                               .subscriptionType(SubscriptionType.Key_Shared)
                               .messageListener((MessageListener<Integer>) (consumer1, msg) -> {
-                                  try {
-                                      Thread.sleep(random.nextInt(5));

Member

lhotari Mar 31, 2026 •

edited

Loading

IIRC, In this particular case, the random delay triggered the problem. removing this will change the original intention. It's my bad that this hasn't been properly documented initially in test code. I'd assume that the PR has some description.
The impact to the test duration will be around 2000 ms due to this 0-4ms random delay for 1000 messages.

Contributor Author

merlimat commented Apr 1, 2026

Closing for now

merlimat closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/test doc-not-needed ready-to-test