[improve][test] Speed up KeySharedSubscriptionTest from ~10min to ~3min#25445
[improve][test] Speed up KeySharedSubscriptionTest from ~10min to ~3min#25445merlimat wants to merge 1 commit intoapache:masterfrom
Conversation
lhotari
left a comment
There was a problem hiding this comment.
Perhaps we should split out very slow tests to a regression test suite that we only run in a scheduled job?
Some of the sleeps were required to trigger race conditions that reproduced real production issues. Without them, the tests won't catch regressions of these fixes in the future.
pulsar-broker/src/test/java/org/apache/pulsar/client/api/KeySharedSubscriptionTest.java
Show resolved
Hide resolved
| // this sleep is to trigger a race condition that happens | ||
| // when there are some messages that cannot be dispatched while consuming | ||
| Thread.sleep(3000); |
There was a problem hiding this comment.
Some of the sleeps were required to trigger race conditions that reproduced real production issues. Without them, the tests won't catch regressions of these fixes in the future. Another option is to move very slow tests to only run in a scheduled job on the master branch to catch such regressions. That approach might be useful for some cases.
There was a problem hiding this comment.
Yes, it's not ideal.. though let me put the sleep back
| remainingMessageValues.add(i); | ||
| } | ||
|
|
||
| Thread.sleep(2 * pauseTime); |
There was a problem hiding this comment.
These were also needed to reproduce the issue.
0d098e6 to
b9eb072
Compare
| // Wait for messages to be dispatched and verify limit is respected | ||
| Awaitility.await().untilAsserted(checkLimit::run); |
There was a problem hiding this comment.
Using Awaitility here changes the original logic. The intention is to run the check several times and verify that the limit isn't exceeded. pauseTime is only 100ms so there shouldn't be a need to change the original logic.
|
|
||
| Thread.sleep(pauseTime); | ||
| checkLimit.run(); | ||
| Awaitility.await().untilAsserted(checkLimit::run); |
There was a problem hiding this comment.
Using Awaitility here changes the original logic. The intention is to run the check several times and verify that the limit isn't exceeded. pauseTime is only 100ms so there shouldn't be a need to change the original logic.
| .subscriptionType(SubscriptionType.Key_Shared) | ||
| .messageListener((MessageListener<Integer>) (consumer1, msg) -> { | ||
| try { | ||
| Thread.sleep(random.nextInt(5)); |
There was a problem hiding this comment.
IIRC, In this particular case, the random delay triggered the problem. removing this will change the original intention. It's my bad that this hasn't been properly documented initially in test code. I'd assume that the PR has some description.
The impact to the test duration will be around 2000 ms due to this 0-4ms random delay for 1000 messages.
|
Closing for now |
Motivation
KeySharedSubscriptionTestwas taking ~10 minutes to execute, mostly due tounnecessary
Thread.sleepcalls, generous receive timeouts, and redundantPulsarClientinstance creation.Modifications
Replace artificial delays with condition-based waits or remove them entirely:
Thread.sleep(100)× 100 iterations in producer loop oftestStickyKeyRangesRestartConsumers(~10s saved)Thread.sleep(3000/5000/2000)calls, replace with Awaitilityconditions where needed (~25s saved)
createConsumertestContinueDispatchMessagesWhenMessageTTLdeliverAfterfrom 10s to 1s intestContinueDispatchMessagesWhenMessageDelayedreceive()timeouts from 1-3s to 200-500ms for negative assertionsand "no more messages" probing
pulsarClientintestMakingProgressWithSlowerConsumerinsteadof creating 10 separate
PulsarClientinstances (~20s saved per variant)redeliverUnacknowledgedMessages()inreceiveAndCheckfor immediateredelivery instead of waiting for ack timeout
Verifying this change
All 105 tests in
KeySharedSubscriptionTestpass (0 failures, 3 skipped).Total execution time reduced from ~10 minutes to ~3 minutes.
Documentation
doc-not-neededMatching PR in forked repository
No response