Resolve race condition when starting AsyncSniffer #4912
+1
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When starting an
AsyncSniffer, one can provide astarted_callbackwhich is called when the sniffer starts sniffing. After this callback has been called it can reasonably be assumed that the sniffer is indeed running and hence can be stopped.We have existing automated tests using
AsyncSniffervia a wrapping context manager, for convenience of writing those tests: when entering the context the sniffer is started and when leaving the context again the sniffer is stopped. The context manager waits until the callback is called before finishing its__enter__, with the callback signalling aBarrierto inform the context manager about this.In particularly short tests, which almost immediately close the sniffer again, it turned out that this can cause the sniffer to hang or exceptions to be raised. This happens when the following events occurs, in this very order:
AsyncSnifferAsyncSnifferAsyncSnifferthread setsself.continue_snifftoTrueNote the thread switches: steps 1, 2 and 6 execute on the
AsyncSnifferthread, steps 3 through 5 on the test's primary thread.We can hit this race condition with fair reproducibility (tens of percents - no exact numbers known). Calling
started_callbackonly after settingself.continue_sniffmakes this entirely stable.I have not included a test for this, as it is impossible hit this race condition anywhere near reliably (we have exacerbating conditions in our particular setup).