Bug in SpecificDiscoveryStore breaks PubSub communication#294
Bug in SpecificDiscoveryStore breaks PubSub communication#294KonradBreitsprecherBkd wants to merge 7 commits intomainfrom
Conversation
Signed-off-by: Konrad Breitsprecher <Konrad.Breitsprecher@vector.com>
Signed-off-by: Konrad Breitsprecher <Konrad.Breitsprecher@vector.com>
Signed-off-by: Marius Börschig <Marius.Boerschig@vector.com>
Signed-off-by: Marius Börschig <Marius.Boerschig@vector.com>
Signed-off-by: Marius Börschig <Marius.Boerschig@vector.com>
a0df15e to
e4edc7f
Compare
fix data races Signed-off-by: Marius Börschig <Marius.Boerschig@vector.com>
reorder allReceived atomic bool Signed-off-by: Marius Börschig <Marius.Boerschig@vector.com>
| auto& not_label_nodes = keyNode.notLabelMap[l.key].nodes; | ||
|
|
||
| size_t relevantNodeCount = fit_nodes.size() + not_label_nodes.size(); | ||
| if (relevantNodeCount < matchCount) |
There was a problem hiding this comment.
So this just matches the last label to have any nodes in the cluster instead of the one with the least nodes?
There was a problem hiding this comment.
I think we need the matchCount condition and > 0
There was a problem hiding this comment.
I think we need the matchCount condition and > 0. The bug was that the outgreedylabel that was found here in the specific label setup (reproduced by the test) did not contain any handers (of the subscriber) and thus the follow up logic to finish the pubsub connection never happened.
Maybe the dict here should never have been populated to get into this situation, at least the >0 prevents the bug.
Also, for "symmetry reasons" there might be the same situation for mandatory Labels a few lines above.
To stir it up a little more, we might need at least one person who proudly says "I understand what's happening here" otherwise we have a black box algorithm in a central unit that was "introduced for performance reasons by a former colleague". If there is no xkcd we should make one...
There was a problem hiding this comment.
I think we need the matchCount condition and > 0. The bug was that the outgreedylabel that was found here in the specific label setup (reproduced by the test) did not contain any handers (of the subscriber) and thus the follow up logic to finish the pubsub connection never happened.
Maybe the dict here should never have been populated to get into this situation, at least the >0 prevents the bug.
Also, for "symmetry reasons" there might be the same situation for mandatory Labels a few lines above.
To stir it up a little more, we might need at least one person who proudly says "I understand what's happening here" otherwise we have a black box algorithm in a central unit that was "introduced for performance reasons by a former colleague". If there is no xkcd we should make one...
I fully agree - I think i'll try to make a clean room implementation of the label matching code next week.
This PR is no show stopper for the upcoming 5.0.3 release, though.
There was a problem hiding this comment.
That sounds great. You can use https://github.com/vectorgrp/sil-kit/blob/main/SilKit%2FIntegrationTests%2FFTest_PubSubPerf.cpp to check performance degradation. I think there's a commented test set for production that runs a while but pushes the pubsub code to it's limits..
There was a problem hiding this comment.
That sounds great. You can use https://github.com/vectorgrp/sil-kit/blob/main/SilKit%2FIntegrationTests%2FFTest_PubSubPerf.cpp to check performance degradation. I think there's a commented test set for production that runs a while but pushes the pubsub code to it's limits..
For a specific setup, the PubSub communication is failing. Minimal conditions to reproduce the bug are:
L1L1and the optional labelL2The bug can be traced back to the
SpecificDiscoveryStore. This is the lookup algorithm that is supposed to improve the matching of publishers and subscribers with labels.