Fix another race condition that causes lost metrics#137
Fix another race condition that causes lost metrics#137pwinckles wants to merge 3 commits intoperformancecopilot:mainfrom
Conversation
This fixes another race condition that causes metrics to be lost when they are updated while the writer is stopped and after the monitorable has been re-added to the writer. In this case, the update needs to set the initial value on `PcpValueInfo` so that the correct value is written when the writer is started.
Adds a cache to the `MetricNameMapper` and `MetricNameValidator`. This improves performance of mapping metric names because a regex no longer has to be executed over and over again. The validator cache helps because the metric name no longer has to be encoded over and over again. Neither of these caches expire entries, but this is not an issue because it's currently not possible to remove monitorables from Parfait.
|
Any thoughts on a path forward here? |
|
@pwinckles it conceptually makes sense to me - I'm guessing retired life is treating @tallpsmith very well and we wont hear from him, so please go ahead and merge when you are happy with your PRs (I don't have the java experience to give you deep and meaningful reviews). |
Currently a glorified kids Uber driver: ...and general house-wife. Not easy to find a chunk of time to sit and concentrate at the moment! :( |
|
@tallpsmith Is this still on your radar? |
|
ok yeah totally dropped the ball (down a big hole) on this, I'll get on it, thanks. |
tallpsmith
left a comment
There was a problem hiding this comment.
just a few comments/questions there
| import java.util.Map; | ||
| import java.util.concurrent.ConcurrentHashMap; | ||
|
|
||
| public class CachingMetricNameMapper implements MetricNameMapper { |
There was a problem hiding this comment.
This seems like a good idea, but just wondering if there's backstory/observation on why this would be good performance improvement?
I mean, every time I see a cache now, I think of the phrase "There's only 2 hard things in Computer Science: Naming, Cache Invalidation, and off-by-one errors".
Was there something in the wild that you spotted that prompted adding this cache layer in? (I honestly can't see a problem with adding it here, but would prefer to know what the driver might be).
There was a problem hiding this comment.
@pwinckles are you able to put some sage words to this one ? (I forgot I even wrote this that long ago, but still valid questions? At the very least some documentation here as to the reasons this is highly useful.
There was a problem hiding this comment.
@tallpsmith Yeah, I was seeing reworld issues. Some of the software I work on runs in a resource constrained environment, and I noticed that if you have thousands of monitorables, then re-adding them in PcpMonitorBridge.startMonitoring() can take a very long time. For example, on one system I was seeing it take >20 seconds to add 40k monitorables. This related to parsing the metric names repeatedly.
Because it's not currently possible for metric to be removed from parfait after its been added, there's no risk of the cache getting out of sync or leaking. It's just the normal tradeoff of memory vs cpu.
I'll add a comment to the class later today.
| return initialValue; | ||
| } | ||
|
|
||
| public void setInitialValue(Object initialValue) { |
There was a problem hiding this comment.
Can we make this package-private at least? I don't want this to be public and get abused, I still like the idea of aiming to make these immutable again so limiting exposure by keeping this inside the package will protect us in the long run?
There was a problem hiding this comment.
I can make this package-private. As commented above, the main risk in regards to swapping in a new PcpValueInfo instance with an updated initial value is that perMetricByteBuffers is keyed off of it. Let me play around with it today and see if I can reasonably get it working though.
There was a problem hiding this comment.
That said, while I too value immutability, I don't think making initialValue mutable here is that bad for two reasons.
initialValueis only ever used once when the metric is first writtenPcpValueInfois already mutable due to tracking theoffset.
As such, I think I would be inclined to taking the path of least resistance here, and just making initialValue mutable, with the setter package-private, if that is okay with you.
This fixes another race condition that causes metrics to be lost when they are updated while the writer is stopped and after the monitorable has been re-added to the writer. In this case, the update needs to set the initial value on
PcpValueInfoso that the correct value is written when the writer is started.Additionally, while I was working on this, I noticed that if you have thousands of monitorables, then re-adding them in
PcpMonitorBridge.startMonitoring()can take a very long time. For example, on one system I was seeing it take >20 seconds to add 40k monitorables. To help alleviate this problem, I added a cache to the metric name mapper and validator.Resolves #132