We recently noticed that a heavy JMESpath workload was triggering a large number of garbage collection runs. We are using jmespath.compile(), and we tracked this down to the jmespath.visitor.TreeInterpreter that is created on every call to `ParsedResult.search():
|
interpreter = visitor.TreeInterpreter(options) |
It appears that TreeInterpreter creates a reference cycle, which leads to the GC being triggered frequently to clean up the cycles. As far as I can tell, the problem comes from the Visitor._method_cache:
|
method = getattr( |
|
self, 'visit_%s' % node['type'], self.default_visit) |
|
self._method_cache[node_type] = method |
...which store references to methods that are bound to self in a member of self.
Possible solution
We worked around the problem by monkey patching ParsedResult so that it (1) caches a default_interpreter for use when options=None, and (2) uses it in search(). If I understand correctly, we could go further and use a global TreeInterpreter for all ParsedResult instances. The TreeInterpreter seems to be stateless apart from self._method_cache and that implementation seems to be thread-safe (with only the risk of multiple lookups for the same method in a multithreaded case).
I'd be happy to contribute a PR for either version if this would be welcome.
How to reproduce
The following reproducer shows the problem:
import jmespath
import gc
gc.set_debug(gc.DEBUG_COLLECTABLE)
pattern = jmespath.compile("foo")
value = {"foo": "bar"}
for _ in range(1000000):
pattern.search(value)
...where the output contains one million repetitions of something like:
gc: collectable <TreeInterpreter 0x10f634fa0>
gc: collectable <dict 0x10f63e780>
gc: collectable <Options 0x10f634520>
gc: collectable <Functions 0x10f6345b0>
gc: collectable <method 0x10f63ee80>
gc: collectable <dict 0x10f63eb00>
We recently noticed that a heavy JMESpath workload was triggering a large number of garbage collection runs. We are using
jmespath.compile(), and we tracked this down to thejmespath.visitor.TreeInterpreterthat is created on every call to `ParsedResult.search():jmespath.py/jmespath/parser.py
Line 508 in bbe7300
It appears that
TreeInterpretercreates a reference cycle, which leads to the GC being triggered frequently to clean up the cycles. As far as I can tell, the problem comes from theVisitor._method_cache:jmespath.py/jmespath/visitor.py
Lines 91 to 93 in bbe7300
...which store references to methods that are bound to
selfin a member ofself.Possible solution
We worked around the problem by monkey patching
ParsedResultso that it (1) caches adefault_interpreterfor use whenoptions=None, and (2) uses it insearch(). If I understand correctly, we could go further and use a globalTreeInterpreterfor allParsedResultinstances. TheTreeInterpreterseems to be stateless apart fromself._method_cacheand that implementation seems to be thread-safe (with only the risk of multiple lookups for the same method in a multithreaded case).I'd be happy to contribute a PR for either version if this would be welcome.
How to reproduce
The following reproducer shows the problem:
...where the output contains one million repetitions of something like: