-
Notifications
You must be signed in to change notification settings - Fork 133
FIX: Update hash tables of affected profiler instances when rewriting code objects #351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #351 +/- ##
=======================================
Coverage 87.36% 87.36%
=======================================
Files 18 18
Lines 1630 1630
Branches 347 347
=======================================
Hits 1424 1424
Misses 151 151
Partials 55 55 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
line_profiler/_line_profiler.pyx::LineProfiler.get_stats()
Updated/simplified implementation to aggregate profiling data
between code objects with duplicate hashes
line_profiler/_line_profiler.pyx::LineProfiler
_all_instances_by_bcodes
New private class attribute for keeping track of:
- Instances profiling the same bytecode
- Count for how the the bytecode should be padded
dupes_map
Now containing the up-to-date (instead of pre-padding) code objs
add_function()
Refactored implementation to:
- Pad non-Cython bytecodes according to
`._all_instances_by_bcodes` instead of the instance's
`.dupes_map`
- Use `.dupes_map` to identify profiler instances profiling the
same functions and update their hash tables
- Curate `.dupes_map` on said instances to stay up-to-date
line_profiler/_line_profiler.pyx::LineProfiler
add_function()
- Now identifying update targets based on function-object
identity instead of code-object identity (because different
function objects can share the same code object)
- Simplified implementation
_all_instances_by_bcodes
Superseded by other private class attributes:
- `._all_paddings`:
mapping from bytecodes to padding length
- `._all_instances_by_funcs`:
mapping from function identity to `WeakSet`s of instances
tests/test_line_profiler.py
test_multiple_profilers_identical_bytecode()
New test for applying multiple profilers to identical functions
in arbitrary order (one subtest currently failing)
line_profiler/_line_profiler.pyx::LineProfiler.add_function()
Now stripping padded code objects and re-padding when appropriate,
so that no two profiled code objects can end up with the same
bytecode
tests/test_line_profiler.py
test_multiple_profilers_identical_bytecode()
- Updated comments and docstring
- Added check against duplicate bytecodes
tests/test_line_profiler.py
get_prof_stats()
New helper function for formatting and retrieving
`LineProfiler.print_stats()` output
test_*_decorator()
test_profile_generated_code()
test_multiple_profilers_identical_bytecode()
Refactored to use `get_prof_stats()`
test_aggregate_profiling_data_between_code_versions()
New test checking that profiling data gathered before and after
a function's code object is rewritten are correctly aggregated
781c01e to
af098a2
Compare
Erotemic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For add_function we are assuming that the function actually hasn't been added before, so if we see the same bytes it must be a new instance of that function rather than the actual same function added more than once?
line_profiler/_line_profiler.pyx
Outdated
| base_co_code: bytes = co_code | ||
| # Figure out how much padding we need and strip the bytecode | ||
| # TODO: is there a way to do this faster? `.rstrip()` doesn't | ||
| # work (reliably) since `NOP_BYTES` should be 2-byte wide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only alternative I can think of is a regex, but this logic is likely much faster (especially if it gets optimized into C). You could write it as a separate cdef function because it is fairly self-contained behavior. Maybe multibyte_rstrip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the genned .cpp code:
- We grab
NOP_BYTESfrom the global scope, and do aPyObject_Length()to getnop_len. This may be optimized a bit by placinglen(NOP_BYTES)itself in the global namespace I guess; or maybe we can just hard-code to 2 seeing that we doNOP_BYTES: bytes = NOP_VALUE.to_bytes(2, byteorder=byteorder)
- In the loop, we again grab
NOP_BYTESfrom the global scope, then call__Pyx_ByBytesTailmatch()withbase_co_codeand that. Supposing that local variables are relatively cheap, we should maybe at the very least pre-fetchNOP_BYTESto prevent calling__Pyx_GetModuleGlobalName()in the loop. base_co_codeis truncated withPySequence_GetSlice(); this is a bit surprising to me since I kinda expected a specializedPyBytesC function.npad_codeis incremented with__Pyx_PyLong_AddObjC().
Since the code is rather simple and yet we want both base_co_code and npad_code, multibyte_rstrip() (which I supposed should be inline) will have to return a tuple which is to be unpacked in add_function(). Said unpacking may have its own performance implications, so I'm not really sure whether it should be refactored out, or stay inlined-inlined as-is.
But realistically speaking this is probably not a bottleneck either way, and if it was, refactoring it out as you suggested will give us more freedom to optimize it. Will play around with this a bit more and see what works best.
line_profiler/_line_profiler.pyx
Outdated
| npad = 0 | ||
| self._all_paddings[base_co_code] = max(npad, npad_code) + 1 | ||
| try: | ||
| neighbors = self._all_instances_by_funcs[func_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neighbors implies some sort of geometric proximity to me. I guess you are building a graph of some sort. But this is a mapping from each function id to the set of profiler objects that contain it. Maybe profilers is a more descriptive name? Or maybe call it profilers_to_update, as that looks like what these will be used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I actually had profilers_to_update (or was it instances~?) at some point. Will roll back for clarity.
Yes, because |
|
I think this looks good in general. I'll let you make any cleanups, and tell me when you are ready to merge. |
|
Thanks for the review! I've implemented the changes we discussed above, and the code is otherwise unchanged. Should be ready for the merge once CI passes. |
Closes #350. Includes #349, which by extension closes #348. New changes are from 7338799 onwards.
Code changes
line_profiler/_line_profiler.pyx::LineProfiler.add_function().dupes_mapitems for padded bytecodes._all_paddings,._all_instances_by_funcsNew private class attributes for use by
.add_function()Test changes
tests/test_line_profiler.pytest_{{class,static,bound,partial}method,partial,[cached_]property}_decorator(),test_profile_generated_code()Minor refactoring with a utility function for more readable output
test_multiple_profilers_identical_bytecode()New test checking that the order in which profilers are used to decorate functions shouldn't affect the output (up to overhead)
test_aggregate_profiling_data_between_code_versions()New test checking that existing profiling data should persist and be aggregated with new data after another profiler replaced the code object of a profiled function
Performance impact1
Test suite run 5 times with the command
pytest 'not (test_duplicate or test_aggregate or identical_bytecode)', i.e. excluding the tests added by this PR (and #349); impact on performance should be minimal.mainmulti-desync-fix(this PR)Footnotes
(Update 8 Jul) Outdated data from before ENH: more intuitive profiling-target selection #337, FIX: restore Cython compatibility + pivot to
sys.monitoring#352, etc. were merged. Since the changes are largely independent they are probably still indicative of the (non-)impact this has on performance. ↩