Skip to content

Conversation

@TTsangSC
Copy link
Collaborator

@TTsangSC TTsangSC commented Jun 10, 2025

Closes #350. Includes #349, which by extension closes #348. New changes are from 7338799 onwards.

Code changes

  • line_profiler/_line_profiler.pyx::LineProfiler
    • .add_function()
      • Now synchronizing the hash tables of instances profiling the same function object whenever it replaces its code object
      • No longer creating separate .dupes_map items for padded bytecodes
    • ._all_paddings, ._all_instances_by_funcs
      New private class attributes for use by .add_function()

Test changes

  • tests/test_line_profiler.py
    • test_{{class,static,bound,partial}method,partial,[cached_]property}_decorator(), test_profile_generated_code()
      Minor refactoring with a utility function for more readable output
    • test_multiple_profilers_identical_bytecode()
      New test checking that the order in which profilers are used to decorate functions shouldn't affect the output (up to overhead)
    • test_aggregate_profiling_data_between_code_versions()
      New test checking that existing profiling data should persist and be aggregated with new data after another profiler replaced the code object of a profiled function

Performance impact1

Test suite run 5 times with the command pytest 'not (test_duplicate or test_aggregate or identical_bytecode)', i.e. excluding the tests added by this PR (and #349); impact on performance should be minimal.

Setup\Code (mean, best (s)) main multi-desync-fix (this PR)
Python 3.8.20, pytest 8.3.5 7.605, 7.518 7.523, 7.426
Python 3.13.3, pytest 8.4.0 9.835, 9.589 9.777, 9.662

Footnotes

  1. (Update 8 Jul) Outdated data from before ENH: more intuitive profiling-target selection #337, FIX: restore Cython compatibility + pivot to sys.monitoring #352, etc. were merged. Since the changes are largely independent they are probably still indicative of the (non-)impact this has on performance.

@TTsangSC TTsangSC changed the title FIX: FIX: Update hash tables of affected profiler instances when rewriting code objects Jun 10, 2025
@codecov
Copy link

codecov bot commented Jun 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.36%. Comparing base (97ecd5c) to head (90cd65e).
Report is 19 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #351   +/-   ##
=======================================
  Coverage   87.36%   87.36%           
=======================================
  Files          18       18           
  Lines        1630     1630           
  Branches      347      347           
=======================================
  Hits         1424     1424           
  Misses        151      151           
  Partials       55       55           

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ffc0f1...90cd65e. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

TTsangSC added 9 commits July 8, 2025 05:02
line_profiler/_line_profiler.pyx::LineProfiler.get_stats()
    Updated/simplified implementation to aggregate profiling data
    between code objects with duplicate hashes
line_profiler/_line_profiler.pyx::LineProfiler
    _all_instances_by_bcodes
        New private class attribute for keeping track of:
        - Instances profiling the same bytecode
        - Count for how the the bytecode should be padded
    dupes_map
        Now containing the up-to-date (instead of pre-padding) code objs
    add_function()
        Refactored implementation to:
        - Pad non-Cython bytecodes according to
          `._all_instances_by_bcodes` instead of the instance's
          `.dupes_map`
        - Use `.dupes_map` to identify profiler instances profiling the
          same functions and update their hash tables
        - Curate `.dupes_map` on said instances to stay up-to-date
line_profiler/_line_profiler.pyx::LineProfiler
    add_function()
        - Now identifying update targets based on function-object
          identity instead of code-object identity (because different
          function objects can share the same code object)
        - Simplified implementation
    _all_instances_by_bcodes
        Superseded by other private class attributes:
        - `._all_paddings`:
          mapping from bytecodes to padding length
        - `._all_instances_by_funcs`:
          mapping from function identity to `WeakSet`s of instances
tests/test_line_profiler.py
    test_multiple_profilers_identical_bytecode()
        New test for applying multiple profilers to identical functions
        in arbitrary order (one subtest currently failing)
line_profiler/_line_profiler.pyx::LineProfiler.add_function()
    Now stripping padded code objects and re-padding when appropriate,
    so that no two profiled code objects can end up with the same
    bytecode

tests/test_line_profiler.py
    test_multiple_profilers_identical_bytecode()
        - Updated comments and docstring
        - Added check against duplicate bytecodes
tests/test_line_profiler.py
    get_prof_stats()
        New helper function for formatting and retrieving
        `LineProfiler.print_stats()` output
    test_*_decorator()
    test_profile_generated_code()
    test_multiple_profilers_identical_bytecode()
        Refactored to use `get_prof_stats()`
    test_aggregate_profiling_data_between_code_versions()
        New test checking that profiling data gathered before and after
        a function's code object is rewritten are correctly aggregated
@TTsangSC TTsangSC force-pushed the multi-desync-fix branch from 781c01e to af098a2 Compare July 8, 2025 04:06
@TTsangSC
Copy link
Collaborator Author

TTsangSC commented Jul 8, 2025

Just finished rebasing. @Erotemic, maybe you'll find it more convenient to look at this PR (which contains #349) to resolve both bugs at once? (Or maybe not – up to you of course.) Cheers.

Copy link
Member

@Erotemic Erotemic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For add_function we are assuming that the function actually hasn't been added before, so if we see the same bytes it must be a new instance of that function rather than the actual same function added more than once?

base_co_code: bytes = co_code
# Figure out how much padding we need and strip the bytecode
# TODO: is there a way to do this faster? `.rstrip()` doesn't
# work (reliably) since `NOP_BYTES` should be 2-byte wide
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only alternative I can think of is a regex, but this logic is likely much faster (especially if it gets optimized into C). You could write it as a separate cdef function because it is fairly self-contained behavior. Maybe multibyte_rstrip?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the genned .cpp code:

  • We grab NOP_BYTES from the global scope, and do a PyObject_Length() to get nop_len. This may be optimized a bit by placing len(NOP_BYTES) itself in the global namespace I guess; or maybe we can just hard-code to 2 seeing that we do
    NOP_BYTES: bytes = NOP_VALUE.to_bytes(2, byteorder=byteorder)
  • In the loop, we again grab NOP_BYTES from the global scope, then call __Pyx_ByBytesTailmatch() with base_co_code and that. Supposing that local variables are relatively cheap, we should maybe at the very least pre-fetch NOP_BYTES to prevent calling __Pyx_GetModuleGlobalName() in the loop.
  • base_co_code is truncated with PySequence_GetSlice(); this is a bit surprising to me since I kinda expected a specialized PyBytes C function.
  • npad_code is incremented with __Pyx_PyLong_AddObjC().

Since the code is rather simple and yet we want both base_co_code and npad_code, multibyte_rstrip() (which I supposed should be inline) will have to return a tuple which is to be unpacked in add_function(). Said unpacking may have its own performance implications, so I'm not really sure whether it should be refactored out, or stay inlined-inlined as-is.

But realistically speaking this is probably not a bottleneck either way, and if it was, refactoring it out as you suggested will give us more freedom to optimize it. Will play around with this a bit more and see what works best.

npad = 0
self._all_paddings[base_co_code] = max(npad, npad_code) + 1
try:
neighbors = self._all_instances_by_funcs[func_id]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neighbors implies some sort of geometric proximity to me. I guess you are building a graph of some sort. But this is a mapping from each function id to the set of profiler objects that contain it. Maybe profilers is a more descriptive name? Or maybe call it profilers_to_update, as that looks like what these will be used for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I actually had profilers_to_update (or was it instances~?) at some point. Will roll back for clarity.

@TTsangSC
Copy link
Collaborator Author

TTsangSC commented Jul 9, 2025

we are assuming that the function actually hasn't been added before

Yes, because .add_callable() checks for function identity before passing a function to .add_function(). We already have a test ensuring that in tests/test_line_profiler.py::test_double_decoration() for the single-profiler case, but I guess we should maybe also add subtests to test_multiple_profilers_identical_bytecode() to ensure the check holds up (which I suppose it would) in a multi-profiler situation.

@Erotemic
Copy link
Member

Erotemic commented Jul 9, 2025

I think this looks good in general. I'll let you make any cleanups, and tell me when you are ready to merge.

@TTsangSC
Copy link
Collaborator Author

TTsangSC commented Jul 9, 2025

Thanks for the review! I've implemented the changes we discussed above, and the code is otherwise unchanged. Should be ready for the merge once CI passes.

@Erotemic Erotemic merged commit 2cfc693 into pyutils:main Jul 9, 2025
36 checks passed
@TTsangSC TTsangSC deleted the multi-desync-fix branch July 9, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: dupe-code resolution on multiple profilers results in loss of profiling data Timing entries not combined between duplicate code objects

2 participants