Skip to content

_remote_debugging: returning the same samples over and over #149718

@maurycy

Description

@maurycy

Bug report

Bug description:

Diagnosis

Note the consecutive identical here:

2026-05-12T10:27:01.788483000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main c6fd7de*?) % sudo ./python.exe -m profiling.sampling run -r 300khz --pstats -o /dev/null -m test

[...]

Captured 31,970,196 samples in 851.80 seconds
Sample rate: 37,532.68 samples/sec (consecutive identical: 30,295,039/31,457,858)
Error rate: 1.60
Warning: missed 251961921 samples from the expected total of 283932117 (88.74%)

That's with:

diff --git i/Lib/profiling/sampling/sample.py w/Lib/profiling/sampling/sample.py
index 5bbe2483581..41c6dec4f6d 100644
--- i/Lib/profiling/sampling/sample.py
+++ w/Lib/profiling/sampling/sample.py
@@ -109,6 +109,8 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
         last_sample_time = start_time
         realtime_update_interval = 1.0  # Update every second
         last_realtime_update = start_time
+        prev_stack = None
+        consecutive_identical = 0
         try:
             while duration_sec is None or running_time_sec < duration_sec:
                 # Check if live collector wants to stop
@@ -125,6 +127,9 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
                         stack_frames = self._get_stack_trace(
                             async_aware=async_aware
                         )
+                        if stack_frames == prev_stack:
+                            consecutive_identical += 1
+                        prev_stack = stack_frames
                         collector.collect(stack_frames)
                     except ProcessLookupError as e:
                         running_time_sec = current_time - start_time
@@ -178,7 +183,9 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
         if not is_live_mode:
             s = "" if num_samples == 1 else "s"
             print(f"Captured {num_samples:n} sample{s} in {fmt(running_time_sec, 2)} seconds")
-            print(f"Sample rate: {fmt(sample_rate, 2)} samples/sec")
+            comparable_samples = max(1, num_samples - errors - 1)
+            print(f"Sample rate: {fmt(sample_rate, 2)} samples/sec "
+                  f"(consecutive identical: {consecutive_identical:n}/{comparable_samples:n})")
             print(f"Error rate: {fmt(error_rate, 2)}")
 
             # Print unwinder stats if stats collection is enabled

Discussion

This hints why RLE in the binary format is so efficient.

I propose a Python-level improvement in #149719, leveraging the fact that timestamps_us in the collect is plural:

def collect(self, stack_frames, timestamps_us=None):

This gives 3x improvement throughput for --pstats for example:

2026-05-12T11:08:55.568015000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main c6fd7de*?) % sudo ./python.exe -m profiling.sampling run -r 300khz --pstats -o /dev/null -m test

[...]

463 tests OK.

Total duration: 14 min 46 sec
Total tests: run=49,788 failures=9 skipped=2,696
Total test files: run=494/505 failed=5 env_changed=1 skipped=25 resource_denied=11
Result: FAILURE
Captured 100,236,311 samples in 886.43 seconds
Sample rate: 113,079.14 samples/sec (consecutive identical: 98,417,466/99,721,944)
Error rate: 0.51
Warning: missed 195239075 samples from the expected total of 295475386 (66.08%)

On the C-level, there'd be still millions of unnecessary allocations, though.

I did not measure nor explored this yet, but quick shots go along the lines of returning a non-mutable objects from get_stack_trace() and friends. Maybe the ideal would be a common StackTrace, instead of a list. Another would be to add prev_stack_trace (so we don't make the RemoteUnwinder stateful) and perhaps return something like same-as-before.

Perhaps we could already think in terms of batches and dedups in the RemoteUnwinder layer, instead of Binary{Writer,Reader}? cc @LalitMaganti

My intuition is that it starts to resemble epoll(), but that's very vague.

I'm not exactly sure about hinting the user so she'd decrease the sampling rate. There still might be new unique data among the sea of duplicates.

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions