Bug report
Bug description:
Diagnosis
Note the consecutive identical here:
2026-05-12T10:27:01.788483000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main c6fd7de*?) % sudo ./python.exe -m profiling.sampling run -r 300khz --pstats -o /dev/null -m test
[...]
Captured 31,970,196 samples in 851.80 seconds
Sample rate: 37,532.68 samples/sec (consecutive identical: 30,295,039/31,457,858)
Error rate: 1.60
Warning: missed 251961921 samples from the expected total of 283932117 (88.74%)
That's with:
diff --git i/Lib/profiling/sampling/sample.py w/Lib/profiling/sampling/sample.py
index 5bbe2483581..41c6dec4f6d 100644
--- i/Lib/profiling/sampling/sample.py
+++ w/Lib/profiling/sampling/sample.py
@@ -109,6 +109,8 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
last_sample_time = start_time
realtime_update_interval = 1.0 # Update every second
last_realtime_update = start_time
+ prev_stack = None
+ consecutive_identical = 0
try:
while duration_sec is None or running_time_sec < duration_sec:
# Check if live collector wants to stop
@@ -125,6 +127,9 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
stack_frames = self._get_stack_trace(
async_aware=async_aware
)
+ if stack_frames == prev_stack:
+ consecutive_identical += 1
+ prev_stack = stack_frames
collector.collect(stack_frames)
except ProcessLookupError as e:
running_time_sec = current_time - start_time
@@ -178,7 +183,9 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
if not is_live_mode:
s = "" if num_samples == 1 else "s"
print(f"Captured {num_samples:n} sample{s} in {fmt(running_time_sec, 2)} seconds")
- print(f"Sample rate: {fmt(sample_rate, 2)} samples/sec")
+ comparable_samples = max(1, num_samples - errors - 1)
+ print(f"Sample rate: {fmt(sample_rate, 2)} samples/sec "
+ f"(consecutive identical: {consecutive_identical:n}/{comparable_samples:n})")
print(f"Error rate: {fmt(error_rate, 2)}")
# Print unwinder stats if stats collection is enabled
Discussion
This hints why RLE in the binary format is so efficient.
I propose a Python-level improvement in #149719, leveraging the fact that timestamps_us in the collect is plural:
|
def collect(self, stack_frames, timestamps_us=None): |
This gives 3x improvement throughput for --pstats for example:
2026-05-12T11:08:55.568015000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main c6fd7de*?) % sudo ./python.exe -m profiling.sampling run -r 300khz --pstats -o /dev/null -m test
[...]
463 tests OK.
Total duration: 14 min 46 sec
Total tests: run=49,788 failures=9 skipped=2,696
Total test files: run=494/505 failed=5 env_changed=1 skipped=25 resource_denied=11
Result: FAILURE
Captured 100,236,311 samples in 886.43 seconds
Sample rate: 113,079.14 samples/sec (consecutive identical: 98,417,466/99,721,944)
Error rate: 0.51
Warning: missed 195239075 samples from the expected total of 295475386 (66.08%)
On the C-level, there'd be still millions of unnecessary allocations, though.
I did not measure nor explored this yet, but quick shots go along the lines of returning a non-mutable objects from get_stack_trace() and friends. Maybe the ideal would be a common StackTrace, instead of a list. Another would be to add prev_stack_trace (so we don't make the RemoteUnwinder stateful) and perhaps return something like same-as-before.
Perhaps we could already think in terms of batches and dedups in the RemoteUnwinder layer, instead of Binary{Writer,Reader}? cc @LalitMaganti
My intuition is that it starts to resemble epoll(), but that's very vague.
I'm not exactly sure about hinting the user so she'd decrease the sampling rate. There still might be new unique data among the sea of duplicates.
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS
Linked PRs
Bug report
Bug description:
Diagnosis
Note the consecutive identical here:
That's with:
Discussion
This hints why RLE in the binary format is so efficient.
I propose a Python-level improvement in #149719, leveraging the fact that
timestamps_usin thecollectis plural:cpython/Lib/profiling/sampling/collector.py
Line 147 in b546cc1
This gives 3x improvement throughput for
--pstatsfor example:On the C-level, there'd be still millions of unnecessary allocations, though.
I did not measure nor explored this yet, but quick shots go along the lines of returning a non-mutable objects from
get_stack_trace()and friends. Maybe the ideal would be a commonStackTrace, instead of a list. Another would be to addprev_stack_trace(so we don't make theRemoteUnwinderstateful) and perhaps return something like same-as-before.Perhaps we could already think in terms of batches and dedups in the
RemoteUnwinderlayer, instead ofBinary{Writer,Reader}? cc @LalitMagantiMy intuition is that it starts to resemble
epoll(), but that's very vague.I'm not exactly sure about hinting the user so she'd decrease the sampling rate. There still might be new unique data among the sea of duplicates.
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS
Linked PRs