`_remote_debugging`: returning the same samples over and over

# Bug report

### Bug description:

## Diagnosis

Note the **consecutive identical** here:

```bash
2026-05-12T10:27:01.788483000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main c6fd7de*?) % sudo ./python.exe -m profiling.sampling run -r 300khz --pstats -o /dev/null -m test

[...]

Captured 31,970,196 samples in 851.80 seconds
Sample rate: 37,532.68 samples/sec (consecutive identical: 30,295,039/31,457,858)
Error rate: 1.60
Warning: missed 251961921 samples from the expected total of 283932117 (88.74%)
```

That's with:

```diff
diff --git i/Lib/profiling/sampling/sample.py w/Lib/profiling/sampling/sample.py
index 5bbe2483581..41c6dec4f6d 100644
--- i/Lib/profiling/sampling/sample.py
+++ w/Lib/profiling/sampling/sample.py
@@ -109,6 +109,8 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
         last_sample_time = start_time
         realtime_update_interval = 1.0  # Update every second
         last_realtime_update = start_time
+        prev_stack = None
+        consecutive_identical = 0
         try:
             while duration_sec is None or running_time_sec < duration_sec:
                 # Check if live collector wants to stop
@@ -125,6 +127,9 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
                         stack_frames = self._get_stack_trace(
                             async_aware=async_aware
                         )
+                        if stack_frames == prev_stack:
+                            consecutive_identical += 1
+                        prev_stack = stack_frames
                         collector.collect(stack_frames)
                     except ProcessLookupError as e:
                         running_time_sec = current_time - start_time
@@ -178,7 +183,9 @@ def sample(self, collector, duration_sec=None, *, async_aware=False):
         if not is_live_mode:
             s = "" if num_samples == 1 else "s"
             print(f"Captured {num_samples:n} sample{s} in {fmt(running_time_sec, 2)} seconds")
-            print(f"Sample rate: {fmt(sample_rate, 2)} samples/sec")
+            comparable_samples = max(1, num_samples - errors - 1)
+            print(f"Sample rate: {fmt(sample_rate, 2)} samples/sec "
+                  f"(consecutive identical: {consecutive_identical:n}/{comparable_samples:n})")
             print(f"Error rate: {fmt(error_rate, 2)}")
 
             # Print unwinder stats if stats collection is enabled
```

## Discussion

This hints why RLE in the binary format is so efficient.

I propose a Python-level improvement in #149719, leveraging the fact that `timestamps_us` in the `collect` is plural:

https://github.com/python/cpython/blob/b546cc10f5c659344ce3cf49db6d9c92307ed1fc/Lib/profiling/sampling/collector.py#L147

This gives 3x improvement throughput for `--pstats` for example:

```
2026-05-12T11:08:55.568015000+0200 maurycy@gimel /Users/maurycy/src/github.com/maurycy/cpython (main c6fd7de*?) % sudo ./python.exe -m profiling.sampling run -r 300khz --pstats -o /dev/null -m test

[...]

463 tests OK.

Total duration: 14 min 46 sec
Total tests: run=49,788 failures=9 skipped=2,696
Total test files: run=494/505 failed=5 env_changed=1 skipped=25 resource_denied=11
Result: FAILURE
Captured 100,236,311 samples in 886.43 seconds
Sample rate: 113,079.14 samples/sec (consecutive identical: 98,417,466/99,721,944)
Error rate: 0.51
Warning: missed 195239075 samples from the expected total of 295475386 (66.08%)
```

On the C-level, there'd be still millions of unnecessary allocations, though.

I did not measure nor explored this yet, but quick shots go along the lines of returning a non-mutable objects from `get_stack_trace()` and friends. Maybe the ideal would be a common `StackTrace`, instead of a list. Another would be to add `prev_stack_trace` (so we don't make the `RemoteUnwinder` stateful) and perhaps return something like same-as-before. 

Perhaps we could already think in terms of batches and dedups in the `RemoteUnwinder` layer, instead of `Binary{Writer,Reader}`? cc @LalitMaganti

My intuition is that it starts to resemble `epoll()`, but that's very vague.

I'm not exactly sure about hinting the user so she'd decrease the sampling rate. There still might be new unique data among the sea of duplicates.

### CPython versions tested on:

CPython main branch

### Operating systems tested on:

macOS


### Linked PRs
* gh-149719

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`_remote_debugging`: returning the same samples over and over #149718

Bug report

Bug description:

Diagnosis

Discussion

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

_remote_debugging: returning the same samples over and over #149718

Description

Bug report

Bug description:

Diagnosis

Discussion

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`_remote_debugging`: returning the same samples over and over #149718