Bug report
Bug description:
I'm investigating if profiling.sampling can be faster, and noticed a massive number of process_vm_writev syscalls:
2026-05-07T23:54:53.689606110+0000 maurycy@weiss /home/maurycy/cpython (main f5c7535*) # ./python.exe /tmp/spin.py & TGT=$!
sleep 0.5
strace -c -f -e trace=process_vm_writev,process_vm_readv,pwritev,pwritev2 \
./python.exe -m profiling.sampling attach -r 100khz -d 30 \
--pstats -o /tmp/out.pstats $TGT
kill $TGT; wait 2>/dev/null
[1] 135687
Captured 118240 samples in 30.00 seconds
Sample rate: 3941.33 samples/sec
Error rate: 0.00
Warning: missed 2881765 samples from the expected total of 3000005 (96.06%)
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ------------------
75.87 0.916719 2 354732 process_vm_readv
24.13 0.291560 2 118241 process_vm_writev
------ ----------- ----------- --------- --------- ------------------
100.00 1.208279 2 472973 total
[1] + terminated ./python.exe /tmp/spin.py
2026-05-07T23:58:28.503095895+0000 maurycy@weiss /home/maurycy/cpython (remote-debugging-last-profiled-frame 1a07c4a*) # cat /tmp/spin.py
def f():
x = 0
while True:
x += 1
f()
(This is a Docker instance, and I'm too tired to configure /sys/kernel/tracing properly, so using strace instead of perf)
Given how the script looks, the last frame should be literally the same.
I believe this is the reason:
|
// Update last_profiled_frame for next sample |
|
uintptr_t lpf_addr = |
|
*current_tstate + (uintptr_t)unwinder->debug_offsets.thread_state.last_profiled_frame; |
|
if (_Py_RemoteDebug_WriteRemoteMemory(&unwinder->handle, lpf_addr, |
|
sizeof(uintptr_t), &frame_addr) < 0) { |
|
PyErr_Clear(); // Non-fatal |
|
} |
The last_profiled_frame is updated no matter what, and _Py_RemoteDebug_WriteRemoteMemory results in a syscall:
|
written = process_vm_writev(handle->pid, local, 1, remote, 1, 0); |
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
I'm investigating if
profiling.samplingcan be faster, and noticed a massive number ofprocess_vm_writevsyscalls:(This is a Docker instance, and I'm too tired to configure
/sys/kernel/tracingproperly, so usingstraceinstead ofperf)Given how the script looks, the last frame should be literally the same.
I believe this is the reason:
cpython/Modules/_remote_debugging/threads.c
Lines 453 to 459 in b142878
The
last_profiled_frameis updated no matter what, and_Py_RemoteDebug_WriteRemoteMemoryresults in a syscall:cpython/Python/remote_debug.h
Line 1234 in 49918f5
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux
Linked PRs
last_profiled_frameif it's not changed #149522last_profiled_frameif it's not changed (GH-149522) #149542