Skip to content

Commit 62a459d

Browse files
committed
Add last_profiled_frame field to thread state for remote profilers
Remote profilers that sample call stacks from external processes need to read the entire frame chain on every sample. For deep stacks, this is expensive since most of the stack is typically unchanged between samples. This adds a `last_profiled_frame` pointer that remote profilers can use to implement a caching optimization. When sampling, a profiler writes the current frame address here. The eval loop then keeps this pointer valid by updating it to the parent frame in _PyEval_FrameClearAndPop. This creates a "high-water mark" that always points to a frame still on the stack, allowing profilers to skip reading unchanged portions of the stack. The check in ceval.c is guarded so there's zero overhead when profiling isn't active (the field starts NULL and the branch is predictable).
1 parent ea51e74 commit 62a459d

File tree

4 files changed

+29
-0
lines changed

4 files changed

+29
-0
lines changed

Include/cpython/pystate.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,8 @@ struct _ts {
135135
/* Pointer to currently executing frame. */
136136
struct _PyInterpreterFrame *current_frame;
137137

138+
struct _PyInterpreterFrame *last_profiled_frame;
139+
138140
Py_tracefunc c_profilefunc;
139141
Py_tracefunc c_tracefunc;
140142
PyObject *c_profileobj;

Include/internal/pycore_debug_offsets.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ typedef struct _Py_DebugOffsets {
102102
uint64_t next;
103103
uint64_t interp;
104104
uint64_t current_frame;
105+
uint64_t last_profiled_frame;
105106
uint64_t thread_id;
106107
uint64_t native_thread_id;
107108
uint64_t datastack_chunk;
@@ -272,6 +273,7 @@ typedef struct _Py_DebugOffsets {
272273
.next = offsetof(PyThreadState, next), \
273274
.interp = offsetof(PyThreadState, interp), \
274275
.current_frame = offsetof(PyThreadState, current_frame), \
276+
.last_profiled_frame = offsetof(PyThreadState, last_profiled_frame), \
275277
.thread_id = offsetof(PyThreadState, thread_id), \
276278
.native_thread_id = offsetof(PyThreadState, native_thread_id), \
277279
.datastack_chunk = offsetof(PyThreadState, datastack_chunk), \

InternalDocs/frames.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,24 @@ The shim frame points to a special code object containing the `INTERPRETER_EXIT`
111111
instruction which cleans up the shim frame and returns.
112112

113113

114+
### Remote Profiling Frame Cache
115+
116+
The `last_profiled_frame` field in `PyThreadState` supports an optimization for
117+
remote profilers that sample call stacks from external processes. When a remote
118+
profiler reads the call stack, it writes the current frame address to this field.
119+
The eval loop then keeps this pointer valid by updating it to the parent frame
120+
whenever a frame returns (in `_PyEval_FrameClearAndPop`).
121+
122+
This creates a "high-water mark" that always points to a frame still on the stack.
123+
On subsequent samples, the profiler can walk from `current_frame` until it reaches
124+
`last_profiled_frame`, knowing that frames from that point downward are unchanged
125+
and can be retrieved from a cache. This significantly reduces the amount of remote
126+
memory reads needed when call stacks are deep and stable at their base.
127+
128+
The update in `_PyEval_FrameClearAndPop` is guarded: it only writes when
129+
`last_profiled_frame` is non-NULL, avoiding any overhead when profiling is inactive.
130+
131+
114132
### The Instruction Pointer
115133

116134
`_PyInterpreterFrame` has two fields which are used to maintain the instruction

Python/ceval.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2004,6 +2004,13 @@ clear_gen_frame(PyThreadState *tstate, _PyInterpreterFrame * frame)
20042004
void
20052005
_PyEval_FrameClearAndPop(PyThreadState *tstate, _PyInterpreterFrame * frame)
20062006
{
2007+
// Update last_profiled_frame for remote profiler frame caching.
2008+
// By this point, tstate->current_frame is already set to the parent frame.
2009+
// The guarded check avoids writes when profiling is not active (predictable branch).
2010+
if (tstate->last_profiled_frame != NULL) {
2011+
tstate->last_profiled_frame = tstate->current_frame;
2012+
}
2013+
20072014
if (frame->owner == FRAME_OWNED_BY_THREAD) {
20082015
clear_thread_frame(tstate, frame);
20092016
}

0 commit comments

Comments
 (0)