Skip to content

Commit 7c44f37

Browse files
gh-138122: Extend binary profiling format with full source location and opcode (#143088)
Co-authored-by: Stan Ulbrych <stan@ulbrych.org>
1 parent 1e17ccd commit 7c44f37

File tree

6 files changed

+411
-104
lines changed

6 files changed

+411
-104
lines changed

InternalDocs/profiling_binary_format.md

Lines changed: 69 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -272,33 +272,85 @@ byte.
272272

273273
## Frame Table
274274

275-
The frame table stores deduplicated frame entries:
275+
The frame table stores deduplicated frame entries with full source position
276+
information and bytecode opcode:
276277

277278
```
278-
+----------------------+
279-
| filename_idx: varint |
280-
| funcname_idx: varint |
281-
| lineno: svarint |
282-
+----------------------+ (repeated for each frame)
279+
+----------------------------+
280+
| filename_idx: varint |
281+
| funcname_idx: varint |
282+
| lineno: svarint |
283+
| end_lineno_delta: svarint |
284+
| column: svarint |
285+
| end_column_delta: svarint |
286+
| opcode: u8 |
287+
+----------------------------+ (repeated for each frame)
283288
```
284289

285-
Each unique (filename, funcname, lineno) combination gets one entry. Two
286-
calls to the same function at different line numbers produce different
287-
frame entries; two calls at the same line number share one entry.
290+
### Field Definitions
291+
292+
| Field | Type | Description |
293+
|------------------|---------------|----------------------------------------------------------|
294+
| filename_idx | varint | Index into string table for file name |
295+
| funcname_idx | varint | Index into string table for function name |
296+
| lineno | zigzag varint | Start line number (-1 for synthetic frames) |
297+
| end_lineno_delta | zigzag varint | Delta from lineno (end_lineno = lineno + delta) |
298+
| column | zigzag varint | Start column offset in UTF-8 bytes (-1 if not available) |
299+
| end_column_delta | zigzag varint | Delta from column (end_column = column + delta) |
300+
| opcode | u8 | Python bytecode opcode (0-254) or 255 for None |
301+
302+
### Delta Encoding
303+
304+
Position end values use delta encoding for efficiency:
305+
306+
- `end_lineno = lineno + end_lineno_delta`
307+
- `end_column = column + end_column_delta`
308+
309+
Typical values:
310+
- `end_lineno_delta`: Usually 0 (single-line expressions) → encodes to 1 byte
311+
- `end_column_delta`: Usually 5-20 (expression width) → encodes to 1 byte
312+
313+
This saves ~1-2 bytes per frame compared to absolute encoding. When the base
314+
value (lineno or column) is -1 (not available), the delta is stored as 0 and
315+
the reconstructed value is -1.
316+
317+
### Sentinel Values
318+
319+
- `opcode = 255`: No opcode captured
320+
- `lineno = -1`: Synthetic frame (no source location)
321+
- `column = -1`: Column offset not available
322+
323+
### Deduplication
324+
325+
Each unique (filename, funcname, lineno, end_lineno, column, end_column,
326+
opcode) combination gets one entry. This enables instruction-level profiling
327+
where multiple bytecode instructions on the same line can be distinguished.
288328

289329
Strings and frames are deduplicated separately because they have different
290330
cardinalities and reference patterns. A codebase might have hundreds of
291331
unique source files but thousands of unique functions. Many functions share
292332
the same filename, so storing the filename index in each frame entry (rather
293333
than the full string) provides an additional layer of deduplication. A frame
294-
entry is just three varints (typically 3-6 bytes) rather than two full
295-
strings plus a line number.
296-
297-
Line numbers use signed varint (zigzag encoding) rather than unsigned to
298-
handle edge cases. Synthetic frames—generated frames that don't correspond
299-
directly to Python source code, such as C extension boundaries or internal
300-
interpreter frames—use line number 0 or -1 to indicate the absence of a
301-
source location. Zigzag encoding ensures these small negative values encode
334+
entry is typically 7-9 bytes rather than two full strings plus location data.
335+
336+
### Size Analysis
337+
338+
Typical frame size with delta encoding:
339+
- file_idx: 1-2 bytes
340+
- func_idx: 1-2 bytes
341+
- lineno: 1-2 bytes
342+
- end_lineno_delta: 1 byte (usually 0)
343+
- column: 1 byte (usually < 64)
344+
- end_column_delta: 1 byte (usually < 64)
345+
- opcode: 1 byte
346+
347+
**Total: ~7-9 bytes per frame**
348+
349+
Line numbers and columns use signed varint (zigzag encoding) to handle
350+
sentinel values efficiently. Synthetic frames—generated frames that don't
351+
correspond directly to Python source code, such as C extension boundaries or
352+
internal interpreter frames—use -1 to indicate the absence of a source
353+
location. Zigzag encoding ensures these small negative values encode
302354
efficiently (−1 becomes 1, which is one byte) rather than requiring the
303355
maximum varint length.
304356

Lib/profiling/sampling/cli.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -715,7 +715,7 @@ def _validate_args(args, parser):
715715
)
716716

717717
# Validate --opcodes is only used with compatible formats
718-
opcodes_compatible_formats = ("live", "gecko", "flamegraph", "heatmap")
718+
opcodes_compatible_formats = ("live", "gecko", "flamegraph", "heatmap", "binary")
719719
if getattr(args, 'opcodes', False) and args.format not in opcodes_compatible_formats:
720720
parser.error(
721721
f"--opcodes is only compatible with {', '.join('--' + f for f in opcodes_compatible_formats)}."

Lib/test/test_profiling/test_sampling_profiler/test_binary_format.py

Lines changed: 143 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,17 @@
2929
)
3030

3131

32-
def make_frame(filename, lineno, funcname):
33-
"""Create a FrameInfo struct sequence."""
34-
location = LocationInfo((lineno, lineno, -1, -1))
35-
return FrameInfo((filename, location, funcname, None))
32+
def make_frame(filename, lineno, funcname, end_lineno=None, column=None,
33+
end_column=None, opcode=None):
34+
"""Create a FrameInfo struct sequence with full location info and opcode."""
35+
if end_lineno is None:
36+
end_lineno = lineno
37+
if column is None:
38+
column = 0
39+
if end_column is None:
40+
end_column = 0
41+
location = LocationInfo((lineno, end_lineno, column, end_column))
42+
return FrameInfo((filename, location, funcname, opcode))
3643

3744

3845
def make_thread(thread_id, frames, status=0):
@@ -54,6 +61,36 @@ def extract_lineno(location):
5461
return location
5562

5663

64+
def extract_location(location):
65+
"""Extract full location info as dict from location tuple or None."""
66+
if location is None:
67+
return {"lineno": 0, "end_lineno": 0, "column": 0, "end_column": 0}
68+
if isinstance(location, tuple) and len(location) >= 4:
69+
return {
70+
"lineno": location[0] if location[0] is not None else 0,
71+
"end_lineno": location[1] if location[1] is not None else 0,
72+
"column": location[2] if location[2] is not None else 0,
73+
"end_column": location[3] if location[3] is not None else 0,
74+
}
75+
# Fallback for old-style location
76+
lineno = location[0] if isinstance(location, tuple) else location
77+
return {"lineno": lineno or 0, "end_lineno": lineno or 0, "column": 0, "end_column": 0}
78+
79+
80+
def frame_to_dict(frame):
81+
"""Convert a FrameInfo to a dict."""
82+
loc = extract_location(frame.location)
83+
return {
84+
"filename": frame.filename,
85+
"funcname": frame.funcname,
86+
"lineno": loc["lineno"],
87+
"end_lineno": loc["end_lineno"],
88+
"column": loc["column"],
89+
"end_column": loc["end_column"],
90+
"opcode": frame.opcode,
91+
}
92+
93+
5794
class RawCollector:
5895
"""Collector that captures all raw data grouped by thread."""
5996

@@ -68,15 +105,7 @@ def collect(self, stack_frames, timestamps_us):
68105
count = len(timestamps_us)
69106
for interp in stack_frames:
70107
for thread in interp.threads:
71-
frames = []
72-
for frame in thread.frame_info:
73-
frames.append(
74-
{
75-
"filename": frame.filename,
76-
"funcname": frame.funcname,
77-
"lineno": extract_lineno(frame.location),
78-
}
79-
)
108+
frames = [frame_to_dict(f) for f in thread.frame_info]
80109
key = (interp.interpreter_id, thread.thread_id)
81110
sample = {"status": thread.status, "frames": frames}
82111
for _ in range(count):
@@ -93,15 +122,7 @@ def samples_to_by_thread(samples):
93122
for sample in samples:
94123
for interp in sample:
95124
for thread in interp.threads:
96-
frames = []
97-
for frame in thread.frame_info:
98-
frames.append(
99-
{
100-
"filename": frame.filename,
101-
"funcname": frame.funcname,
102-
"lineno": extract_lineno(frame.location),
103-
}
104-
)
125+
frames = [frame_to_dict(f) for f in thread.frame_info]
105126
key = (interp.interpreter_id, thread.thread_id)
106127
by_thread[key].append(
107128
{
@@ -187,25 +208,15 @@ def assert_samples_equal(self, expected_samples, collector):
187208
for j, (exp_frame, act_frame) in enumerate(
188209
zip(exp["frames"], act["frames"])
189210
):
190-
self.assertEqual(
191-
exp_frame["filename"],
192-
act_frame["filename"],
193-
f"Thread ({interp_id}, {thread_id}), sample {i}, "
194-
f"frame {j}: filename mismatch",
195-
)
196-
self.assertEqual(
197-
exp_frame["funcname"],
198-
act_frame["funcname"],
199-
f"Thread ({interp_id}, {thread_id}), sample {i}, "
200-
f"frame {j}: funcname mismatch",
201-
)
202-
self.assertEqual(
203-
exp_frame["lineno"],
204-
act_frame["lineno"],
205-
f"Thread ({interp_id}, {thread_id}), sample {i}, "
206-
f"frame {j}: lineno mismatch "
207-
f"(expected {exp_frame['lineno']}, got {act_frame['lineno']})",
208-
)
211+
for field in ("filename", "funcname", "lineno", "end_lineno",
212+
"column", "end_column", "opcode"):
213+
self.assertEqual(
214+
exp_frame[field],
215+
act_frame[field],
216+
f"Thread ({interp_id}, {thread_id}), sample {i}, "
217+
f"frame {j}: {field} mismatch "
218+
f"(expected {exp_frame[field]!r}, got {act_frame[field]!r})",
219+
)
209220

210221

211222
class TestBinaryRoundTrip(BinaryFormatTestBase):
@@ -484,6 +495,97 @@ def test_threads_interleaved_samples(self):
484495
self.assertEqual(count, 60)
485496
self.assert_samples_equal(samples, collector)
486497

498+
def test_full_location_roundtrip(self):
499+
"""Full source location (end_lineno, column, end_column) roundtrips."""
500+
frames = [
501+
make_frame("test.py", 10, "func1", end_lineno=12, column=4, end_column=20),
502+
make_frame("test.py", 20, "func2", end_lineno=20, column=8, end_column=45),
503+
make_frame("test.py", 30, "func3", end_lineno=35, column=0, end_column=100),
504+
]
505+
samples = [[make_interpreter(0, [make_thread(1, frames)])]]
506+
collector, count = self.roundtrip(samples)
507+
self.assertEqual(count, 1)
508+
self.assert_samples_equal(samples, collector)
509+
510+
def test_opcode_roundtrip(self):
511+
"""Opcode values roundtrip exactly."""
512+
opcodes = [0, 1, 50, 100, 150, 200, 254] # Valid Python opcodes
513+
samples = []
514+
for opcode in opcodes:
515+
frame = make_frame("test.py", 10, "func", opcode=opcode)
516+
samples.append([make_interpreter(0, [make_thread(1, [frame])])])
517+
collector, count = self.roundtrip(samples)
518+
self.assertEqual(count, len(opcodes))
519+
self.assert_samples_equal(samples, collector)
520+
521+
def test_opcode_none_roundtrip(self):
522+
"""Opcode=None (sentinel 255) roundtrips as None."""
523+
frame = make_frame("test.py", 10, "func", opcode=None)
524+
samples = [[make_interpreter(0, [make_thread(1, [frame])])]]
525+
collector, count = self.roundtrip(samples)
526+
self.assertEqual(count, 1)
527+
self.assert_samples_equal(samples, collector)
528+
529+
def test_mixed_location_and_opcode(self):
530+
"""Mixed full location and opcode data roundtrips."""
531+
frames = [
532+
make_frame("a.py", 10, "a", end_lineno=15, column=4, end_column=30, opcode=100),
533+
make_frame("b.py", 20, "b", end_lineno=20, column=0, end_column=50, opcode=None),
534+
make_frame("c.py", 30, "c", end_lineno=32, column=8, end_column=25, opcode=50),
535+
]
536+
samples = [[make_interpreter(0, [make_thread(1, frames)])]]
537+
collector, count = self.roundtrip(samples)
538+
self.assertEqual(count, 1)
539+
self.assert_samples_equal(samples, collector)
540+
541+
def test_delta_encoding_multiline(self):
542+
"""Multi-line spans (large end_lineno delta) roundtrip correctly."""
543+
# This tests the delta encoding: end_lineno = lineno + delta
544+
frames = [
545+
make_frame("test.py", 1, "small", end_lineno=1, column=0, end_column=10),
546+
make_frame("test.py", 100, "medium", end_lineno=110, column=0, end_column=50),
547+
make_frame("test.py", 1000, "large", end_lineno=1500, column=0, end_column=200),
548+
]
549+
samples = [[make_interpreter(0, [make_thread(1, frames)])]]
550+
collector, count = self.roundtrip(samples)
551+
self.assertEqual(count, 1)
552+
self.assert_samples_equal(samples, collector)
553+
554+
def test_column_positions_preserved(self):
555+
"""Various column positions are preserved exactly."""
556+
columns = [(0, 10), (4, 50), (8, 100), (100, 200)]
557+
samples = []
558+
for col, end_col in columns:
559+
frame = make_frame("test.py", 10, "func", column=col, end_column=end_col)
560+
samples.append([make_interpreter(0, [make_thread(1, [frame])])])
561+
collector, count = self.roundtrip(samples)
562+
self.assertEqual(count, len(columns))
563+
self.assert_samples_equal(samples, collector)
564+
565+
def test_same_line_different_opcodes(self):
566+
"""Same line with different opcodes creates distinct frames."""
567+
# This tests that opcode is part of the frame key
568+
frames = [
569+
make_frame("test.py", 10, "func", opcode=100),
570+
make_frame("test.py", 10, "func", opcode=101),
571+
make_frame("test.py", 10, "func", opcode=102),
572+
]
573+
samples = [[make_interpreter(0, [make_thread(1, [f])]) for f in frames]]
574+
collector, count = self.roundtrip(samples)
575+
# Verify all three opcodes are preserved distinctly
576+
self.assertEqual(count, 3)
577+
578+
def test_same_line_different_columns(self):
579+
"""Same line with different columns creates distinct frames."""
580+
frames = [
581+
make_frame("test.py", 10, "func", column=0, end_column=10),
582+
make_frame("test.py", 10, "func", column=15, end_column=25),
583+
make_frame("test.py", 10, "func", column=30, end_column=40),
584+
]
585+
samples = [[make_interpreter(0, [make_thread(1, [f])]) for f in frames]]
586+
collector, count = self.roundtrip(samples)
587+
self.assertEqual(count, 3)
588+
487589

488590
class TestBinaryEdgeCases(BinaryFormatTestBase):
489591
"""Tests for edge cases in binary format."""

Modules/_remote_debugging/binary_io.h

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,10 @@ extern "C" {
2525
#define BINARY_FORMAT_MAGIC_SWAPPED 0x48434154 /* Byte-swapped magic for endianness detection */
2626
#define BINARY_FORMAT_VERSION 1
2727

28+
/* Sentinel values for optional frame fields */
29+
#define OPCODE_NONE 255 /* No opcode captured (u8 sentinel) */
30+
#define LOCATION_NOT_AVAILABLE (-1) /* lineno/column not available (zigzag sentinel) */
31+
2832
/* Conditional byte-swap macros for cross-endian file reading.
2933
* Uses Python's optimized byte-swap functions from pycore_bitutils.h */
3034
#define SWAP16_IF(swap, x) ((swap) ? _Py_bswap16(x) : (x))
@@ -172,18 +176,28 @@ typedef struct {
172176
size_t compressed_buffer_size;
173177
} ZstdCompressor;
174178

175-
/* Frame entry - combines all frame data for better cache locality */
179+
/* Frame entry - combines all frame data for better cache locality.
180+
* Stores full source position (line, end_line, column, end_column) and opcode.
181+
* Delta values are computed during serialization for efficiency. */
176182
typedef struct {
177183
uint32_t filename_idx;
178184
uint32_t funcname_idx;
179-
int32_t lineno;
185+
int32_t lineno; /* Start line number (-1 for synthetic frames) */
186+
int32_t end_lineno; /* End line number (-1 if not available) */
187+
int32_t column; /* Start column in UTF-8 bytes (-1 if not available) */
188+
int32_t end_column; /* End column in UTF-8 bytes (-1 if not available) */
189+
uint8_t opcode; /* Python opcode (0-254) or OPCODE_NONE (255) */
180190
} FrameEntry;
181191

182-
/* Frame key for hash table lookup */
192+
/* Frame key for hash table lookup - includes all fields for proper deduplication */
183193
typedef struct {
184194
uint32_t filename_idx;
185195
uint32_t funcname_idx;
186196
int32_t lineno;
197+
int32_t end_lineno;
198+
int32_t column;
199+
int32_t end_column;
200+
uint8_t opcode;
187201
} FrameKey;
188202

189203
/* Pending RLE sample - buffered for run-length encoding */
@@ -305,8 +319,8 @@ typedef struct {
305319
PyObject **strings;
306320
uint32_t strings_count;
307321

308-
/* Parsed frame table: packed as [filename_idx, funcname_idx, lineno] */
309-
uint32_t *frame_data;
322+
/* Parsed frame table: array of FrameEntry structures */
323+
FrameEntry *frames;
310324
uint32_t frames_count;
311325

312326
/* Sample data region */

0 commit comments

Comments
 (0)