Skip to content

Commit a0a28c2

Browse files
authored
Merge branch 'main' into lazy
2 parents 5000033 + ef51a7c commit a0a28c2

File tree

80 files changed

+5793
-2119
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+5793
-2119
lines changed

Doc/library/exceptions.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -978,6 +978,12 @@ their subgroups based on the types of the contained exceptions.
978978
raises a :exc:`TypeError` if any contained exception is not an
979979
:exc:`Exception` subclass.
980980

981+
.. impl-detail::
982+
983+
The ``excs`` parameter may be any sequence, but lists and tuples are
984+
specifically processed more efficiently here. For optimal performance,
985+
pass a tuple as ``excs``.
986+
981987
.. attribute:: message
982988

983989
The ``msg`` argument to the constructor. This is a read-only attribute.

Doc/whatsnew/3.15.rst

Lines changed: 76 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ Summary -- Release highlights
6868
* :pep:`810`: :ref:`Explicit lazy imports for faster startup times
6969
<whatsnew315-pep810>`
7070
* :pep:`799`: :ref:`A dedicated profiling package for organizing Python
71-
profiling tools <whatsnew315-sampling-profiler>`
71+
profiling tools <whatsnew315-profiling-package>`
7272
* :pep:`686`: :ref:`Python now uses UTF-8 as the default encoding
7373
<whatsnew315-utf8-default>`
7474
* :pep:`782`: :ref:`A new PyBytesWriter C API to create a Python bytes object
@@ -170,14 +170,32 @@ imports cannot be lazy either (``lazy from __future__ import ...`` raises
170170
.. seealso:: :pep:`810` for the full specification and rationale.
171171

172172
(Contributed by Pablo Galindo Salgado and Dino Viehland in :gh:`142349`.)
173+
.. _whatsnew315-profiling-package:
174+
175+
:pep:`799`: A dedicated profiling package
176+
-----------------------------------------
177+
178+
A new :mod:`!profiling` module has been added to organize Python's built-in
179+
profiling tools under a single, coherent namespace. This module contains:
180+
181+
* :mod:`!profiling.tracing`: deterministic function-call tracing (relocated from
182+
:mod:`cProfile`).
183+
* :mod:`!profiling.sampling`: a new statistical sampling profiler (named Tachyon).
184+
185+
The :mod:`cProfile` module remains as an alias for backwards compatibility.
186+
The :mod:`profile` module is deprecated and will be removed in Python 3.17.
187+
188+
.. seealso:: :pep:`799` for further details.
189+
190+
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`138122`.)
173191

174192

175193
.. _whatsnew315-sampling-profiler:
176194

177-
:pep:`799`: High frequency statistical sampling profiler
178-
--------------------------------------------------------
195+
Tachyon: High frequency statistical sampling profiler
196+
-----------------------------------------------------
179197

180-
A new statistical sampling profiler has been added to the new :mod:`!profiling` module as
198+
A new statistical sampling profiler (Tachyon) has been added as
181199
:mod:`!profiling.sampling`. This profiler enables low-overhead performance analysis of
182200
running Python processes without requiring code modification or process restart.
183201

@@ -186,101 +204,64 @@ every function call, the sampling profiler periodically captures stack traces fr
186204
running processes. This approach provides virtually zero overhead while achieving
187205
sampling rates of **up to 1,000,000 Hz**, making it the fastest sampling profiler
188206
available for Python (at the time of its contribution) and ideal for debugging
189-
performance issues in production environments.
207+
performance issues in production environments. This capability is particularly
208+
valuable for debugging performance issues in production systems where traditional
209+
profiling approaches would be too intrusive.
190210

191211
Key features include:
192212

193213
* **Zero-overhead profiling**: Attach to any running Python process without
194-
affecting its performance
195-
* **No code modification required**: Profile existing applications without restart
196-
* **Real-time statistics**: Monitor sampling quality during data collection
197-
* **Multiple output formats**: Generate both detailed statistics and flamegraph data
198-
* **Thread-aware profiling**: Option to profile all threads or just the main thread
199-
200-
Profile process 1234 for 10 seconds with default settings:
201-
202-
.. code-block:: shell
203-
204-
python -m profiling.sampling 1234
205-
206-
Profile with custom interval and duration, save to file:
207-
208-
.. code-block:: shell
209-
210-
python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
211-
212-
Generate collapsed stacks for flamegraph:
213-
214-
.. code-block:: shell
215-
216-
python -m profiling.sampling --collapsed 1234
217-
218-
Profile all threads and sort by total time:
219-
220-
.. code-block:: shell
221-
222-
python -m profiling.sampling -a --sort-tottime 1234
223-
224-
The profiler generates statistical estimates of where time is spent:
225-
226-
.. code-block:: text
227-
228-
Real-time sampling stats: Mean: 100261.5Hz (9.97µs) Min: 86333.4Hz (11.58µs) Max: 118807.2Hz (8.42µs) Samples: 400001
229-
Captured 498841 samples in 5.00 seconds
230-
Sample rate: 99768.04 samples/sec
231-
Error rate: 0.72%
232-
Profile Stats:
233-
nsamples sample% tottime (s) cumul% cumtime (s) filename:lineno(function)
234-
43/418858 0.0 0.000 87.9 4.189 case.py:667(TestCase.run)
235-
3293/418812 0.7 0.033 87.9 4.188 case.py:613(TestCase._callTestMethod)
236-
158562/158562 33.3 1.586 33.3 1.586 test_compile.py:725(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
237-
129553/129553 27.2 1.296 27.2 1.296 ast.py:46(parse)
238-
0/128129 0.0 0.000 26.9 1.281 test_ast.py:884(AST_Tests.test_ast_recursion_limit.<locals>.check_limit)
239-
7/67446 0.0 0.000 14.2 0.674 test_compile.py:729(TestSpecifics.test_compiler_recursion_limit)
240-
6/60380 0.0 0.000 12.7 0.604 test_ast.py:888(AST_Tests.test_ast_recursion_limit)
241-
3/50020 0.0 0.000 10.5 0.500 test_compile.py:727(TestSpecifics.test_compiler_recursion_limit)
242-
1/38011 0.0 0.000 8.0 0.380 test_ast.py:886(AST_Tests.test_ast_recursion_limit)
243-
1/25076 0.0 0.000 5.3 0.251 test_compile.py:728(TestSpecifics.test_compiler_recursion_limit)
244-
22361/22362 4.7 0.224 4.7 0.224 test_compile.py:1368(TestSpecifics.test_big_dict_literal)
245-
4/18008 0.0 0.000 3.8 0.180 test_ast.py:889(AST_Tests.test_ast_recursion_limit)
246-
11/17696 0.0 0.000 3.7 0.177 subprocess.py:1038(Popen.__init__)
247-
16968/16968 3.6 0.170 3.6 0.170 subprocess.py:1900(Popen._execute_child)
248-
2/16941 0.0 0.000 3.6 0.169 test_compile.py:730(TestSpecifics.test_compiler_recursion_limit)
249-
250-
Legend:
251-
nsamples: Direct/Cumulative samples (direct executing / on call stack)
252-
sample%: Percentage of total samples this function was directly executing
253-
tottime: Estimated total time spent directly in this function
254-
cumul%: Percentage of total samples when this function was on the call stack
255-
cumtime: Estimated cumulative time (including time in called functions)
256-
filename:lineno(function): Function location and name
257-
258-
Summary of Interesting Functions:
259-
260-
Functions with Highest Direct/Cumulative Ratio (Hot Spots):
261-
1.000 direct/cumulative ratio, 33.3% direct samples: test_compile.py:(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
262-
1.000 direct/cumulative ratio, 27.2% direct samples: ast.py:(parse)
263-
1.000 direct/cumulative ratio, 3.6% direct samples: subprocess.py:(Popen._execute_child)
264-
265-
Functions with Highest Call Frequency (Indirect Calls):
266-
418815 indirect calls, 87.9% total stack presence: case.py:(TestCase.run)
267-
415519 indirect calls, 87.9% total stack presence: case.py:(TestCase._callTestMethod)
268-
159470 indirect calls, 33.5% total stack presence: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
269-
270-
Functions with Highest Call Magnification (Cumulative/Direct):
271-
12267.9x call magnification, 159470 indirect calls from 13 direct: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
272-
10581.7x call magnification, 116388 indirect calls from 11 direct: test_ast.py:(AST_Tests.test_ast_recursion_limit)
273-
9740.9x call magnification, 418815 indirect calls from 43 direct: case.py:(TestCase.run)
274-
275-
The profiler automatically identifies performance bottlenecks through statistical
276-
analysis, highlighting functions with high CPU usage and call frequency patterns.
277-
278-
This capability is particularly valuable for debugging performance issues in
279-
production systems where traditional profiling approaches would be too intrusive.
280-
281-
.. seealso:: :pep:`799` for further details.
282-
283-
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953`.)
214+
affecting its performance. Ideal for production debugging where you can't afford
215+
to restart or slow down your application.
216+
217+
* **No code modification required**: Profile existing applications without restart.
218+
Simply point the profiler at a running process by PID and start collecting data.
219+
220+
* **Flexible target modes**:
221+
222+
* Profile running processes by PID (``attach``) - attach to already-running applications
223+
* Run and profile scripts directly (``run``) - profile from the very start of execution
224+
* Execute and profile modules (``run -m``) - profile packages run as ``python -m module``
225+
226+
* **Multiple profiling modes**: Choose what to measure based on your performance investigation:
227+
228+
* **Wall-clock time** (``--mode wall``, default): Measures real elapsed time including I/O,
229+
network waits, and blocking operations. Use this to understand where your program spends
230+
calendar time, including when waiting for external resources.
231+
* **CPU time** (``--mode cpu``): Measures only active CPU execution time, excluding I/O waits
232+
and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
233+
* **GIL-holding time** (``--mode gil``): Measures time spent holding Python's Global Interpreter
234+
Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
235+
236+
* **Thread-aware profiling**: Option to profile all threads (``-a``) or just the main thread,
237+
essential for understanding multi-threaded application behavior.
238+
239+
* **Multiple output formats**: Choose the visualization that best fits your workflow:
240+
241+
* ``--pstats``: Detailed tabular statistics compatible with :mod:`pstats`. Shows function-level
242+
timing with direct and cumulative samples. Best for detailed analysis and integration with
243+
existing Python profiling tools.
244+
* ``--collapsed``: Generates collapsed stack traces (one line per stack). This format is
245+
specifically designed for creating flamegraphs with external tools like Brendan Gregg's
246+
FlameGraph scripts or speedscope.
247+
* ``--flamegraph``: Generates a self-contained interactive HTML flamegraph using D3.js.
248+
Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
249+
hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
250+
* ``--gecko``: Generates Gecko Profiler format compatible with Firefox Profiler
251+
(https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
252+
timeline-based analysis with features like stack charts, markers, and network activity.
253+
* ``--heatmap``: Generates an interactive HTML heatmap visualization with line-level sample
254+
counts. Creates a directory with per-file heatmaps showing exactly where time is spent
255+
at the source code level.
256+
257+
* **Live interactive mode**: Real-time TUI profiler with a top-like interface (``--live``).
258+
Monitor performance as your application runs with interactive sorting and filtering.
259+
260+
* **Async-aware profiling**: Profile async/await code with task-based stack reconstruction
261+
(``--async-aware``). See which coroutines are consuming time, with options to show only
262+
running tasks or all tasks including those waiting.
263+
264+
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953` and :gh:`138122`.)
284265

285266

286267
.. _whatsnew315-improved-error-messages:

Include/cpython/pyerrors.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ typedef struct {
1818
PyException_HEAD
1919
PyObject *msg;
2020
PyObject *excs;
21+
PyObject *excs_str;
2122
} PyBaseExceptionGroupObject;
2223

2324
typedef struct {

Include/cpython/pystate.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,15 @@ struct _ts {
135135
/* Pointer to currently executing frame. */
136136
struct _PyInterpreterFrame *current_frame;
137137

138+
/* Pointer to the base frame (bottommost sentinel frame).
139+
Used by profilers to validate complete stack unwinding.
140+
Points to the embedded base_frame in _PyThreadStateImpl.
141+
The frame is embedded there rather than here because _PyInterpreterFrame
142+
is defined in internal headers that cannot be exposed in the public API. */
143+
struct _PyInterpreterFrame *base_frame;
144+
145+
struct _PyInterpreterFrame *last_profiled_frame;
146+
138147
Py_tracefunc c_profilefunc;
139148
Py_tracefunc c_tracefunc;
140149
PyObject *c_profileobj;

Include/internal/pycore_debug_offsets.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,8 @@ typedef struct _Py_DebugOffsets {
102102
uint64_t next;
103103
uint64_t interp;
104104
uint64_t current_frame;
105+
uint64_t base_frame;
106+
uint64_t last_profiled_frame;
105107
uint64_t thread_id;
106108
uint64_t native_thread_id;
107109
uint64_t datastack_chunk;
@@ -272,6 +274,8 @@ typedef struct _Py_DebugOffsets {
272274
.next = offsetof(PyThreadState, next), \
273275
.interp = offsetof(PyThreadState, interp), \
274276
.current_frame = offsetof(PyThreadState, current_frame), \
277+
.base_frame = offsetof(PyThreadState, base_frame), \
278+
.last_profiled_frame = offsetof(PyThreadState, last_profiled_frame), \
275279
.thread_id = offsetof(PyThreadState, thread_id), \
276280
.native_thread_id = offsetof(PyThreadState, native_thread_id), \
277281
.datastack_chunk = offsetof(PyThreadState, datastack_chunk), \

Include/internal/pycore_global_objects_fini_generated.h

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_global_strings.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -334,6 +334,7 @@ struct _Py_global_strings {
334334
STRUCT_FOR_ID(c_parameter_type)
335335
STRUCT_FOR_ID(c_return)
336336
STRUCT_FOR_ID(c_stack)
337+
STRUCT_FOR_ID(cache_frames)
337338
STRUCT_FOR_ID(cached_datetime_module)
338339
STRUCT_FOR_ID(cached_statements)
339340
STRUCT_FOR_ID(cadata)
@@ -778,6 +779,7 @@ struct _Py_global_strings {
778779
STRUCT_FOR_ID(stacklevel)
779780
STRUCT_FOR_ID(start)
780781
STRUCT_FOR_ID(statement)
782+
STRUCT_FOR_ID(stats)
781783
STRUCT_FOR_ID(status)
782784
STRUCT_FOR_ID(stderr)
783785
STRUCT_FOR_ID(stdin)

Include/internal/pycore_runtime_init_generated.h

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_tstate.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ extern "C" {
1010

1111
#include "pycore_brc.h" // struct _brc_thread_state
1212
#include "pycore_freelist_state.h" // struct _Py_freelists
13+
#include "pycore_interpframe_structs.h" // _PyInterpreterFrame
1314
#include "pycore_mimalloc.h" // struct _mimalloc_thread_state
1415
#include "pycore_qsbr.h" // struct qsbr
1516
#include "pycore_uop.h" // struct _PyUOpInstruction
@@ -61,6 +62,10 @@ typedef struct _PyThreadStateImpl {
6162
// semi-public fields are in PyThreadState.
6263
PyThreadState base;
6364

65+
// Embedded base frame - sentinel at the bottom of the frame stack.
66+
// Used by profiling/sampling to detect incomplete stack traces.
67+
_PyInterpreterFrame base_frame;
68+
6469
// The reference count field is used to synchronize deallocation of the
6570
// thread state during runtime finalization.
6671
Py_ssize_t refcount;

Include/internal/pycore_unicodeobject_generated.h

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)