@@ -68,7 +68,7 @@ Summary -- Release highlights
6868 * :pep: `810 `: :ref: `Explicit lazy imports for faster startup times
6969 <whatsnew315-pep810>`
7070* :pep: `799 `: :ref: `A dedicated profiling package for organizing Python
71- profiling tools <whatsnew315-sampling-profiler >`
71+ profiling tools <whatsnew315-profiling-package >`
7272* :pep: `686 `: :ref: `Python now uses UTF-8 as the default encoding
7373 <whatsnew315-utf8-default>`
7474* :pep: `782 `: :ref: `A new PyBytesWriter C API to create a Python bytes object
@@ -170,14 +170,32 @@ imports cannot be lazy either (``lazy from __future__ import ...`` raises
170170.. seealso :: :pep:`810` for the full specification and rationale.
171171
172172(Contributed by Pablo Galindo Salgado and Dino Viehland in :gh: `142349 `.)
173+ .. _whatsnew315-profiling-package :
174+
175+ :pep: `799 `: A dedicated profiling package
176+ -----------------------------------------
177+
178+ A new :mod: `!profiling ` module has been added to organize Python's built-in
179+ profiling tools under a single, coherent namespace. This module contains:
180+
181+ * :mod: `!profiling.tracing `: deterministic function-call tracing (relocated from
182+ :mod: `cProfile `).
183+ * :mod: `!profiling.sampling `: a new statistical sampling profiler (named Tachyon).
184+
185+ The :mod: `cProfile ` module remains as an alias for backwards compatibility.
186+ The :mod: `profile ` module is deprecated and will be removed in Python 3.17.
187+
188+ .. seealso :: :pep:`799` for further details.
189+
190+ (Contributed by Pablo Galindo and László Kiss Kollár in :gh: `138122 `.)
173191
174192
175193.. _whatsnew315-sampling-profiler :
176194
177- :pep: ` 799 ` : High frequency statistical sampling profiler
178- --------------------------------------------------------
195+ Tachyon : High frequency statistical sampling profiler
196+ -----------------------------------------------------
179197
180- A new statistical sampling profiler has been added to the new :mod: ` !profiling ` module as
198+ A new statistical sampling profiler (Tachyon) has been added as
181199:mod: `!profiling.sampling `. This profiler enables low-overhead performance analysis of
182200running Python processes without requiring code modification or process restart.
183201
@@ -186,101 +204,64 @@ every function call, the sampling profiler periodically captures stack traces fr
186204running processes. This approach provides virtually zero overhead while achieving
187205sampling rates of **up to 1,000,000 Hz **, making it the fastest sampling profiler
188206available for Python (at the time of its contribution) and ideal for debugging
189- performance issues in production environments.
207+ performance issues in production environments. This capability is particularly
208+ valuable for debugging performance issues in production systems where traditional
209+ profiling approaches would be too intrusive.
190210
191211Key features include:
192212
193213* **Zero-overhead profiling **: Attach to any running Python process without
194- affecting its performance
195- * **No code modification required **: Profile existing applications without restart
196- * **Real-time statistics **: Monitor sampling quality during data collection
197- * **Multiple output formats **: Generate both detailed statistics and flamegraph data
198- * **Thread-aware profiling **: Option to profile all threads or just the main thread
199-
200- Profile process 1234 for 10 seconds with default settings:
201-
202- .. code-block :: shell
203-
204- python -m profiling.sampling 1234
205-
206- Profile with custom interval and duration, save to file:
207-
208- .. code-block :: shell
209-
210- python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
211-
212- Generate collapsed stacks for flamegraph:
213-
214- .. code-block :: shell
215-
216- python -m profiling.sampling --collapsed 1234
217-
218- Profile all threads and sort by total time:
219-
220- .. code-block :: shell
221-
222- python -m profiling.sampling -a --sort-tottime 1234
223-
224- The profiler generates statistical estimates of where time is spent:
225-
226- .. code-block :: text
227-
228- Real-time sampling stats: Mean: 100261.5Hz (9.97µs) Min: 86333.4Hz (11.58µs) Max: 118807.2Hz (8.42µs) Samples: 400001
229- Captured 498841 samples in 5.00 seconds
230- Sample rate: 99768.04 samples/sec
231- Error rate: 0.72%
232- Profile Stats:
233- nsamples sample% tottime (s) cumul% cumtime (s) filename:lineno(function)
234- 43/418858 0.0 0.000 87.9 4.189 case.py:667(TestCase.run)
235- 3293/418812 0.7 0.033 87.9 4.188 case.py:613(TestCase._callTestMethod)
236- 158562/158562 33.3 1.586 33.3 1.586 test_compile.py:725(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
237- 129553/129553 27.2 1.296 27.2 1.296 ast.py:46(parse)
238- 0/128129 0.0 0.000 26.9 1.281 test_ast.py:884(AST_Tests.test_ast_recursion_limit.<locals>.check_limit)
239- 7/67446 0.0 0.000 14.2 0.674 test_compile.py:729(TestSpecifics.test_compiler_recursion_limit)
240- 6/60380 0.0 0.000 12.7 0.604 test_ast.py:888(AST_Tests.test_ast_recursion_limit)
241- 3/50020 0.0 0.000 10.5 0.500 test_compile.py:727(TestSpecifics.test_compiler_recursion_limit)
242- 1/38011 0.0 0.000 8.0 0.380 test_ast.py:886(AST_Tests.test_ast_recursion_limit)
243- 1/25076 0.0 0.000 5.3 0.251 test_compile.py:728(TestSpecifics.test_compiler_recursion_limit)
244- 22361/22362 4.7 0.224 4.7 0.224 test_compile.py:1368(TestSpecifics.test_big_dict_literal)
245- 4/18008 0.0 0.000 3.8 0.180 test_ast.py:889(AST_Tests.test_ast_recursion_limit)
246- 11/17696 0.0 0.000 3.7 0.177 subprocess.py:1038(Popen.__init__)
247- 16968/16968 3.6 0.170 3.6 0.170 subprocess.py:1900(Popen._execute_child)
248- 2/16941 0.0 0.000 3.6 0.169 test_compile.py:730(TestSpecifics.test_compiler_recursion_limit)
249-
250- Legend:
251- nsamples: Direct/Cumulative samples (direct executing / on call stack)
252- sample%: Percentage of total samples this function was directly executing
253- tottime: Estimated total time spent directly in this function
254- cumul%: Percentage of total samples when this function was on the call stack
255- cumtime: Estimated cumulative time (including time in called functions)
256- filename:lineno(function): Function location and name
257-
258- Summary of Interesting Functions:
259-
260- Functions with Highest Direct/Cumulative Ratio (Hot Spots):
261- 1.000 direct/cumulative ratio, 33.3% direct samples: test_compile.py:(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
262- 1.000 direct/cumulative ratio, 27.2% direct samples: ast.py:(parse)
263- 1.000 direct/cumulative ratio, 3.6% direct samples: subprocess.py:(Popen._execute_child)
264-
265- Functions with Highest Call Frequency (Indirect Calls):
266- 418815 indirect calls, 87.9% total stack presence: case.py:(TestCase.run)
267- 415519 indirect calls, 87.9% total stack presence: case.py:(TestCase._callTestMethod)
268- 159470 indirect calls, 33.5% total stack presence: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
269-
270- Functions with Highest Call Magnification (Cumulative/Direct):
271- 12267.9x call magnification, 159470 indirect calls from 13 direct: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
272- 10581.7x call magnification, 116388 indirect calls from 11 direct: test_ast.py:(AST_Tests.test_ast_recursion_limit)
273- 9740.9x call magnification, 418815 indirect calls from 43 direct: case.py:(TestCase.run)
274-
275- The profiler automatically identifies performance bottlenecks through statistical
276- analysis, highlighting functions with high CPU usage and call frequency patterns.
277-
278- This capability is particularly valuable for debugging performance issues in
279- production systems where traditional profiling approaches would be too intrusive.
280-
281- .. seealso :: :pep:`799` for further details.
282-
283- (Contributed by Pablo Galindo and László Kiss Kollár in :gh: `135953 `.)
214+ affecting its performance. Ideal for production debugging where you can't afford
215+ to restart or slow down your application.
216+
217+ * **No code modification required **: Profile existing applications without restart.
218+ Simply point the profiler at a running process by PID and start collecting data.
219+
220+ * **Flexible target modes **:
221+
222+ * Profile running processes by PID (``attach ``) - attach to already-running applications
223+ * Run and profile scripts directly (``run ``) - profile from the very start of execution
224+ * Execute and profile modules (``run -m ``) - profile packages run as ``python -m module ``
225+
226+ * **Multiple profiling modes **: Choose what to measure based on your performance investigation:
227+
228+ * **Wall-clock time ** (``--mode wall ``, default): Measures real elapsed time including I/O,
229+ network waits, and blocking operations. Use this to understand where your program spends
230+ calendar time, including when waiting for external resources.
231+ * **CPU time ** (``--mode cpu ``): Measures only active CPU execution time, excluding I/O waits
232+ and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
233+ * **GIL-holding time ** (``--mode gil ``): Measures time spent holding Python's Global Interpreter
234+ Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
235+
236+ * **Thread-aware profiling **: Option to profile all threads (``-a ``) or just the main thread,
237+ essential for understanding multi-threaded application behavior.
238+
239+ * **Multiple output formats **: Choose the visualization that best fits your workflow:
240+
241+ * ``--pstats ``: Detailed tabular statistics compatible with :mod: `pstats `. Shows function-level
242+ timing with direct and cumulative samples. Best for detailed analysis and integration with
243+ existing Python profiling tools.
244+ * ``--collapsed ``: Generates collapsed stack traces (one line per stack). This format is
245+ specifically designed for creating flamegraphs with external tools like Brendan Gregg's
246+ FlameGraph scripts or speedscope.
247+ * ``--flamegraph ``: Generates a self-contained interactive HTML flamegraph using D3.js.
248+ Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
249+ hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
250+ * ``--gecko ``: Generates Gecko Profiler format compatible with Firefox Profiler
251+ (https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
252+ timeline-based analysis with features like stack charts, markers, and network activity.
253+ * ``--heatmap ``: Generates an interactive HTML heatmap visualization with line-level sample
254+ counts. Creates a directory with per-file heatmaps showing exactly where time is spent
255+ at the source code level.
256+
257+ * **Live interactive mode **: Real-time TUI profiler with a top-like interface (``--live ``).
258+ Monitor performance as your application runs with interactive sorting and filtering.
259+
260+ * **Async-aware profiling **: Profile async/await code with task-based stack reconstruction
261+ (``--async-aware ``). See which coroutines are consuming time, with options to show only
262+ running tasks or all tasks including those waiting.
263+
264+ (Contributed by Pablo Galindo and László Kiss Kollár in :gh: `135953 ` and :gh: `138122 `.)
284265
285266
286267.. _whatsnew315-improved-error-messages :
0 commit comments