Skip to content

Commit 50476ca

Browse files
committed
gh-137122: Improve the profiling section in the 3.15 what's new document
1 parent 46f11b3 commit 50476ca

File tree

1 file changed

+111
-13
lines changed

1 file changed

+111
-13
lines changed

Doc/whatsnew/3.15.rst

Lines changed: 111 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -96,35 +96,133 @@ performance issues in production environments.
9696
Key features include:
9797

9898
* **Zero-overhead profiling**: Attach to any running Python process without
99-
affecting its performance
100-
* **No code modification required**: Profile existing applications without restart
101-
* **Real-time statistics**: Monitor sampling quality during data collection
102-
* **Multiple output formats**: Generate both detailed statistics and flamegraph data
103-
* **Thread-aware profiling**: Option to profile all threads or just the main thread
99+
affecting its performance. Ideal for production debugging where you can't afford
100+
to restart or slow down your application.
104101

105-
Profile process 1234 for 10 seconds with default settings:
102+
* **No code modification required**: Profile existing applications without restart.
103+
Simply point the profiler at a running process by PID and start collecting data.
104+
105+
* **Flexible target modes**:
106+
107+
* Profile running processes by PID (``-p``) - attach to already-running applications
108+
* Run and profile scripts directly - profile from the very start of execution
109+
* Execute and profile modules (``-m``) - profile packages run as ``python -m module``
110+
111+
* **Multiple profiling modes**: Choose what to measure based on your performance investigation:
112+
113+
* **Wall-clock time** (``--mode wall``, default): Measures real elapsed time including I/O,
114+
network waits, and blocking operations. Use this to understand where your program spends
115+
calendar time, including when waiting for external resources.
116+
* **CPU time** (``--mode cpu``): Measures only active CPU execution time, excluding I/O waits
117+
and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
118+
* **GIL-holding time** (``--mode gil``): Measures time spent holding Python's Global Interpreter
119+
Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
120+
121+
* **Thread-aware profiling**: Option to profile all threads (``-a``) or just the main thread,
122+
essential for understanding multi-threaded application behavior.
123+
124+
* **Multiple output formats**: Choose the visualization that best fits your workflow:
125+
126+
* ``--pstats``: Detailed tabular statistics compatible with :mod:`pstats`. Shows function-level
127+
timing with direct and cumulative samples. Best for detailed analysis and integration with
128+
existing Python profiling tools.
129+
* ``--collapsed``: Generates collapsed stack traces (one line per stack). This format is
130+
specifically designed for creating flamegraphs with external tools like Brendan Gregg's
131+
FlameGraph scripts or speedscope.
132+
* ``--flamegraph``: Generates a self-contained interactive HTML flamegraph using D3.js.
133+
Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
134+
hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
135+
* ``--gecko``: Generates Gecko Profiler format compatible with Firefox Profiler
136+
(https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
137+
timeline-based analysis with features like stack charts, markers, and network activity.
138+
139+
* **Advanced sorting options**: Sort by direct samples, total time, cumulative time,
140+
sample percentage, cumulative percentage, or function name. Quickly identify hot spots
141+
by sorting functions by where they appear most in stack traces.
142+
143+
* **Flexible output control**: Limit results to top N functions (``-l``), customize sorting,
144+
and disable summary sections for cleaner output suitable for automation.
145+
146+
**Basic usage examples:**
147+
148+
Attach to a running process and get quick profiling stats:
149+
150+
.. code-block:: shell
151+
152+
python -m profiling.sampling -p 1234
153+
154+
Profile a script from the start of its execution:
155+
156+
.. code-block:: shell
157+
158+
python -m profiling.sampling myscript.py arg1 arg2
159+
160+
Profile a module (like profiling ``python -m http.server``):
161+
162+
.. code-block:: shell
163+
164+
python -m profiling.sampling -m http.server
165+
166+
**Understanding different profiling modes:**
167+
168+
Investigate why your web server feels slow (includes I/O waits):
169+
170+
.. code-block:: shell
171+
172+
python -m profiling.sampling --mode wall -p 1234
173+
174+
Find CPU-intensive functions (excludes I/O and sleep time):
175+
176+
.. code-block:: shell
177+
178+
python -m profiling.sampling --mode cpu -p 1234
179+
180+
Debug GIL contention in multi-threaded applications:
106181

107182
.. code-block:: shell
108183
109-
python -m profiling.sampling 1234
184+
python -m profiling.sampling --mode gil -a -p 1234
185+
186+
**Visualization and output formats:**
187+
188+
Generate an interactive flamegraph for visual analysis (opens in browser):
189+
190+
.. code-block:: shell
191+
192+
python -m profiling.sampling --flamegraph -p 1234
193+
194+
Upload to Firefox Profiler for timeline-based analysis:
195+
196+
.. code-block:: shell
197+
198+
python -m profiling.sampling --gecko -o profile.json -p 1234
199+
# Then upload profile.json to https://profiler.firefox.com
200+
201+
Generate collapsed stacks for custom processing:
202+
203+
.. code-block:: shell
204+
205+
python -m profiling.sampling --collapsed -o stacks.txt -p 1234
206+
207+
**Advanced usage:**
110208

111-
Profile with custom interval and duration, save to file:
209+
Profile all threads with real-time sampling statistics:
112210

113211
.. code-block:: shell
114212
115-
python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
213+
python -m profiling.sampling -a --realtime-stats -p 1234
116214
117-
Generate collapsed stacks for flamegraph:
215+
High-frequency sampling (1ms intervals) for 60 seconds:
118216

119217
.. code-block:: shell
120218
121-
python -m profiling.sampling --collapsed 1234
219+
python -m profiling.sampling -i 1000 -d 60 -p 1234
122220
123-
Profile all threads and sort by total time:
221+
Show only the top 20 CPU-consuming functions:
124222

125223
.. code-block:: shell
126224
127-
python -m profiling.sampling -a --sort-tottime 1234
225+
python -m profiling.sampling --sort-tottime -l 20 -p 1234
128226
129227
The profiler generates statistical estimates of where time is spent:
130228

0 commit comments

Comments
 (0)