python
diff --git a/‎Doc/library/exceptions.rst‎
Lines changed: 6 additions & 0 deletions b/‎Doc/library/exceptions.rst‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎Doc/whatsnew/3.15.rst‎
Lines changed: 76 additions & 95 deletions b/‎Doc/whatsnew/3.15.rst‎
Lines changed: 76 additions & 95 deletions
diff --git a/‎Include/cpython/pyerrors.h‎
Lines changed: 1 addition & 0 deletions b/‎Include/cpython/pyerrors.h‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎Include/cpython/pystate.h‎
Lines changed: 9 additions & 0 deletions b/‎Include/cpython/pystate.h‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎Include/internal/pycore_debug_offsets.h‎
Lines changed: 4 additions & 0 deletions b/‎Include/internal/pycore_debug_offsets.h‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎Include/internal/pycore_global_objects_fini_generated.h‎
Lines changed: 2 additions & 0 deletions b/‎Include/internal/pycore_global_objects_fini_generated.h‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎Include/internal/pycore_global_strings.h‎
Lines changed: 2 additions & 0 deletions b/‎Include/internal/pycore_global_strings.h‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎Include/internal/pycore_runtime_init_generated.h‎
Lines changed: 2 additions & 0 deletions b/‎Include/internal/pycore_runtime_init_generated.h‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎Include/internal/pycore_tstate.h‎
Lines changed: 5 additions & 0 deletions b/‎Include/internal/pycore_tstate.h‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎Include/internal/pycore_unicodeobject_generated.h‎
Lines changed: 8 additions & 0 deletions b/‎Include/internal/pycore_unicodeobject_generated.h‎
Lines changed: 8 additions & 0 deletions
@@ -978,6 +978,12 @@ their subgroups based on the types of the contained exceptions.
    raises a :exc:`TypeError` if any contained exception is not an
    :exc:`Exception` subclass.
 
+   .. impl-detail::
+
+      The ``excs`` parameter may be any sequence, but lists and tuples are
+      specifically processed more efficiently here. For optimal performance,
+      pass a tuple as ``excs``.
+
    .. attribute:: message
 
        The ``msg`` argument to the constructor. This is a read-only attribute.
 
@@ -68,7 +68,7 @@ Summary -- Release highlights
 * :pep:`810`: :ref:`Explicit lazy imports for faster startup times
   <whatsnew315-pep810>`
 * :pep:`799`: :ref:`A dedicated profiling package for organizing Python
-  profiling tools <whatsnew315-sampling-profiler>`
+  profiling tools <whatsnew315-profiling-package>`
 * :pep:`686`: :ref:`Python now uses UTF-8 as the default encoding
   <whatsnew315-utf8-default>`
 * :pep:`782`: :ref:`A new PyBytesWriter C API to create a Python bytes object
@@ -170,14 +170,32 @@ imports cannot be lazy either (``lazy from __future__ import ...`` raises
 .. seealso:: :pep:`810` for the full specification and rationale.
 
 (Contributed by Pablo Galindo Salgado and Dino Viehland in :gh:`142349`.)
+.. _whatsnew315-profiling-package:
+
+:pep:`799`: A dedicated profiling package
+-----------------------------------------
+
+A new :mod:`!profiling` module has been added to organize Python's built-in
+profiling tools under a single, coherent namespace. This module contains:
+
+* :mod:`!profiling.tracing`: deterministic function-call tracing (relocated from
+  :mod:`cProfile`).
+* :mod:`!profiling.sampling`: a new statistical sampling profiler (named Tachyon).
+
+The :mod:`cProfile` module remains as an alias for backwards compatibility.
+The :mod:`profile` module is deprecated and will be removed in Python 3.17.
+
+.. seealso:: :pep:`799` for further details.
+
+(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`138122`.)
 
 
 .. _whatsnew315-sampling-profiler:
 
-:pep:`799`: High frequency statistical sampling profiler
---------------------------------------------------------
+Tachyon: High frequency statistical sampling profiler
+-----------------------------------------------------
 
-A new statistical sampling profiler has been added to the new :mod:`!profiling` module as
+A new statistical sampling profiler (Tachyon) has been added as
 :mod:`!profiling.sampling`. This profiler enables low-overhead performance analysis of
 running Python processes without requiring code modification or process restart.
 
@@ -186,101 +204,64 @@ every function call, the sampling profiler periodically captures stack traces fr
 running processes.  This approach provides virtually zero overhead while achieving
 sampling rates of **up to 1,000,000 Hz**, making it the fastest sampling profiler
 available for Python (at the time of its contribution) and ideal for debugging
-performance issues in production environments.
+performance issues in production environments. This capability is particularly
+valuable for debugging performance issues in production systems where traditional
+profiling approaches would be too intrusive.
 
 Key features include:
 
 * **Zero-overhead profiling**: Attach to any running Python process without
-  affecting its performance
-* **No code modification required**: Profile existing applications without restart
-* **Real-time statistics**: Monitor sampling quality during data collection
-* **Multiple output formats**: Generate both detailed statistics and flamegraph data
-* **Thread-aware profiling**: Option to profile all threads or just the main thread
-
-Profile process 1234 for 10 seconds with default settings:
-
-.. code-block:: shell
-
-  python -m profiling.sampling 1234
-
-Profile with custom interval and duration, save to file:
-
-.. code-block:: shell
-
-  python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
-
-Generate collapsed stacks for flamegraph:
-
-.. code-block:: shell
-
-  python -m profiling.sampling --collapsed 1234
-
-Profile all threads and sort by total time:
-
-.. code-block:: shell
-
-  python -m profiling.sampling -a --sort-tottime 1234
-
-The profiler generates statistical estimates of where time is spent:
-
-.. code-block:: text
-
-  Real-time sampling stats: Mean: 100261.5Hz (9.97µs) Min: 86333.4Hz (11.58µs) Max: 118807.2Hz (8.42µs) Samples: 400001
-  Captured 498841 samples in 5.00 seconds
-  Sample rate: 99768.04 samples/sec
-  Error rate: 0.72%
-  Profile Stats:
-        nsamples   sample%   tottime (s)    cumul%   cumtime (s)  filename:lineno(function)
-        43/418858       0.0         0.000      87.9         4.189  case.py:667(TestCase.run)
-      3293/418812       0.7         0.033      87.9         4.188  case.py:613(TestCase._callTestMethod)
-    158562/158562      33.3         1.586      33.3         1.586  test_compile.py:725(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
-    129553/129553      27.2         1.296      27.2         1.296  ast.py:46(parse)
-        0/128129       0.0         0.000      26.9         1.281  test_ast.py:884(AST_Tests.test_ast_recursion_limit.<locals>.check_limit)
-          7/67446       0.0         0.000      14.2         0.674  test_compile.py:729(TestSpecifics.test_compiler_recursion_limit)
-          6/60380       0.0         0.000      12.7         0.604  test_ast.py:888(AST_Tests.test_ast_recursion_limit)
-          3/50020       0.0         0.000      10.5         0.500  test_compile.py:727(TestSpecifics.test_compiler_recursion_limit)
-          1/38011       0.0         0.000       8.0         0.380  test_ast.py:886(AST_Tests.test_ast_recursion_limit)
-          1/25076       0.0         0.000       5.3         0.251  test_compile.py:728(TestSpecifics.test_compiler_recursion_limit)
-      22361/22362       4.7         0.224       4.7         0.224  test_compile.py:1368(TestSpecifics.test_big_dict_literal)
-          4/18008       0.0         0.000       3.8         0.180  test_ast.py:889(AST_Tests.test_ast_recursion_limit)
-        11/17696       0.0         0.000       3.7         0.177  subprocess.py:1038(Popen.__init__)
-      16968/16968       3.6         0.170       3.6         0.170  subprocess.py:1900(Popen._execute_child)
-          2/16941       0.0         0.000       3.6         0.169  test_compile.py:730(TestSpecifics.test_compiler_recursion_limit)
-
-  Legend:
-    nsamples: Direct/Cumulative samples (direct executing / on call stack)
-    sample%: Percentage of total samples this function was directly executing
-    tottime: Estimated total time spent directly in this function
-    cumul%: Percentage of total samples when this function was on the call stack
-    cumtime: Estimated cumulative time (including time in called functions)
-    filename:lineno(function): Function location and name
-
-  Summary of Interesting Functions:
-
-  Functions with Highest Direct/Cumulative Ratio (Hot Spots):
-    1.000 direct/cumulative ratio, 33.3% direct samples: test_compile.py:(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
-    1.000 direct/cumulative ratio, 27.2% direct samples: ast.py:(parse)
-    1.000 direct/cumulative ratio, 3.6% direct samples: subprocess.py:(Popen._execute_child)
-
-  Functions with Highest Call Frequency (Indirect Calls):
-    418815 indirect calls, 87.9% total stack presence: case.py:(TestCase.run)
-    415519 indirect calls, 87.9% total stack presence: case.py:(TestCase._callTestMethod)
-    159470 indirect calls, 33.5% total stack presence: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
-
-  Functions with Highest Call Magnification (Cumulative/Direct):
-    12267.9x call magnification, 159470 indirect calls from 13 direct: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
-    10581.7x call magnification, 116388 indirect calls from 11 direct: test_ast.py:(AST_Tests.test_ast_recursion_limit)
-    9740.9x call magnification, 418815 indirect calls from 43 direct: case.py:(TestCase.run)
-
-The profiler automatically identifies performance bottlenecks through statistical
-analysis, highlighting functions with high CPU usage and call frequency patterns.
-
-This capability is particularly valuable for debugging performance issues in
-production systems where traditional profiling approaches would be too intrusive.
-
-  .. seealso:: :pep:`799` for further details.
-
-(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953`.)
+  affecting its performance. Ideal for production debugging where you can't afford
+  to restart or slow down your application.
+
+* **No code modification required**: Profile existing applications without restart.
+  Simply point the profiler at a running process by PID and start collecting data.
+
+* **Flexible target modes**:
+
+  * Profile running processes by PID (``attach``) - attach to already-running applications
+  * Run and profile scripts directly (``run``) - profile from the very start of execution
+  * Execute and profile modules (``run -m``) - profile packages run as ``python -m module``
+
+* **Multiple profiling modes**: Choose what to measure based on your performance investigation:
+
+  * **Wall-clock time** (``--mode wall``, default): Measures real elapsed time including I/O,
+    network waits, and blocking operations. Use this to understand where your program spends
+    calendar time, including when waiting for external resources.
+  * **CPU time** (``--mode cpu``): Measures only active CPU execution time, excluding I/O waits
+    and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
+  * **GIL-holding time** (``--mode gil``): Measures time spent holding Python's Global Interpreter
+    Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
+
+* **Thread-aware profiling**: Option to profile all threads (``-a``) or just the main thread,
+  essential for understanding multi-threaded application behavior.
+
+* **Multiple output formats**: Choose the visualization that best fits your workflow:
+
+  * ``--pstats``: Detailed tabular statistics compatible with :mod:`pstats`. Shows function-level
+    timing with direct and cumulative samples. Best for detailed analysis and integration with
+    existing Python profiling tools.
+  * ``--collapsed``: Generates collapsed stack traces (one line per stack). This format is
+    specifically designed for creating flamegraphs with external tools like Brendan Gregg's
+    FlameGraph scripts or speedscope.
+  * ``--flamegraph``: Generates a self-contained interactive HTML flamegraph using D3.js.
+    Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
+    hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
+  * ``--gecko``: Generates Gecko Profiler format compatible with Firefox Profiler
+    (https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
+    timeline-based analysis with features like stack charts, markers, and network activity.
+  * ``--heatmap``: Generates an interactive HTML heatmap visualization with line-level sample
+    counts. Creates a directory with per-file heatmaps showing exactly where time is spent
+    at the source code level.
+
+* **Live interactive mode**: Real-time TUI profiler with a top-like interface (``--live``).
+  Monitor performance as your application runs with interactive sorting and filtering.
+
+* **Async-aware profiling**: Profile async/await code with task-based stack reconstruction
+  (``--async-aware``). See which coroutines are consuming time, with options to show only
+  running tasks or all tasks including those waiting.
+
+(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953` and :gh:`138122`.)
 
 
 .. _whatsnew315-improved-error-messages:
 
@@ -18,6 +18,7 @@ typedef struct {
     PyException_HEAD
     PyObject *msg;
     PyObject *excs;
+    PyObject *excs_str;
 } PyBaseExceptionGroupObject;
 
 typedef struct {
 
@@ -135,6 +135,15 @@ struct _ts {
     /* Pointer to currently executing frame. */
     struct _PyInterpreterFrame *current_frame;
 
+    /* Pointer to the base frame (bottommost sentinel frame).
+       Used by profilers to validate complete stack unwinding.
+       Points to the embedded base_frame in _PyThreadStateImpl.
+       The frame is embedded there rather than here because _PyInterpreterFrame
+       is defined in internal headers that cannot be exposed in the public API. */
+    struct _PyInterpreterFrame *base_frame;
+
+    struct _PyInterpreterFrame *last_profiled_frame;
+
     Py_tracefunc c_profilefunc;
     Py_tracefunc c_tracefunc;
     PyObject *c_profileobj;
 
@@ -102,6 +102,8 @@ typedef struct _Py_DebugOffsets {
         uint64_t next;
         uint64_t interp;
         uint64_t current_frame;
+        uint64_t base_frame;
+        uint64_t last_profiled_frame;
         uint64_t thread_id;
         uint64_t native_thread_id;
         uint64_t datastack_chunk;
@@ -272,6 +274,8 @@ typedef struct _Py_DebugOffsets {
         .next = offsetof(PyThreadState, next), \
         .interp = offsetof(PyThreadState, interp), \
         .current_frame = offsetof(PyThreadState, current_frame), \
+        .base_frame = offsetof(PyThreadState, base_frame), \
+        .last_profiled_frame = offsetof(PyThreadState, last_profiled_frame), \
         .thread_id = offsetof(PyThreadState, thread_id), \
         .native_thread_id = offsetof(PyThreadState, native_thread_id), \
         .datastack_chunk = offsetof(PyThreadState, datastack_chunk), \
 
@@ -334,6 +334,7 @@ struct _Py_global_strings {
         STRUCT_FOR_ID(c_parameter_type)
         STRUCT_FOR_ID(c_return)
         STRUCT_FOR_ID(c_stack)
+        STRUCT_FOR_ID(cache_frames)
         STRUCT_FOR_ID(cached_datetime_module)
         STRUCT_FOR_ID(cached_statements)
         STRUCT_FOR_ID(cadata)
@@ -778,6 +779,7 @@ struct _Py_global_strings {
         STRUCT_FOR_ID(stacklevel)
         STRUCT_FOR_ID(start)
         STRUCT_FOR_ID(statement)
+        STRUCT_FOR_ID(stats)
         STRUCT_FOR_ID(status)
         STRUCT_FOR_ID(stderr)
         STRUCT_FOR_ID(stdin)
 
@@ -10,6 +10,7 @@ extern "C" {
 
 #include "pycore_brc.h"             // struct _brc_thread_state
 #include "pycore_freelist_state.h"  // struct _Py_freelists
+#include "pycore_interpframe_structs.h"  // _PyInterpreterFrame
 #include "pycore_mimalloc.h"        // struct _mimalloc_thread_state
 #include "pycore_qsbr.h"            // struct qsbr
 #include "pycore_uop.h"             // struct _PyUOpInstruction
@@ -61,6 +62,10 @@ typedef struct _PyThreadStateImpl {
     // semi-public fields are in PyThreadState.
     PyThreadState base;
 
+    // Embedded base frame - sentinel at the bottom of the frame stack.
+    // Used by profiling/sampling to detect incomplete stack traces.
+    _PyInterpreterFrame base_frame;
+
     // The reference count field is used to synchronize deallocation of the
     // thread state during runtime finalization.
     Py_ssize_t refcount;