Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Doc/c-api/memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -677,7 +677,11 @@ The pymalloc allocator
Python has a *pymalloc* allocator optimized for small objects (smaller or equal
to 512 bytes) with a short lifetime. It uses memory mappings called "arenas"
with a fixed size of either 256 KiB on 32-bit platforms or 1 MiB on 64-bit
platforms. It falls back to :c:func:`PyMem_RawMalloc` and
platforms. When Python is configured with :option:`--with-pymalloc-hugepages`,
the arena size on 64-bit platforms is increased to 2 MiB to match the huge page
size, and arena allocation will attempt to use huge pages (``MAP_HUGETLB`` on
Linux, ``MEM_LARGE_PAGES`` on Windows) with automatic fallback to regular pages.
It falls back to :c:func:`PyMem_RawMalloc` and
:c:func:`PyMem_RawRealloc` for allocations larger than 512 bytes.

*pymalloc* is the :ref:`default allocator <default-memory-allocators>` of the
Expand Down
15 changes: 15 additions & 0 deletions Doc/using/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -783,6 +783,21 @@ also be used to improve performance.

See also :envvar:`PYTHONMALLOC` environment variable.

.. option:: --with-pymalloc-hugepages

Enable huge page support for :ref:`pymalloc <pymalloc>` arenas (disabled by
default). When enabled, the arena size on 64-bit platforms is increased to
2 MiB and arena allocation uses ``MAP_HUGETLB`` (Linux) or
``MEM_LARGE_PAGES`` (Windows) with automatic fallback to regular pages.

The configure script checks that the platform supports ``MAP_HUGETLB``
and emits a warning if it is not available.

On Windows, use the ``--pymalloc-hugepages`` flag with ``build.bat`` or
set the ``UsePymallocHugepages`` MSBuild property.

.. versionadded:: 3.15

.. option:: --without-doc-strings

Disable static documentation strings to reduce the memory footprint (enabled
Expand Down
6 changes: 6 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1477,6 +1477,12 @@ Build changes
modules that are missing or packaged separately.
(Contributed by Stan Ulbrych and Petr Viktorin in :gh:`139707`.)

* The new configure option :option:`--with-pymalloc-hugepages` enables huge
page support for :ref:`pymalloc <pymalloc>` arenas. When enabled, arena size
increases to 2 MiB and allocation uses ``MAP_HUGETLB`` (Linux) or
``MEM_LARGE_PAGES`` (Windows) with automatic fallback to regular pages.
On Windows, use ``build.bat --pymalloc-hugepages``.

* Annotating anonymous mmap usage is now supported if Linux kernel supports
:manpage:`PR_SET_VMA_ANON_NAME <PR_SET_VMA(2const)>` (Linux 5.17 or newer).
Annotations are visible in ``/proc/<pid>/maps`` if the kernel supports the feature
Expand Down
22 changes: 19 additions & 3 deletions Include/internal/pycore_obmalloc.h
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,11 @@ typedef unsigned int pymem_uint; /* assuming >= 16 bits */
* mappings to reduce heap fragmentation.
*/
#ifdef USE_LARGE_ARENAS
#define ARENA_BITS 20 /* 1 MiB */
# ifdef PYMALLOC_USE_HUGEPAGES
# define ARENA_BITS 21 /* 2 MiB */
# else
# define ARENA_BITS 20 /* 1 MiB */
# endif
#else
#define ARENA_BITS 18 /* 256 KiB */
#endif
Expand Down Expand Up @@ -469,7 +473,7 @@ nfp free pools in usable_arenas.
*/

/* How many arena_objects do we initially allocate?
* 16 = can allocate 16 arenas = 16 * ARENA_SIZE = 4MB before growing the
* 16 = can allocate 16 arenas = 16 * ARENA_SIZE before growing the
* `arenas` vector.
*/
#define INITIAL_ARENA_OBJECTS 16
Expand Down Expand Up @@ -512,14 +516,26 @@ struct _obmalloc_mgmt {

memory address bit allocation for keys

64-bit pointers, IGNORE_BITS=0 and 2^20 arena size:
ARENA_BITS is configurable: 20 (1 MiB) by default on 64-bit, or
21 (2 MiB) when PYMALLOC_USE_HUGEPAGES is enabled. All bit widths
below are derived from ARENA_BITS automatically.

64-bit pointers, IGNORE_BITS=0 and 2^20 arena size (default):
15 -> MAP_TOP_BITS
15 -> MAP_MID_BITS
14 -> MAP_BOT_BITS
20 -> ideal aligned arena
----
64

64-bit pointers, IGNORE_BITS=0 and 2^21 arena size (hugepages):
15 -> MAP_TOP_BITS
15 -> MAP_MID_BITS
13 -> MAP_BOT_BITS
21 -> ideal aligned arena
----
64

64-bit pointers, IGNORE_BITS=16, and 2^20 arena size:
16 -> IGNORE_BITS
10 -> MAP_TOP_BITS
Expand Down
11 changes: 11 additions & 0 deletions Lib/test/test_binascii.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,17 @@ def assertNonBase64Data(data, expected, ignorechars):
assertNonBase64Data(b'a\nb==', b'i', ignorechars=bytearray(b'\n'))
assertNonBase64Data(b'a\nb==', b'i', ignorechars=memoryview(b'\n'))

# Same cell in the cache: '\r' >> 3 == '\n' >> 3.
data = self.type2test(b'\r\n')
with self.assertRaises(binascii.Error):
binascii.a2b_base64(data, ignorechars=b'\r')
self.assertEqual(binascii.a2b_base64(data, ignorechars=b'\r\n'), b'')
# Same bit mask in the cache: '*' & 31 == '\n' & 31.
data = self.type2test(b'*\n')
with self.assertRaises(binascii.Error):
binascii.a2b_base64(data, ignorechars=b'*')
self.assertEqual(binascii.a2b_base64(data, ignorechars=b'*\n'), b'')

data = self.type2test(b'a\nb==')
with self.assertRaises(TypeError):
binascii.a2b_base64(data, ignorechars='')
Expand Down
1 change: 1 addition & 0 deletions Lib/test/test_capi/test_opt.py
Original file line number Diff line number Diff line change
Expand Up @@ -3750,6 +3750,7 @@ def test_is_none(n):
res, ex = self._run_with_optimizer(test_is_none, TIER2_THRESHOLD)
self.assertEqual(res, True)
self.assertIsNotNone(ex)
uops = get_opnames(ex)

self.assertIn("_IS_OP", uops)
self.assertIn("_POP_TOP_NOP", uops)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Build Python with POSIX 2024, instead of POSIX 2008. Patch by Victor Stinner.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add huge pages support for the pymalloc allocator. Patch by Pablo Galindo
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Speed up Base64 decoding of data containing ignored characters (both in
non-strict mode and with an explicit *ignorechars* argument).
It is now up to 2 times faster for multiline Base64 data.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Improve error messages for buffer overflow in :func:`fcntl.fcntl` and
:func:`fcntl.ioctl`.
42 changes: 31 additions & 11 deletions Modules/binascii.c
Original file line number Diff line number Diff line change
Expand Up @@ -469,12 +469,23 @@ binascii_b2a_uu_impl(PyObject *module, Py_buffer *data, int backtick)
return PyBytesWriter_FinishWithPointer(writer, ascii_data);
}

typedef unsigned char ignorecache_t[32];

static int
ignorechar(unsigned char c, Py_buffer *ignorechars)
ignorechar(unsigned char c, const Py_buffer *ignorechars,
ignorecache_t ignorecache)
{
return (ignorechars->buf != NULL &&
memchr(ignorechars->buf, c, ignorechars->len));
if (ignorechars == NULL) {
return 0;
}
if (ignorecache[c >> 3] & (1 << (c & 7))) {
return 1;
}
if (memchr(ignorechars->buf, c, ignorechars->len)) {
ignorecache[c >> 3] |= 1 << (c & 7);
return 1;
}
return 0;
}

/*[clinic input]
Expand Down Expand Up @@ -508,6 +519,13 @@ binascii_a2b_base64_impl(PyObject *module, Py_buffer *data, int strict_mode,
if (strict_mode == -1) {
strict_mode = (ignorechars->buf != NULL);
}
if (!strict_mode || ignorechars->buf == NULL || ignorechars->len == 0) {
ignorechars = NULL;
}
ignorecache_t ignorecache;
if (ignorechars != NULL) {
memset(ignorecache, 0, sizeof(ignorecache));
}

/* Allocate the buffer */
Py_ssize_t bin_len = ((ascii_len+3)/4)*3; /* Upper bound, corrected later */
Expand All @@ -517,8 +535,7 @@ binascii_a2b_base64_impl(PyObject *module, Py_buffer *data, int strict_mode,
}
unsigned char *bin_data = PyBytesWriter_GetData(writer);

size_t i = 0; /* Current position in input */

fastpath:
/* Fast path: use optimized decoder for complete quads.
* This works for both strict and non-strict mode for valid input.
* The fast path stops at padding, invalid chars, or incomplete groups.
Expand All @@ -527,7 +544,8 @@ binascii_a2b_base64_impl(PyObject *module, Py_buffer *data, int strict_mode,
Py_ssize_t fast_chars = base64_decode_fast(ascii_data, (Py_ssize_t)ascii_len,
bin_data, table_a2b_base64);
if (fast_chars > 0) {
i = (size_t)fast_chars;
ascii_data += fast_chars;
ascii_len -= fast_chars;
bin_data += (fast_chars / 4) * 3;
}
}
Expand All @@ -536,8 +554,8 @@ binascii_a2b_base64_impl(PyObject *module, Py_buffer *data, int strict_mode,
int quad_pos = 0;
unsigned char leftchar = 0;
int pads = 0;
for (; i < ascii_len; i++) {
unsigned char this_ch = ascii_data[i];
for (; ascii_len; ascii_data++, ascii_len--) {
unsigned char this_ch = *ascii_data;

/* Check for pad sequences and ignore
** the invalid ones.
Expand All @@ -549,7 +567,7 @@ binascii_a2b_base64_impl(PyObject *module, Py_buffer *data, int strict_mode,
if (quad_pos == 0) {
state = get_binascii_state(module);
if (state) {
PyErr_SetString(state->Error, (i == 0)
PyErr_SetString(state->Error, (ascii_data == data->buf)
? "Leading padding not allowed"
: "Excess padding not allowed");
}
Expand Down Expand Up @@ -580,7 +598,7 @@ binascii_a2b_base64_impl(PyObject *module, Py_buffer *data, int strict_mode,

unsigned char v = table_a2b_base64[this_ch];
if (v >= 64) {
if (strict_mode && !ignorechar(this_ch, ignorechars)) {
if (strict_mode && !ignorechar(this_ch, ignorechars, ignorecache)) {
state = get_binascii_state(module);
if (state) {
PyErr_SetString(state->Error, "Only base64 data is allowed");
Expand Down Expand Up @@ -621,7 +639,9 @@ binascii_a2b_base64_impl(PyObject *module, Py_buffer *data, int strict_mode,
quad_pos = 0;
*bin_data++ = (leftchar << 6) | (v);
leftchar = 0;
break;
ascii_data++;
ascii_len--;
goto fastpath;
}
}

Expand Down
31 changes: 26 additions & 5 deletions Modules/fcntlmodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,11 @@ fcntl_fcntl_impl(PyObject *module, int fd, int code, PyObject *arg)
return !async_err ? PyErr_SetFromErrno(PyExc_OSError) : NULL;
}
if (memcmp(buf + len, guard, GUARDSZ) != 0) {
PyErr_SetString(PyExc_SystemError, "buffer overflow");
PyErr_SetString(PyExc_SystemError,
"Possible stack corruption in fcntl() due to "
"buffer overflow. "
"Provide an argument of sufficient size as "
"determined by the operation.");
return NULL;
}
return PyBytes_FromStringAndSize(buf, len);
Expand Down Expand Up @@ -139,7 +143,11 @@ fcntl_fcntl_impl(PyObject *module, int fd, int code, PyObject *arg)
return NULL;
}
if (ptr[len] != '\0') {
PyErr_SetString(PyExc_SystemError, "buffer overflow");
PyErr_SetString(PyExc_SystemError,
"Memory corruption in fcntl() due to "
"buffer overflow. "
"Provide an argument of sufficient size as "
"determined by the operation.");
PyBytesWriter_Discard(writer);
return NULL;
}
Expand Down Expand Up @@ -264,7 +272,12 @@ fcntl_ioctl_impl(PyObject *module, int fd, unsigned long code, PyObject *arg,
}
PyBuffer_Release(&view);
if (ptr == buf && memcmp(buf + len, guard, GUARDSZ) != 0) {
PyErr_SetString(PyExc_SystemError, "buffer overflow");
PyErr_SetString(PyExc_SystemError,
"Possible stack corruption in ioctl() due to "
"buffer overflow. "
"Provide a writable buffer argument of "
"sufficient size as determined by "
"the operation.");
return NULL;
}
return PyLong_FromLong(ret);
Expand Down Expand Up @@ -293,7 +306,11 @@ fcntl_ioctl_impl(PyObject *module, int fd, unsigned long code, PyObject *arg,
return !async_err ? PyErr_SetFromErrno(PyExc_OSError) : NULL;
}
if (memcmp(buf + len, guard, GUARDSZ) != 0) {
PyErr_SetString(PyExc_SystemError, "buffer overflow");
PyErr_SetString(PyExc_SystemError,
"Possible stack corruption in ioctl() due to "
"buffer overflow. "
"Provide an argument of sufficient size as "
"determined by the operation.");
return NULL;
}
return PyBytes_FromStringAndSize(buf, len);
Expand Down Expand Up @@ -321,7 +338,11 @@ fcntl_ioctl_impl(PyObject *module, int fd, unsigned long code, PyObject *arg,
return NULL;
}
if (ptr[len] != '\0') {
PyErr_SetString(PyExc_SystemError, "buffer overflow");
PyErr_SetString(PyExc_SystemError,
"Memory corruption in ioctl() due to "
"buffer overflow. "
"Provide an argument of sufficient size as "
"determined by the operation.");
PyBytesWriter_Discard(writer);
return NULL;
}
Expand Down
20 changes: 20 additions & 0 deletions Objects/obmalloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -496,10 +496,30 @@ void *
_PyMem_ArenaAlloc(void *Py_UNUSED(ctx), size_t size)
{
#ifdef MS_WINDOWS
# ifdef PYMALLOC_USE_HUGEPAGES
void *ptr = VirtualAlloc(NULL, size,
MEM_COMMIT | MEM_RESERVE | MEM_LARGE_PAGES,
PAGE_READWRITE);
if (ptr != NULL)
return ptr;
/* Fall back to regular pages */
# endif
return VirtualAlloc(NULL, size,
MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
#elif defined(ARENAS_USE_MMAP)
void *ptr;
# ifdef PYMALLOC_USE_HUGEPAGES
# ifdef MAP_HUGETLB
ptr = mmap(NULL, size, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0);
if (ptr != MAP_FAILED) {
assert(ptr != NULL);
(void)_PyAnnotateMemoryMap(ptr, size, "cpython:pymalloc:hugepage");
return ptr;
}
/* Fall back to regular pages */
# endif
# endif
ptr = mmap(NULL, size, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED)
Expand Down
3 changes: 3 additions & 0 deletions PCbuild/build.bat
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ echo. --experimental-jit-interpreter Enable the experimental Tier 2 interprete
echo. --pystats Enable PyStats collection.
echo. --tail-call-interp Enable tail-calling interpreter (requires LLVM 19 or higher).
echo. --enable-stackref-debug Enable stackref debugging mode.
echo. --pymalloc-hugepages Enable huge page support for pymalloc arenas.
echo.
echo.Available flags to avoid building certain modules.
echo.These flags have no effect if '-e' is not given:
Expand Down Expand Up @@ -100,6 +101,7 @@ if "%~1"=="--without-remote-debug" (set DisableRemoteDebug=true) & shift & goto
if "%~1"=="--pystats" (set PyStats=1) & shift & goto CheckOpts
if "%~1"=="--tail-call-interp" (set UseTailCallInterp=true) & shift & goto CheckOpts
if "%~1"=="--enable-stackref-debug" (set StackRefDebug=true) & shift & goto CheckOpts
if "%~1"=="--pymalloc-hugepages" (set UsePymallocHugepages=true) & shift & goto CheckOpts
rem These use the actual property names used by MSBuild. We could just let
rem them in through the environment, but we specify them on the command line
rem anyway for visibility so set defaults after this
Expand Down Expand Up @@ -205,6 +207,7 @@ echo on
/p:UseTailCallInterp=%UseTailCallInterp%^
/p:DisableRemoteDebug=%DisableRemoteDebug%^
/p:StackRefDebug=%StackRefDebug%^
/p:UsePymallocHugepages=%UsePymallocHugepages%^
%1 %2 %3 %4 %5 %6 %7 %8 %9

@echo off
Expand Down
3 changes: 2 additions & 1 deletion PCbuild/pyproject.props
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,12 @@
<_PlatformPreprocessorDefinition Condition="$(Platform) == 'x64' and $(PlatformToolset) != 'ClangCL'">_M_X64;$(_PlatformPreprocessorDefinition)</_PlatformPreprocessorDefinition>
<_Py3NamePreprocessorDefinition>PY3_DLLNAME=L"$(Py3DllName)$(PyDebugExt)";</_Py3NamePreprocessorDefinition>
<_FreeThreadedPreprocessorDefinition Condition="$(DisableGil) == 'true'">Py_GIL_DISABLED=1;</_FreeThreadedPreprocessorDefinition>
<_PymallocHugepagesPreprocessorDefinition Condition="$(UsePymallocHugepages) == 'true'">PYMALLOC_USE_HUGEPAGES=1;</_PymallocHugepagesPreprocessorDefinition>
</PropertyGroup>
<ItemDefinitionGroup>
<ClCompile>
<AdditionalIncludeDirectories>$(PySourcePath)Include;$(PySourcePath)Include\internal;$(PySourcePath)Include\internal\mimalloc;$(PySourcePath)PC;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
<PreprocessorDefinitions>WIN32;$(_Py3NamePreprocessorDefinition)$(_PlatformPreprocessorDefinition)$(_DebugPreprocessorDefinition)$(_PyStatsPreprocessorDefinition)$(_PydPreprocessorDefinition)$(_FreeThreadedPreprocessorDefinition)%(PreprocessorDefinitions)</PreprocessorDefinitions>
<PreprocessorDefinitions>WIN32;$(_Py3NamePreprocessorDefinition)$(_PlatformPreprocessorDefinition)$(_DebugPreprocessorDefinition)$(_PyStatsPreprocessorDefinition)$(_PydPreprocessorDefinition)$(_FreeThreadedPreprocessorDefinition)$(_PymallocHugepagesPreprocessorDefinition)%(PreprocessorDefinitions)</PreprocessorDefinitions>
<PreprocessorDefinitions Condition="'$(SupportPGO)' and ($(Configuration) == 'PGInstrument' or $(Configuration) == 'PGUpdate')">_Py_USING_PGO=1;%(PreprocessorDefinitions)</PreprocessorDefinitions>

<Optimization>MaxSpeed</Optimization>
Expand Down
5 changes: 5 additions & 0 deletions PCbuild/readme.txt
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,11 @@ Supported flags are:
* WITH_COMPUTED_GOTOS: build the interpreter using "computed gotos".
Currently only supported by clang-cl.

* UsePymallocHugepages: enable huge page support for pymalloc arenas.
When enabled, the arena size on 64-bit platforms is increased to 2 MiB
and arena allocation uses MEM_LARGE_PAGES with automatic fallback to
regular pages. Can also be enabled via `--pymalloc-hugepages` flag.


Static library
--------------
Expand Down
2 changes: 1 addition & 1 deletion Parser/tokenizer/helpers.c
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ _syntaxerror_range(struct tok_state *tok, const char *format,
int
_PyTokenizer_syntaxerror(struct tok_state *tok, const char *format, ...)
{
// This errors are cleaned on startup. Todo: Fix it.
// These errors are cleaned on startup. Todo: Fix it.
va_list vargs;
va_start(vargs, format);
int ret = _syntaxerror_range(tok, format, -1, -1, vargs);
Expand Down
Loading
Loading