Skip to content
/ server Public

Comments

MDEV-38671: Default to large innodb_buffer_pool_size_max (except on AIX)#4674

Open
dr-m wants to merge 1 commit into10.11from
MDEV-38671
Open

MDEV-38671: Default to large innodb_buffer_pool_size_max (except on AIX)#4674
dr-m wants to merge 1 commit into10.11from
MDEV-38671

Conversation

@dr-m
Copy link
Contributor

@dr-m dr-m commented Feb 20, 2026

  • The Jira issue number for this PR is: MDEV-38671

Description

MDEV-38671: SET GLOBAL innodb_buffer_pool_size cannot be increased by default

#3826 introduced the parameter innodb_buffer_pool_size_max, which defaulted to the initially specified innodb_buffer_pool_size. This implies that by default, SET GLOBAL innodb_buffer_pool_size cannot increase beyond the initial buffer pool size.

It turns out that on any other 64-bit platform than IBM AIX (maybe also there if @grooverdan can help us) we should be able to reserve a huge chunk of virtual address space upfront, without any overhead for allocating page table entries, by passing PROT_NONE to the initial mmap(2) call.

Hence, we can use the conservative default innodb_buffer_pool_size_max=8t (40 bits of address space out of typical 48 bits).

my_large_virtual_alloc(): If large page allocation fails, clear my_use_large_pages. Outside large page allocation (and IBM AIX), invoke mmap(2) with PROT_NONE so that no page table entries will have to be allocated.

HAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT: Remove, and always make the virtual memory unaccessible, so that page table entries can be deallocated in the operating system kernel.

buf_pool_t::create(): Skip the my_virtual_mem_commit() on AIX only.

buf_pool_t::page_guess(): Unconditionally check if the pointer is within the currently allocated buffer pool.

There is no functional change to the startup logic that adjusts innodb_buffer_pool_size_max to be at least innodb_buffer_pool_size rounded up to the extent size (2 MiB on 32-bit, 8 MiB on 64-bit).

Release Notes

On 64-bit systems other than IBM AIX, the minimum and default values of innodb_buffer_pool_size_max were changed to 8 MiB and 8 TiB, respectively. This means that by default, SET GLOBAL innodb_buffer_pool_size will be able to allocate up to 8 TiB of buffer pool, provided that the operating system supports that.

How can this PR be tested?

See #3826.

On a 64-bit system with 32GiB of RAM and no swap, the following test (an unsuccessful attempt to allocate 64 GiB and a successful one to allocate 16 GiB) passed:

--source include/have_innodb.inc
--error 5
set global innodb_buffer_pool_size=64*1073741824;
set global innodb_buffer_pool_size=16*1073741824;

A variant with only the last statement completed in 1½ seconds, with no difference between CMAKE_BUILD_TYPE=Debug WITH_ASAN=OFF and CMAKE_BUILD_TYPE=RelWithDebInfo WITH_ASAN=OFF.

For a 32-bit userspace running on a Linux 64-bit kernel, a CMAKE_BUILD_TYPE=RelWithDebInfo build the following turned out to be a tight limit:

mysql-test/mtr --mysqld=--innodb-buffer-pool-size-max=2444m innodb.innodb_buffer_pool_shrink

This suggests that it might be practical to configure innodb_buffer_pool_size=2g on some 32-bit systems, but we are not going to do that by default. For a WITH_ASAN=ON build in the same environment, the limit would be lower.

Additionally, this was manually tested WITH_MSAN=ON, which turns out to allocate and initialize shadow buffers for all the PROT_NONE address space. This is why we will treat MemorySanitizer and Valgrind as if they were 32-bit environments.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@vaintroub
Copy link
Member

on 32bit, I think there should be no change, a preallocation would take scarce virtual address space, or can fail because shared libraries or heap can be located at random addresses in the middle of address space (e.g due to ASLR)

my_large_virtual_alloc(): If large page allocation fails, clear
my_use_large_pages. Outside large page allocation (and IBM AIX),
invoke mmap(2) with PROT_NONE so that no page table entries will
have to be allocated. Also, fix the error reporting.

HAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT: Remove. We will always make
the virtual memory unaccessible, so that the operating system
can deallocate the page table entries.

buf_pool_t::create(): Skip the my_virtual_mem_commit() on AIX only.

buf_pool_t::page_guess(): Unconditionally check if the pointer is
within the currently allocated buffer pool.

innodb_buffer_pool_size_max: On 64-bit systems other than IBM AIX,
we will default to 8 TiB.

On 32-bit systems, we retain the default innodb_buffer_pool_size_max=0,
that is, by default, SET GLOBAL innodb_buffer_pool_size=... will be
unable to increase from the initial value.

There is no functional change to the startup logic that adjusts
innodb_buffer_pool_size_max to be at least innodb_buffer_pool_size
rounded up to the extent size (2 MiB on 32-bit, 8 MiB on 64-bit).
innodb_buffer_pool_size > buf_pool.size_in_bytes_max)
if (innodb_buffer_pool_size > buf_pool.size_in_bytes_max)
buf_pool.size_in_bytes_max= ut_calc_align(innodb_buffer_pool_size,
innodb_buffer_pool_extent_size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t test this with large pages yet. I believe that we need the condition my_use_large_pages here. Otherwise, by default, we ought to attempt to allocate 8 TiB using large pages, which would fail.

Comment on lines -1377 to 1378
#ifdef _WIN32
#ifndef _AIX
if (!my_virtual_mem_commit(memory, actual_size))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that this would likely introduce a regression on Linux or any system that supports the MAP_POPULATE flag of mmap(2). Before these change, the initial buffer pool allocation would have been lazily allocated, without committing the memory. Previously, we allowed the buffer pool to be overcommitted initially, but not when SET GLOBAL innodb_buffer_pool_size is increasing the size.

I believe that we must introduce a Linux specific Boolean parameter, innodb_buffer_pool_commit, and make it OFF by default. This setting would be passed as a third parameter to my_virtual_mem_commit(), and it would specify whether the MAP_POPULATE flag will be set. As far as I can tell, only Linux would allow lazy, overcommitted anonymous mappings; any other operating systems that I checked (Microsoft Windows, IBM AIX, FreeBSD, OpenBSD, NetBSD, Dragonfly BSD, Solaris, Apple macOS) would essentially always commit the memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants