MDEV-38671: Default to large innodb_buffer_pool_size_max (except on AIX)#4674
MDEV-38671: Default to large innodb_buffer_pool_size_max (except on AIX)#4674
Conversation
|
|
|
on 32bit, I think there should be no change, a preallocation would take scarce virtual address space, or can fail because shared libraries or heap can be located at random addresses in the middle of address space (e.g due to ASLR) |
my_large_virtual_alloc(): If large page allocation fails, clear my_use_large_pages. Outside large page allocation (and IBM AIX), invoke mmap(2) with PROT_NONE so that no page table entries will have to be allocated. Also, fix the error reporting. HAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT: Remove. We will always make the virtual memory unaccessible, so that the operating system can deallocate the page table entries. buf_pool_t::create(): Skip the my_virtual_mem_commit() on AIX only. buf_pool_t::page_guess(): Unconditionally check if the pointer is within the currently allocated buffer pool. innodb_buffer_pool_size_max: On 64-bit systems other than IBM AIX, we will default to 8 TiB. On 32-bit systems, we retain the default innodb_buffer_pool_size_max=0, that is, by default, SET GLOBAL innodb_buffer_pool_size=... will be unable to increase from the initial value. There is no functional change to the startup logic that adjusts innodb_buffer_pool_size_max to be at least innodb_buffer_pool_size rounded up to the extent size (2 MiB on 32-bit, 8 MiB on 64-bit).
| innodb_buffer_pool_size > buf_pool.size_in_bytes_max) | ||
| if (innodb_buffer_pool_size > buf_pool.size_in_bytes_max) | ||
| buf_pool.size_in_bytes_max= ut_calc_align(innodb_buffer_pool_size, | ||
| innodb_buffer_pool_extent_size); |
There was a problem hiding this comment.
I didn’t test this with large pages yet. I believe that we need the condition my_use_large_pages here. Otherwise, by default, we ought to attempt to allocate 8 TiB using large pages, which would fail.
| #ifdef _WIN32 | ||
| #ifndef _AIX | ||
| if (!my_virtual_mem_commit(memory, actual_size)) |
There was a problem hiding this comment.
I realized that this would likely introduce a regression on Linux or any system that supports the MAP_POPULATE flag of mmap(2). Before these change, the initial buffer pool allocation would have been lazily allocated, without committing the memory. Previously, we allowed the buffer pool to be overcommitted initially, but not when SET GLOBAL innodb_buffer_pool_size is increasing the size.
I believe that we must introduce a Linux specific Boolean parameter, innodb_buffer_pool_commit, and make it OFF by default. This setting would be passed as a third parameter to my_virtual_mem_commit(), and it would specify whether the MAP_POPULATE flag will be set. As far as I can tell, only Linux would allow lazy, overcommitted anonymous mappings; any other operating systems that I checked (Microsoft Windows, IBM AIX, FreeBSD, OpenBSD, NetBSD, Dragonfly BSD, Solaris, Apple macOS) would essentially always commit the memory.
Description
MDEV-38671:
SET GLOBAL innodb_buffer_pool_sizecannot be increased by default#3826 introduced the parameter
innodb_buffer_pool_size_max, which defaulted to the initially specifiedinnodb_buffer_pool_size. This implies that by default,SET GLOBAL innodb_buffer_pool_sizecannot increase beyond the initial buffer pool size.It turns out that on any other 64-bit platform than IBM AIX (maybe also there if @grooverdan can help us) we should be able to reserve a huge chunk of virtual address space upfront, without any overhead for allocating page table entries, by passing
PROT_NONEto the initialmmap(2)call.Hence, we can use the conservative default
innodb_buffer_pool_size_max=8t(40 bits of address space out of typical 48 bits).my_large_virtual_alloc(): If large page allocation fails, clear my_use_large_pages. Outside large page allocation (and IBM AIX), invokemmap(2)withPROT_NONEso that no page table entries will have to be allocated.HAVE_UNACCESSIBLE_AFTER_MEM_DECOMMIT: Remove, and always make the virtual memory unaccessible, so that page table entries can be deallocated in the operating system kernel.buf_pool_t::create(): Skip themy_virtual_mem_commit()on AIX only.buf_pool_t::page_guess(): Unconditionally check if the pointer is within the currently allocated buffer pool.There is no functional change to the startup logic that adjusts
innodb_buffer_pool_size_maxto be at leastinnodb_buffer_pool_sizerounded up to the extent size (2 MiB on 32-bit, 8 MiB on 64-bit).Release Notes
On 64-bit systems other than IBM AIX, the minimum and default values of
innodb_buffer_pool_size_maxwere changed to 8 MiB and 8 TiB, respectively. This means that by default,SET GLOBAL innodb_buffer_pool_sizewill be able to allocate up to 8 TiB of buffer pool, provided that the operating system supports that.How can this PR be tested?
See #3826.
On a 64-bit system with 32GiB of RAM and no swap, the following test (an unsuccessful attempt to allocate 64 GiB and a successful one to allocate 16 GiB) passed:
A variant with only the last statement completed in 1½ seconds, with no difference between
CMAKE_BUILD_TYPE=Debug WITH_ASAN=OFFandCMAKE_BUILD_TYPE=RelWithDebInfo WITH_ASAN=OFF.For a 32-bit userspace running on a Linux 64-bit kernel, a
CMAKE_BUILD_TYPE=RelWithDebInfobuild the following turned out to be a tight limit:This suggests that it might be practical to configure
innodb_buffer_pool_size=2gon some 32-bit systems, but we are not going to do that by default. For aWITH_ASAN=ONbuild in the same environment, the limit would be lower.Additionally, this was manually tested
WITH_MSAN=ON, which turns out to allocate and initialize shadow buffers for all thePROT_NONEaddress space. This is why we will treat MemorySanitizer and Valgrind as if they were 32-bit environments.Basing the PR against the correct MariaDB version
mainbranch.PR quality check