• Marko Mäkelä's avatar
    MDEV-31953 madvise(..., MADV_FREE) is causing a performance regression · 23234835
    Marko Mäkelä authored
    buf_page_t::set_os_unused(): Remove the system call that had been added in
    commit 16c97187 and revised in
    commit c1fd082e for Microsoft Windows.
    
    buf_pool_t::garbage_collect(): A new function to collect any garbage
    from the InnoDB buffer pool that can be removed without writing any
    log or data files. This will also invoke madvise() for all of buf_pool.free.
    
    To trigger this the following MDEV is implemented:
    MDEV-24670 avoid OOM by linux kernel co-operative memory management
    
    To avoid frequent triggers that caused the MDEV-31953 regression, while
    still preserving the 10.11 functionality of non-greedy kernel memory
    usage, memory triggers are used.
    
    On the triggering of memory pressure, if supported in the Linux kernel,
    trigger the garbage collection of the innodb buffer pool.
    
    The hard coded triggers occur where there is:
    * some memory pressure in 5 of the last 10 seconds
    * a full stall on memory pressure for 10ms in the last 2 seconds
    
    The kernel will trigger only one in each of these time windows. To avoid
    mariadb being in a constant state of memory garbage collection, this has
    been limited to once per minute.
    
    For a small set of kernels in 2023 (6.5, 6.6), there was a limit requiring
    CAP_SYS_RESOURCE that was lifted[1] to support the use case of user
    memory pressure. It not currently possible to set CAP_SYS_RESOURCES in
    a systemd service as its setting a capability inside a usernamespace.
    
    Running under systemd v254+ requires the default MemoryPressureWatch=auto
    (or alternately "on").
    
    Functionality was tested in a 6.4 kernel Fedora successfully under a
    systemd service.
    
    Running in a container requires that (unmask=)/sys/fs/cgroup be writable
    by the mariadbd process.
    
    To aid testing, the buf_pool_resize was a convient trigger point on
    which to trigger garbage collection.
    
    ref [1]: https://lore.kernel.org/all/CAMw=ZnQ56cm4Txgy5EhGYvR+Jt4s-KVgoA9_65HKWVMOXp7a9A@mail.gmail.com/T/#m3bd2a73c5ee49965cb73a830b1ccaa37ccf4e427
    
    Co-Author: Daniel Black (on memory pressure trigger)
    
    Reviewed by: Marko Mäkelä, Vladislav Vaintroub, Vladislav Lesin,
       Thirunarayanan Balathandayuthapani
    
    Tested by: Matthias Leich
    23234835
mem_pressure.test 1.41 KB