MDEV-23369 False sharing in page_hash_latch::read_lock_wait()
MDEV-22871 refactored the InnoDB buf_pool.page_hash to use a simple rw-lock implementation that avoids a spinloop between non-contended read-lock requests, simply using std::atomic::fetch_add() for the lock acquisition. Alas, in a write-heavy stress test on a 56-core system with 1,000 concurrent client connections, the server would stop processing any transactions every now and then. The reason turned out to be false sharing. Attaching a debugger to the server during one such hang revealed that 22 of the 1,033 threads were polling in page_hash_latch::read_lock_wait() on the same object, which appeared to be in unlocked state (no readers or writers). All 22 requests were for accessing an undo log page, with a distinct page number. To eliminate such false sharing, we will make buf_pool.page_hash.array contain one page_hash_latch per CPU data cache line. On AMD64, this will pad the size of the array by 8/7, or almost 15%. For a 50GiB buffer pool of 16KiB pages, the buf_pool.page_hash.array would grow from 25MiB to 28.6MiB. On other instruction set architectures, the incurred memory overhead may be smaller. Thanks to Vladislav Vaintroub for noticing this anomaly.
Showing
Please register or sign in to comment