BUG#11747548: DETECT ORPHAN TEMP-POOL FILES, AND HANDLE GRACEFULLY
Analysis: -------- Certain queries using intrinsic temporary tables may fail due to name clashes in the file name for the temporary table when the 'temp-pool' enabled. 'temp-pool' tries to reduce the number of different filenames used for temp tables by allocating them from small pool in order to avoid problems in the Linux kernel by using a three part filename: <tmp_file_prefix>_<pid>_<temp_pool_slot_num>. The bit corresponding to the temp_pool_slot_num is set in the bit map maintained for the temp-pool when it used for the file name. It is cleared after the temp table is deleted for re-use. The 'create_tmp_table()' function call under error condition tries to clear the same bit twice by calling 'free_tmp_table()' and 'bitmap_lock_clear_bit()'. 'free_tmp_table()' does a delete of the table/file and clears the bit by calling the same function 'bitmap_lock_clear_bit()'. The issue reported can be triggered under the timing window mentioned below for an error condition while creating the temp table: a) THD1: Due to an error clears the temp pool slot number used by it by calling 'free_tmp_table'. b) THD2: In the process of creating the temp table by using an unused slot number in the bit map. c) THD1: Clears the slot number used THD2 by calling 'bitmap_lock_clear_bit()' after completing the call 'free_tmp_table'. d) THD3: Uses the slot number used the THD2 since it is freed by THD1. When it tries to create the temp file using that slot number, an error is reported since it is currently in use by THD2. [The error: Error 'Can't create/write to file '/tmp/#sql_277e_0.MYD' (Errcode: 17)'] Another issue which may occur in 5.6 and trunk is that: When the open temporary table fails after its creation(due to ulimit or OOM error), the file is not deleted. Thus further attempts to use the same slot number in the 'temp-pool' results in failure. Fix: --- a) Under the error condition calling the 'bitmap_lock_clear_bit()' function to clear the bit is unnecessary since 'free_tmp_table()' deletes the table/file and clears the bit. Hence removed the redundant call 'bitmap_lock_clear_bit()' in 'create_tmp_table()' This prevents the timing window under which the issue reported can be seen. b) If open of the temporary table fails, then the file is deleted thus allowing the temp-pool slot number to be utilized for the subsequent temporary table creation. c) Also if the attempt to create temp table fails since it already exists, the temp-pool slot for it is marked as used, to avoid the problem from re-appearing.
Showing
Please register or sign in to comment