• Praveenkumar Hulakund's avatar
    Bug#13058122 - DML, LOCK/UNLOCK TABLES AND SELECT LEAD TO · d0766534
    Praveenkumar Hulakund authored
    FOREVER MDL LOCK
    
    Analysis:
    ----------
    While granting MDL lock for the lock requests in wait queue,
    first the lock is granted to the high priority lock types
    and then to the low priority lock types.
    
    MDL Priority Matrix,
      +-------------+----+---+---+---+----+-----+
      | Locks       |    |   |   |   |    |     |
      | has Priority|    |   |   |   |    |     |
      | over --->   |  S | SR| SW| SU| SNW| SNRW|   
      +-------------+----+---+---+---+----+-----+
      | X           |  + | + | + | + | +  | +   |
      +-------------|----|---|---|---|----|-----|
      | SNRW        |  - | + | + | - | -  | -   |
      +-------------|----|---|---|---|----|-----|
      | SNW         |  - | - | + | - | -  | -   |
      +-------------+----+---+---+---+----+-----+
    
    Here '+' means, Lock priority is higher.
         '-' means, Has same priority
    
    In the scenario where,
       *. Lock wait queue has requests of type S/SR/SW/SU.
       *. And locks of high priority X/SNRW/SNW are requested 
          continuously.
    
    In this case, while granting lock, always first high priority 
    lock requests(X/SNRW/SNW) are considered. Low priority 
    locks(S/SR/SW/SU) will not get chance and they will 
    wait forever.
    
    In the scenario for which this bug is reported, application
    executed many LOCK TABLES ... WRITE statements concurrently.
    These statements request SNRW lock. Also there were some
    connections trying to execute DML statements requesting SR
    lock. Since SNRW lock request has higher priority (and as
    they were too many waiting SNRW requests) lock is always 
    granted to it. So, lock request SR will wait forever, resulting
    in DML starvation.
    
    How is this handled in 5.1?
    ---------------------------
    Even in 5.1 we have low priority lock starvation issue.
    But, in 5.1 thread locking, system variable 
    "max_write_lock_count" can be configured to grant
    some pending read lock requests. After 
    "max_write_lock_count" of write lock grants all the low
    priority locks are granted.
    
    Why this issue is seen in 5.5/trunk?
    ---------------------------------
    In 5.5/trunk MDL locking, "max_write_lock_count" system 
    variable exists but not used in MDL, only thread lock uses
    it. So no effect of "max_write_lock_count" in MDL locking.
    This means that starvation of metadata locks is possible 
    even if max_write_lock_count is used.
    
    Looks like, customer was using "max_write_lock_count" in
    5.1 and when upgraded to 5.5, starvation is seen because
    of not having effect of "max_write_lock_count" in MDL.
    
    Fix:
    ----------
    As a fix, support for max_write_lock_count is added to MDL.
    To maintain write lock counter per MDL_lock object, new
    member "m_hog_lock_count" is added in MDL_lock.
    
    And following logic is added to increment the counter in 
    function reschedule_waiters, 
    (reschedule_waiters function is called while thread is
     releasing the lock)
        - After granting lock request from the wait queue.
        -  Check if there are any S/SR/SU/SW exists in the wait queue
          - If yes then increment the "m_hog_lock_count"
    
    And following logic is added in the same function to
    handle pending S/SU/SR/SW locks
        
        - Before granting locks 
        - Check if max_write_lock_count <= m_hog_lock_count
        - If Yes, then try to grant S/SR/SW/SU locks. 
          (Since all of these has same priority, all locks are
           granted together. But some lock grant may fail because
           of grant incompatibility)
        - Reset m_hog_lock_count if there no low priority lock
          requests in wait queue. 
        - return
    
    Note:
    --------------------------
    In the lock priority matrix explained above,
    though X has priority over the SNW and SNRW. X locks is
    taken mostly for RENAME, TRUNCATE, CREATE ... operations.
    So lock type X may not be requested in loop continuously 
    in real world applications, as compared to other lock 
    request types. So, lock request of type SNW and SNRW are 
    not starved. So, we can grant all S/SR/SU/SW in one shot,
    without considering SNW & SNRW lock request starvation.
    
    ALTER table operations take SU lock first and then 
    upgrade to SNW if required. All S, SR, SW, SU have same
    lock priority. So while granting SU, request of types
    SR, SW, S are also granted in one shot. So, lock request 
    of type SU->SNW in loop will not make other low priority 
    lock request to starve.
    
    But, when there is request for lock of type SNRW, lock
    requests of lower priority types are not granted. And if 
    SNRW is requested in loop continuously then all 
    S, SR, SW, SU are starved.
    
    This patch addresses the latter scenario.
    When we have S/SR/SW/SU in wait queue and if 
    there are
        - Continuous SNRW lock requests
        - OR one or more X and Continuous SNRW lock requests.
        - OR one SNW and Continuous SNRW lock requests.
        - OR one SNW, one or more X and continuous SNRW lock 
          requests.
    in wait queue then, S/SR/SW/SU lock request are starved.
    
    d0766534
mdl.cc 85.8 KB