Bug#13058122 - DML, LOCK/UNLOCK TABLES AND SELECT LEAD TO

FOREVER MDL LOCK Analysis: ---------- While granting MDL lock for the lock requests in wait queue, first the lock is granted to the high priority lock types and then to the low priority lock types. MDL Priority Matrix, +-------------+----+---+---+---+----+-----+ | Locks | | | | | | | | has Priority| | | | | | | | over ---> | S | SR| SW| SU| SNW| SNRW| +-------------+----+---+---+---+----+-----+ | X | + | + | + | + | + | + | +-------------|----|---|---|---|----|-----| | SNRW | - | + | + | - | - | - | +-------------|----|---|---|---|----|-----| | SNW | - | - | + | - | - | - | +-------------+----+---+---+---+----+-----+ Here '+' means, Lock priority is higher. '-' means, Has same priority In the scenario where, *. Lock wait queue has requests of type S/SR/SW/SU. *. And locks of high priority X/SNRW/SNW are requested continuously. In this case, while granting lock, always first high priority lock requests(X/SNRW/SNW) are considered. Low priority locks(S/SR/SW/SU) will not get chance and they will wait forever. In the scenario for which this bug is reported, application executed many LOCK TABLES ... WRITE statements concurrently. These statements request SNRW lock. Also there were some connections trying to execute DML statements requesting SR lock. Since SNRW lock request has higher priority (and as they were too many waiting SNRW requests) lock is always granted to it. So, lock request SR will wait forever, resulting in DML starvation. How is this handled in 5.1? --------------------------- Even in 5.1 we have low priority lock starvation issue. But, in 5.1 thread locking, system variable "max_write_lock_count" can be configured to grant some pending read lock requests. After "max_write_lock_count" of write lock grants all the low priority locks are granted. Why this issue is seen in 5.5/trunk? --------------------------------- In 5.5/trunk MDL locking, "max_write_lock_count" system variable exists but not used in MDL, only thread lock uses it. So no effect of "max_write_lock_count" in MDL locking. This means that starvation of metadata locks is possible even if max_write_lock_count is used. Looks like, customer was using "max_write_lock_count" in 5.1 and when upgraded to 5.5, starvation is seen because of not having effect of "max_write_lock_count" in MDL. Fix: ---------- As a fix, support for max_write_lock_count is added to MDL. To maintain write lock counter per MDL_lock object, new member "m_hog_lock_count" is added in MDL_lock. And following logic is added to increment the counter in function reschedule_waiters, (reschedule_waiters function is called while thread is releasing the lock) - After granting lock request from the wait queue. - Check if there are any S/SR/SU/SW exists in the wait queue - If yes then increment the "m_hog_lock_count" And following logic is added in the same function to handle pending S/SU/SR/SW locks - Before granting locks - Check if max_write_lock_count <= m_hog_lock_count - If Yes, then try to grant S/SR/SW/SU locks. (Since all of these has same priority, all locks are granted together. But some lock grant may fail because of grant incompatibility) - Reset m_hog_lock_count if there no low priority lock requests in wait queue. - return Note: -------------------------- In the lock priority matrix explained above, though X has priority over the SNW and SNRW. X locks is taken mostly for RENAME, TRUNCATE, CREATE ... operations. So lock type X may not be requested in loop continuously in real world applications, as compared to other lock request types. So, lock request of type SNW and SNRW are not starved. So, we can grant all S/SR/SU/SW in one shot, without considering SNW & SNRW lock request starvation. ALTER table operations take SU lock first and then upgrade to SNW if required. All S, SR, SW, SU have same lock priority. So while granting SU, request of types SR, SW, S are also granted in one shot. So, lock request of type SU->SNW in loop will not make other low priority lock request to starve. But, when there is request for lock of type SNRW, lock requests of lower priority types are not granted. And if SNRW is requested in loop continuously then all S, SR, SW, SU are starved. This patch addresses the latter scenario. When we have S/SR/SW/SU in wait queue and if there are - Continuous SNRW lock requests - OR one or more X and Continuous SNRW lock requests. - OR one SNW and Continuous SNRW lock requests. - OR one SNW, one or more X and continuous SNRW lock requests. in wait queue then, S/SR/SW/SU lock request are starved.

Bug#13058122 - DML, LOCK/UNLOCK TABLES AND SELECT LEAD TO
FOREVER MDL LOCK Analysis: ---------- While granting MDL lock for the lock requests in wait queue, first the lock is granted to the high priority lock types and then to the low priority lock types. MDL Priority Matrix, +-------------+----+---+---+---+----+-----+ | Locks | | | | | | | | has Priority| | | | | | | | over ---> | S | SR| SW| SU| SNW| SNRW| +-------------+----+---+---+---+----+-----+ | X | + | + | + | + | + | + | +-------------|----|---|---|---|----|-----| | SNRW | - | + | + | - | - | - | +-------------|----|---|---|---|----|-----| | SNW | - | - | + | - | - | - | +-------------+----+---+---+---+----+-----+ Here '+' means, Lock priority is higher. '-' means, Has same priority In the scenario where, *. Lock wait queue has requests of type S/SR/SW/SU. *. And locks of high priority X/SNRW/SNW are requested continuously. In this case, while granting lock, always first high priority lock requests(X/SNRW/SNW) are considered. Low priority locks(S/SR/SW/SU) will not get chance and they will wait forever. In the scenario for which this bug is reported, application executed many LOCK TABLES ... WRITE statements concurrently. These statements request SNRW lock. Also there were some connections trying to execute DML statements requesting SR lock. Since SNRW lock request has higher priority (and as they were too many waiting SNRW requests) lock is always granted to it. So, lock request SR will wait forever, resulting in DML starvation. How is this handled in 5.1? --------------------------- Even in 5.1 we have low priority lock starvation issue. But, in 5.1 thread locking, system variable "max_write_lock_count" can be configured to grant some pending read lock requests. After "max_write_lock_count" of write lock grants all the low priority locks are granted. Why this issue is seen in 5.5/trunk? --------------------------------- In 5.5/trunk MDL locking, "max_write_lock_count" system variable exists but not used in MDL, only thread lock uses it. So no effect of "max_write_lock_count" in MDL locking. This means that starvation of metadata locks is possible even if max_write_lock_count is used. Looks like, customer was using "max_write_lock_count" in 5.1 and when upgraded to 5.5, starvation is seen because of not having effect of "max_write_lock_count" in MDL. Fix: ---------- As a fix, support for max_write_lock_count is added to MDL. To maintain write lock counter per MDL_lock object, new member "m_hog_lock_count" is added in MDL_lock. And following logic is added to increment the counter in function reschedule_waiters, (reschedule_waiters function is called while thread is releasing the lock) - After granting lock request from the wait queue. - Check if there are any S/SR/SU/SW exists in the wait queue - If yes then increment the "m_hog_lock_count" And following logic is added in the same function to handle pending S/SU/SR/SW locks - Before granting locks - Check if max_write_lock_count <= m_hog_lock_count - If Yes, then try to grant S/SR/SW/SU locks. (Since all of these has same priority, all locks are granted together. But some lock grant may fail because of grant incompatibility) - Reset m_hog_lock_count if there no low priority lock requests in wait queue. - return Note: -------------------------- In the lock priority matrix explained above, though X has priority over the SNW and SNRW. X locks is taken mostly for RENAME, TRUNCATE, CREATE ... operations. So lock type X may not be requested in loop continuously in real world applications, as compared to other lock request types. So, lock request of type SNW and SNRW are not starved. So, we can grant all S/SR/SU/SW in one shot, without considering SNW & SNRW lock request starvation. ALTER table operations take SU lock first and then upgrade to SNW if required. All S, SR, SW, SU have same lock priority. So while granting SU, request of types SR, SW, S are also granted in one shot. So, lock request of type SU->SNW in loop will not make other low priority lock request to starve. But, when there is request for lock of type SNRW, lock requests of lower priority types are not granted. And if SNRW is requested in loop continuously then all S, SR, SW, SU are starved. This patch addresses the latter scenario. When we have S/SR/SW/SU in wait queue and if there are - Continuous SNRW lock requests - OR one or more X and Continuous SNRW lock requests. - OR one SNW and Continuous SNRW lock requests. - OR one SNW, one or more X and continuous SNRW lock requests. in wait queue then, S/SR/SW/SU lock request are starved.
d0766534 · Praveenkumar Hulakund · f79b8d6f · d0766534 · d0766534
Commit d0766534 authored Aug 07, 2012 by Praveenkumar Hulakund
Hide whitespace changes
Inline Side-by-side

Showing with 118 additions and 8 deletions

sql/mdl.cc sql/mdl.cc +112 -8

sql/mdl.h sql/mdl.h +6 -0

No files found.
--- a/sql/mdl.cc
+++ b/sql/mdl.cc
@@ -378,7 +378,8 @@ class MDL_lock

  bool has_pending_conflicting_lock(enum_mdl_type type);

-  bool can_grant_lock(enum_mdl_type type, MDL_context *requstor_ctx) const;
+  bool can_grant_lock(enum_mdl_type type, MDL_context *requstor_ctx,
+                      bool ignore_lock_priority) const;

  inline static MDL_lock *create(const MDL_key *key);

@@ -392,14 +393,24 @@ class MDL_lock
  virtual bool needs_notification(const MDL_ticket *ticket) const = 0;
  virtual void notify_conflicting_locks(MDL_context *ctx) = 0;

+  virtual bitmap_t hog_lock_types_bitmap() const = 0;
+
  /** List of granted tickets for this lock. */
  Ticket_list m_granted;
  /** Tickets for contexts waiting to acquire a lock. */
  Ticket_list m_waiting;
+
+  /**
+    Number of times high priority lock requests have been granted while
+    low priority lock requests were waiting.
+  */
+  ulong m_hog_lock_count;
+
 public:

  MDL_lock(const MDL_key *key_arg)
  : key(key_arg),
+    m_hog_lock_count(0),
    m_ref_usage(0),
    m_ref_release(0),
    m_is_destroyed(FALSE),
@@ -484,6 +495,15 @@ class MDL_scoped_lock : public MDL_lock
  }
  virtual void notify_conflicting_locks(MDL_context *ctx);

+  /*
+    In scoped locks, only IX lock request would starve because of X/S. But that
+    is practically very rare case. So just return 0 from this function.
+  */
+  virtual bitmap_t hog_lock_types_bitmap() const
+  {
+    return 0;
+  }
+
 private:
  static const bitmap_t m_granted_incompatible[MDL_TYPE_END];
  static const bitmap_t m_waiting_incompatible[MDL_TYPE_END];
@@ -536,6 +556,18 @@ class MDL_object_lock : public MDL_lock
  }
  virtual void notify_conflicting_locks(MDL_context *ctx);

+  /*
+    To prevent starvation, these lock types that are only granted
+    max_write_lock_count times in a row while other lock types are
+    waiting.
+  */
+  virtual bitmap_t hog_lock_types_bitmap() const
+  {
+    return (MDL_BIT(MDL_SHARED_NO_WRITE) |
+            MDL_BIT(MDL_SHARED_NO_READ_WRITE) |
+            MDL_BIT(MDL_EXCLUSIVE));
+  }
+
 private:
  static const bitmap_t m_granted_incompatible[MDL_TYPE_END];
  static const bitmap_t m_waiting_incompatible[MDL_TYPE_END];
@@ -1267,6 +1299,41 @@ void MDL_lock::reschedule_waiters()
 {
  MDL_lock::Ticket_iterator it(m_waiting);
  MDL_ticket *ticket;
+  bool skip_high_priority= false;
+  bitmap_t hog_lock_types= hog_lock_types_bitmap();
+
+  if (m_hog_lock_count >= max_write_lock_count)
+  {
+    /*
+      If number of successively granted high-prio, strong locks has exceeded
+      max_write_lock_count give a way to low-prio, weak locks to avoid their
+      starvation.
+    */
+
+    if ((m_waiting.bitmap() & ~hog_lock_types) != 0)
+    {
+      /*
+        Even though normally when m_hog_lock_count is non-0 there is
+        some pending low-prio lock, we still can encounter situation
+        when m_hog_lock_count is non-0 and there are no pending low-prio
+        locks. This, for example, can happen when a ticket for pending
+        low-prio lock was removed from waiters list due to timeout,
+        and reschedule_waiters() is called after that to update the
+        waiters queue. m_hog_lock_count will be reset to 0 at the
+        end of this call in such case.
+
+        Note that it is not an issue if we fail to wake up any pending
+        waiters for weak locks in the loop below. This would mean that
+        all of them are either killed, timed out or chosen as a victim
+        by deadlock resolver, but have not managed to remove ticket
+        from the waiters list yet. After tickets will be removed from
+        the waiters queue there will be another call to
+        reschedule_waiters() with pending bitmap updated to reflect new
+        state of waiters queue.
+      */
+      skip_high_priority= true;
+    }
+  }

  /*
    Find the first (and hence the oldest) waiting request which
@@ -1288,7 +1355,16 @@ void MDL_lock::reschedule_waiters()
  */
  while ((ticket= it++))
  {
-    if (can_grant_lock(ticket->get_type(), ticket->get_ctx()))
+    /*
+      Skip high-prio, strong locks if earlier we have decided to give way to
+      low-prio, weaker locks.
+    */
+    if (skip_high_priority &&
+        ((MDL_BIT(ticket->get_type()) & hog_lock_types) != 0))
+      continue;
+
+    if (can_grant_lock(ticket->get_type(), ticket->get_ctx(),
+                       skip_high_priority))
    {
      if (! ticket->get_ctx()->m_wait.set_status(MDL_wait::GRANTED))
      {
@@ -1302,6 +1378,13 @@ void MDL_lock::reschedule_waiters()
        */
        m_waiting.remove_ticket(ticket);
        m_granted.add_ticket(ticket);
+
+        /*
+          Increase counter of successively granted high-priority strong locks,
+          if we have granted one.
+        */
+        if ((MDL_BIT(ticket->get_type()) & hog_lock_types) != 0)
+          m_hog_lock_count++;
      }
      /*
        If we could not update the wait slot of the waiter,
@@ -1313,6 +1396,24 @@ void MDL_lock::reschedule_waiters()
      */
    }
  }
+
+  if ((m_waiting.bitmap() & ~hog_lock_types) == 0)
+  {
+    /*
+      Reset number of successively granted high-prio, strong locks
+      if there are no pending low-prio, weak locks.
+      This ensures:
+      - That m_hog_lock_count is correctly reset after strong lock
+      is released and weak locks are granted (or there are no
+      other lock requests).
+      - That situation when SNW lock is granted along with some SR
+      locks, but SW locks are still blocked are handled correctly.
+      - That m_hog_lock_count is zero in most cases when there are no pending
+      weak locks (see comment at the start of this method for example of
+      exception). This allows to save on checks at the start of this method.
+    */
+    m_hog_lock_count= 0;
+  }
 }


@@ -1467,8 +1568,9 @@ MDL_object_lock::m_waiting_incompatible[MDL_TYPE_END] =
  Check if request for the metadata lock can be satisfied given its
  current state.

-  @param  type_arg       The requested lock type.
-  @param  requestor_ctx  The MDL context of the requestor.
+  @param  type_arg             The requested lock type.
+  @param  requestor_ctx        The MDL context of the requestor.
+  @param  ignore_lock_priority Ignore lock priority.

  @retval TRUE   Lock request can be satisfied
  @retval FALSE  There is some conflicting lock.
@@ -1480,19 +1582,21 @@ MDL_object_lock::m_waiting_incompatible[MDL_TYPE_END] =

 bool
 MDL_lock::can_grant_lock(enum_mdl_type type_arg,
-                         MDL_context *requestor_ctx) const
+                         MDL_context *requestor_ctx,
+                         bool ignore_lock_priority) const
 {
  bool can_grant= FALSE;
  bitmap_t waiting_incompat_map= incompatible_waiting_types_bitmap()[type_arg];
  bitmap_t granted_incompat_map= incompatible_granted_types_bitmap()[type_arg];
+
  /*
    New lock request can be satisfied iff:
    - There are no incompatible types of satisfied requests
    in other contexts
    - There are no waiting requests which have higher priority
-    than this request.
+    than this request when priority was not ignored.
  */
-  if (! (m_waiting.bitmap() & waiting_incompat_map))
+  if (ignore_lock_priority || !(m_waiting.bitmap() & waiting_incompat_map))
  {
    if (! (m_granted.bitmap() & granted_incompat_map))
      can_grant= TRUE;
@@ -1788,7 +1892,7 @@ MDL_context::try_acquire_lock_impl(MDL_request *mdl_request,

  ticket->m_lock= lock;

-  if (lock->can_grant_lock(mdl_request->type, this))
+  if (lock->can_grant_lock(mdl_request->type, this, false))
  {
    lock->m_granted.add_ticket(ticket);


--- a/sql/mdl.h
+++ b/sql/mdl.h
@@ -859,4 +859,10 @@ extern mysql_mutex_t LOCK_open;
 extern ulong mdl_locks_cache_size;
 static const ulong MDL_LOCKS_CACHE_SIZE_DEFAULT = 1024;

+/*
+  Metadata locking subsystem tries not to grant more than
+  max_write_lock_count high-prio, strong locks successively,
+  to avoid starving out weak, low-prio locks.
+*/
+extern "C" ulong max_write_lock_count;
 #endif