Commit 4ee9f919 authored by Hidetoshi Seto's avatar Hidetoshi Seto Committed by Linus Torvalds

[PATCH] futex_wait hang fix

NPTL has 3 control counters (total/wake/woken).
so NPTL can know:
  "how many threads enter to wait"(total),
  "how many threads receive wake signal"(wake),
  and "how many threads exit waiting"(woken).

Abstraction of pthread_cond_wait and pthread_cond_signal are:

A01 pthread_cond_wait {
A02   timeout = 0;
A03   lock(counters);
A04     total++;
A05     val = get_from(futex);
A06   unlock(counters);
A07
A08   sys_futex(futex, FUTEX_WAIT, val, timeout);
A09
A10   lock(counters);
A11     woken++;
A12   unlock(counters);
A13 }

B01 pthread_cond_signal {
B02   lock(counters);
B03   if(total>wake) { /* if there is waiter */
B04     wake++;
B05     update_val(futex);
B06     sys_futex(futex, FUTEX_WAKE, 1);
B07   }
B08   unlock(counters);
B09 }

What we have to notice is:
    FUTEX_WAKE could be called before FUTEX_WAIT have called (at A07).
In such case, FUTEX_WAKE will fail if there is no thread in waitqueue.

However, since pthread_cond_signal do not only wake++ but also
update_val(futex), next FUTEX_WAIT will fail with -EWOULDBLOCK because the val
passed to WAIT is now not equal to updated val.  Therefore, as the result, it
seems that the WAKE wakes the WAIT.

===

The bug will appear if 2 pair of wait & wake called at (nearly)once:

   * Assume 4 threads, wait_A, wait_B, wake_X, and wake_Y
   * counters start from [total/wake/woken]=[0/0/0]
   * the val of futex starts from (0), update means inclement of the val.
   * there is no thread in waitqueue on the futex.

[simulation]

wait_A: calls pthread_cond_wait:
    total++, prepare to call FUTEX_WAIT with val=0.
    # status: [1/0/0] (0) queue={}(empty) #

wake_X: calls pthread_cond_signal:
    no one in waitqueue, just wake++ and update futex val.
    # status: [1/1/0] (1) queue={}(empty) #

wait_B: calls pthread_cond_wait:
    total++, prepare to call FUTEX_WAIT with val=1.
    # status: [2/1/0] (1) queue={}(empty) #

wait_A: calls FUTEX_WAIT with val=0:
    after queueing, compare val. 0!=1 ... this should be blocked...
    # status: [2/1/0] (1) queue={A} #

wait_B: calls FUTEX_WAIT with val=1:
    after queueing, compare val. 1==1 ... OK, let's schedule()...
    # status: [2/1/0] (1) queue={A,B} (B=sleeping) #

wake_Y: calls pthread_cond_signal:
    A is in waitqueue ... dequeue A, wake++ and update futex val.
    # status: [2/2/0] (2) queue={B} (B=sleeping) #

wait_A: end of FUTEX_WAIT with val=0:
    try to dequeue but already dequeued, return anyway.
    # status: [2/2/0] (2) queue={B} (B=sleeping) #

wait_A: end of pthread_cond_wait:
    woken++.
    # status: [2/2/1] (2) queue={B} (B=sleeping) #

This is bug:
   wait_A: wakeup
   wait_B: sleeping
   wake_X: wake A
   wake_Y: wake A again

if subsequent wake_Z try to wake B:

wake_Z: calls pthread_cond_signal:
    since total==wake, do nothing.
    # status: [2/2/1] (2) queue={B} (B=sleeping) #

If wait_C comes, B become to can be woken, but C...

This bug makes the waitqueue to trap some threads in it all time.

====

>  - According to man of futex:
>      "If the futex was not equal to the expected value, the operation
>       returns -EWOULDBLOCK."
>    but now, here is no description about the rare case:
>      "returns 0 if the futex was not equal to the expected value, but
>       the process was woken by a FUTEX_WAKE call."
>    this behavior on rare case causes the hang which I found.

So to avoid this problem, my patch shut up the window that you said:

 > The patch certainly looks sensible - I can see that without the patch,
 > there is a window in which this process is pointlessly queued up on the
 > futex and that in this window a wakeup attempt might do a bad thing.

=====

In short:
There is an un-documented behavior of futex_wait. This behavior misleads
NPTL to wake a thread doubly, as the result, causes an application hang.
Signed-off-by: default avatarHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
parent e8999f7a
...@@ -486,8 +486,6 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time) ...@@ -486,8 +486,6 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time)
if (unlikely(ret != 0)) if (unlikely(ret != 0))
goto out_release_sem; goto out_release_sem;
queue_me(&q, -1, NULL);
/* /*
* Access the page after the futex is queued. * Access the page after the futex is queued.
* We hold the mmap semaphore, so the mapping cannot have changed * We hold the mmap semaphore, so the mapping cannot have changed
...@@ -495,13 +493,15 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time) ...@@ -495,13 +493,15 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time)
*/ */
if (get_user(curval, (int __user *)uaddr) != 0) { if (get_user(curval, (int __user *)uaddr) != 0) {
ret = -EFAULT; ret = -EFAULT;
goto out_unqueue; goto out_release_sem;
} }
if (curval != val) { if (curval != val) {
ret = -EWOULDBLOCK; ret = -EWOULDBLOCK;
goto out_unqueue; goto out_release_sem;
} }
queue_me(&q, -1, NULL);
/* /*
* Now the futex is queued and we have checked the data, we * Now the futex is queued and we have checked the data, we
* don't want to hold mmap_sem while we sleep. * don't want to hold mmap_sem while we sleep.
...@@ -542,11 +542,10 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time) ...@@ -542,11 +542,10 @@ static int futex_wait(unsigned long uaddr, int val, unsigned long time)
WARN_ON(!signal_pending(current)); WARN_ON(!signal_pending(current));
return -EINTR; return -EINTR;
out_unqueue:
/* If we were woken (and unqueued), we succeeded, whatever. */ /* If we were woken (and unqueued), we succeeded, whatever. */
if (!unqueue_me(&q)) if (!unqueue_me(&q))
ret = 0; ret = 0;
out_release_sem: out_release_sem:
up_read(&current->mm->mmap_sem); up_read(&current->mm->mmap_sem);
return ret; return ret;
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment