1. 25 Feb, 2009 3 commits
    • Peter Zijlstra's avatar
      generic-ipi: remove CSD_FLAG_WAIT · 6e275637
      Peter Zijlstra authored
      Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
      the code so that we can use CSD_FLAG_LOCK for both purposes.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6e275637
    • Peter Zijlstra's avatar
      generic-ipi: remove kmalloc() · 8969a5ed
      Peter Zijlstra authored
      Remove the use of kmalloc() from the smp_call_function_*()
      calls.
      
      Steven's generic-ipi patch (d7240b98: generic-ipi: use per cpu
      data for single cpu ipi calls) started the discussion on the use
      of kmalloc() in this code and fixed the
      smp_call_function_single(.wait=0) fallback case.
      
      In this patch we complete this by also providing means for the
      _many() call, which fully removes the need for kmalloc() in this
      code.
      
      The problem with the _many() call is that other cpus might still
      be observing our entry when we're done with it. It solved this
      by dynamically allocating data elements and RCU-freeing it.
      
      We solve it by using a single per-cpu entry which provides
      static storage and solves one half of the problem (avoiding
      referencing freed data).
      
      The other half, ensuring the queue iteration it still possible,
      is done by placing re-used entries at the head of the list. This
      means that if someone was still iterating that entry when it got
      moved, he will now re-visit the entries on the list he had
      already seen, but avoids skipping over entries like would have
      happened had we placed the new entry at the end.
      
      Furthermore, visiting entries twice is not a problem, since we
      remove our cpu from the entry's cpumask once its called.
      
      Many thanks to Oleg for his suggestions and him poking holes in
      my earlier attempts.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8969a5ed
    • Nick Piggin's avatar
      generic IPI: simplify barriers and locking · 15d0d3b3
      Nick Piggin authored
      Simplify the barriers in generic remote function call interrupt
      code.
      
      Firstly, just unconditionally take the lock and check the list
      in the generic_call_function_single_interrupt IPI handler. As
      we've just taken an IPI here, the chances are fairly high that
      there will be work on the list for us, so do the locking
      unconditionally. This removes the tricky lockless list_empty
      check and dubious barriers. The change looks bigger than it is
      because it is just removing an outer loop.
      
      Secondly, clarify architecture specific IPI locking rules.
      Generic code has no tools to impose any sane ordering on IPIs if
      they go outside normal cache coherency, ergo the arch code must
      make them appear to obey cache coherency as a "memory operation"
      to initiate an IPI, and a "memory operation" to receive one.
      This way at least they can be reasoned about in generic code,
      and smp_mb used to provide ordering.
      
      The combination of these two changes means that explict barriers
      can be taken out of queue handling for the single case -- shared
      data is explicitly locked, and ipi ordering must conform to
      that, so no barriers needed. An extra barrier is needed in the
      many handler, so as to ensure we load the list element after the
      IPI is received.
      
      Does any architecture actually *need* these barriers? For the
      initiator I could see it, but for the handler I would be
      surprised. So the other thing we could do for simplicity is just
      to require that, rather than just matching with cache coherency,
      we just require a full barrier before generating an IPI, and
      after receiving an IPI. In which case, the smp_mb()s can go
      away. But just for now, we'll be on the safe side and use the
      barriers (they're in the slow case anyway).
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linux-arch@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      15d0d3b3
  2. 24 Feb, 2009 4 commits
  3. 23 Feb, 2009 14 commits
  4. 22 Feb, 2009 19 commits