1. 09 May, 2013 1 commit
  2. 08 May, 2013 37 commits
    • Asias He's avatar
      KVM: Fix kvm_irqfd_init initialization · 7dac16c3
      Asias He authored
      In commit a0f155e9 'KVM: Initialize irqfd from kvm_init()', when
      kvm_init() is called the second time (e.g kvm-amd.ko and kvm-intel.ko),
      kvm_arch_init() will fail with -EEXIST, then kvm_irqfd_exit() will be
      called on the error handling path. This way, the kvm_irqfd system will
      not be ready.
      
      This patch fix the following:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in: vhost_net
      CPU 6
      Pid: 4257, comm: qemu-system-x86 Not tainted 3.9.0-rc3+ #757 Dell Inc. OptiPlex 790/0V5HMK
      RIP: 0010:[<ffffffff81c0721e>]  [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
      RSP: 0018:ffff880221721cc8  EFLAGS: 00010046
      RAX: 0000000000000100 RBX: ffff88022dcc003f RCX: ffff880221734950
      RDX: ffff8802208f6ca8 RSI: 000000007fffffff RDI: 0000000000000000
      RBP: ffff880221721cc8 R08: 0000000000000002 R09: 0000000000000002
      R10: 00007f7fd01087e0 R11: 0000000000000246 R12: ffff8802208f6ca8
      R13: 0000000000000080 R14: ffff880223e2a900 R15: 0000000000000000
      FS:  00007f7fd38488e0(0000) GS:ffff88022dcc0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 000000022309f000 CR4: 00000000000427e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process qemu-system-x86 (pid: 4257, threadinfo ffff880221720000, task ffff880222bd5640)
      Stack:
       ffff880221721d08 ffffffff810ac5c5 ffff88022431dc00 0000000000000086
       0000000000000080 ffff880223e2a900 ffff8802208f6ca8 0000000000000000
       ffff880221721d48 ffffffff810ac8fe 0000000000000000 ffff880221734000
      Call Trace:
       [<ffffffff810ac5c5>] __queue_work+0x45/0x2d0
       [<ffffffff810ac8fe>] queue_work_on+0x8e/0xa0
       [<ffffffff810ac949>] queue_work+0x19/0x20
       [<ffffffff81009b6b>] irqfd_deactivate+0x4b/0x60
       [<ffffffff8100a69d>] kvm_irqfd+0x39d/0x580
       [<ffffffff81007a27>] kvm_vm_ioctl+0x207/0x5b0
       [<ffffffff810c9545>] ? update_curr+0xf5/0x180
       [<ffffffff811b66e8>] do_vfs_ioctl+0x98/0x550
       [<ffffffff810c1f5e>] ? finish_task_switch+0x4e/0xe0
       [<ffffffff81c054aa>] ? __schedule+0x2ea/0x710
       [<ffffffff811b6bf7>] sys_ioctl+0x57/0x90
       [<ffffffff8140ae9e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
       [<ffffffff81c0f602>] system_call_fastpath+0x16/0x1b
      Code: c1 ea 08 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b c9 c3 55 48 89 e5 66 66 66 66 90 b8 00 01 00 00 <f0> 66 0f c1 07 89 c2 66 c1 ea 08 38 c2 74 0c 0f 1f 00 f3 90 0f
      RIP  [<ffffffff81c0721e>] _raw_spin_lock+0xe/0x30
      RSP <ffff880221721cc8>
      CR2: 0000000000000000
      ---[ end trace 13fb1e4b6e5ab21f ]---
      Signed-off-by: default avatarAsias He <asias@redhat.com>
      Acked-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      7dac16c3
    • Marcelo Tosatti's avatar
      KVM: x86: fix maintenance of guest/host xcr0 state · 42bdf991
      Marcelo Tosatti authored
      Emulation of xcr0 writes zero guest_xcr0_loaded variable so that
      subsequent VM-entry reloads CPU's xcr0 with guests xcr0 value.
      
      However, this is incorrect because guest_xcr0_loaded variable is
      read to decide whether to reload hosts xcr0.
      
      In case the vcpu thread is scheduled out after the guest_xcr0_loaded = 0
      assignment, and scheduler decides to preload FPU:
      
      switch_to
      {
        __switch_to
          __math_state_restore
            restore_fpu_checking
              fpu_restore_checking
                if (use_xsave())
                    fpu_xrstor_checking
      		xrstor64 with CPU's xcr0 == guests xcr0
      
      Fix by properly restoring hosts xcr0 during emulation of xcr0 writes.
      Analyzed-by: default avatarUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      Reviewed-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      42bdf991
    • Linus Torvalds's avatar
      Merge branch 'akpm' (incoming from Andrew) · 5af43c24
      Linus Torvalds authored
      Merge more incoming from Andrew Morton:
      
       - Various fixes which were stalled or which I picked up recently
      
       - A large rotorooting of the AIO code.  Allegedly to improve
         performance but I don't really have good performance numbers (I might
         have lost the email) and I can't raise Kent today.  I held this out
         of 3.9 and we could give it another cycle if it's all too late/scary.
      
      I ended up taking only the first two thirds of the AIO rotorooting.  I
      left the percpu parts and the batch completion for later.  - Linus
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (33 commits)
        aio: don't include aio.h in sched.h
        aio: kill ki_retry
        aio: kill ki_key
        aio: give shared kioctx fields their own cachelines
        aio: kill struct aio_ring_info
        aio: kill batch allocation
        aio: change reqs_active to include unreaped completions
        aio: use cancellation list lazily
        aio: use flush_dcache_page()
        aio: make aio_read_evt() more efficient, convert to hrtimers
        wait: add wait_event_hrtimeout()
        aio: refcounting cleanup
        aio: make aio_put_req() lockless
        aio: do fget() after aio_get_req()
        aio: dprintk() -> pr_debug()
        aio: move private stuff out of aio.h
        aio: add kiocb_cancel()
        aio: kill return value of aio_complete()
        char: add aio_{read,write} to /dev/{null,zero}
        aio: remove retry-based AIO
        ...
      5af43c24
    • Kent Overstreet's avatar
      aio: don't include aio.h in sched.h · a27bb332
      Kent Overstreet authored
      Faster kernel compiles by way of fewer unnecessary includes.
      
      [akpm@linux-foundation.org: fix fallout]
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a27bb332
    • Kent Overstreet's avatar
      aio: kill ki_retry · 41ef4eb8
      Kent Overstreet authored
      Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
      need this anymore - ki_retry was only called right after the kiocb was
      initialized.
      
      This also refactors and trims some duplicated code, as well as cleaning up
      the refcounting/error handling a bit.
      
      [akpm@linux-foundation.org: use fmode_t in aio_run_iocb()]
      [akpm@linux-foundation.org: fix file_start_write/file_end_write tests]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41ef4eb8
    • Kent Overstreet's avatar
      aio: kill ki_key · 8a660890
      Kent Overstreet authored
      ki_key wasn't actually used for anything previously - it was always 0.
      Drop it to trim struct kiocb a bit.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a660890
    • Kent Overstreet's avatar
      aio: give shared kioctx fields their own cachelines · 4e23bcae
      Kent Overstreet authored
      [akpm@linux-foundation.org: make reqs_active __cacheline_aligned_in_smp]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e23bcae
    • Kent Overstreet's avatar
      aio: kill struct aio_ring_info · 58c85dc2
      Kent Overstreet authored
      struct aio_ring_info was kind of odd, the only place it's used is where
      it's embedded in struct kioctx - there's no real need for it.
      
      The next patch rearranges struct kioctx and puts various things on their
      own cachelines - getting rid of struct aio_ring_info now makes that
      reordering a bit clearer.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58c85dc2
    • Kent Overstreet's avatar
      aio: kill batch allocation · a1c8eae7
      Kent Overstreet authored
      Previously, allocating a kiocb required touching quite a few global
      (well, per kioctx) cachelines...  so batching up allocation to amortize
      those was worthwhile.  But we've gotten rid of some of those, and in
      another couple of patches kiocb allocation won't require writing to any
      shared cachelines, so that means we can just rip this code out.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a1c8eae7
    • Kent Overstreet's avatar
      aio: change reqs_active to include unreaped completions · 3e845ce0
      Kent Overstreet authored
      The aio code tries really hard to avoid having to deal with the
      completion ringbuffer overflowing.  To do that, it has to keep track of
      the number of outstanding kiocbs, and the number of completions
      currently in the ringbuffer - and it's got to check that every time we
      allocate a kiocb.  Ouch.
      
      But - we can improve this quite a bit if we just change reqs_active to
      mean "number of outstanding requests and unreaped completions" - that
      means kiocb allocation doesn't have to look at the ringbuffer, which is
      a fairly significant win.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3e845ce0
    • Kent Overstreet's avatar
      aio: use cancellation list lazily · 0460fef2
      Kent Overstreet authored
      Cancelling kiocbs requires adding them to a per kioctx linked list,
      which is one of the few things we need to take the kioctx lock for in
      the fast path.  But most kiocbs can't be cancelled - so if we just do
      this lazily, we can avoid quite a bit of locking overhead.
      
      While we're at it, instead of using a flag bit switch to using ki_cancel
      itself to indicate that a kiocb has been cancelled/completed.  This lets
      us get rid of ki_flags entirely.
      
      [akpm@linux-foundation.org: remove buggy BUG()]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0460fef2
    • Kent Overstreet's avatar
      aio: use flush_dcache_page() · 21b40200
      Kent Overstreet authored
      This wasn't causing problems before because it's not needed on x86, but
      it is needed on other architectures.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      21b40200
    • Kent Overstreet's avatar
      aio: make aio_read_evt() more efficient, convert to hrtimers · a31ad380
      Kent Overstreet authored
      Previously, aio_read_event() pulled a single completion off the
      ringbuffer at a time, locking and unlocking each time.  Change it to
      pull off as many events as it can at a time, and copy them directly to
      userspace.
      
      This also fixes a bug where if copying the event to userspace failed,
      we'd lose the event.
      
      Also convert it to wait_event_interruptible_hrtimeout(), which
      simplifies it quite a bit.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a31ad380
    • Kent Overstreet's avatar
      wait: add wait_event_hrtimeout() · 774a08b3
      Kent Overstreet authored
      Analagous to wait_event_timeout() and friends, this adds
      wait_event_hrtimeout() and wait_event_interruptible_hrtimeout().
      
      Note that unlike the versions that use regular timers, these don't
      return the amount of time remaining when they return - instead, they
      return 0 or -ETIME if they timed out.  because I was uncomfortable with
      the semantics of doing it the other way (that I could get it right,
      anyways).
      
      If the timer expires, there's no real guarantee that expire_time -
      current_time would be <= 0 - due to timer slack certainly, and I'm not
      sure I want to know the implications of the different clock bases in
      hrtimers.
      
      If the timer does expire and the code calculates that the time remaining
      is nonnegative, that could be even worse if the calling code then reuses
      that timeout.  Probably safer to just return 0 then, but I could imagine
      weird bugs or at least unintended behaviour arising from that too.
      
      I came to the conclusion that if other users end up actually needing the
      amount of time remaining, the sanest thing to do would be to create a
      version that uses absolute timeouts instead of relative.
      
      [akpm@linux-foundation.org: fix description of `timeout' arg]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      774a08b3
    • Kent Overstreet's avatar
      aio: refcounting cleanup · 36f55889
      Kent Overstreet authored
      The usage of ctx->dead was fubar - it makes no sense to explicitly check
      it all over the place, especially when we're already using RCU.
      
      Now, ctx->dead only indicates whether we've dropped the initial
      refcount. The new teardown sequence is:
      
        set ctx->dead
        hlist_del_rcu();
        synchronize_rcu();
      
      Now we know no system calls can take a new ref, and it's safe to drop
      the initial ref:
      
        put_ioctx();
      
      We also need to ensure there are no more outstanding kiocbs.  This was
      done incorrectly - it was being done in kill_ctx(), and before dropping
      the initial refcount.  At this point, other syscalls may still be
      submitting kiocbs!
      
      Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after
      kioctx->users has dropped to 0 and we know no more iocbs could be
      submitted.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      36f55889
    • Kent Overstreet's avatar
      aio: make aio_put_req() lockless · 11599eba
      Kent Overstreet authored
      Freeing a kiocb needed to touch the kioctx for three things:
      
       * Pull it off the reqs_active list
       * Decrementing reqs_active
       * Issuing a wakeup, if the kioctx was in the process of being freed.
      
      This patch moves these to aio_complete(), for a couple reasons:
      
       * aio_complete() already has to issue the wakeup, so if we drop the
         kioctx refcount before aio_complete does its wakeup we don't have to
         do it twice.
       * aio_complete currently has to take the kioctx lock, so it makes sense
         for it to pull the kiocb off the reqs_active list too.
       * A later patch is going to change reqs_active to include unreaped
         completions - this will mean allocating a kiocb doesn't have to look
         at the ringbuffer. So taking the decrement of reqs_active out of
         kiocb_free() is useful prep work for that patch.
      
      This doesn't really affect cancellation, since existing (usb) code that
      implements a cancel function still calls aio_complete() - we just have
      to make sure that aio_complete does the necessary teardown for cancelled
      kiocbs.
      
      It does affect code paths where we free kiocbs that were never
      submitted; they need to decrement reqs_active and pull the kiocb off the
      reqs_active list.  This occurs in two places: kiocb_batch_free(), which
      is going away in a later patch, and the error path in io_submit_one.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      11599eba
    • Kent Overstreet's avatar
      aio: do fget() after aio_get_req() · 1d98ebfc
      Kent Overstreet authored
      aio_get_req() will fail if we have the maximum number of requests
      outstanding, which depending on the application may not be uncommon.  So
      avoid doing an unnecessary fget().
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d98ebfc
    • Kent Overstreet's avatar
      aio: dprintk() -> pr_debug() · caf4167a
      Kent Overstreet authored
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      caf4167a
    • Kent Overstreet's avatar
      aio: move private stuff out of aio.h · 4e179bca
      Kent Overstreet authored
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e179bca
    • Kent Overstreet's avatar
      aio: add kiocb_cancel() · 906b973c
      Kent Overstreet authored
      Minor refactoring, to get rid of some duplicated code
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      906b973c
    • Kent Overstreet's avatar
      aio: kill return value of aio_complete() · 2d68449e
      Kent Overstreet authored
      Nothing used the return value, and it probably wasn't possible to use it
      safely for the locked versions (aio_complete(), aio_put_req()).  Just
      kill it.
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Acked-by: default avatarZach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d68449e
    • Zach Brown's avatar
      char: add aio_{read,write} to /dev/{null,zero} · 162934de
      Zach Brown authored
      These are handy for measuring the cost of the aio infrastructure with
      operations that do very little and complete immediately.
      Signed-off-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      162934de
    • Zach Brown's avatar
      aio: remove retry-based AIO · 41003a7b
      Zach Brown authored
      This removes the retry-based AIO infrastructure now that nothing in tree
      is using it.
      
      We want to remove retry-based AIO because it is fundemantally unsafe.
      It retries IO submission from a kernel thread that has only assumed the
      mm of the submitting task.  All other task_struct references in the IO
      submission path will see the kernel thread, not the submitting task.
      This design flaw means that nothing of any meaningful complexity can use
      retry-based AIO.
      
      This removes all the code and data associated with the retry machinery.
      The most significant benefit of this is the removal of the locking
      around the unused run list in the submission path.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Signed-off-by: default avatarZach Brown <zab@redhat.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      41003a7b
    • Zach Brown's avatar
      gadget: remove only user of aio retry · a80bf61e
      Zach Brown authored
      This removes the only in-tree user of aio retry.  This will let us
      remove the retry code from the aio core.
      
      Removing retry is relatively easy as the USB gadget wasn't using it to
      retry IOs at all.  It always fully submitted the IO in the context of
      the initial io_submit() call.  It only used the AIO retry facility to
      get the submitter's mm context for copying the result of a read back to
      user space.  This is easy to implement with use_mm() and a work struct,
      much like kvm does with async_pf_execute() for get_user_pages().
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a80bf61e
    • Zach Brown's avatar
      aio: remove dead code from aio.h · 4b49bb8a
      Zach Brown authored
      Signed-off-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b49bb8a
    • Zach Brown's avatar
      mm: remove old aio use_mm() comment · 697f4d68
      Zach Brown authored
      Bunch of performance improvements and cleanups Zach Brown and I have
      been working on.  The code should be pretty solid at this point, though
      it could of course use more review and testing.
      
      The results in my testing are pretty impressive, particularly when an
      ioctx is being shared between multiple threads.  In my crappy synthetic
      benchmark, with 4 threads submitting and one thread reaping completions,
      I saw overhead in the aio code go from ~50% (mostly ioctx lock
      contention) to low single digits.  Performance with ioctx per thread
      improved too, but I'd have to rerun those benchmarks.
      
      The reason I've been focused on performance when the ioctx is shared is
      that for a fair number of real world completions, userspace needs the
      completions aggregated somehow - in practice people just end up
      implementing this aggregation in userspace today, but if it's done right
      we can do it much more efficiently in the kernel.
      
      Performance wise, the end result of this patch series is that submitting
      a kiocb writes to _no_ shared cachelines - the penalty for sharing an
      ioctx is gone there.  There's still going to be some cacheline
      contention when we deliver the completions to the aio ringbuffer (at
      least if you have interrupts being delivered on multiple cores, which
      for high end stuff you do) but I have a couple more patches not in this
      series that implement coalescing for that (by taking advantage of
      interrupt coalescing).  With that, there's basically no bottlenecks or
      performance issues to speak of in the aio code.
      
      This patch:
      
      use_mm() is used in more places than just aio.  There's no need to mention
      callers when describing the function.
      Signed-off-by: default avatarZach Brown <zab@redhat.com>
      Signed-off-by: default avatarKent Overstreet <koverstreet@google.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      697f4d68
    • Andrew Morton's avatar
      mm/vmalloc.c: add vfree comment · c9fcee51
      Andrew Morton authored
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9fcee51
    • Akinobu Mita's avatar
      remove unused random32() and srandom32() · 22ea9c07
      Akinobu Mita authored
      After finishing a naming transition, remove unused backward
      compatibility wrapper macros
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      22ea9c07
    • Andrew Morton's avatar
      drivers/infiniband/hw: rename random32() to prandom_u32() · 50bea5c0
      Andrew Morton authored
      Use preferable function name which implies using a pseudo-random number
      generator.
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50bea5c0
    • Akinobu Mita's avatar
      drivers/net: rename random32() to prandom_u32() · e00adf39
      Akinobu Mita authored
      Use preferable function name which implies using a pseudo-random number
      generator.
      
      [akpm@linux-foundation.org: convert team_mode_random.c]
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: default avatarThomas Sailer <t.sailer@alumni.ethz.ch>
      Acked-by: Bing Zhao <bzhao@marvell.com> [mwifiex]
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Chan <mchan@broadcom.com>
      Cc: Thomas Sailer <t.sailer@alumni.ethz.ch>
      Cc: Jean-Paul Roubelat <jpr@f6fbb.org>
      Cc: Bing Zhao <bzhao@marvell.com>
      Cc: Brett Rudley <brudley@broadcom.com>
      Cc: Arend van Spriel <arend@broadcom.com>
      Cc: "Franky (Zhenhui) Lin" <frankyl@broadcom.com>
      Cc: Hante Meuleman <meuleman@broadcom.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e00adf39
    • Naoya Horiguchi's avatar
      hugetlbfs: fix mmap failure in unaligned size request · af73e4d9
      Naoya Horiguchi authored
      The current kernel returns -EINVAL unless a given mmap length is
      "almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
      given length is passed to vm_mmap_pgoff() as it is without being aligned
      with hugepage boundary.
      
      This is a regression introduced in commit 40716e29 ("hugetlbfs: fix
      alignment of huge page requests"), where alignment code is pushed into
      hugetlb_file_setup() and the variable len in caller side is not changed.
      
      To fix this, this patch partially reverts that commit, and adds
      alignment code in caller side.  And it also introduces hstate_sizelog()
      in order to get proper hstate to specified hugepage size.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881
      
      [akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: <iceman_dvd@yahoo.com>
      Cc: Steven Truelove <steven.truelove@utoronto.ca>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af73e4d9
    • Zhao Hongjiang's avatar
      parisc: remove the second argument of kmap_atomic() · 1ab4ce76
      Zhao Hongjiang authored
      kmap_atomic() requires only one argument now.
      Signed-off-by: default avatarZhao Hongjiang <zhaohongjiang@huawei.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ab4ce76
    • Lucas Stach's avatar
      drivers/rtc/rtc-rs5c372.c: add R2221T/L variant to the driver · 550fcb8f
      Lucas Stach authored
      Register layout is the same, so just add the variant to the appropriate
      places.
      Signed-off-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: default avatarJan Luebbe <jlu@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      550fcb8f
    • Andrew Morton's avatar
      include/linux/mm.h: complete the mm_walk definition · 0f157a5b
      Andrew Morton authored
      That nameless-function-arguments thing drives me batty.  Fix.
      
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f157a5b
    • David Rientjes's avatar
      mm, memcg: add rss_huge stat to memory.stat · b070e65c
      David Rientjes authored
      This exports the amount of anonymous transparent hugepages for each
      memcg via the new "rss_huge" stat in memory.stat.  The units are in
      bytes.
      
      This is helpful to determine the hugepage utilization for individual
      jobs on the system in comparison to rss and opportunities where
      MADV_HUGEPAGE may be helpful.
      
      The amount of anonymous transparent hugepages is also included in "rss"
      for backwards compatibility.
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b070e65c
    • Jiang Liu's avatar
      mm/SPARC: use common help functions to free reserved pages · 70affe45
      Jiang Liu authored
      Use common help functions to free reserved pages.
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      70affe45
    • Linus Torvalds's avatar
      arm: fix mismerge of arch/arm/mach-omap2/timer.c · 9affd6be
      Linus Torvalds authored
      I badly screwed up the merge in commit 6fa52ed3 ("Merge tag
      'drivers-for-linus' of git://git.kernel.org/pub/.../arm-soc") by
      incorrectly taking the arch/arm/mach-omap2/* data fully from the merge
      target because the 'drivers-for-linus' branch seemed to be a proper
      superset of the duplicate ARM commits.
      
      That was bogus: commit ff931c82 ("ARM: OMAP: clocks: Delay clk inits
      atleast until slab is initialized") only existed in head, and the
      changes to arch/arm/mach-omap2/timer.c from that commit got list.
      
      Re-doing the merge more carefully, I do think this part was the only
      thing I screwed up.  Knock wood.
      Reported-by: default avatarTony Lindgren <tony@atomide.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Olof Johansson <olof@lixom.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9affd6be
  3. 07 May, 2013 2 commits
    • Davidlohr Bueso's avatar
      rwsem: check counter to avoid cmpxchg calls · 9607a85b
      Davidlohr Bueso authored
      This patch tries to reduce the amount of cmpxchg calls in the writer
      failed path by checking the counter value first before issuing the
      instruction.  If ->count is not set to RWSEM_WAITING_BIAS then there is
      no point wasting a cmpxchg call.
      
      Furthermore, Michel states "I suppose it helps due to the case where
      someone else steals the lock while we're trying to acquire
      sem->wait_lock."
      
      Two very different workloads and machines were used to see how this
      patch improves throughput: pgbench on a quad-core laptop and aim7 on a
      large 8 socket box with 80 cores.
      
      Some results comparing Michel's fast-path write lock stealing
      (tps-rwsem) on a quad-core laptop running pgbench:
      
        | db_size | clients  |  tps-rwsem     |   tps-patch  |
        +---------+----------+----------------+--------------+
        | 160 MB   |       1 |           6906 |         9153 | + 32.5
        | 160 MB   |       2 |          15931 |        22487 | + 41.1%
        | 160 MB   |       4 |          33021 |        32503 |
        | 160 MB   |       8 |          34626 |        34695 |
        | 160 MB   |      16 |          33098 |        34003 |
        | 160 MB   |      20 |          31343 |        31440 |
        | 160 MB   |      30 |          28961 |        28987 |
        | 160 MB   |      40 |          26902 |        26970 |
        | 160 MB   |      50 |          25760 |        25810 |
        ------------------------------------------------------
        | 1.6 GB   |       1 |           7729 |         7537 |
        | 1.6 GB   |       2 |          19009 |        23508 | + 23.7%
        | 1.6 GB   |       4 |          33185 |        32666 |
        | 1.6 GB   |       8 |          34550 |        34318 |
        | 1.6 GB   |      16 |          33079 |        32689 |
        | 1.6 GB   |      20 |          31494 |        31702 |
        | 1.6 GB   |      30 |          28535 |        28755 |
        | 1.6 GB   |      40 |          27054 |        27017 |
        | 1.6 GB   |      50 |          25591 |        25560 |
        ------------------------------------------------------
        | 7.6 GB   |       1 |           6224 |         7469 | + 20.0%
        | 7.6 GB   |       2 |          13611 |        12778 |
        | 7.6 GB   |       4 |          33108 |        32927 |
        | 7.6 GB   |       8 |          34712 |        34878 |
        | 7.6 GB   |      16 |          32895 |        33003 |
        | 7.6 GB   |      20 |          31689 |        31974 |
        | 7.6 GB   |      30 |          29003 |        28806 |
        | 7.6 GB   |      40 |          26683 |        26976 |
        | 7.6 GB   |      50 |          25925 |        25652 |
        ------------------------------------------------------
      
      For the aim7 worloads, they overall improved on top of Michel's
      patchset.  For full graphs on how the rwsem series plus this patch
      behaves on a large 8 socket machine against a vanilla kernel:
      
        http://stgolabs.net/rwsem-aim7-results.tar.gzSigned-off-by: default avatarDavidlohr Bueso <davidlohr.bueso@hp.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9607a85b
    • Anatol Pomozov's avatar
      kref: minor cleanup · 2d864e41
      Anatol Pomozov authored
       - make warning smp-safe
       - result of atomic _unless_zero functions should be checked by caller
         to avoid use-after-free error
       - trivial whitespace fix.
      
      Link: https://lkml.org/lkml/2013/4/12/391
      
      Tested: compile x86, boot machine and run xfstests
      Signed-off-by: default avatarAnatol Pomozov <anatol.pomozov@gmail.com>
      [ Removed line-break, changed to use WARN_ON_ONCE()  - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2d864e41