1. 12 Jun, 2009 40 commits
    • Rusty Russell's avatar
      lguest: avoid sending interrupts to Guest when no activity occurs. · 95c517c0
      Rusty Russell authored
      If we track how many buffers we've used, we can tell whether we really
      need to interrupt the Guest.  This happens as a side effect of
      spurious notifications.
      
      Spurious notifications happen because it can take a while before the
      Host thread wakes up and sets the VRING_USED_F_NO_NOTIFY flag, and
      meanwhile the Guest can more notifications.
      
      A real fix would be to use wake counts, rather than a suppression
      flag, but the practical difference is generally in the noise: the
      interrupt is usually coalesced into a pending one anyway so we just
      save a system call which isn't clearly measurable.
      
      				Secs	Spurious IRQS
      1G TCP Guest->Host:		3.93	58
      1M normal pings:		100	72
      1M 1k pings (-l 120):		57	492904
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      95c517c0
    • Rusty Russell's avatar
      lguest: implement deferred interrupts in example Launcher · 38bc2b8c
      Rusty Russell authored
      Rather than sending an interrupt on every buffer, we only send an interrupt
      when we're about to wait for the Guest to send us a new one.  The console
      input and network input still send interrupts manually, but the block device,
      network and console output queues can simply rely on this logic to send
      interrupts to the Guest at the right time.
      
      The patch is cluttered by moving trigger_irq() higher in the code.
      
      In practice, two factors make this optimization less interesting:
      (1) we often only get one input at a time, even for networking,
      (2) triggering an interrupt rapidly tends to get coalesced anyway.
      
      Before:				Secs	RxIRQS	TxIRQs
       1G TCP Guest->Host:		3.72	32784	32771
       1M normal pings:		99	1000004	995541
       100,000 1k pings (-l 120):	5	49510	49058
      
      After:
       1G TCP Guest->Host:		3.69	32809	32769
       1M normal pings:		99	1000004	996196
       100,000 1k pings (-l 120):	5	52435	52361
      
      (Note the interrupt count on 100k pings goes *up*: see next patch).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      38bc2b8c
    • Rusty Russell's avatar
      lguest: remove obsolete LHREQ_BREAK call · 5dac051b
      Rusty Russell authored
      We no longer need an efficient mechanism to force the Guest back into
      host userspace, as each device is serviced without bothering the main
      Guest process (aka. the Launcher).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      5dac051b
    • Rusty Russell's avatar
      lguest: have example Launcher service all devices in separate threads · 659a0e66
      Rusty Russell authored
      Currently lguest has three threads: the main Launcher thread, a Waker
      thread, and a thread for the block device (because synchronous block
      was simply too painful to bear).
      
      The Waker selects() on all the input file descriptors (eg. stdin, net
      devices, pipe to the block thread) and when one becomes readable it calls
      into the kernel to kick the Launcher thread out into userspace, which
      repeats the poll, services the device(s), and then tells the kernel to
      release the Waker before re-entering the kernel to run the Guest.
      
      Also, to make a slightly-decent network transmit routine, the Launcher
      would suppress further network interrupts while it set a timer: that
      signal handler would write to a pipe, which would rouse the Waker
      which would prod the Launcher out of the kernel to check the network
      device again.
      
      Now we can convert all our virtqueues to separate threads: each one has
      a separate eventfd for when the Guest pokes the device, and can trigger
      interrupts in the Guest directly.
      
      The linecount shows how much this simplifies, but to really bring it
      home, here's an strace analysis of single Guest->Host ping before:
      
      * Guest sends packet, notifies xmit vq, return control to Launcher
      * Launcher clears notification flag on xmit ring
      * Launcher writes packet to TUN device
      	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
      * Launcher sets up interrupt for Guest (xmit ring is empty)
      	write(10, "\2\0\0\0\3\0\0\0", 8) = 0
      * Launcher sets up timer for interrupt mitigation
      	setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0
      * Launcher re-runs guest
      	pread64(10, 0xbfa5f4d4, 4, 0) ...
      * Waker notices reply packet in tun device (it was in select)
      	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4])
      * Waker kicks Launcher out of guest:
      	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
      * Launcher returns from running guest:
      	... = -1 EAGAIN (Resource temporarily unavailable)
      * Launcher looks at input fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
      * Launcher reads pong from tun device:
      	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108
      * Launcher injects guest notification:
      	write(10, "\2\0\0\0\2\0\0\0", 8) = 0
      * Launcher rechecks fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
      * Launcher clears Waker:
      	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
      * Launcher reruns Guest:
      	pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted)
      * Signal comes in, uses pipe to wake up Launcher:
      	--- SIGALRM (Alarm clock) @ 0 (0) ---
      	write(8, "\0", 1)       = 1
      	sigreturn()             = ? (mask now [])
      * Waker sees write on pipe:
      	select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6])
      * Waker kicks Launcher out of Guest:
      	pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
      * Launcher exits from kernel:
      	pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable)
      * Launcher looks to see what fd woke it:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
      * Launcher reads timeout fd, sets notification flag on xmit ring
      	read(6, "\0", 32)       = 1
      * Launcher rechecks fds:
      	select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
      * Launcher clears Waker:
      	pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
      * Launcher resumes Guest:
      	pread64(10, "\0p\0\4", 4, 0) ....
      
      strace analysis of single Guest->Host ping after:
      
      * Guest sends packet, notifies xmit vq, creates event on eventfd.
      * Network xmit thread wakes from read on eventfd:
      	read(7, "\1\0\0\0\0\0\0\0", 8)          = 8
      * Network xmit thread writes packet to TUN device
      	writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
      * Network recv thread wakes up from read on tunfd:
      	readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108
      * Network recv thread sets up interrupt for the Guest
      	write(6, "\2\0\0\0\2\0\0\0", 8) = 0
      * Network recv thread goes back to reading tunfd
      	13:39:42.460285 readv(4,  <unfinished ...>
      * Network xmit thread sets up interrupt for Guest (xmit ring is empty)
      	write(6, "\2\0\0\0\3\0\0\0", 8) = 0
      * Network xmit thread goes back to reading from eventfd
      	read(7, <unfinished ...>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      659a0e66
    • Rusty Russell's avatar
      lguest: use eventfds for device notification · df60aeef
      Rusty Russell authored
      Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
      an address: the main Launcher process returns with this address, and figures
      out what device to run.
      
      A far nicer model is to let processes bind an eventfd to an address: if we
      find one, we simply signal the eventfd.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      df60aeef
    • Rusty Russell's avatar
      eventfd: export eventfd_signal and eventfd_fget for lguest · 5718607b
      Rusty Russell authored
      lguest wants to attach eventfds to guest notifications, and lguest is
      usually a module.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      To: Davide Libenzi <davidel@xmailserver.org>
      5718607b
    • Rusty Russell's avatar
      lguest: allow any process to send interrupts · 9f155a9b
      Rusty Russell authored
      We currently only allow the Launcher process to send interrupts, but it
      as we already send interrupts from the hrtimer, it's a simple matter of
      extracting that code into a common set_interrupt routine.
      
      As we switch to a thread per virtqueue, this avoids a bottleneck through the
      main Launcher process.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      9f155a9b
    • Rusty Russell's avatar
      lguest: PAE fixes · 92b4d8df
      Rusty Russell authored
      1) j wasn't initialized in setup_pagetables, so they weren't set up for me
         causing immediate guest crashes.
      
      2) gpte_addr should not re-read the pmd from the Guest.  Especially
         not BUG_ON() based on the value.  If we ever supported SMP guests,
         they could trigger that.  And the Launcher could also trigger it
         (tho currently root-only).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      92b4d8df
    • Matias Zabaljauregui's avatar
      lguest: PAE support · acdd0b62
      Matias Zabaljauregui authored
      This version requires that host and guest have the same PAE status.
      NX cap is not offered to the guest, yet.
      Signed-off-by: default avatarMatias Zabaljauregui <zabaljauregui@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      acdd0b62
    • Matias Zabaljauregui's avatar
      lguest: Add support for kvm_hypercall4() · cefcad17
      Matias Zabaljauregui authored
      Add support for kvm_hypercall4(); PAE wants it.
      
      Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      cefcad17
    • Matias Zabaljauregui's avatar
      lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD · ebe0ba84
      Matias Zabaljauregui authored
      replace LHCALL_SET_PMD with LHCALL_SET_PGD hypercall name
      (That's really what it is, and the confusion gets worse with PAE support)
      Signed-off-by: default avatarMatias Zabaljauregui <zabaljauregui@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Reported-by: default avatarJeremy Fitzhardinge <jeremy@goop.org>
      ebe0ba84
    • Matias Zabaljauregui's avatar
      lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated · 90603d15
      Matias Zabaljauregui authored
      Some cleanups and replace direct assignment with native_set_* macros which properly handle 64-bit entries when PAE is activated
      Signed-off-by: default avatarMatias Zabaljauregui <zabaljauregui@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      90603d15
    • Matias Zabaljauregui's avatar
      lguest: map switcher with executable page table entries · ed1dc778
      Matias Zabaljauregui authored
      Map switcher with executable page table entries.
      (This bug didn't matter before PAE and hence NX support -- RR)
      Signed-off-by: default avatarMatias Zabaljauregui <zabaljauregui@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      ed1dc778
    • Rusty Russell's avatar
      lguest: fix writev returning short on console output · 7b5c806c
      Rusty Russell authored
      I've never seen it here, but I can't find anywhere that says writev
      will write everything.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      7b5c806c
    • Rusty Russell's avatar
      lguest: clean up length-used value in example launcher · e606490c
      Rusty Russell authored
      The "len" field in the used ring for virtio indicates the number of
      bytes *written* to the buffer.  This means the guest doesn't have to
      zero the buffers in advance as it always knows the used length.
      
      Erroneously, the console and network example code puts the length
      *read* into that field.  The guest ignores it, but it's wrong.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      e606490c
    • Matias Zabaljauregui's avatar
      lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition. · f086122b
      Matias Zabaljauregui authored
      If GDT_ENTRIES were every > 256, this could become a problem.
      
      Signed-off-by: Matias Zabaljauregui <zabaljauregui at gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      f086122b
    • Roel Kluin's avatar
      lguest: beyond ARRAY_SIZE of cpu->arch.gdt · 81b79b01
      Roel Kluin authored
      Do not go beyond ARRAY_SIZE of cpu->arch.gdt
      Signed-off-by: default avatarRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      81b79b01
    • Rusty Russell's avatar
      lguest: clean up example launcher compile flags. · 2644f17d
      Rusty Russell authored
      18 months ago 5bbf89fc changed to loading
      bzImages directly, and no longer manually ungzipping them, so we no longer
      need libz.
      
      Also, -m32 is useful for those on 64-bit platforms (and harmless on
      32-bit).
      Reported-by: default avatarRon Minnich <rminnich@gmail.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      2644f17d
    • Rusty Russell's avatar
      lguest: optimize by coding restore_flags and irq_enable in assembler. · 61f4bc83
      Rusty Russell authored
      The downside of the last patch which made restore_flags and irq_enable
      check interrupts is that they are now too big to be patched directly
      into the callsites, so the C versions are always used.
      
      But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
      the registers.  In fact, we don't need any registers in the fast path,
      so we can do better than this if we actually code them in assembler.
      
      The results are in the noise, but since it's about the same amount of
      code, it's worth applying.
      
      1GB Guest->Host: input(suppressed),output(suppressed)
      Before:
      	Seconds: 0:16.53
      	Packets: 377268,753673
      	Interrupts: 22461,24297
      	Notifications: 1(5245),21303(732370)
      	Net IRQs triggered: 377023(245),42578(711095)
      
      After:
      	Seconds: 0:16.48
      	Packets: 377289,753673
      	Interrupts: 22281,24465
      	Notifications: 1(5245),21296(732377)
      	Net IRQs triggered: 377060(229),42564(711109)
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      61f4bc83
    • Rusty Russell's avatar
      lguest: improve interrupt handling, speed up stream networking · a32a8813
      Rusty Russell authored
      lguest never checked for pending interrupts when enabling interrupts, and
      things still worked.  However, it makes a significant difference to TCP
      performance, so it's time we fixed it by introducing a pending_irq flag
      and checking it on irq_restore and irq_enable.
      
      These two routines are now too big to patch into the 8/10 bytes
      patch space, so we drop that code.
      
      Note: The high latency on interrupt delivery had a very curious
      effect: once everything else was optimized, networking without GSO was
      faster than networking with GSO, since more interrupts were sent and
      hence a greater chance of one getting through to the Guest!
      
      Note2: (Almost) Closing the same loophole for iret doesn't have any
      measurable effect, so I'm leaving that patch for the moment.
      
      Before:
      	1GB tcpblast Guest->Host:		30.7 seconds
      	1GB tcpblast Guest->Host (no GSO):	76.0 seconds
      
      After:
      	1GB tcpblast Guest->Host:		6.8 seconds
      	1GB tcpblast Guest->Host (no GSO):	27.8 seconds
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      a32a8813
    • Rusty Russell's avatar
      lguest: fix race in halt code · abd41f03
      Rusty Russell authored
      When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting
      that a timer or the Waker will wake_up_process() us.
      
      But we do it in a stupid way, leaving a classic missing wakeup race.
      
      So split maybe_do_interrupt() into interrupt_pending() and
      try_deliver_interrupt(), and check maybe_do_interrupt() and the
      "break_out" flag before calling schedule.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      abd41f03
    • Rusty Russell's avatar
      lguest: remove invalid interrupt forcing logic. · ebf9a5a9
      Rusty Russell authored
      20887611 (lguest: notify on empty) introduced
      lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on
      interrupts all the time.
      
      Because we always process one buffer at a time, the inflight count is always 0
      when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from
      the Guest.
      
      It should be looking to see if there are more buffers in the Guest's queue:
      if it's empty, then we force an interrupt.
      
      This makes little difference, since we usually have an empty queue; but
      that's the subject of another patch.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      ebf9a5a9
    • Rusty Russell's avatar
      lguest: fix lguest wake on guest clock tick, or fd activity · a6c372de
      Rusty Russell authored
      The Launcher could be inside the Guest on another CPU; wake_up_process
      will do nothing because it is "running".  kick_process will knock it
      back into our kernel in this case, otherwise we'll miss it until the
      next guest exit.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      a6c372de
    • Rusty Russell's avatar
      sched: export kick_process · b43e3521
      Rusty Russell authored
      lguest needs kick_process: wake_up_process() does nothing if a process
      is running, which isn't sufficient (we need it in the kernel).
      
      And lguest support is usually modular.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      b43e3521
    • Rusty Russell's avatar
      lguest: get more serious about wmb() in example Launcher code · f7027c63
      Rusty Russell authored
      Since the Launcher process runs the Guest, it doesn't have to be very
      serious about its barriers: the Guest isn't running while we are (Guest
      is UP).
      
      Before we change to use threads to service devices, we need to fix this.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      f7027c63
    • Rusty Russell's avatar
      lguest: clean up lguest_init_IRQ · 1028375e
      Rusty Russell authored
      Copy from arch/x86/kernel/irqinit_32.c: we don't use the vectors beyond
      LGUEST_IRQS (if any), but we might as well set them all.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      1028375e
    • Rusty Russell's avatar
      lguest: cleanup passing of /dev/lguest fd around example launcher. · 56739c80
      Rusty Russell authored
      We hand the /dev/lguest fd everywhere; it's far neater to just make it
      a global (it already is, in fact, hidden in the waker_fds struct).
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      56739c80
    • Rusty Russell's avatar
      lguest: be paranoid about guest playing with device descriptors. · 713b15b3
      Rusty Russell authored
      We can't trust the values in the device descriptor table once the
      guest has booted, so keep local copies.  They could set them to
      strange values then cause us to segv (they're 8 bit values, so they
      can't make our pointers go too wild).
      
      This becomes more important with the following patches which read them.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      713b15b3
    • Randy Dunlap's avatar
      block: fix kernel-doc in recent block/ changes · 8ebf9756
      Randy Dunlap authored
      Fix kernel-doc warnings in recently changed block/ source code.
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8ebf9756
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 4b4f1d01
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (87 commits)
        nilfs2: get rid of bd_mount_sem use from nilfs
        nilfs2: correct exclusion control in nilfs_remount function
        nilfs2: simplify remaining sget() use
        nilfs2: get rid of sget use for checking if current mount is present
        nilfs2: get rid of sget use for acquiring nilfs object
        nilfs2: remove meaningless EBUSY case from nilfs_get_sb function
        remove the call to ->write_super in __sync_filesystem
        nilfs2: call nilfs2_write_super from nilfs2_sync_fs
        jffs2: call jffs2_write_super from jffs2_sync_fs
        ufs: add ->sync_fs
        sysv: add ->sync_fs
        hfsplus: add ->sync_fs
        hfs: add ->sync_fs
        fat: add ->sync_fs
        ext2: add ->sync_fs
        exofs: add ->sync_fs
        bfs: add ->sync_fs
        affs: add ->sync_fs
        sanitize ->fsync() for affs
        repair bfs_write_inode(), switch bfs to simple_fsync()
        ...
      4b4f1d01
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 875287ca
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68knommu: remove unecessary include of thread_info.h in entry.S
        m68knommu: enumerate INIT_THREAD fields properly
        headers_check fix: m68k, swab.h
        arch/m68knommu: Convert #ifdef DEBUG printk(KERN_DEBUG to pr_debug(
        m68knommu: remove obsolete reset code
        m68knommu: move CPU reset code for the 5272 ColdFire into its platform code
        m68knommu: move CPU reset code for the 528x ColdFire into its platform code
        m68knommu: move CPU reset code for the 527x ColdFire into its platform code
        m68knommu: move CPU reset code for the 523x ColdFire into its platform code
        m68knommu: move CPU reset code for the 520x ColdFire into its platform code
        m68knommu: add CPU reset code for the 532x ColdFire
        m68knommu: add CPU reset code for the 5249 ColdFire
        m68knommu: add CPU reset code for the 5206e ColdFire
        m68knommu: add CPU reset code for the 5206 ColdFire
        m68knommu: add CPU reset code for the 5407 ColdFire
        m68knommu: add CPU reset code for the 5307 ColdFire
        m68knommu: merge system reset for code ColdFire 523x family
        m68knommu: fix system reset for ColdFire 527x family
      875287ca
    • Yinghai Lu's avatar
      kvm: remove the duplicated cpumask_clear · aee74f3b
      Yinghai Lu authored
      zalloc_cpumask_var already cleared it.
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aee74f3b
    • Yinghai Lu's avatar
      x86: use zalloc_cpumask_var in arch_early_irq_init · 12274e96
      Yinghai Lu authored
      So we make sure MAXSMP gets a cleared cpumask
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      12274e96
    • Stephen Rothwell's avatar
      perfcounters: remove powerpc definitions of perf_counter_do_pending · e14112d1
      Stephen Rothwell authored
      Commit 925d519a ("perf_counter:
      unify and fix delayed counter wakeup") added global definitions.
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e14112d1
    • Ryusuke Konishi's avatar
      nilfs2: get rid of bd_mount_sem use from nilfs · aa7dfb89
      Ryusuke Konishi authored
      This will remove every bd_mount_sem use in nilfs.
      
      The intended exclusion control was replaced by the previous patch
      ("nilfs2: correct exclusion control in nilfs_remount function") for
      nilfs_remount(), and this patch will replace remains with a new mutex
      that this inserts in nilfs object.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      aa7dfb89
    • Ryusuke Konishi's avatar
      nilfs2: correct exclusion control in nilfs_remount function · e59399d0
      Ryusuke Konishi authored
      nilfs_remount() changes mount state of a superblock instance.  Even
      though nilfs accesses other superblock instances during mount or
      remount, the mount state was not properly protected in
      nilfs_remount().
      
      Moreover, nilfs_remount() has a lock order reversal problem;
      nilfs_get_sb() holds:
      
        1. bdev->bd_mount_sem
        2. sb->s_umount  (sget acquires)
      
      and nilfs_remount() holds:
      
        1. sb->s_umount  (locked by the caller in vfs)
        2. bdev->bd_mount_sem
      
      To avoid these problems, this patch divides a semaphore protecting
      super block instances from nilfs->ns_sem, and applies it to the mount
      state protection in nilfs_remount().
      
      With this change, bd_mount_sem use is removed from nilfs_remount() and
      the lock order reversal will be resolved.  And the new rw-semaphore,
      nilfs->ns_super_sem will properly protect the mount state except the
      modification from nilfs_error function.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e59399d0
    • Ryusuke Konishi's avatar
      nilfs2: simplify remaining sget() use · 6dd47406
      Ryusuke Konishi authored
      This simplifies the test function passed on the remaining sget()
      callsite in nilfs.
      
      Instead of checking mount type (i.e. ro-mount/rw-mount/snapshot mount)
      in the test function passed to sget(), this patch first looks up the
      nilfs_sb_info struct which the given mount type matches, and then
      acquires the super block instance holding the nilfs_sb_info.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6dd47406
    • Ryusuke Konishi's avatar
      nilfs2: get rid of sget use for checking if current mount is present · 3f82ff55
      Ryusuke Konishi authored
      This stops using sget() for checking if an r/w-mount or an r/o-mount
      exists on the device.  This elimination uses a back pointer to the
      current mount added to nilfs object.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3f82ff55
    • Ryusuke Konishi's avatar
      nilfs2: get rid of sget use for acquiring nilfs object · 33c8e57c
      Ryusuke Konishi authored
      This will change the way to obtain nilfs object in nilfs_get_sb()
      function.
      
      Previously, a preliminary sget() call was performed, and the nilfs
      object was acquired from a super block instance found by the sget()
      call.
      
      This patch, instead, instroduces a new dedicated function
      find_or_create_nilfs(); as the name implies, the function finds an
      existent nilfs object from a global list or creates a new one if no
      object is found on the device.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      33c8e57c
    • Ryusuke Konishi's avatar
      nilfs2: remove meaningless EBUSY case from nilfs_get_sb function · 81fc20bd
      Ryusuke Konishi authored
      The following EBUSY case in nilfs_get_sb() is meaningless.  Indeed,
      this error code is never returned to the caller.
      
          if (!s->s_root) {
                ...
          } else if (!(s->s_flags & MS_RDONLY)) {
              err = -EBUSY;
          }
      
      This simply removes the else case.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      81fc20bd