1. 05 Oct, 2014 40 commits
    • Axel Lin's avatar
      spi: dw: Don't use devm_kzalloc in master->setup callback · 2ac3e493
      Axel Lin authored
      commit a97c883a upstream.
      
      device_add() expects that any memory allocated via devm_* API is only
      done in the device's probe function.
      
      Fix below boot warning:
      WARNING: CPU: 1 PID: 1 at drivers/base/dd.c:286 driver_probe_device+0x2b4/0x2f4()
      Modules linked in:
      CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.16.0-10474-g835c90b-dirty #160
      [<c0016364>] (unwind_backtrace) from [<c001251c>] (show_stack+0x20/0x24)
      [<c001251c>] (show_stack) from [<c04eaefc>] (dump_stack+0x7c/0x98)
      [<c04eaefc>] (dump_stack) from [<c0023d4c>] (warn_slowpath_common+0x78/0x9c)
      [<c0023d4c>] (warn_slowpath_common) from [<c0023d9c>] (warn_slowpath_null+0x2c/0x34)
      [<c0023d9c>] (warn_slowpath_null) from [<c0302c60>] (driver_probe_device+0x2b4/0x2f4)
      [<c0302c60>] (driver_probe_device) from [<c0302d90>] (__device_attach+0x50/0x54)
      [<c0302d90>] (__device_attach) from [<c0300e60>] (bus_for_each_drv+0x54/0x9c)
      [<c0300e60>] (bus_for_each_drv) from [<c0302958>] (device_attach+0x84/0x90)
      [<c0302958>] (device_attach) from [<c0301f10>] (bus_probe_device+0x94/0xb8)
      [<c0301f10>] (bus_probe_device) from [<c03000c0>] (device_add+0x434/0x4fc)
      [<c03000c0>] (device_add) from [<c0342dd4>] (spi_add_device+0x98/0x164)
      [<c0342dd4>] (spi_add_device) from [<c03444a4>] (spi_register_master+0x598/0x768)
      [<c03444a4>] (spi_register_master) from [<c03446b4>] (devm_spi_register_master+0x40/0x80)
      [<c03446b4>] (devm_spi_register_master) from [<c0346214>] (dw_spi_add_host+0x1a8/0x258)
      [<c0346214>] (dw_spi_add_host) from [<c0346920>] (dw_spi_mmio_probe+0x1d4/0x294)
      [<c0346920>] (dw_spi_mmio_probe) from [<c0304560>] (platform_drv_probe+0x3c/0x6c)
      [<c0304560>] (platform_drv_probe) from [<c0302a98>] (driver_probe_device+0xec/0x2f4)
      [<c0302a98>] (driver_probe_device) from [<c0302d3c>] (__driver_attach+0x9c/0xa0)
      [<c0302d3c>] (__driver_attach) from [<c0300f0c>] (bus_for_each_dev+0x64/0x98)
      [<c0300f0c>] (bus_for_each_dev) from [<c0302518>] (driver_attach+0x2c/0x30)
      [<c0302518>] (driver_attach) from [<c0302134>] (bus_add_driver+0xdc/0x1f4)
      [<c0302134>] (bus_add_driver) from [<c03035c8>] (driver_register+0x88/0x104)
      [<c03035c8>] (driver_register) from [<c030445c>] (__platform_driver_register+0x58/0x6c)
      [<c030445c>] (__platform_driver_register) from [<c0700f00>] (dw_spi_mmio_driver_init+0x18/0x20)
      [<c0700f00>] (dw_spi_mmio_driver_init) from [<c0008914>] (do_one_initcall+0x90/0x1d4)
      [<c0008914>] (do_one_initcall) from [<c06d7d90>] (kernel_init_freeable+0x178/0x248)
      [<c06d7d90>] (kernel_init_freeable) from [<c04e687c>] (kernel_init+0x18/0xfc)
      [<c04e687c>] (kernel_init) from [<c000ecd8>] (ret_from_fork+0x14/0x20)
      Reported-by: default avatarThor Thayer <tthayer@opensource.altera.com>
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ac3e493
    • Axel Lin's avatar
      spi: fsl: Don't use devm_kzalloc in master->setup callback · f9078a24
      Axel Lin authored
      commit d9f26748 upstream.
      
      device_add() expects that any memory allocated via devm_* API is only
      done in the device's probe function.
      
      Fix below boot warning:
      [    3.092348] WARNING: at drivers/base/dd.c:286
      [    3.096637] Modules linked in:
      [    3.099697] CPU: 0 PID: 25 Comm: kworker/u2:1 Tainted: G W 3.16.1-s3k-drv-999-svn5771_knld-999 #158
      [ 3.109610] Workqueue: deferwq deferred_probe_work_func
      [    3.114736] task: c787f020 ti: c790c000 task.ti: c790c000
      [    3.120062] NIP: c01df158 LR: c01df144 CTR: 00000000
      [    3.124983] REGS: c790db30 TRAP: 0700   Tainted: G        W (3.16.1-s3k-drv-999-svn5771_knld-999)
      [    3.134162] MSR: 00029032 <EE,ME,IR,DR,RI>  CR: 22002082 XER: 20000000
      [    3.140703]
      [    3.140703] GPR00: 00000001 c790dbe0 c787f020 00000044 00000054 00000308 c056da0e 20737069
      [    3.140703] GPR08: 33323736 000ebfe0 00000308 000ebfdf 22002082 00000000 c046c5a0 c046c608
      [    3.140703] GPR16: c046c614 c046c620 c046c62c c046c638 c046c648 c046c654 c046c68c c046c6c4
      [    3.140703] GPR24: 00000000 00000000 00000003 c0401aa0 c0596638 c059662c c054e7a8 c7996800
      [    3.170102] NIP [c01df158] driver_probe_device+0xf8/0x334
      [    3.175431] LR [c01df144] driver_probe_device+0xe4/0x334
      [    3.180633] Call Trace:
      [    3.183093] [c790dbe0] [c01df144] driver_probe_device+0xe4/0x334 (unreliable)
      [    3.190147] [c790dc10] [c01dd15c] bus_for_each_drv+0x7c/0xc0
      [    3.195741] [c790dc40] [c01df5fc] device_attach+0xcc/0xf8
      [    3.201076] [c790dc60] [c01dd6d4] bus_probe_device+0xb4/0xc4
      [    3.206666] [c790dc80] [c01db9f8] device_add+0x270/0x564
      [    3.211923] [c790dcc0] [c0219e84] spi_add_device+0xc0/0x190
      [    3.217427] [c790dce0] [c021a79c] spi_register_master+0x720/0x834
      [    3.223455] [c790dd40] [c021cb48] of_fsl_spi_probe+0x55c/0x614
      [    3.229234] [c790dda0] [c01e0d2c] platform_drv_probe+0x30/0x74
      [    3.234987] [c790ddb0] [c01df18c] driver_probe_device+0x12c/0x334
      [    3.241008] [c790dde0] [c01dd15c] bus_for_each_drv+0x7c/0xc0
      [    3.246602] [c790de10] [c01df5fc] device_attach+0xcc/0xf8
      [    3.251937] [c790de30] [c01dd6d4] bus_probe_device+0xb4/0xc4
      [    3.257536] [c790de50] [c01de9d8] deferred_probe_work_func+0x98/0xe0
      [    3.263816] [c790de70] [c00305b8] process_one_work+0x18c/0x440
      [    3.269577] [c790dea0] [c0030a00] worker_thread+0x194/0x67c
      [    3.275105] [c790def0] [c0039198] kthread+0xd0/0xe4
      [    3.279911] [c790df40] [c000c6d0] ret_from_kernel_thread+0x5c/0x64
      [    3.285970] Instruction dump:
      [    3.288900] 80de0000 419e01d0 3b7b0038 3c60c046 7f65db78 38635264 48211b99 813f00a0
      [    3.296559] 381f00a0 7d290278 3169ffff 7c0b4910 <0f000000> 93df0044 7fe3fb78 4bfffd4d
      Reported-by: default avatarleroy christophe <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarAxel Lin <axel.lin@ingics.com>
      Tested-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9078a24
    • Matan Barak's avatar
      IB/core: When marshaling uverbs path, clear unused fields · e60309c8
      Matan Barak authored
      commit a59c5850 upstream.
      
      When marsheling a user path to the kernel struct ib_sa_path, need
      to zero smac, dmac and set the vlan id to the "no vlan" value.
      
      Fixes: dd5f03be ("IB/core: Ethernet L2 attributes in verbs/cm structures")
      Reported-by: default avatarAleksey Senin <alekseys@mellanox.com>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e60309c8
    • Moni Shoua's avatar
      IB/mlx4: Don't duplicate the default RoCE GID · a85dce19
      Moni Shoua authored
      commit f5c4834d upstream.
      
      When reading the IPv6 addresses from the net-device, make sure to
      avoid adding a duplicate entry to the GID table because of equality
      between the default GID we generate and the default IPv6 link-local
      address of the device.
      
      Fixes: acc4fccf ("IB/mlx4: Make sure GID index 0 is always occupied")
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a85dce19
    • Moni Shoua's avatar
      IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs() · 23636720
      Moni Shoua authored
      commit e381835c upstream.
      
      When Ethernet netdev is not present for a port (e.g. when the link
      layer type of the port is InfiniBand) it's possible to dereference a
      null pointer when we do netdevice scanning.
      
      To fix that, we move a section of code that needs to run only when
      netdev is present to a proper if () statement.
      
      Fixes: ad4885d2 ("IB/mlx4: Build the port IBoE GID table properly under bonding")
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarMoni Shoua <monis@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      23636720
    • Mike Marciniszyn's avatar
      IB/qib: Correct reference counting in debugfs qp_stats · f59c9cae
      Mike Marciniszyn authored
      commit 85cbb7c7 upstream.
      
      This particular reference count is not needed with the rcu protection,
      and the current code leaks a reference count, causing a hang in
      qib_qp_destroy().
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f59c9cae
    • Al Viro's avatar
      GFS2: fix d_splice_alias() misuses · 778c2935
      Al Viro authored
      commit cfb2f9d5 upstream.
      
      Callers of d_splice_alias(dentry, inode) don't need iput(), neither
      on success nor on failure.  Either the reference to inode is stored
      in a previously negative dentry, or it's dropped.  In either case
      inode reference the caller used to hold is consumed.
      
      __gfs2_lookup() does iput() in case when d_splice_alias() has failed.
      Double iput() if we ever hit that.  And gfs2_create_inode() ends up
      not only with double iput(), but with link count dropped to zero - on
      an inode it has just found in directory.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      778c2935
    • Amit Shah's avatar
      Revert "hwrng: virtio - ensure reads happen after successful probe" · 60ede3e7
      Amit Shah authored
      commit eeec6263 upstream.
      
      This reverts commit e052dbf5.
      
      Now that we use the virtio ->scan() function to register with the hwrng
      core, we will not get read requests till probe is successfully finished.
      
      So revert the workaround we had in place to refuse read requests while
      we were not yet setup completely.
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60ede3e7
    • Amit Shah's avatar
      virtio: rng: delay hwrng_register() till driver is ready · 5590b04b
      Amit Shah authored
      commit 5c062734 upstream.
      
      Instead of calling hwrng_register() in the probe routing, call it in the
      scan routine.  This ensures that when hwrng_register() is successful,
      and it requests a few random bytes to seed the kernel's pool at init,
      we're ready to service that request.
      
      This will also enable us to remove the workaround added previously to
      check whether probe was completed, and only then ask for data from the
      host.  The revert follows in the next commit.
      
      There's a slight behaviour change here on unsuccessful hwrng_register().
      Previously, when hwrng_register() failed, the probe() routine would
      fail, and the vqs would be torn down, and driver would be marked not
      initialized.  Now, the vqs will remain initialized, driver would be
      marked initialized as well, but won't be available in the list of RNGs
      available to hwrng core.  To fix the failures, the procedure remains the
      same, i.e. unload and re-load the module, and hope things succeed the
      next time around.
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      5590b04b
    • Richard Larocque's avatar
      alarmtimer: Lock k_itimer during timer callback · 6ba3934d
      Richard Larocque authored
      commit 474e941b upstream.
      
      Locks the k_itimer's it_lock member when handling the alarm timer's
      expiry callback.
      
      The regular posix timers defined in posix-timers.c have this lock held
      during timout processing because their callbacks are routed through
      posix_timer_fn().  The alarm timers follow a different path, so they
      ought to grab the lock somewhere else.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ba3934d
    • Richard Larocque's avatar
      alarmtimer: Do not signal SIGEV_NONE timers · f11d2259
      Richard Larocque authored
      commit 265b81d2 upstream.
      
      Avoids sending a signal to alarm timers created with sigev_notify set to
      SIGEV_NONE by checking for that special case in the timeout callback.
      
      The regular posix timers avoid sending signals to SIGEV_NONE timers by
      not scheduling any callbacks for them in the first place.  Although it
      would be possible to do something similar for alarm timers, it's simpler
      to handle this as a special case in the timeout.
      
      Prior to this patch, the alarm timer would ignore the sigev_notify value
      and try to deliver signals to the process anyway.  Even worse, the
      sanity check for the value of sigev_signo is skipped when SIGEV_NONE was
      specified, so the signal number could be bogus.  If sigev_signo was an
      unitialized value (as it often would be if SIGEV_NONE is used), then
      it's hard to predict which signal will be sent.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f11d2259
    • Richard Larocque's avatar
      alarmtimer: Return relative times in timer_gettime · bffe4248
      Richard Larocque authored
      commit e86fea76 upstream.
      
      Returns the time remaining for an alarm timer, rather than the time at
      which it is scheduled to expire.  If the timer has already expired or it
      is not currently scheduled, the it_value's members are set to zero.
      
      This new behavior matches that of the other posix-timers and the POSIX
      specifications.
      
      This is a change in user-visible behavior, and may break existing
      applications.  Hopefully, few users rely on the old incorrect behavior.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sharvil Nanavati <sharvil@google.com>
      Signed-off-by: default avatarRichard Larocque <rlarocque@google.com>
      [jstultz: minor style tweak]
      Signed-off-by: default avatarJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bffe4248
    • John David Anglin's avatar
      parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds · 79b9b729
      John David Anglin authored
      commit d26a7730 upstream.
      
      In spite of what the GCC manual says, the -mfast-indirect-calls has
      never been supported in the 64-bit parisc compiler. Indirect calls have
      always been done using function descriptors irrespective of the
      -mfast-indirect-calls option.
      
      Recently, it was noticed that a function descriptor was always requested
      when the -mfast-indirect-calls option was specified. This caused
      problems when the option was used in  application code and doesn't make
      any sense because the whole point of the option is to avoid using a
      function descriptor for indirect calls.
      
      Fixing this broke 64-bit kernel builds.
      
      I will fix GCC but for now we need the attached change. This results in
      the same kernel code as before.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79b9b729
    • Guy Martin's avatar
      parisc: Implement new LWS CAS supporting 64 bit operations. · 43ed39e4
      Guy Martin authored
      commit 89206491 upstream.
      
      The current LWS cas only works correctly for 32bit. The new LWS allows
      for CAS operations of variable size.
      Signed-off-by: default avatarGuy Martin <gmsoft@tuxicoman.be>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      43ed39e4
    • Al Viro's avatar
      don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() · 8ed0282d
      Al Viro authored
      commit 7bd88377 upstream.
      
      return the value instead, and have path_init() do the assignment.  Broken by
      "vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
      which was Cc-stable with 2.6.38+ as destination.  This one should go where
      it went.
      
      To avoid dummy value returned in case when root is already set (it would do
      no harm, actually, since the only caller that doesn't ignore the return value
      is guaranteed to have nd->root *not* set, but it's more obvious that way),
      lift the check into callers.  And do the same to set_root(), to keep them
      in sync.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ed0282d
    • Richard Genoud's avatar
      tty/serial: at91: BUG: disable interrupts when !UART_ENABLE_MS() · f60133a9
      Richard Genoud authored
      commit 35b675b9 upstream.
      
      In set_termios(), interrupts where not disabled if UART_ENABLE_MS() was
      false.
      
      Tested on at91sam9g35.
      Signed-off-by: default avatarRichard Genoud <richard.genoud@gmail.com>
      Reviewed-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f60133a9
    • Michael Ellerman's avatar
      powerpc: Add smp_mb()s to arch_spin_unlock_wait() · 4e43bbd4
      Michael Ellerman authored
      commit 78e05b14 upstream.
      
      Similar to the previous commit which described why we need to add a
      barrier to arch_spin_is_locked(), we have a similar problem with
      spin_unlock_wait().
      
      We need a barrier on entry to ensure any spinlock we have previously
      taken is visibly locked prior to the load of lock->slock.
      
      It's also not clear if spin_unlock_wait() is intended to have ACQUIRE
      semantics. For now be conservative and add a barrier on exit to give it
      ACQUIRE semantics.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e43bbd4
    • Michael Ellerman's avatar
      powerpc: Add smp_mb() to arch_spin_is_locked() · 2dd10ce8
      Michael Ellerman authored
      commit 51d7d520 upstream.
      
      The kernel defines the function spin_is_locked(), which can be used to
      check if a spinlock is currently locked.
      
      Using spin_is_locked() on a lock you don't hold is obviously racy. That
      is, even though you may observe that the lock is unlocked, it may become
      locked at any time.
      
      There is (at least) one exception to that, which is if two locks are
      used as a pair, and the holder of each checks the status of the other
      before doing any update.
      
      Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic
      value:
      
      The first CPU does:
      
      	spin_lock(*A)
      
      	if spin_is_locked(*B)
      		# nothing
      	else
      		smp_mb()
      		LOAD r = *COUNTER
      		r++
      		STORE *COUNTER = r
      
      	spin_unlock(*A)
      
      And the second CPU does:
      
      	spin_lock(*B)
      
      	if spin_is_locked(*A)
      		# nothing
      	else
      		smp_mb()
      		LOAD r = *COUNTER
      		r++
      		STORE *COUNTER = r
      
      	spin_unlock(*B)
      
      Although this is a strange locking construct, it should work.
      
      It seems to be understood, but not documented, that spin_is_locked() is
      not a memory barrier, so in the examples above and below the caller
      inserts its own memory barrier before acting on the result of
      spin_is_locked().
      
      For now we assume spin_is_locked() is implemented as below, and we break
      it out in our examples:
      
      	bool spin_is_locked(*LOCK) {
      		LOAD l = *LOCK
      		return l.locked
      	}
      
      Our intuition is that there should be no problem even if the two code
      sequences run simultaneously such as:
      
      	CPU 0			CPU 1
      	==================================================
      	spin_lock(*A)		spin_lock(*B)
      	LOAD b = *B		LOAD a = *A
      	if b.locked # true	if a.locked # true
      	# nothing		# nothing
      	spin_unlock(*A)		spin_unlock(*B)
      
      If one CPU gets the lock before the other then it will do the update and
      the other CPU will back off:
      
      	CPU 0			CPU 1
      	==================================================
      	spin_lock(*A)
      	LOAD b = *B
      				spin_lock(*B)
      	if b.locked # false	LOAD a = *A
      	else			if a.locked # true
      	smp_mb()		# nothing
      	LOAD r1 = *COUNTER	spin_unlock(*B)
      	r1++
      	STORE *COUNTER = r1
      	spin_unlock(*A)
      
      However in reality spin_lock() itself is not indivisible. On powerpc we
      implement it as a load-and-reserve and store-conditional.
      
      Ignoring the retry logic for the lost reservation case, it boils down to:
      	spin_lock(*LOCK) {
      		LOAD l = *LOCK
      		l.locked = true
      		STORE *LOCK = l
      		ACQUIRE_BARRIER
      	}
      
      The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as
      defined in memory-barriers.txt:
      
           This acts as a one-way permeable barrier.  It guarantees that all
           memory operations after the ACQUIRE operation will appear to happen
           after the ACQUIRE operation with respect to the other components of
           the system.
      
      On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is
      also know as "lightweight sync", or "sync 1".
      
      As described in Power ISA v2.07 section B.2.1.1, in this scenario the
      lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to
      act as the barrier, preventing any loads or stores in the locked region
      from occurring prior to the load of *LOCK.
      
      Whether this behaviour is in accordance with the definition of ACQUIRE
      semantics in memory-barriers.txt is open to discussion, we may switch to
      a different barrier in future.
      
      What this means in practice is that the following can occur:
      
      	CPU 0			CPU 1
      	==================================================
      	LOAD a = *A 		LOAD b = *B
      	a.locked = true		b.locked = true
      	LOAD b = *B		LOAD a = *A
      	STORE *A = a		STORE *B = b
      	if b.locked # false	if a.locked # false
      	else			else
      	smp_mb()		smp_mb()
      	LOAD r1 = *COUNTER	LOAD r2 = *COUNTER
      	r1++			r2++
      	STORE *COUNTER = r1
      				STORE *COUNTER = r2	# Lost update
      	spin_unlock(*A)		spin_unlock(*B)
      
      That is, the load of *B can occur prior to the store that makes *A
      visibly locked. And similarly for CPU 1. The result is both CPUs hold
      their lock and believe the other lock is unlocked.
      
      The easiest fix for this is to add a full memory barrier to the start of
      spin_is_locked(), so adding to our previous definition would give us:
      
      	bool spin_is_locked(*LOCK) {
      		smp_mb()
      		LOAD l = *LOCK
      		return l.locked
      	}
      
      The new barrier orders the store to the lock we are locking vs the load
      of the other lock:
      
      	CPU 0			CPU 1
      	==================================================
      	LOAD a = *A 		LOAD b = *B
      	a.locked = true		b.locked = true
      	STORE *A = a		STORE *B = b
      	smp_mb()		smp_mb()
      	LOAD b = *B		LOAD a = *A
      	if b.locked # true	if a.locked # true
      	# nothing		# nothing
      	spin_unlock(*A)		spin_unlock(*B)
      
      Although the above example is theoretical, there is code similar to this
      example in sem_lock() in ipc/sem.c. This commit in addition to the next
      commit appears to be a fix for crashes we are seeing in that code where
      we believe this race happens in practice.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2dd10ce8
    • Anton Blanchard's avatar
      powerpc/perf: Fix ABIv2 kernel backtraces · cc8dcb69
      Anton Blanchard authored
      commit 85101af1 upstream.
      
      ABIv2 kernels are failing to backtrace through the kernel. An example:
      
      39.30%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
                  |
                  --- find_get_entry
                     __GI___libc_read
      
      The problem is in valid_next_sp() where we check that the new stack
      pointer is at least STACK_FRAME_OVERHEAD below the previous one.
      
      ABIv1 has a minimum stack frame size of 112 bytes consisting of 48 bytes
      and 64 bytes of parameter save area. ABIv2 changes that to 32 bytes
      with no paramter save area.
      
      STACK_FRAME_OVERHEAD is in theory the minimum stack frame size,
      but we over 240 uses of it, some of which assume that it includes
      space for the parameter area.
      
      We need to work through all our stack defines and rationalise them
      but let's fix perf now by creating STACK_FRAME_MIN_SIZE and using
      in valid_next_sp(). This fixes the issue:
      
      30.64%  readseek2_proce  [kernel.kallsyms]    [k] find_get_entry
                  |
                  --- find_get_entry
                     pagecache_get_page
                     generic_file_read_iter
                     new_sync_read
                     vfs_read
                     sys_read
                     syscall_exit
                     __GI___libc_read
      Reported-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc8dcb69
    • Johannes Stezenbach's avatar
      ath9k_htc: fix random decryption failure · 981b2611
      Johannes Stezenbach authored
      commit d21ccfd0 upstream.
      
      In v3.15 the driver stopped to accept network packets after successful
      authentification, which could be worked around by passing the
      nohwcrypt=1 module parameter.  This was not reproducible by
      everyone, and showed random behaviour in some tests.
      It was caused by an uninitialized variable introduced
      in 4ed1a8d4 ("ath9k_htc: use ath9k_cmn_rx_accept") and
      used in 341b29b9 ("ath9k_htc: use ath9k_cmn_rx_skb_postprocess").
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=78581
      Fixes: 341b29b9 ("ath9k_htc: use ath9k_cmn_rx_skb_postprocess")
      Signed-off-by: default avatarJohannes Stezenbach <js@sig21.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      981b2611
    • Arend van Spriel's avatar
      brcmfmac: handle IF event for P2P_DEVICE interface · 6b2a8bf7
      Arend van Spriel authored
      commit 87c47903 upstream.
      
      The firmware notifies about interface changes through the IF event
      which has a NO_IF flag that means host can ignore the event. This
      behaviour was introduced in the driver by:
      
        commit 2ee8382f
        Author: Arend van Spriel <arend@broadcom.com>
        Date:   Sat Aug 10 12:27:24 2013 +0200
      
            brcmfmac: ignore IF event if firmware indicates it
      
      It turns out that the IF event for the P2P_DEVICE also has this
      flag set, but the event should not be ignored in this scenario.
      The mentioned commit caused a regression in 3.12 kernel in creation
      of the P2P_DEVICE interface.
      Reviewed-by: default avatarHante Meuleman <meuleman@broadcom.com>
      Reviewed-by: default avatarFranky (Zhenhui) Lin <frankyl@broadcom.com>
      Reviewed-by: default avatarDaniel (Deognyoun) Kim <dekim@broadcom.com>
      Reviewed-by: default avatarPieter-Paul Giesberts <pieterpg@broadcom.com>
      Signed-off-by: default avatarArend van Spriel <arend@broadcom.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6b2a8bf7
    • Wanpeng Li's avatar
      sched: Fix unreleased llc_shared_mask bit during CPU hotplug · f99234e1
      Wanpeng Li authored
      commit 03bd4e1f upstream.
      
      The following bug can be triggered by hot adding and removing a large number of
      xen domain0's vcpus repeatedly:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
      	PGD 5a9d5067 PUD 13067 PMD 0
      	Oops: 0000 [#3] SMP
      	[...]
      	Call Trace:
      	load_balance
      	? _raw_spin_unlock_irqrestore
      	idle_balance
      	__schedule
      	schedule
      	schedule_timeout
      	? lock_timer_base
      	schedule_timeout_uninterruptible
      	msleep
      	lock_device_hotplug_sysfs
      	online_store
      	dev_attr_store
      	sysfs_write_file
      	vfs_write
      	SyS_write
      	system_call_fastpath
      
      Last level cache shared mask is built during CPU up and the
      build_sched_domain() routine takes advantage of it to setup
      the sched domain CPU topology.
      
      However, llc_shared_mask is not released during CPU disable,
      which leads to an invalid sched domainCPU topology.
      
      This patch fix it by releasing the llc_shared_mask correctly
      during CPU disable.
      
      Yasuaki also reported that this can happen on real hardware:
      
        https://lkml.org/lkml/2014/7/22/1018
      
      His case is here:
      
      	==
      	Here is an example on my system.
      	My system has 4 sockets and each socket has 15 cores and HT is
      	enabled. In this case, each core of sockes is numbered as
      	follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-44, 90-104
      	Socket#3 | 45-59, 105-119
      
      	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
      
      	It means that last level cache of Socket#2 is shared with
      	CPU#30-44 and 90-104.
      
      	When hot-removing socket#2 and #3, each core of sockets is
      	numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      
      	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
      	remains having 0x3fff80000001fffc0000000.
      
      	After that, when hot-adding socket#2 and #3, each core of
      	sockets is numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-59
      	Socket#3 | 90-119
      
      	Then llc_shared_mask of CPU#30 becomes
      	0x3fff8000fffffffc0000000. It means that last level cache of
      	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
      	the wrong value.
      Signed-off-by: default avatarWanpeng Li <wanpeng.li@linux.intel.com>
      Tested-by: default avatarLinn Crosetto <linn@hp.com>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarToshi Kani <toshi.kani@hp.com>
      Reviewed-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f99234e1
    • Peter Feiner's avatar
      mm: softdirty: keep bit when zapping file pte · ffa16dcb
      Peter Feiner authored
      commit dbab31aa upstream.
      
      This fixes the same bug as b43790ee ("mm: softdirty: don't forget to
      save file map softdiry bit on unmap") and 9aed8614 ("mm/memory.c:
      don't forget to set softdirty on file mapped fault") where the return
      value of pte_*mksoft_dirty was being ignored.
      
      To be sure that no other pte/pmd "mk" function return values were being
      ignored, I annotated the functions in arch/x86/include/asm/pgtable.h
      with __must_check and rebuilt.
      
      The userspace effect of this bug is that the softdirty mark might be
      lost if a file mapped pte get zapped.
      Signed-off-by: default avatarPeter Feiner <pfeiner@google.com>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Jamie Liu <jamieliu@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffa16dcb
    • Fabian Frederick's avatar
      fs/cachefiles: add missing \n to kerror conversions · 152f5bed
      Fabian Frederick authored
      commit 6ff66ac7 upstream.
      
      Commit 0227d6ab ("fs/cachefiles: replace kerror by pr_err") didn't
      include newline featuring in original kerror definition
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Reported-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      152f5bed
    • David Rientjes's avatar
      mm, slab: initialize object alignment on cache creation · f5eae161
      David Rientjes authored
      commit d4a5fca5 upstream.
      
      Since commit 45906855 ("mm/sl[aou]b: Common alignment code"), the
      "ralign" automatic variable in __kmem_cache_create() may be used as
      uninitialized.
      
      The proper alignment defaults to BYTES_PER_WORD and can be overridden by
      SLAB_RED_ZONE or the alignment specified by the caller.
      
      This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarAndrei Elovikov <a.elovikov@gmail.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f5eae161
    • Joseph Qi's avatar
      ocfs2/dlm: do not get resource spinlock if lockres is new · a5d969ea
      Joseph Qi authored
      commit 5760a97c upstream.
      
      There is a deadlock case which reported by Guozhonghua:
        https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html
      
      This case is caused by &res->spinlock and &dlm->master_lock
      misordering in different threads.
      
      It was introduced by commit 8d400b81 ("ocfs2/dlm: Clean up refmap
      helpers").  Since lockres is new, it doesn't not require the
      &res->spinlock.  So remove it.
      
      Fixes: 8d400b81 ("ocfs2/dlm: Clean up refmap helpers")
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Reported-by: default avatarGuozhonghua <guozhonghua@h3c.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a5d969ea
    • Andreas Rohner's avatar
      nilfs2: fix data loss with mmap() · 1ba8582b
      Andreas Rohner authored
      commit 56d7acc7 upstream.
      
      This bug leads to reproducible silent data loss, despite the use of
      msync(), sync() and a clean unmount of the file system.  It is easily
      reproducible with the following script:
      
        ----------------[BEGIN SCRIPT]--------------------
        mkfs.nilfs2 -f /dev/sdb
        mount /dev/sdb /mnt
      
        dd if=/dev/zero bs=1M count=30 of=/mnt/testfile
      
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"
      
        /root/mmaptest/mmaptest /mnt/testfile 30 10 5
      
        sync
        CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
        umount /mnt
        mount /dev/sdb /mnt
        CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
        umount /mnt
      
        echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
        echo "AFTER MMAP:\t$CHECKSUM_AFTER"
        echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
        ----------------[END SCRIPT]--------------------
      
      The mmaptest tool looks something like this (very simplified, with
      error checking removed):
      
        ----------------[BEGIN mmaptest]--------------------
        data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
                    MAP_SHARED, fd, file_offset);
      
        for (i = 0; i < write_count; ++i) {
              memcpy(data + i * 4096, buf, sizeof(buf));
              msync(data, file_size - file_offset, MS_SYNC))
        }
        ----------------[END mmaptest]--------------------
      
      The output of the script looks something like this:
      
        BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
        AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
        AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
      
      So it is clear, that the changes done using mmap() do not survive a
      remount.  This can be reproduced a 100% of the time.  The problem was
      introduced in commit 136e8770 ("nilfs2: fix issue of
      nilfs_set_page_dirty() for page at EOF boundary").
      
      If the page was read with mpage_readpage() or mpage_readpages() for
      example, then it has no buffers attached to it.  In that case
      page_has_buffers(page) in nilfs_set_page_dirty() will be false.
      Therefore nilfs_set_file_dirty() is never called and the pages are never
      collected and never written to disk.
      
      This patch fixes the problem by also calling nilfs_set_file_dirty() if the
      page has no buffers attached to it.
      
      [akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
      Signed-off-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Tested-by: default avatarAndreas Rohner <andreas.rohner@gmx.net>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ba8582b
    • Andrey Vagin's avatar
      fs/notify: don't show f_handle if exportfs_encode_inode_fh failed · 312a707d
      Andrey Vagin authored
      commit 7e882481 upstream.
      
      Currently we handle only ENOSPC.  In case of other errors the file_handle
      variable isn't filled properly and we will show a part of stack.
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      312a707d
    • Andrey Vagin's avatar
      fsnotify/fdinfo: use named constants instead of hardcoded values · 009adfc8
      Andrey Vagin authored
      commit 1fc98d11 upstream.
      
      MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
      bytes, so exportfs_encode_inode_fh can return an error.
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      009adfc8
    • Rasmus Villemoes's avatar
      kcmp: fix standard comparison bug · 67e478a6
      Rasmus Villemoes authored
      commit acbbe6fb upstream.
      
      The C operator <= defines a perfectly fine total ordering on the set of
      values representable in a long.  However, unlike its namesake in the
      integers, it is not translation invariant, meaning that we do not have
      "b <= c" iff "a+b <= a+c" for all a,b,c.
      
      This means that it is always wrong to try to boil down the relationship
      between two longs to a question about the sign of their difference,
      because the resulting relation [a LEQ b iff a-b <= 0] is neither
      anti-symmetric or transitive.  The former is due to -LONG_MIN==LONG_MIN
      (take any two a,b with a-b = LONG_MIN; then a LEQ b and b LEQ a, but a !=
      b).  The latter can either be seen observing that x LEQ x+1 for all x,
      implying x LEQ x+1 LEQ x+2 ...  LEQ x-1 LEQ x; or more directly with the
      simple example a=LONG_MIN, b=0, c=1, for which a-b < 0, b-c < 0, but a-c >
      0.
      
      Note that it makes absolutely no difference that a transmogrying bijection
      has been applied before the comparison is done.  In fact, had the
      obfuscation not been done, one could probably not observe the bug
      (assuming all values being compared always lie in one half of the address
      space, the mathematical value of a-b is always representable in a long).
      As it stands, one can easily obtain three file descriptors exhibiting the
      non-transitivity of kcmp().
      
      Side note 1: I can't see that ensuring the MSB of the multiplier is
      set serves any purpose other than obfuscating the obfuscating code.
      
      Side note 2:
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <assert.h>
      #include <sys/syscall.h>
      
      enum kcmp_type {
              KCMP_FILE,
              KCMP_VM,
              KCMP_FILES,
              KCMP_FS,
              KCMP_SIGHAND,
              KCMP_IO,
              KCMP_SYSVSEM,
              KCMP_TYPES,
      };
      pid_t pid;
      
      int kcmp(pid_t pid1, pid_t pid2, int type,
      	 unsigned long idx1, unsigned long idx2)
      {
      	return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2);
      }
      int cmp_fd(int fd1, int fd2)
      {
      	int c = kcmp(pid, pid, KCMP_FILE, fd1, fd2);
      	if (c < 0) {
      		perror("kcmp");
      		exit(1);
      	}
      	assert(0 <= c && c < 3);
      	return c;
      }
      int cmp_fdp(const void *a, const void *b)
      {
      	static const int normalize[] = {0, -1, 1};
      	return normalize[cmp_fd(*(int*)a, *(int*)b)];
      }
      #define MAX 100 /* This is plenty; I've seen it trigger for MAX==3 */
      int main(int argc, char *argv[])
      {
      	int r, s, count = 0;
      	int REL[3] = {0,0,0};
      	int fd[MAX];
      	pid = getpid();
      	while (count < MAX) {
      		r = open("/dev/null", O_RDONLY);
      		if (r < 0)
      			break;
      		fd[count++] = r;
      	}
      	printf("opened %d file descriptors\n", count);
      	for (r = 0; r < count; ++r) {
      		for (s = r+1; s < count; ++s) {
      			REL[cmp_fd(fd[r], fd[s])]++;
      		}
      	}
      	printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
      	qsort(fd, count, sizeof(fd[0]), cmp_fdp);
      	memset(REL, 0, sizeof(REL));
      
      	for (r = 0; r < count; ++r) {
      		for (s = r+1; s < count; ++s) {
      			REL[cmp_fd(fd[r], fd[s])]++;
      		}
      	}
      	printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]);
      	return (REL[0] + REL[2] != 0);
      }
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67e478a6
    • Nicolas Iooss's avatar
      eventpoll: fix uninitialized variable in epoll_ctl · af3d8636
      Nicolas Iooss authored
      commit c680e41b upstream.
      
      When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
      not initialized but ep_take_care_of_epollwakeup reads its event field.
      When this unintialized field has EPOLLWAKEUP bit set, a capability check
      is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup.  This
      produces unexpected messages in the audit log, such as (on a system
      running SELinux):
      
          type=AVC msg=audit(1408212798.866:410): avc:  denied
          { block_suspend } for  pid=7754 comm="dbus-daemon" capability=36
          scontext=unconfined_u:unconfined_r:unconfined_t
          tcontext=unconfined_u:unconfined_r:unconfined_t
          tclass=capability2 permissive=1
      
          type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
          success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
          pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
          fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
          exe="/usr/bin/dbus-daemon"
          subj=unconfined_u:unconfined_r:unconfined_t key=(null)
      
      ("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")
      
      Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.
      
      Fixes: 4d7e30d9 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
      Signed-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arve Hjønnevåg <arve@android.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3d8636
    • Patrick Palka's avatar
      kernel/printk/printk.c: fix faulty logic in the case of recursive printk · 254e4530
      Patrick Palka authored
      commit 000a7d66 upstream.
      
      We shouldn't set text_len in the code path that detects printk recursion
      because text_len corresponds to the length of the string inside textbuf.
      A few lines down from the line
      
          text_len = strlen(recursion_msg);
      
      is the line
      
          text_len += vscnprintf(text + text_len, ...);
      
      So if printk detects recursion, it sets text_len to 29 (the length of
      recursion_msg) and logs an error.  Then the message supplied by the
      caller of printk is stored inside textbuf but offset by 29 bytes.  This
      means that the output of the recursive call to printk will contain 29
      bytes of garbage in front of it.
      
      This defect is caused by commit 458df9fd ("printk: remove separate
      printk_sched buffers and use printk buf instead") which turned the line
      
          text_len = vscnprintf(text, ...);
      
      into
      
          text_len += vscnprintf(text + text_len, ...);
      
      To fix this, this patch avoids setting text_len when logging the printk
      recursion error.  This patch also marks unlikely() the branch leading up
      to this code.
      
      Fixes: 458df9fd ("printk: remove separate printk_sched buffers and use printk buf instead")
      Signed-off-by: default avatarPatrick Palka <patrick@parcs.ath.cx>
      Reviewed-by: default avatarPetr Mladek <pmladek@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      254e4530
    • Johannes Berg's avatar
      Revert "mac80211: disable uAPSD if all ACs are under ACM" · d9ecad33
      Johannes Berg authored
      commit bb512ad0 upstream.
      
      This reverts commit 24aa11ab.
      
      That commit was wrong since it uses data that hasn't even been set
      up yet, but might be a hold-over from a previous connection.
      
      Additionally, it seems like a driver-specific workaround that
      shouldn't have been in mac80211 to start with.
      
      Fixes: 24aa11ab ("mac80211: disable uAPSD if all ACs are under ACM")
      Reviewed-by: default avatarLuciano Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9ecad33
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Update all ftrace_ops for a ftrace_hash_ops update · 240f589d
      Steven Rostedt (Red Hat) authored
      commit 84261912 upstream.
      
      When updating what an ftrace_ops traces, if it is registered (that is,
      actively tracing), and that ftrace_ops uses the shared global_ops
      local_hash, then we need to update all tracers that are active and
      also share the global_ops' ftrace_hash_ops.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      240f589d
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix function_profiler and function tracer together · 7f4ad283
      Steven Rostedt (Red Hat) authored
      commit 5f151b24 upstream.
      
      The latest rewrite of ftrace removed the separate ftrace_ops of
      the function tracer and the function graph tracer and had them
      share the same ftrace_ops. This simplified the accounting by removing
      the multiple layers of functions called, where the global_ops func
      would call a special list that would iterate over the other ops that
      were registered within it (like function and function graph), which
      itself was registered to the ftrace ops list of all functions
      currently active. If that sounds confusing, the code that implemented
      it was also confusing and its removal is a good thing.
      
      The problem with this change was that it assumed that the function
      and function graph tracer can never be used at the same time.
      This is mostly true, but there is an exception. That is when the
      function profiler uses the function graph tracer to profile.
      The function profiler can be activated the same time as the function
      tracer, and this breaks the assumption and the result is that ftrace
      will crash (it detects the error and shuts itself down, it does not
      cause a kernel oops).
      
      To solve this issue, a previous change allowed the hash tables
      for the functions traced by a ftrace_ops to be a pointer and let
      multiple ftrace_ops share the same hash. This allows the function
      and function_graph tracer to have separate ftrace_ops, but still
      share the hash, which is what is done.
      
      Now the function and function graph tracers have separate ftrace_ops
      again, and the function tracer can be run while the function_profile
      is active.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      7f4ad283
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Allow ftrace_ops to use the hashes from other ops · 6f6ad430
      Steven Rostedt (Red Hat) authored
      commit 33b7f99c upstream.
      
      Currently the top level debug file system function tracer shares its
      ftrace_ops with the function graph tracer. This was thought to be fine
      because the tracers are not used together, as one can only enable
      function or function_graph tracer in the current_tracer file.
      
      But that assumption proved to be incorrect. The function profiler
      can use the function graph tracer when function tracing is enabled.
      Since all function graph users uses the function tracing ftrace_ops
      this causes a conflict and when a user enables both function profiling
      as well as the function tracer it will crash ftrace and disable it.
      
      The quick solution so far is to move them as separate ftrace_ops like
      it was earlier. The problem though is to synchronize the functions that
      are traced because both function and function_graph tracer are limited
      by the selections made in the set_ftrace_filter and set_ftrace_notrace
      files.
      
      To handle this, a new structure is made called ftrace_ops_hash. This
      structure will now hold the filter_hash and notrace_hash, and the
      ftrace_ops will point to this structure. That will allow two ftrace_ops
      to share the same hashes.
      
      Since most ftrace_ops do not share the hashes, and to keep allocation
      simple, the ftrace_ops structure will include both a pointer to the
      ftrace_ops_hash called func_hash, as well as the structure itself,
      called local_hash. When the ops are registered, the func_hash pointer
      will be initialized to point to the local_hash within the ftrace_ops
      structure. Some of the ftrace internal ftrace_ops will be initialized
      statically. This will allow for the function and function_graph tracer
      to have separate ops but still share the same hash tables that determine
      what functions they trace.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      6f6ad430
    • Felipe Balbi's avatar
      usb: dwc3: fix TRB completion when multiple TRBs are started · 2b3a5805
      Felipe Balbi authored
      commit 0b93a4c8 upstream.
      
      After commit 2ec2a8be (usb: dwc3: gadget:
      always enable IOC on bulk/interrupt transfers)
      we created a situation where it was possible to
      hang a bulk/interrupt endpoint if we had more
      than one pending request in our queue and they
      were both started with a single Start Transfer
      command.
      
      The problems triggers because we had not enabled
      Transfer In Progress event for those endpoints
      and we were not able to process early giveback
      of requests completed without LST bit set.
      
      Fix the problem by finally enabling Xfer In Progress
      event for all endpoint types, except control.
      
      Fixes: 2ec2a8be (usb: dwc3: gadget: always
      	enable IOC on bulk/interrupt transfers)
      Reported-by: default avatarPratyush Anand <pratyush.anand@st.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      2b3a5805
    • Jens Axboe's avatar
      genhd: fix leftover might_sleep() in blk_free_devt() · e8b1e960
      Jens Axboe authored
      commit 46f341ff upstream.
      
      Commit 2da78092 changed the locking from a mutex to a spinlock,
      so we now longer sleep in this context. But there was a leftover
      might_sleep() in there, which now triggers since we do the final
      free from an RCU callback. Get rid of it.
      Reported-by: default avatarPontus Fuchs <pontus.fuchs@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8b1e960
    • Trond Myklebust's avatar
      lockdep: Revert lockdep check in raw_seqcount_begin() · 5083d713
      Trond Myklebust authored
      commit 22fdcf02 upstream.
      
      This commit reverts the addition of lockdep checking to raw_seqcount_begin
      for the following reasons:
      
       1) It violates the naming convention that raw_* functions should not
          do lockdep checks (a convention that is also followed by the other
          raw_*_seqcount_begin functions).
      
       2) raw_seqcount_begin does not spin, so it can only be part of an ABBA
          deadlock in very special circumstances (for instance if a lock
          is held across the entire raw_seqcount_begin()+read_seqcount_retry()
          loop while also being taken inside the write_seqcount protected area).
      
       3) It is causing false positives with some existing callers, and there
          is no non-lockdep alternative for those callers to use.
      
      None of the three existing callers (__d_lookup_rcu, netdev_get_name, and
      the NFS state code) appear to use the function in a manner that is ABBA
      deadlock prone.
      
      Fixes: 1ca7d67c: seqcount: Add lockdep functionality to seqcount/seqlock
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Waiman Long <Waiman.Long@hp.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/CAHQdGtRR6SvEhXiqWo24hoUh9AU9cL82Z8Z-d8-7u951F_d+5g@mail.gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5083d713
    • J. Bruce Fields's avatar
      lockd: fix rpcbind crash on lockd startup failure · f12342b1
      J. Bruce Fields authored
      commit 7c17705e upstream.
      
      Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
      then nfs mounting without portmap or rpcbind running using a busybox
      mount resulted in:
      
        # mount -t nfs 10.30.130.21:/opt /mnt
        svc: failed to register lockdv1 RPC service (errno 111).
        lockd_up: makesock failed, error=-111
        Unable to handle kernel paging request for data at address 0x00000030
        Faulting instruction address: 0xc055e65c
        Oops: Kernel access of bad area, sig: 11 [#1]
        MPC85xx CDS
        Modules linked in:
        CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
        task: cf29cea0 ti: cf35c000 task.ti: cf35c000
        NIP: c055e65c LR: c0566490 CTR: c055e648
        REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
        MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
        DEAR: 00000030, ESR: 00000000
      
        GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
        00000000
        GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
        10090ae8
        GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
        bfa46ef0
        GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
        cf0ded80
        NIP [c055e65c] call_start+0x14/0x34
        LR [c0566490] __rpc_execute+0x70/0x250
        Call Trace:
        [cf35db80] [00000080] 0x80 (unreliable)
        [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
        [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
        [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
        [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
        [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
        [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
        [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
        [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
        [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
        [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
        [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
        [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
        [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
        [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
        [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
        [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
        [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
        [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
      
      The addition of svc_shutdown_net() resulted in two calls to
      svc_rpcb_cleanup(); the second is no longer necessary and crashes when
      it calls rpcb_register_call with clnt=NULL.
      Reported-by: default avatarNikita Yushchenko <nyushchenko@dev.rtsoft.ru>
      Fixes: 679b033d "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
      Acked-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f12342b1