1. 22 Aug, 2016 40 commits
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE · 81bba9ff
      Paul Mackerras authored
      commit 93d17397 upstream.
      
      It turns out that if the guest does a H_CEDE while the CPU is in
      a transactional state, and the H_CEDE does a nap, and the nap
      loses the architected state of the CPU (which is is allowed to do),
      then we lose the checkpointed state of the virtual CPU.  In addition,
      the transactional-memory state recorded in the MSR gets reset back
      to non-transactional, and when we try to return to the guest, we take
      a TM bad thing type of program interrupt because we are trying to
      transition from non-transactional to transactional with a hrfid
      instruction, which is not permitted.
      
      The result of the program interrupt occurring at that point is that
      the host CPU will hang in an infinite loop with interrupts disabled.
      Thus this is a denial of service vulnerability in the host which can
      be triggered by any guest (and depending on the guest kernel, it can
      potentially triggered by unprivileged userspace in the guest).
      
      This vulnerability has been assigned the ID CVE-2016-5412.
      
      To fix this, we save the TM state before napping and restore it
      on exit from the nap, when handling a H_CEDE in real mode.  The
      case where H_CEDE exits to host virtual mode is already OK (as are
      other hcalls which exit to host virtual mode) because the exit
      path saves the TM state.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      81bba9ff
    • Paul Mackerras's avatar
      KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures · b16e5c4a
      Paul Mackerras authored
      commit f024ee09 upstream.
      
      This moves the transactional memory state save and restore sequences
      out of the guest entry/exit paths into separate procedures.  This is
      so that these sequences can be used in going into and out of nap
      in a subsequent patch.
      
      The only code changes here are (a) saving and restore LR on the
      stack, since these new procedures get called with a bl instruction,
      (b) explicitly saving r1 into the PACA instead of assuming that
      HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
      and redundant setting of MSR[TM] that should have been removed by
      commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
      support", 2013-09-24) but wasn't.
      Signed-off-by: default avatarPaul Mackerras <paulus@ozlabs.org>
      [bwh: Backported to 3.16: include dots in subroutine names]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      b16e5c4a
    • Kangjie Lu's avatar
      rds: fix an infoleak in rds_inc_info_copy · 5343d177
      Kangjie Lu authored
      commit 4116def2 upstream.
      
      The last field "flags" of object "minfo" is not initialized.
      Copying this object out may leak kernel stack data.
      Assign 0 to it to avoid leak.
      Signed-off-by: default avatarKangjie Lu <kjlu@gatech.edu>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5343d177
    • Kangjie Lu's avatar
      tipc: fix an infoleak in tipc_nl_compat_link_dump · 3d4997da
      Kangjie Lu authored
      commit 5d2be142 upstream.
      
      link_info.str is a char array of size 60. Memory after the NULL
      byte is not initialized. Sending the whole object out can cause
      a leak.
      Signed-off-by: default avatarKangjie Lu <kjlu@gatech.edu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [carnil: Backported to 3.16 (same as bwh did for 3.2): the unpadded strcpy() is
      in tipc_node_get_links() and no nlattr is involved, so use strncpy()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3d4997da
    • Kangjie Lu's avatar
      ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt · ece77e42
      Kangjie Lu authored
      commit e4ec8cc8 upstream.
      
      The stack object “r1” has a total size of 32 bytes. Its field
      “event” and “val” both contain 4 bytes padding. These 8 bytes
      padding bytes are sent to user without being initialized.
      Signed-off-by: default avatarKangjie Lu <kjlu@gatech.edu>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ece77e42
    • Kangjie Lu's avatar
      ALSA: timer: Fix leak in events via snd_timer_user_ccallback · d5b7dbe5
      Kangjie Lu authored
      commit 9a47e9cf upstream.
      
      The stack object “r1” has a total size of 32 bytes. Its field
      “event” and “val” both contain 4 bytes padding. These 8 bytes
      padding bytes are sent to user without being initialized.
      Signed-off-by: default avatarKangjie Lu <kjlu@gatech.edu>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d5b7dbe5
    • Kangjie Lu's avatar
      ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS · 84d86972
      Kangjie Lu authored
      commit cec8f96e upstream.
      
      The stack object “tread” has a total size of 32 bytes. Its field
      “event” and “val” both contain 4 bytes padding. These 8 bytes
      padding bytes are sent to user without being initialized.
      Signed-off-by: default avatarKangjie Lu <kjlu@gatech.edu>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      84d86972
    • Kangjie Lu's avatar
      USB: usbfs: fix potential infoleak in devio · 502c7a5b
      Kangjie Lu authored
      commit 681fef83 upstream.
      
      The stack object “ci” has a total size of 8 bytes. Its last 3 bytes
      are padding bytes which are not initialized and leaked to userland
      via “copy_to_user”.
      Signed-off-by: default avatarKangjie Lu <kjlu@gatech.edu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      502c7a5b
    • Jann Horn's avatar
      proc: prevent stacking filesystems on top · a0b5c04d
      Jann Horn authored
      commit e54ad7f1 upstream.
      
      This prevents stacking filesystems (ecryptfs and overlayfs) from using
      procfs as lower filesystem.  There is too much magic going on inside
      procfs, and there is no good reason to stack stuff on top of procfs.
      
      (For example, procfs does access checks in VFS open handlers, and
      ecryptfs by design calls open handlers from a kernel thread that doesn't
      drop privileges or so.)
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a0b5c04d
    • Miklos Szeredi's avatar
      fs: limit filesystem stacking depth · 54c202bb
      Miklos Szeredi authored
      commit 69c433ed upstream.
      
      Add a simple read-only counter to super_block that indicates how deep this
      is in the stack of filesystems.  Previously ecryptfs was the only stackable
      filesystem and it explicitly disallowed multiple layers of itself.
      
      Overlayfs, however, can be stacked recursively and also may be stacked
      on top of ecryptfs or vice versa.
      
      To limit the kernel stack usage we must limit the depth of the
      filesystem stack.  Initially the limit is set to 2.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      [bwh: Backported to 3.16: drop changes to overlayfs]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      54c202bb
    • Dan Carpenter's avatar
      ALSA: compress: fix an integer overflow check · bd5ab00a
      Dan Carpenter authored
      commit 6217e5ed upstream.
      
      I previously added an integer overflow check here but looking at it now,
      it's still buggy.
      
      The bug happens in snd_compr_allocate_buffer().  We multiply
      ".fragments" and ".fragment_size" and that doesn't overflow but then we
      save it in an unsigned int so it truncates the high bits away and we
      allocate a smaller than expected size.
      
      Fixes: b35cc822 ('ALSA: compress_core: integer overflow in snd_compr_allocate_buffer()')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      bd5ab00a
    • Hugh Dickins's avatar
      tmpfs: fix regression hang in fallocate undo · 193cd538
      Hugh Dickins authored
      commit 7f556567 upstream.
      
      The well-spotted fallocate undo fix is good in most cases, but not when
      fallocate failed on the very first page.  index 0 then passes lend -1
      to shmem_undo_range(), and that has two bad effects: (a) that it will
      undo every fallocation throughout the file, unrestricted by the current
      range; but more importantly (b) it can cause the undo to hang, because
      lend -1 is treated as truncation, which makes it keep on retrying until
      every page has gone, but those already fully instantiated will never go
      away.  Big thank you to xfstests generic/269 which demonstrates this.
      
      Fixes: b9b4bb26 ("tmpfs: don't undo fallocate past its last page")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [bwh: Backported to 3.16: use PAGE_CACHE_SHIFT instead of PAGE_SHIFT]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      193cd538
    • Jeff Mahoney's avatar
      ecryptfs: don't allow mmap when the lower fs doesn't support it · d9ce899c
      Jeff Mahoney authored
      commit f0fe970d upstream.
      
      There are legitimate reasons to disallow mmap on certain files, notably
      in sysfs or procfs.  We shouldn't emulate mmap support on file systems
      that don't offer support natively.
      
      CVE-2016-1583
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
      Signed-off-by: default avatarTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      d9ce899c
    • Jan Beulich's avatar
      xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7 · 17a13e5d
      Jan Beulich authored
      commit 6f2d9d99 upstream.
      
      As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
      CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
      both Intel and AMD systems. Doing any kind of hardware capability
      checks in the driver as a prerequisite was wrong anyway: With the
      hypervisor being in charge, all such checking should be done by it. If
      ACPI data gets uploaded despite some missing capability, the hypervisor
      is free to ignore part or all of that data.
      
      Ditch the entire check_prereq() function, and do the only valid check
      (xen_initial_domain()) in the caller in its place.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      17a13e5d
    • Jan Beulich's avatar
      xenbus: don't bail early from xenbus_dev_request_and_reply() · 6b25dcab
      Jan Beulich authored
      commit 7469be95 upstream.
      
      xenbus_dev_request_and_reply() needs to track whether a transaction is
      open.  For XS_TRANSACTION_START messages it calls transaction_start()
      and for XS_TRANSACTION_END messages it calls transaction_end().
      
      If sending an XS_TRANSACTION_START message fails or responds with an
      an error, the transaction is not open and transaction_end() must be
      called.
      
      If sending an XS_TRANSACTION_END message fails, the transaction is
      still open, but if an error response is returned the transaction is
      closed.
      
      Commit 027bd7e8 ("xen/xenbus: Avoid synchronous wait on XenBus
      stalling shutdown/restart") introduced a regression where failed
      XS_TRANSACTION_START messages were leaving the transaction open.  This
      can cause problems with suspend (and migration) as all transactions
      must be closed before suspending.
      
      It appears that the problematic change was added accidentally, so just
      remove it.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6b25dcab
    • Ursula Braun's avatar
      qeth: delete napi struct when removing a qeth device · 4fa1ae66
      Ursula Braun authored
      commit 7831b4ff upstream.
      
      A qeth_card contains a napi_struct linked to the net_device during
      device probing. This struct must be deleted when removing the qeth
      device, otherwise Panic on oops can occur when qeth devices are
      repeatedly removed and added.
      
      Fixes: a1c3ed4c ("qeth: NAPI support for l2 and l3 discipline")
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Tested-by: default avatarAlexander Klein <ALKL@de.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      4fa1ae66
    • Takashi Iwai's avatar
      ALSA: timer: Fix negative queue usage by racy accesses · cb2955d6
      Takashi Iwai authored
      commit 3fa6993f upstream.
      
      The user timer tu->qused counter may go to a negative value when
      multiple concurrent reads are performed since both the check and the
      decrement of tu->qused are done in two individual locked contexts.
      This results in bogus read outs, and the endless loop in the
      user-space side.
      
      The fix is to move the decrement of the tu->qused counter into the
      same spinlock context as the zero-check of the counter.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cb2955d6
    • Florian Fainelli's avatar
      net: bcmsysport: Device stats are unsigned long · 54e5acf1
      Florian Fainelli authored
      commit 016eb551 upstream.
      
      On 64bits kernels, device stats are 64bits wide, not 32bits.
      
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      54e5acf1
    • Omar Sandoval's avatar
      block: fix use-after-free in sys_ioprio_get() · 60b67e25
      Omar Sandoval authored
      commit 8ba86821 upstream.
      
      get_task_ioprio() accesses the task->io_context without holding the task
      lock and thus can race with exit_io_context(), leading to a
      use-after-free. The reproducer below hits this within a few seconds on
      my 4-core QEMU VM:
      
      #define _GNU_SOURCE
      #include <assert.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/wait.h>
      
      int main(int argc, char **argv)
      {
      	pid_t pid, child;
      	long nproc, i;
      
      	/* ioprio_set(IOPRIO_WHO_PROCESS, 0, IOPRIO_PRIO_VALUE(IOPRIO_CLASS_IDLE, 0)); */
      	syscall(SYS_ioprio_set, 1, 0, 0x6000);
      
      	nproc = sysconf(_SC_NPROCESSORS_ONLN);
      
      	for (i = 0; i < nproc; i++) {
      		pid = fork();
      		assert(pid != -1);
      		if (pid == 0) {
      			for (;;) {
      				pid = fork();
      				assert(pid != -1);
      				if (pid == 0) {
      					_exit(0);
      				} else {
      					child = wait(NULL);
      					assert(child == pid);
      				}
      			}
      		}
      
      		pid = fork();
      		assert(pid != -1);
      		if (pid == 0) {
      			for (;;) {
      				/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
      				syscall(SYS_ioprio_get, 2, 0);
      			}
      		}
      	}
      
      	for (;;) {
      		/* ioprio_get(IOPRIO_WHO_PGRP, 0); */
      		syscall(SYS_ioprio_get, 2, 0);
      	}
      
      	return 0;
      }
      
      This gets us KASAN dumps like this:
      
      [   35.526914] ==================================================================
      [   35.530009] BUG: KASAN: out-of-bounds in get_task_ioprio+0x7b/0x90 at addr ffff880066f34e6c
      [   35.530009] Read of size 2 by task ioprio-gpf/363
      [   35.530009] =============================================================================
      [   35.530009] BUG blkdev_ioc (Not tainted): kasan: bad access detected
      [   35.530009] -----------------------------------------------------------------------------
      
      [   35.530009] Disabling lock debugging due to kernel taint
      [   35.530009] INFO: Allocated in create_task_io_context+0x2b/0x370 age=0 cpu=0 pid=360
      [   35.530009] 	___slab_alloc+0x55d/0x5a0
      [   35.530009] 	__slab_alloc.isra.20+0x2b/0x40
      [   35.530009] 	kmem_cache_alloc_node+0x84/0x200
      [   35.530009] 	create_task_io_context+0x2b/0x370
      [   35.530009] 	get_task_io_context+0x92/0xb0
      [   35.530009] 	copy_process.part.8+0x5029/0x5660
      [   35.530009] 	_do_fork+0x155/0x7e0
      [   35.530009] 	SyS_clone+0x19/0x20
      [   35.530009] 	do_syscall_64+0x195/0x3a0
      [   35.530009] 	return_from_SYSCALL_64+0x0/0x6a
      [   35.530009] INFO: Freed in put_io_context+0xe7/0x120 age=0 cpu=0 pid=1060
      [   35.530009] 	__slab_free+0x27b/0x3d0
      [   35.530009] 	kmem_cache_free+0x1fb/0x220
      [   35.530009] 	put_io_context+0xe7/0x120
      [   35.530009] 	put_io_context_active+0x238/0x380
      [   35.530009] 	exit_io_context+0x66/0x80
      [   35.530009] 	do_exit+0x158e/0x2b90
      [   35.530009] 	do_group_exit+0xe5/0x2b0
      [   35.530009] 	SyS_exit_group+0x1d/0x20
      [   35.530009] 	entry_SYSCALL_64_fastpath+0x1a/0xa4
      [   35.530009] INFO: Slab 0xffffea00019bcd00 objects=20 used=4 fp=0xffff880066f34ff0 flags=0x1fffe0000004080
      [   35.530009] INFO: Object 0xffff880066f34e58 @offset=3672 fp=0x0000000000000001
      [   35.530009] ==================================================================
      
      Fix it by grabbing the task lock while we poke at the io_context.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      60b67e25
    • Mohamad Haj Yahia's avatar
      net/mlx5: Add timeout handle to commands with callback · a2666756
      Mohamad Haj Yahia authored
      commit 65ee6708 upstream.
      
      The current implementation does not handle timeout in case of command
      with callback request, and this can lead to deadlock if the command
      doesn't get fw response.
      Add delayed callback timeout work before posting the command to fw.
      In case of real fw command completion we will cancel the delayed work.
      In case of fw command timeout the callback timeout handler will be
      called and it will simulate fw completion with timeout error.
      
      Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
      Signed-off-by: default avatarMohamad Haj Yahia <mohamad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a2666756
    • Mohamad Haj Yahia's avatar
      net/mlx5: Fix potential deadlock in command mode change · c92dd270
      Mohamad Haj Yahia authored
      commit 9cba4ebc upstream.
      
      Call command completion handler in case of timeout when working in
      interrupts mode.
      Avoid flushing the commands workqueue after acquiring the semaphores to
      prevent a potential deadlock.
      
      Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
      Signed-off-by: default avatarMohamad Haj Yahia <mohamad@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: the calculation of ds is more complex]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      c92dd270
    • Eric Dumazet's avatar
      bonding: prevent out of bound accesses · 8b65deb4
      Eric Dumazet authored
      commit f87fda00 upstream.
      
      ether_addr_equal_64bits() requires some care about its arguments,
      namely that 8 bytes might be read, even if last 2 byte values are not
      used.
      
      KASan detected a violation with null_mac_addr and lacpdu_mcast_addr
      in bond_3ad.c
      
      Same problem with mac_bcast[] and mac_v6_allmcast[] in bond_alb.c :
      Although the 8-byte alignment was there, KASan would detect out
      of bound accesses.
      
      Fixes: 815117ad ("bonding: use ether_addr_equal_unaligned for bond addr compare")
      Fixes: bb54e589 ("bonding: Verify RX LACPDU has proper dest mac-addr")
      Fixes: 885a136c ("bonding: use compare_ether_addr_64bits() in ALB")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: default avatarDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16:
       - Adjust filename
       - Drop change to bond_params::ad_actor_system
       - Fix one more copy of null_mac_addr to use eth_zero_addr()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      8b65deb4
    • Borislav Petkov's avatar
      x86/amd_nb: Fix boot crash on non-AMD systems · a660bccb
      Borislav Petkov authored
      commit 1ead852d upstream.
      
      Fix boot crash that triggers if this driver is built into a kernel and
      run on non-AMD systems.
      
      AMD northbridges users call amd_cache_northbridges() and it returns
      a negative value to signal that we weren't able to cache/detect any
      northbridges on the system.
      
      At least, it should do so as all its callers expect it to do so. But it
      does return a negative value only when kmalloc() fails.
      
      Fix it to return -ENODEV if there are no NBs cached as otherwise, amd_nb
      users like amd64_edac, for example, which relies on it to know whether
      it should load or not, gets loaded on systems like Intel Xeons where it
      shouldn't.
      Reported-and-tested-by: default avatarTony Battersby <tonyb@cybernetics.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1466097230-5333-2-git-send-email-bp@alien8.de
      Link: https://lkml.kernel.org/r/5761BEB0.9000807@cybernetics.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      a660bccb
    • Rafael J. Wysocki's avatar
      x86/power/64: Fix kernel text mapping corruption during image restoration · e2c74129
      Rafael J. Wysocki authored
      commit 65c0554b upstream.
      
      Logan Gunthorpe reports that hibernation stopped working reliably for
      him after commit ab76f7b4 (x86/mm: Set NX on gap between __ex_table
      and rodata).
      
      That turns out to be a consequence of a long-standing issue with the
      64-bit image restoration code on x86, which is that the temporary
      page tables set up by it to avoid page tables corruption when the
      last bits of the image kernel's memory contents are copied into
      their original page frames re-use the boot kernel's text mapping,
      but that mapping may very well get corrupted just like any other
      part of the page tables.  Of course, if that happens, the final
      jump to the image kernel's entry point will go to nowhere.
      
      The exact reason why commit ab76f7b4 matters here is that it
      sometimes causes a PMD of a large page to be split into PTEs
      that are allocated dynamically and get corrupted during image
      restoration as described above.
      
      To fix that issue note that the code copying the last bits of the
      image kernel's memory contents to the page frames occupied by them
      previoulsy doesn't use the kernel text mapping, because it runs from
      a special page covered by the identity mapping set up for that code
      from scratch.  Hence, the kernel text mapping is only needed before
      that code starts to run and then it will only be used just for the
      final jump to the image kernel's entry point.
      
      Accordingly, the temporary page tables set up in swsusp_arch_resume()
      on x86-64 need to contain the kernel text mapping too.  That mapping
      is only going to be used for the final jump to the image kernel, so
      it only needs to cover the image kernel's entry point, because the
      first thing the image kernel does after getting control back is to
      switch over to its own original page tables.  Moreover, the virtual
      address of the image kernel's entry point in that mapping has to be
      the same as the one mapped by the image kernel's page tables.
      
      With that in mind, modify the x86-64's arch_hibernation_header_save()
      and arch_hibernation_header_restore() routines to pass the physical
      address of the image kernel's entry point (in addition to its virtual
      address) to the boot kernel (a small piece of assembly code involved
      in passing the entry point's virtual address to the image kernel is
      not necessary any more after that, so drop it).  Update RESTORE_MAGIC
      too to reflect the image header format change.
      
      Next, in set_up_temporary_mappings(), use the physical and virtual
      addresses of the image kernel's entry point passed in the image
      header to set up a minimum kernel text mapping (using memory pages
      that won't be overwritten by the image kernel's memory contents) that
      will map those addresses to each other as appropriate.
      
      This makes the concern about the possible corruption of the original
      boot kernel text mapping go away and if the the minimum kernel text
      mapping used for the final jump marks the image kernel's entry point
      memory as executable, the jump to it is guaraneed to succeed.
      
      Fixes: ab76f7b4 (x86/mm: Set NX on gap between __ex_table and rodata)
      Link: http://marc.info/?l=linux-pm&m=146372852823760&w=2Reported-by: default avatarLogan Gunthorpe <logang@deltatee.com>
      Reported-and-tested-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      e2c74129
    • Takashi Iwai's avatar
      ALSA: au88x0: Fix calculation in vortex_wtdma_bufshift() · 7a826679
      Takashi Iwai authored
      commit 62db7152 upstream.
      
      vortex_wtdma_bufshift() function does calculate the page index
      wrongly, first masking then shift, which always results in zero.
      The proper computation is to first shift, then mask.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7a826679
    • Dan Carpenter's avatar
      qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag() · 1f53d345
      Dan Carpenter authored
      commit 5b4d10f5 upstream.
      
      There is a static checker warning here "warn: mask and shift to zero"
      and the code sets "ring" to zero every time.  From looking at how
      QLCNIC_FETCH_RING_ID() is used in qlcnic_83xx_process_rcv_ring() the
      qlcnic_83xx_hndl() should be removed.
      
      Fixes: 4be41e92 ('qlcnic: 83xx data path routines')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      1f53d345
    • Sven Eckelmann's avatar
      batman-adv: Clean up untagged vlan when destroying via rtnl-link · 3d88d87f
      Sven Eckelmann authored
      commit 420cb1b7 upstream.
      
      The untagged vlan object is only destroyed when the interface is removed
      via the legacy sysfs interface. But it also has to be destroyed when the
      standard rtnl-link interface is used.
      
      Fixes: 5d2c05b2 ("batman-adv: add per VLAN interface attribute framework")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Acked-by: default avatarAntonio Quartulli <a@unstable.cc>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: s/_put/_free_ref/]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      3d88d87f
    • Sven Eckelmann's avatar
      batman-adv: Fix ICMP RR ethernet access after skb_linearize · 6d3ce108
      Sven Eckelmann authored
      commit 3b55e442 upstream.
      
      The skb_linearize may reallocate the skb. This makes the calculated pointer
      for ethhdr invalid. But it the pointer is used later to fill in the RR
      field of the batadv_icmp_packet_rr packet.
      
      Instead re-evaluate eth_hdr after the skb_linearize+skb_cow to fix the
      pointer and avoid the invalid read.
      
      Fixes: da6b8c20 ("batman-adv: generalize batman-adv icmp packet handling")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      6d3ce108
    • Ben Hutchings's avatar
      batman-adv: Fix double-put of vlan object · 51e37ea1
      Ben Hutchings authored
      commit baceced9 upstream.
      
      Each batadv_tt_local_entry hold a single reference to a
      batadv_softif_vlan.  In case a new entry cannot be added to the hash
      table, the error path puts the reference, but the reference will also
      now be dropped by batadv_tt_local_entry_release().
      
      Fixes: a33d970d ("batman-adv: Fix reference counting of vlan object for tt_local_entry")
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: s/_put/_free_ref/]
      51e37ea1
    • Sven Eckelmann's avatar
      batman-adv: Fix use-after-free/double-free of tt_req_node · 55640775
      Sven Eckelmann authored
      commit 9c4604a2 upstream.
      
      The tt_req_node is added and removed from a list inside a spinlock. But the
      locking is sometimes removed even when the object is still referenced and
      will be used later via this reference. For example batadv_send_tt_request
      can create a new tt_req_node (including add to a list) and later
      re-acquires the lock to remove it from the list and to free it. But at this
      time another context could have already removed this tt_req_node from the
      list and freed it.
      
      CPU#0
      
          batadv_batman_skb_recv from net_device 0
          -> batadv_iv_ogm_receive
            -> batadv_iv_ogm_process
              -> batadv_iv_ogm_process_per_outif
                -> batadv_tvlv_ogm_receive
                  -> batadv_tvlv_ogm_receive
                    -> batadv_tvlv_containers_process
                      -> batadv_tvlv_call_handler
                        -> batadv_tt_tvlv_ogm_handler_v1
                          -> batadv_tt_update_orig
                            -> batadv_send_tt_request
                              -> batadv_tt_req_node_new
                                 spin_lock(...)
                                 allocates new tt_req_node and adds it to list
                                 spin_unlock(...)
                                 return tt_req_node
      
      CPU#1
      
          batadv_batman_skb_recv from net_device 1
          -> batadv_recv_unicast_tvlv
            -> batadv_tvlv_containers_process
              -> batadv_tvlv_call_handler
                -> batadv_tt_tvlv_unicast_handler_v1
                  -> batadv_handle_tt_response
                     spin_lock(...)
                     tt_req_node gets removed from list and is freed
                     spin_unlock(...)
      
      CPU#0
      
                            <- returned to batadv_send_tt_request
                               spin_lock(...)
                               tt_req_node gets removed from list and is freed
                               MEMORY CORRUPTION/SEGFAULT/...
                               spin_unlock(...)
      
      This can only be solved via reference counting to allow multiple contexts
      to handle the list manipulation while making sure that only the last
      context holding a reference will free the object.
      
      Fixes: a73105b8 ("batman-adv: improved client announcement mechanism")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Tested-by: default avatarMartin Weinelt <martin@darmstadt.freifunk.net>
      Tested-by: default avatarAmadeus Alfa <amadeus@chemnitz.freifunk.net>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16:
       - Adjust context
       - Use list_empty() instead of hlist_unhashed()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      55640775
    • Simon Wunderlich's avatar
      batman-adv: replace WARN with rate limited output on non-existing VLAN · 43c704f5
      Simon Wunderlich authored
      commit 0b3dd7df upstream.
      
      If a VLAN tagged frame is received and the corresponding VLAN is not
      configured on the soft interface, it will splat a WARN on every packet
      received. This is a quite annoying behaviour for some scenarios, e.g. if
      bat0 is bridged with eth0, and there are arbitrary VLAN tagged frames
      from Ethernet coming in without having any VLAN configuration on bat0.
      
      The code should probably create vlan objects on the fly and
      transparently transport these VLAN-tagged Ethernet frames, but until
      this is done, at least the WARN splat should be replaced by a rate
      limited output.
      
      Fixes: 354136bc ("batman-adv: fix kernel crash due to missing NULL checks")
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      43c704f5
    • Sven Eckelmann's avatar
      batman-adv: Fix memory leak on tt add with invalid vlan · cbc2dd21
      Sven Eckelmann authored
      commit fd7dec25 upstream.
      
      The object tt_local is allocated with kmalloc and not initialized when the
      function batadv_tt_local_add checks for the vlan. But this function can
      only cleanup the object when the (not yet initialized) reference counter of
      the object is 1. This is unlikely and thus the object would leak when the
      vlan could not be found.
      
      Instead the uninitialized object tt_local has to be freed manually and the
      pointer has to set to NULL to avoid calling the function which would try to
      decrement the reference counter of the not existing object.
      
      CID: 1316518
      Fixes: 354136bc ("batman-adv: fix kernel crash due to missing NULL checks")
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      cbc2dd21
    • Florian Fainelli's avatar
      net: phy: Manage fixed PHY address space using IDA · 9a95f4c4
      Florian Fainelli authored
      commit 69fc58a5 upstream.
      
      If we have a system which uses fixed PHY devices and calls
      fixed_phy_register() then fixed_phy_unregister() we can exhaust the
      number of fixed PHYs available after a while, since we keep incrementing
      the variable phy_fixed_addr, but we never decrement it.
      
      This patch fixes that by converting the fixed PHY allocation to using
      IDA, which takes care of the allocation/dealloaction of the PHY
      addresses for us.
      
      Fixes: a7595121 ("net: phy: extend fixed driver with fixed_phy_register()")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16:
       - Adjust filename, context
       - fixed_phy_register() returns an integer, not a pointer/error]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9a95f4c4
    • Michael Neuling's avatar
      powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 · ceb3c227
      Michael Neuling authored
      commit 190ce869 upstream.
      
      Currently we have 2 segments that are bolted for the kernel linear
      mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel
      stacks. Anything accessed outside of these regions may need to be
      faulted in. (In practice machines with TM always have 1T segments)
      
      If a machine has < 2TB of memory we never fault on the kernel linear
      mapping as these two segments cover all physical memory. If a machine
      has > 2TB of memory, there may be structures outside of these two
      segments that need to be faulted in. This faulting can occur when
      running as a guest as the hypervisor may remove any SLB that's not
      bolted.
      
      When we treclaim and trecheckpoint we have a window where we need to
      run with the userspace GPRs. This means that we no longer have a valid
      stack pointer in r1. For this window we therefore clear MSR RI to
      indicate that any exceptions taken at this point won't be able to be
      handled. This means that we can't take segment misses in this RI=0
      window.
      
      In this RI=0 region, we currently access the thread_struct for the
      process being context switched to or from. This thread_struct access
      may cause a segment fault since it's not guaranteed to be covered by
      the two bolted segment entries described above.
      
      We've seen this with a crash when running as a guest with > 2TB of
      memory on PowerVM:
      
        Unrecoverable exception 4100 at c00000000004f138
        Oops: Unrecoverable exception, sig: 6 [#1]
        SMP NR_CPUS=2048 NUMA pSeries
        CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G                 X 4.4.13-46-default #1
        task: c000189001df4210 ti: c000189001d5c000 task.ti: c000189001d5c000
        NIP: c00000000004f138 LR: 0000000010003a24 CTR: 0000000010001b20
        REGS: c000189001d5f730 TRAP: 4100   Tainted: G                 X  (4.4.13-46-default)
        MSR: 8000000100001031 <SF,ME,IR,DR,LE>  CR: 24000048  XER: 00000000
        CFAR: c00000000004ed18 SOFTE: 0
        GPR00: ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288
        GPR04: 0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620
        GPR08: 000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0
        GPR12: 00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211
        GPR16: 0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110
        GPR20: 0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050
        GPR24: 0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30
        GPR28: 00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680
        NIP [c00000000004f138] restore_gprs+0xd0/0x16c
        LR [0000000010003a24] 0x10003a24
        Call Trace:
        [c000189001d5f9b0] [c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable)
        [c000189001d5fb90] [c00000000001583c] tm_recheckpoint+0x6c/0xa0
        [c000189001d5fbd0] [c000000000015c40] __switch_to+0x2c0/0x350
        [c000189001d5fc30] [c0000000007e647c] __schedule+0x32c/0x9c0
        [c000189001d5fcb0] [c0000000007e6b58] schedule+0x48/0xc0
        [c000189001d5fce0] [c0000000000deabc] worker_thread+0x22c/0x5b0
        [c000189001d5fd80] [c0000000000e7000] kthread+0x110/0x130
        [c000189001d5fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
        Instruction dump:
        7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6
        38a00006 78c60022 7cc62838 0b060000 <e8c701a0> 7ccff120 e8270078 e8a70098
        ---[ end trace 602126d0a1dedd54 ]---
      
      This fixes this by copying the required data from the thread_struct to
      the stack before we clear MSR RI. Then once we clear RI, we only access
      the stack, guaranteeing there's no segment miss.
      
      We also tighten the region over which we set RI=0 on the treclaim()
      path. This may have a slight performance impact since we're adding an
      mtmsr instruction.
      
      Fixes: 090b9284 ("powerpc/tm: Clear MSR RI in non-recoverable TM code")
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Reviewed-by: default avatarCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      ceb3c227
    • Brian King's avatar
      ipr: Clear interrupt on croc/crocodile when running with LSI · 5de24309
      Brian King authored
      commit 54e430bb upstream.
      
      If we fall back to using LSI on the Croc or Crocodile chip we need to
      clear the interrupt so we don't hang the system.
      Tested-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      5de24309
    • Trond Myklebust's avatar
      NFS: Fix another OPEN_DOWNGRADE bug · 9650f565
      Trond Myklebust authored
      commit e547f262 upstream.
      
      Olga Kornievskaia reports that the following test fails to trigger
      an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.
      
      	fd0 = open(foo, RDRW)   -- should be open on the wire for "both"
      	fd1 = open(foo, RDONLY)  -- should be open on the wire for "read"
      	close(fd0) -- should trigger an open_downgrade
      	read(fd1)
      	close(fd1)
      
      The issue is that we're missing a check for whether or not the current
      state transitioned from an O_RDWR state as opposed to having transitioned
      from a combination of O_RDONLY and O_WRONLY.
      Reported-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Fixes: cd9288ff ("NFSv4: Fix another bug in the close/open_downgrade code")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      9650f565
    • daniel's avatar
      Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address · 766b5ed1
      daniel authored
      commit 0888d5f3 upstream.
      
      The bridge is falsly dropping ipv6 mulitcast packets if there is:
       1. No ipv6 address assigned on the brigde.
       2. No external mld querier present.
       3. The internal querier enabled.
      
      When the bridge fails to build mld queries, because it has no
      ipv6 address, it slilently returns, but keeps the local querier enabled.
      This specific case causes confusing packet loss.
      
      Ipv6 multicast snooping can only work if:
       a) An external querier is present
       OR
       b) The bridge has an ipv6 address an is capable of sending own queries
      
      Otherwise it has to forward/flood the ipv6 multicast traffic,
      because snooping cannot work.
      
      This patch fixes the issue by adding a flag to the bridge struct that
      indicates that there is currently no ipv6 address assinged to the bridge
      and returns a false state for the local querier in
      __br_multicast_querier_exists().
      
      Special thanks to Linus Lüssing.
      
      Fixes: d1d81d4c ("bridge: check return value of ipv6_dev_get_saddr()")
      Signed-off-by: default avatarDaniel Danzberger <daniel@dd-wrt.com>
      Acked-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      766b5ed1
    • Jouni Malinen's avatar
      mac80211: Fix mesh estab_plinks counting in STA removal case · 93917b05
      Jouni Malinen authored
      commit 126e7557 upstream.
      
      If a user space program (e.g., wpa_supplicant) deletes a STA entry that
      is currently in NL80211_PLINK_ESTAB state, the number of established
      plinks counter was not decremented and this could result in rejecting
      new plink establishment before really hitting the real maximum plink
      limit. For !user_mpm case, this decrementation is handled by
      mesh_plink_deactive().
      
      Fix this by decrementing estab_plinks on STA deletion
      (mesh_sta_cleanup() gets called from there) so that the counter has a
      correct value and the Beacon frame advertisement in Mesh Configuration
      element shows the proper value for capability to accept additional
      peers.
      Signed-off-by: default avatarJouni Malinen <j@w1.fi>
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      [bwh: Backported to 3.16: plink_state field is in struct sta_info]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      93917b05
    • Florian Fainelli's avatar
      net: bgmac: Remove superflous netif_carrier_on() · f61c0cf7
      Florian Fainelli authored
      commit 3894396e upstream.
      
      bgmac_open() calls phy_start() to initialize the PHY state machine,
      which will set the interface's carrier state accordingly, no need to
      force that as this could be conflicting with the PHY state determined by
      PHYLIB.
      
      Fixes: dd4544f0 ("bgmac: driver for GBit MAC core on BCMA bus")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      f61c0cf7
    • Florian Fainelli's avatar
      net: bgmac: Start transmit queue in bgmac_open · 7d28b934
      Florian Fainelli authored
      commit c3897f2a upstream.
      
      The driver does not start the transmit queue in bgmac_open(). If the
      queue was stopped prior to closing then re-opening the interface, we
      would never be able to wake-up again.
      
      Fixes: dd4544f0 ("bgmac: driver for GBit MAC core on BCMA bus")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [bwh: Backported to 3.16: adjust context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      7d28b934