1. 27 Apr, 2016 40 commits
    • Mel Gorman's avatar
      mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault · e9c23599
      Mel Gorman authored
      commit 2f84a899 upstream.
      
      SunDong reported the following on
      
        https://bugzilla.kernel.org/show_bug.cgi?id=103841
      
      	I think I find a linux bug, I have the test cases is constructed. I
      	can stable recurring problems in fedora22(4.0.4) kernel version,
      	arch for x86_64.  I construct transparent huge page, when the parent
      	and child process with MAP_SHARE, MAP_PRIVATE way to access the same
      	huge page area, it has the opportunity to lead to huge page copy on
      	write failure, and then it will munmap the child corresponding mmap
      	area, but then the child mmap area with VM_MAYSHARE attributes, child
      	process munmap this area can trigger VM_BUG_ON in set_vma_resv_flags
      	functions (vma - > vm_flags & VM_MAYSHARE).
      
      There were a number of problems with the report (e.g.  it's hugetlbfs that
      triggers this, not transparent huge pages) but it was fundamentally
      correct in that a VM_BUG_ON in set_vma_resv_flags() can be triggered that
      looks like this
      
      	 vma ffff8804651fd0d0 start 00007fc474e00000 end 00007fc475e00000
      	 next ffff8804651fd018 prev ffff8804651fd188 mm ffff88046b1b1800
      	 prot 8000000000000027 anon_vma           (null) vm_ops ffffffff8182a7a0
      	 pgoff 0 file ffff88106bdb9800 private_data           (null)
      	 flags: 0x84400fb(read|write|shared|mayread|maywrite|mayexec|mayshare|dontexpand|hugetlb)
      	 ------------
      	 kernel BUG at mm/hugetlb.c:462!
      	 SMP
      	 Modules linked in: xt_pkttype xt_LOG xt_limit [..]
      	 CPU: 38 PID: 26839 Comm: map Not tainted 4.0.4-default #1
      	 Hardware name: Dell Inc. PowerEdge R810/0TT6JF, BIOS 2.7.4 04/26/2012
      	 set_vma_resv_flags+0x2d/0x30
      
      The VM_BUG_ON is correct because private and shared mappings have
      different reservation accounting but the warning clearly shows that the
      VMA is shared.
      
      When a private COW fails to allocate a new page then only the process
      that created the VMA gets the page -- all the children unmap the page.
      If the children access that data in the future then they get killed.
      
      The problem is that the same file is mapped shared and private.  During
      the COW, the allocation fails, the VMAs are traversed to unmap the other
      private pages but a shared VMA is found and the bug is triggered.  This
      patch identifies such VMAs and skips them.
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Reported-by: default avatarSunDong <sund_sky@126.com>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: David Rientjes <rientjes@google.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      e9c23599
    • Ben Hutchings's avatar
      genirq: Fix race in register_irq_proc() · a03288de
      Ben Hutchings authored
      commit 95c2b175 upstream.
      
      Per-IRQ directories in procfs are created only when a handler is first
      added to the irqdesc, not when the irqdesc is created.  In the case of
      a shared IRQ, multiple tasks can race to create a directory.  This
      race condition seems to have been present forever, but is easier to
      hit with async probing.
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.ukSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a03288de
    • Thomas Gleixner's avatar
      x86/process: Add proper bound checks in 64bit get_wchan() · 12bdf057
      Thomas Gleixner authored
      commit eddd3826 upstream.
      
      Dmitry Vyukov reported the following using trinity and the memory
      error detector AddressSanitizer
      (https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel).
      
      [ 124.575597] ERROR: AddressSanitizer: heap-buffer-overflow on
      address ffff88002e280000
      [ 124.576801] ffff88002e280000 is located 131938492886538 bytes to
      the left of 28857600-byte region [ffffffff81282e0a, ffffffff82e0830a)
      [ 124.578633] Accessed by thread T10915:
      [ 124.579295] inlined in describe_heap_address
      ./arch/x86/mm/asan/report.c:164
      [ 124.579295] #0 ffffffff810dd277 in asan_report_error
      ./arch/x86/mm/asan/report.c:278
      [ 124.580137] #1 ffffffff810dc6a0 in asan_check_region
      ./arch/x86/mm/asan/asan.c:37
      [ 124.581050] #2 ffffffff810dd423 in __tsan_read8 ??:0
      [ 124.581893] #3 ffffffff8107c093 in get_wchan
      ./arch/x86/kernel/process_64.c:444
      
      The address checks in the 64bit implementation of get_wchan() are
      wrong in several ways:
      
       - The lower bound of the stack is not the start of the stack
         page. It's the start of the stack page plus sizeof (struct
         thread_info)
      
       - The upper bound must be:
      
             top_of_stack - TOP_OF_KERNEL_STACK_PADDING - 2 * sizeof(unsigned long).
      
         The 2 * sizeof(unsigned long) is required because the stack pointer
         points at the frame pointer. The layout on the stack is: ... IP FP
         ... IP FP. So we need to make sure that both IP and FP are in the
         bounds.
      
      Fix the bound checks and get rid of the mix of numeric constants, u64
      and unsigned long. Making all unsigned long allows us to use the same
      function for 32bit as well.
      
      Use READ_ONCE() when accessing the stack. This does not prevent a
      concurrent wakeup of the task and the stack changing, but at least it
      avoids TOCTOU.
      
      Also check task state at the end of the loop. Again that does not
      prevent concurrent changes, but it avoids walking for nothing.
      
      Add proper comments while at it.
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Based-on-patch-from: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarBorislav Petkov <bp@alien8.de>
      Reviewed-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
      Link: http://lkml.kernel.org/r/20150930083302.694788319@linutronix.deSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      [lizf: Backported to 3.4:
       - s/READ_ONCE/ACCESS_ONCE
       - remove TOP_OF_KERNEL_STACK_PADDING]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      12bdf057
    • shengyong's avatar
      UBI: return ENOSPC if no enough space available · 127cf7cb
      shengyong authored
      commit 7c7feb2e upstream.
      
      UBI: attaching mtd1 to ubi0
      UBI: scanning is finished
      UBI error: init_volumes: not enough PEBs, required 706, available 686
      UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1)
      UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12 <= NOT ENOMEM
      UBI error: ubi_init: cannot attach mtd1
      
      If available PEBs are not enough when initializing volumes, return -ENOSPC
      directly. If available PEBs are not enough when initializing WL, return
      -ENOSPC instead of -ENOMEM.
      Signed-off-by: default avatarSheng Yong <shengyong1@huawei.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reviewed-by: default avatarDavid Gstir <david@sigma-star.at>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      127cf7cb
    • Richard Weinberger's avatar
      UBI: Validate data_size · 15acb368
      Richard Weinberger authored
      commit 281fda27 upstream.
      
      Make sure that data_size is less than LEB size.
      Otherwise a handcrafted UBI image is able to trigger
      an out of bounds memory access in ubi_compare_lebs().
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      Reviewed-by: default avatarDavid Gstir <david@sigma-star.at>
      [lizf: Backported to 3.4: use dbg_err() instead of ubi_err()];
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      15acb368
    • Malcolm Crossley's avatar
      x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map · 9ed559d3
      Malcolm Crossley authored
      commit 64c98e7f upstream.
      
      Sanitizing the e820 map may produce extra E820 entries which would result in
      the topmost E820 entries being removed. The removed entries would typically
      include the top E820 usable RAM region and thus result in the domain having
      signicantly less RAM available to it.
      
      Fix by allowing sanitize_e820_map to use the full size of the allocated E820
      array.
      Signed-off-by: default avatarMalcolm Crossley <malcolm.crossley@citrix.com>
      Reviewed-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      [lizf: Backported to 3.4: s/map/xen_e820_map]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      9ed559d3
    • Andreas Schwab's avatar
      m68k: Define asmlinkage_protect · d311156a
      Andreas Schwab authored
      commit 8474ba74 upstream.
      
      Make sure the compiler does not modify arguments of syscall functions.
      This can happen if the compiler generates a tailcall to another
      function.  For example, without asmlinkage_protect sys_openat is compiled
      into this function:
      
      sys_openat:
      	clr.l %d0
      	move.w 18(%sp),%d0
      	move.l %d0,16(%sp)
      	jbra do_sys_open
      
      Note how the fourth argument is modified in place, modifying the register
      %d4 that gets restored from this stack slot when the function returns to
      user-space.  The caller may expect the register to be unmodified across
      system calls.
      Signed-off-by: default avatarAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d311156a
    • Felix Fietkau's avatar
      ath9k: declare required extra tx headroom · edb236d7
      Felix Fietkau authored
      commit 029cd037 upstream.
      
      ath9k inserts padding between the 802.11 header and the data area (to
      align it). Since it didn't declare this extra required headroom, this
      led to some nasty issues like randomly dropped packets in some setups.
      Signed-off-by: default avatarFelix Fietkau <nbd@openwrt.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      edb236d7
    • Joseph Qi's avatar
      ocfs2/dlm: fix deadlock when dispatch assert master · f5499bfc
      Joseph Qi authored
      commit 012572d4 upstream.
      
      The order of the following three spinlocks should be:
      dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock
      
      But dlm_dispatch_assert_master() is called while holding
      dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
      dlm_grab() which will take dlm_domain_lock.
      
      Once another thread (for example, dlm_query_join_handler) has already
      taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
      happens.
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: "Junxiao Bi" <junxiao.bi@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f5499bfc
    • Peter Seiderer's avatar
      cifs: use server timestamp for ntlmv2 authentication · 6fa2028d
      Peter Seiderer authored
      commit 98ce94c8 upstream.
      
      Linux cifs mount with ntlmssp against an Mac OS X (Yosemite
      10.10.5) share fails in case the clocks differ more than +/-2h:
      
      digest-service: digest-request: od failed with 2 proto=ntlmv2
      digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2
      
      Fix this by (re-)using the given server timestamp for the
      ntlmv2 authentication (as Windows 7 does).
      
      A related problem was also reported earlier by Namjae Jaen (see below):
      
      Windows machine has extended security feature which refuse to allow
      authentication when there is time difference between server time and
      client time when ntlmv2 negotiation is used. This problem is prevalent
      in embedded enviornment where system time is set to default 1970.
      
      Modern servers send the server timestamp in the TargetInfo Av_Pair
      structure in the challenge message [see MS-NLMP 2.2.2.1]
      In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must
      use the server provided timestamp if present OR current time if it is
      not
      Reported-by: default avatarNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: default avatarPeter Seiderer <ps.report@gmx.net>
      Signed-off-by: default avatarSteve French <smfrench@gmail.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6fa2028d
    • Mathias Nyman's avatar
      xhci: change xhci 1.0 only restrictions to support xhci 1.1 · 276a6c94
      Mathias Nyman authored
      commit dca77945 upstream.
      
      Some changes between xhci 0.96 and xhci 1.0 specifications forced us to
      check the hci version in code, some of these checks were implemented as
      hci_version == 1.0, which will not work with new xhci 1.1 controllers.
      
      xhci 1.1 behaves similar to xhci 1.0 in these cases, so change these
      checks to hci_version >= 1.0
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      276a6c94
    • Roger Quadros's avatar
      usb: xhci: Clear XHCI_STATE_DYING on start · fa8600fa
      Roger Quadros authored
      commit e5bfeab0 upstream.
      
      For whatever reason if XHCI died in the previous instant
      then it will never recover on the next xhci_start unless we
      clear the DYING flag.
      Signed-off-by: default avatarRoger Quadros <rogerq@ti.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      fa8600fa
    • Mathias Nyman's avatar
      xhci: give command abortion one more chance before killing xhci · 63a2bddd
      Mathias Nyman authored
      commit a6809ffd upstream.
      
      We want to give the command abortion an additional try to stop
      the command ring before we completely hose xhci.
      Tested-by: default avatarVincent Pelletier <plr.vincent@gmail.com>
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [lizf: Backported to 3.4: call handshake() instead of xhci_handshake()]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      63a2bddd
    • Mathias Nyman's avatar
      usb: Use the USB_SS_MULT() macro to get the burst multiplier. · 40dba0fd
      Mathias Nyman authored
      commit ff30cbc8 upstream.
      
      Bits 1:0 of the bmAttributes are used for the burst multiplier.
      The rest of the bits used to be reserved (zero), but USB3.1 takes bit 7
      into use.
      
      Use the existing USB_SS_MULT() macro instead to make sure the mult value
      and hence max packet calculations are correct for USB3.1 devices.
      
      Note that burst multiplier in bmAttributes is zero based and that
      the USB_SS_MULT() macro adds one.
      Signed-off-by: default avatarMathias Nyman <mathias.nyman@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      40dba0fd
    • Paolo Bonzini's avatar
      KVM: x86: trap AMD MSRs for the TSeg base and mask · 7d148ce4
      Paolo Bonzini authored
      commit 3afb1121 upstream.
      
      These have roughly the same purpose as the SMRR, which we do not need
      to implement in KVM.  However, Linux accesses MSR_K8_TSEG_ADDR at
      boot, which causes problems when running a Xen dom0 under KVM.
      Just return 0, meaning that processor protection of SMRAM is not
      in effect.
      Reported-by: default avatarM A Young <m.a.young@durham.ac.uk>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7d148ce4
    • Mark Brown's avatar
      regmap: debugfs: Don't bother actually printing when calculating max length · 42bffe1a
      Mark Brown authored
      commit 176fc2d5 upstream.
      
      The in kernel snprintf() will conveniently return the actual length of
      the printed string even if not given an output beffer at all so just do
      that rather than relying on the user to pass in a suitable buffer,
      ensuring that we don't need to worry if the buffer was truncated due to
      the size of the buffer passed in.
      Reported-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      42bffe1a
    • Mark Brown's avatar
      regmap: debugfs: Ensure we don't underflow when printing access masks · f4524d72
      Mark Brown authored
      commit b763ec17 upstream.
      
      If a read is attempted which is smaller than the line length then we may
      underflow the subtraction we're doing with the unsigned size_t type so
      move some of the calculation to be additions on the right hand side
      instead in order to avoid this.
      Reported-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f4524d72
    • Jeff Mahoney's avatar
      btrfs: skip waiting on ordered range for special files · 413e7340
      Jeff Mahoney authored
      commit a30e577c upstream.
      
      In btrfs_evict_inode, we properly truncate the page cache for evicted
      inodes but then we call btrfs_wait_ordered_range for every inode as well.
      It's the right thing to do for regular files but results in incorrect
      behavior for device inodes for block devices.
      
      filemap_fdatawrite_range gets called with inode->i_mapping which gets
      resolved to the block device inode before getting passed to
      wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi.  What happens
      next depends on whether there's an open file handle associated with the
      inode.  If there is, we write to the block device, which is unexpected
      behavior.  If there isn't, we through normally and inode->i_data is used.
      We can also end up racing against open/close which can result in crashes
      when i_mapping points to a block device inode that has been closed.
      
      Since there can't be any page cache associated with special file inodes,
      it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
      the problem.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911Tested-by: default avatarChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: default avatarJeff Mahoney <jeffm@suse.com>
      Reviewed-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      413e7340
    • Guenter Roeck's avatar
      spi: Fix documentation of spi_alloc_master() · 557b53d8
      Guenter Roeck authored
      commit a394d635 upstream.
      
      Actually, spi_master_put() after spi_alloc_master() must _not_ be followed
      by kfree(). The memory is already freed with the call to spi_master_put()
      through spi_master_class, which registers a release function. Calling both
      spi_master_put() and kfree() results in often nasty (and delayed) crashes
      elsewhere in the kernel, often in the networking stack.
      
      This reverts commit eb4af0f5.
      
      Link to patch and concerns: https://lkml.org/lkml/2012/9/3/269
      or
      http://lkml.iu.edu/hypermail/linux/kernel/1209.0/00790.html
      
      Alexey Klimov: This revert becomes valid after
      94c69f76 when spi-imx.c
      has been fixed and there is no need to call kfree() so comment
      for spi_alloc_master() should be fixed.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarAlexey Klimov <alexey.klimov@linaro.org>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      557b53d8
    • Tan, Jui Nee's avatar
      spi: spi-pxa2xx: Check status register to determine if SSSR_TINT is disabled · 47d7e7e7
      Tan, Jui Nee authored
      commit 02bc933e upstream.
      
      On Intel Baytrail, there is case when interrupt handler get called, no SPI
      message is captured. The RX FIFO is indeed empty when RX timeout pending
      interrupt (SSSR_TINT) happens.
      
      Use the BIOS version where both HSUART and SPI are on the same IRQ. Both
      drivers are using IRQF_SHARED when calling the request_irq function. When
      running two separate and independent SPI and HSUART application that
      generate data traffic on both components, user will see messages like
      below on the console:
      
        pxa2xx-spi pxa2xx-spi.0: bad message state in interrupt handler
      
      This commit will fix this by first checking Receiver Time-out Interrupt,
      if it is disabled, ignore the request and return without servicing.
      Signed-off-by: default avatarTan, Jui Nee <jui.nee.tan@intel.com>
      Acked-by: default avatarJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      47d7e7e7
    • Dan Carpenter's avatar
      drm: crtc: integer overflow in drm_property_create_blob() · 2f5b9f27
      Dan Carpenter authored
      commit 9ac0934b upstream.
      
      The size here comes from the user via the ioctl, it is a number between
      1-u32max so the addition here could overflow on 32 bit systems.
      
      Fixes: f453ba04 ('DRM: add mode setting support')
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarDaniel Stone <daniels@collabora.com>
      Signed-off-by: default avatarDave Airlie <airlied@gmail.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2f5b9f27
    • NeilBrown's avatar
      md/raid1: don't clear bitmap bit when bad-block-list write fails. · 6126604d
      NeilBrown authored
      commit bd8688a1 upstream.
      
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R1BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-and-tested-by: default avatarNate Dailey <nate.dailey@stratus.com>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Fixes: cd5ff9a1 ("md/raid1:  Handle write errors by updating badblock log.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      6126604d
    • NeilBrown's avatar
      md/raid1: ensure device failure recorded before write request returns. · f6b1d7cb
      NeilBrown authored
      commit 55ce74d4 upstream.
      
      When a write to one of the legs of a RAID1 fails, the failure is
      recorded in the metadata of the other leg(s) so that after a restart
      the data on the failed drive wont be trusted even if that drive seems
      to be working again  (maybe a cable was unplugged).
      
      Similarly when we record a bad-block in response to a write failure,
      we must not let the write complete until the bad-block update is safe.
      
      Currently there is no interlock between the write request completing
      and the metadata update.  So it is possible that the write will
      complete, the app will confirm success in some way, and then the
      machine will crash before the metadata update completes.
      
      This is an extremely small hole for a racy to fit in, but it is
      theoretically possible and so should be closed.
      
      So:
       - set MD_CHANGE_PENDING when requesting a metadata update for a
         failed device, so we can know with certainty when it completes
       - queue requests that experienced an error on a new queue which
         is only processed after the metadata update completes
       - call raid_end_bio_io() on bios in that queue when the time comes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f6b1d7cb
    • NeilBrown's avatar
      md/raid10: don't clear bitmap bit when bad-block-list write fails. · 0570dab3
      NeilBrown authored
      commit c340702c upstream.
      
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 95af587e ("md/raid10: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R10BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-by: default avatarNate Dailey <nate.dailey@stratus.com>
      Fixes: bd870a16 ("md/raid10:  Handle write errors by updating badblock log.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      0570dab3
    • NeilBrown's avatar
      md/raid10: ensure device failure recorded before write request returns. · 43bf02ba
      NeilBrown authored
      commit 95af587e upstream.
      
      When a write to one of the legs of a RAID10 fails, the failure is
      recorded in the metadata of the other legs so that after a restart
      the data on the failed drive wont be trusted even if that drive seems
      to be working again (maybe a cable was unplugged).
      
      Currently there is no interlock between the write request completing
      and the metadata update.  So it is possible that the write will
      complete, the app will confirm success in some way, and then the
      machine will crash before the metadata update completes.
      
      This is an extremely small hole for a racy to fit in, but it is
      theoretically possible and so should be closed.
      
      So:
       - set MD_CHANGE_PENDING when requesting a metadata update for a
         failed device, so we can know with certainty when it completes
       - queue requests that experienced an error on a new queue which
         is only processed after the metadata update completes
       - call raid_end_bio_io() on bios in that queue when the time comes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      [lizf: Backported to 3.4: adjust context]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      43bf02ba
    • Vasant Hegde's avatar
      powerpc/rtas: Validate rtas.entry before calling enter_rtas() · 894f53c9
      Vasant Hegde authored
      commit 8832317f upstream.
      
      Currently we do not validate rtas.entry before calling enter_rtas(). This
      leads to a kernel oops when user space calls rtas system call on a powernv
      platform (see below). This patch adds code to validate rtas.entry before
      making enter_rtas() call.
      
        Oops: Exception in kernel mode, sig: 4 [#1]
        SMP NR_CPUS=1024 NUMA PowerNV
        task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
        NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
        REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
        MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
        CFAR: c000000000009c0c SOFTE: 0
        NIP [0000000000000000]           (null)
        LR [0000000000009c14] 0x9c14
        Call Trace:
        [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
        [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
        [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98
      
      Fixes: 55190f88 ("powerpc: Add skeleton PowerNV platform")
      Reported-by: default avatarNAGESWARA R. SASTRY <nasastry@in.ibm.com>
      Signed-off-by: default avatarVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      [mpe: Reword change log, trim oops, and add stable + fixes]
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      894f53c9
    • Doron Tsur's avatar
      IB/cm: Fix rb-tree duplicate free and use-after-free · 7abd07f2
      Doron Tsur authored
      commit 0ca81a28 upstream.
      
      ib_send_cm_sidr_rep could sometimes erase the node from the sidr
      (depending on errors in the process). Since ib_send_cm_sidr_rep is
      called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
      could be either erased from the rb_tree twice or not erased at all.
      Fixing that by making sure it's erased only once before freeing
      cm_id_priv.
      
      Fixes: a977049d ('[PATCH] IB: Add the kernel CM implementation')
      Signed-off-by: default avatarDoron Tsur <doront@mellanox.com>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      7abd07f2
    • Peter Zijlstra's avatar
      sched/core: Fix TASK_DEAD race in finish_task_switch() · a12321d3
      Peter Zijlstra authored
      commit 95913d97 upstream.
      
      So the problem this patch is trying to address is as follows:
      
              CPU0                            CPU1
      
              context_switch(A, B)
                                              ttwu(A)
                                                LOCK A->pi_lock
                                                A->on_cpu == 0
              finish_task_switch(A)
                prev_state = A->state  <-.
                WMB                      |
                A->on_cpu = 0;           |
                UNLOCK rq0->lock         |
                                         |    context_switch(C, A)
                                         `--  A->state = TASK_DEAD
                prev_state == TASK_DEAD
                  put_task_struct(A)
                                              context_switch(A, C)
                                              finish_task_switch(A)
                                                A->state == TASK_DEAD
                                                  put_task_struct(A)
      
      The argument being that the WMB will allow the load of A->state on CPU0
      to cross over and observe CPU1's store of A->state, which will then
      result in a double-drop and use-after-free.
      
      Now the comment states (and this was true once upon a long time ago)
      that we need to observe A->state while holding rq->lock because that
      will order us against the wakeup; however the wakeup will not in fact
      acquire (that) rq->lock; it takes A->pi_lock these days.
      
      We can obviously fix this by upgrading the WMB to an MB, but that is
      expensive, so we'd rather avoid that.
      
      The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
      which avoids the MB on some archs, but not important ones like ARM.
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: manfred@colorfullife.com
      Cc: will.deacon@arm.com
      Fixes: e4a52bcb ("sched: Remove rq->lock from the first half of ttwu()")
      Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      [lizf: Backported to 3.4: use smb_mb() instead of smp_store_release(), which
       is not defined in 3.4.y]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      a12321d3
    • Johannes Berg's avatar
      iwlwifi: dvm: fix D3 firmware PN programming · c2acc6aa
      Johannes Berg authored
      commit 5bd16687 upstream.
      
      The code to send the RX PN data (for each TID) to the firmware
      has a devastating bug: it overwrites the data for TID 0 with
      all the TID data, leaving the remaining TIDs zeroed. This will
      allow replays to actually be accepted by the firmware, which
      could allow waking up the system.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      [lizf: Backported to 3.4: adjust filename]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      c2acc6aa
    • NeilBrown's avatar
      md/raid0: apply base queue limits *before* disk_stack_limits · 55555bf1
      NeilBrown authored
      commit 66eefe5d upstream.
      
      Calling e.g. blk_queue_max_hw_sectors() after calls to
      disk_stack_limits() discards the settings determined by
      disk_stack_limits().
      So we need to make those calls first.
      
      Fixes: 199dc6ed ("md/raid0: update queue parameter in a safer location.")
      Reported-by: default avatarJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      55555bf1
    • James Hogan's avatar
      MIPS: dma-default: Fix 32-bit fall back to GFP_DMA · 65cee714
      James Hogan authored
      commit 53960059 upstream.
      
      If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32
      zone, as is the case for some 32-bit kernels, then massage_gfp_flags()
      will cause DMA memory allocated for devices with a 32..63-bit
      coherent_dma_mask to fall back to using __GFP_DMA, even though there may
      only be 32-bits of physical address available anyway.
      
      Correct that case to compare against a mask the size of phys_addr_t
      instead of always using a 64-bit mask.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Fixes: a2e715a8 ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.")
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/9610/Signed-off-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      65cee714
    • Robert Jarzmik's avatar
      ASoC: fix broken pxa SoC support · f9c715ee
      Robert Jarzmik authored
      commit 3c8f7710 upstream.
      
      The previous fix of pxa library support, which was introduced to fix the
      library dependency, broke the previous SoC behavior, where a machine
      code binding pxa2xx-ac97 with a coded relied on :
       - sound/soc/pxa/pxa2xx-ac97.c
       - sound/soc/codecs/XXX.c
      
      For example, the mioa701_wm9713.c machine code is currently broken. The
      "select ARM" statement wrongly selects the soc/arm/pxa2xx-ac97 for
      compilation, as per an unfortunate fate SND_PXA2XX_AC97 is both declared
      in sound/arm/Kconfig and sound/soc/pxa/Kconfig.
      
      Fix this by ensuring that SND_PXA2XX_SOC correctly triggers the correct
      pxa2xx-ac97 compilation.
      
      Fixes: 846172df ("ASoC: fix SND_PXA2XX_LIB Kconfig warning")
      Signed-off-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      f9c715ee
    • Herbert Xu's avatar
      ipv6: Fix IPsec pre-encap fragmentation check · 244e0dbc
      Herbert Xu authored
      commit 93efac3f upstream.
      
      The IPv6 IPsec pre-encap path performs fragmentation for tunnel-mode
      packets.  That is, we perform fragmentation pre-encap rather than
      post-encap.
      
      A check was added later to ensure that proper MTU information is
      passed back for locally generated traffic.  Unfortunately this
      check was performed on all IPsec packets, including transport-mode
      packets.
      
      What's more, the check failed to take GSO into account.
      
      The end result is that transport-mode GSO packets get dropped at
      the check.
      
      This patch fixes it by moving the tunnel mode check forward as well
      as adding the GSO check.
      
      Fixes: dd767856 ("xfrm6: Don't call icmpv6_send on local error")
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      [lizf: Backported to 3.4:
       - adjust context
       - s/ignore_df/local_df]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      244e0dbc
    • Peter Zijlstra's avatar
      module: Fix locking in symbol_put_addr() · d8776fff
      Peter Zijlstra authored
      commit 275d7d44 upstream.
      
      Poma (on the way to another bug) reported an assertion triggering:
      
        [<ffffffff81150529>] module_assert_mutex_or_preempt+0x49/0x90
        [<ffffffff81150822>] __module_address+0x32/0x150
        [<ffffffff81150956>] __module_text_address+0x16/0x70
        [<ffffffff81150f19>] symbol_put_addr+0x29/0x40
        [<ffffffffa04b77ad>] dvb_frontend_detach+0x7d/0x90 [dvb_core]
      
      Laura Abbott <labbott@redhat.com> produced a patch which lead us to
      inspect symbol_put_addr(). This function has a comment claiming it
      doesn't need to disable preemption around the module lookup
      because it holds a reference to the module it wants to find, which
      therefore cannot go away.
      
      This is wrong (and a false optimization too, preempt_disable() is really
      rather cheap, and I doubt any of this is on uber critical paths,
      otherwise it would've retained a pointer to the actual module anyway and
      avoided the second lookup).
      
      While its true that the module cannot go away while we hold a reference
      on it, the data structure we do the lookup in very much _CAN_ change
      while we do the lookup. Therefore fix the comment and add the
      required preempt_disable().
      Reported-by: default avatarpoma <pomidorabelisima@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Fixes: a6e6abd5 ("module: remove module_text_address()")
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      d8776fff
    • David Woodhouse's avatar
      x86/platform: Fix Geode LX timekeeping in the generic x86 build · 2bba66d6
      David Woodhouse authored
      commit 03da3ff1 upstream.
      
      In 2007, commit 07190a08 ("Mark TSC on GeodeLX reliable")
      bypassed verification of the TSC on Geode LX. However, this code
      (now in the check_system_tsc_reliable() function in
      arch/x86/kernel/tsc.c) was only present if CONFIG_MGEODE_LX was
      set.
      
      OpenWRT has recently started building its generic Geode target
      for Geode GX, not LX, to include support for additional
      platforms. This broke the timekeeping on LX-based devices,
      because the TSC wasn't marked as reliable:
      https://dev.openwrt.org/ticket/20531
      
      By adding a runtime check on is_geode_lx(), we can also include
      the fix if CONFIG_MGEODEGX1 or CONFIG_X86_GENERIC are set, thus
      fixing the problem.
      Signed-off-by: default avatarDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Andres Salomon <dilinger@queued.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marcelo Tosatti <marcelo@kvack.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1442409003.131189.87.camel@infradead.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      2bba66d6
    • Russell King's avatar
      ARM: fix Thumb2 signal handling when ARMv6 is enabled · 31a52644
      Russell King authored
      commit 9b55613f upstream.
      
      When a kernel is built covering ARMv6 to ARMv7, we omit to clear the
      IT state when entering a signal handler.  This can cause the first
      few instructions to be conditionally executed depending on the parent
      context.
      
      In any case, the original test for >= ARMv7 is broken - ARMv6 can have
      Thumb-2 support as well, and an ARMv6T2 specific build would omit this
      code too.
      
      Relax the test back to ARMv6 or greater.  This results in us always
      clearing the IT state bits in the PSR, even on CPUs where these bits
      are reserved.  However, they're reserved for the IT state, so this
      should cause no harm.
      
      Fixes: d71e1352 ("Clear the IT state when invoking a Thumb-2 signal handler")
      Acked-by: default avatarTony Lindgren <tony@atomide.com>
      Tested-by: default avatarH. Nikolaus Schaller <hns@goldelico.com>
      Tested-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      31a52644
    • T.J. Purtell's avatar
      ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode · b2e30785
      T.J. Purtell authored
      commit 6ecf830e upstream.
      
      The ARM architecture reference specifies that the IT state bits in the
      PSR must be all zeros in ARM mode or behavior is unspecified.  On the
      Qualcomm Snapdragon S4/Krait architecture CPUs the processor continues
      to consider the IT state bits while in ARM mode.  This makes it so
      that some instructions are skipped by the CPU.
      Signed-off-by: default avatarT.J. Purtell <tj@mobisocial.us>
      [rmk+kernel@arm.linux.org.uk: fixed whitespace formatting in patch]
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b2e30785
    • Arnaldo Carvalho de Melo's avatar
      perf header: Fixup reading of HEADER_NRCPUS feature · 98e57bab
      Arnaldo Carvalho de Melo authored
      commit caa47047 upstream.
      
      The original patch introducing this header wrote the number of CPUs available
      and online in one order and then swapped those values when reading, fix it.
      
      Before:
      
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 4
        # nrcpus avail : 4
        # echo 0 > /sys/devices/system/cpu/cpu2/online
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 4
        # nrcpus avail : 3
        # echo 0 > /sys/devices/system/cpu/cpu1/online
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 4
        # nrcpus avail : 2
      
      After the fix, bringing back the CPUs online:
      
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 2
        # nrcpus avail : 4
        # echo 1 > /sys/devices/system/cpu/cpu2/online
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 3
        # nrcpus avail : 4
        # echo 1 > /sys/devices/system/cpu/cpu1/online
        # perf record usleep 1
        # perf report --header-only | grep 'nrcpus \(online\|avail\)'
        # nrcpus online : 4
        # nrcpus avail : 4
      Acked-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: fbe96f29 ("perf tools: Make perf.data more self-descriptive (v8)")
      Link: http://lkml.kernel.org/r/20150911153323.GP23511@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      [lizf: Backported to 3.4: fix it by saving values in an array and then print
       it in reverse order]
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      98e57bab
    • Paul Mackerras's avatar
      powerpc/MSI: Fix race condition in tearing down MSI interrupts · b834fc16
      Paul Mackerras authored
      commit e297c939 upstream.
      
      This fixes a race which can result in the same virtual IRQ number
      being assigned to two different MSI interrupts.  The most visible
      consequence of that is usually a warning and stack trace from the
      sysfs code about an attempt to create a duplicate entry in sysfs.
      
      The race happens when one CPU (say CPU 0) is disposing of an MSI
      while another CPU (say CPU 1) is setting up an MSI.  CPU 0 calls
      (for example) pnv_teardown_msi_irqs(), which calls
      msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
      hardware IRQ number) is no longer in use.  Then, before CPU 0 gets
      to calling irq_dispose_mapping() to free up the virtal IRQ number,
      CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
      MSI, and gets the same hardware IRQ number that CPU 0 just freed.
      CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
      which sees that there is currently a mapping for that hardware IRQ
      number and returns the corresponding virtual IRQ number (which is
      the same virtual IRQ number that CPU 0 was using).  CPU 0 then
      calls irq_dispose_mapping() and frees that virtual IRQ number.
      Now, if another CPU comes along and calls irq_create_mapping(), it
      is likely to get the virtual IRQ number that was just freed,
      resulting in the same virtual IRQ number apparently being used for
      two different hardware interrupts.
      
      To fix this race, we just move the call to msi_bitmap_free_hwirqs()
      to after the call to irq_dispose_mapping().  Since virq_to_hw()
      doesn't work for the virtual IRQ number after irq_dispose_mapping()
      has been called, we need to call it before irq_dispose_mapping() and
      remember the result for the msi_bitmap_free_hwirqs() call.
      
      The pattern of calling msi_bitmap_free_hwirqs() before
      irq_dispose_mapping() appears in 5 places under arch/powerpc, and
      appears to have originated in commit 05af7bd2 ("[POWERPC] MPIC
      U3/U4 MSI backend") from 2007.
      
      Fixes: 05af7bd2 ("[POWERPC] MPIC U3/U4 MSI backend")
      Reported-by: default avatarAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      [bwh: Backported to 3.2:
       - powernv uses a private functions instead of msi_bitmap_free_hwirqs()
       - Adjust filename, context]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      b834fc16
    • Ard Biesheuvel's avatar
      ARM: 8429/1: disable GCC SRA optimization · 59463bb2
      Ard Biesheuvel authored
      commit a077224f upstream.
      
      While working on the 32-bit ARM port of UEFI, I noticed a strange
      corruption in the kernel log. The following snprintf() statement
      (in drivers/firmware/efi/efi.c:efi_md_typeattr_format())
      
      	snprintf(pos, size, "|%3s|%2s|%2s|%2s|%3s|%2s|%2s|%2s|%2s]",
      
      was producing the following output in the log:
      
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |WB|WT|WC|UC]*
      	|    |   |   |   |    |WB|WT|WC|UC]
      	|RUN|   |   |   |    |   |   |   |UC]
      	|RUN|   |   |   |    |   |   |   |UC]
      
      As it turns out, this is caused by incorrect code being emitted for
      the string() function in lib/vsprintf.c. The following code
      
      	if (!(spec.flags & LEFT)) {
      		while (len < spec.field_width--) {
      			if (buf < end)
      				*buf = ' ';
      			++buf;
      		}
      	}
      	for (i = 0; i < len; ++i) {
      		if (buf < end)
      			*buf = *s;
      		++buf; ++s;
      	}
      	while (len < spec.field_width--) {
      		if (buf < end)
      			*buf = ' ';
      		++buf;
      	}
      
      when called with len == 0, triggers an issue in the GCC SRA optimization
      pass (Scalar Replacement of Aggregates), which handles promotion of signed
      struct members incorrectly. This is a known but as yet unresolved issue.
      (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932). In this particular
      case, it is causing the second while loop to be executed erroneously a
      single time, causing the additional space characters to be printed.
      
      So disable the optimization by passing -fno-ipa-sra.
      Acked-by: default avatarNicolas Pitre <nico@linaro.org>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarZefan Li <lizefan@huawei.com>
      59463bb2