1. 09 Jun, 2014 13 commits
  2. 09 May, 2014 1 commit
    • Mel Gorman's avatar
      mm: use paravirt friendly ops for NUMA hinting ptes · dec1d8e2
      Mel Gorman authored
      commit 29c77870 upstream.
      
      David Vrabel identified a regression when using automatic NUMA balancing
      under Xen whereby page table entries were getting corrupted due to the
      use of native PTE operations.  Quoting him
      
      	Xen PV guest page tables require that their entries use machine
      	addresses if the preset bit (_PAGE_PRESENT) is set, and (for
      	successful migration) non-present PTEs must use pseudo-physical
      	addresses.  This is because on migration MFNs in present PTEs are
      	translated to PFNs (canonicalised) so they may be translated back
      	to the new MFN in the destination domain (uncanonicalised).
      
      	pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
      	set and clear the _PAGE_PRESENT bit using pte_set_flags(),
      	pte_clear_flags(), etc.
      
      	In a Xen PV guest, these functions must translate MFNs to PFNs
      	when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
      	_PAGE_PRESENT.
      
      His suggested fix converted p[te|md]_[set|clear]_flags to using
      paravirt-friendly ops but this is overkill.  He suggested an alternative
      of using p[te|md]_modify in the NUMA page table operations but this is
      does more work than necessary and would require looking up a VMA for
      protections.
      
      This patch modifies the NUMA page table operations to use paravirt
      friendly operations to set/clear the flags of interest.  Unfortunately
      this will take a performance hit when updating the PTEs on
      CONFIG_PARAVIRT but I do not see a way around it that does not break
      Xen.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Tested-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      BugLink: http://bugs.launchpad.net/bugs/1313450
      Cc: Stefan Bader <stefan.bader@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      dec1d8e2
  3. 08 May, 2014 1 commit
  4. 07 May, 2014 1 commit
  5. 06 May, 2014 3 commits
    • Matthew Daley's avatar
      floppy: don't write kernel-only members to FDRAWCMD ioctl output · ac31585c
      Matthew Daley authored
      commit 2145e15e upstream.
      
      Do not leak kernel-only floppy_raw_cmd structure members to userspace.
      This includes the linked-list pointer and the pointer to the allocated
      DMA space.
      Signed-off-by: default avatarMatthew Daley <mattd@bugfuzz.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      References: CVE-2014-1738
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ac31585c
    • Matthew Daley's avatar
      floppy: ignore kernel-only members in FDRAWCMD ioctl input · 970b98f5
      Matthew Daley authored
      commit ef87dbe7 upstream.
      
      Always clear out these floppy_raw_cmd struct members after copying the
      entire structure from userspace so that the in-kernel version is always
      valid and never left in an interdeterminate state.
      Signed-off-by: default avatarMatthew Daley <mattd@bugfuzz.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      References: CVE-2014-1737
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      970b98f5
    • Peter Hurley's avatar
      n_tty: Fix n_tty_write crash when echoing in raw mode · 9a8bf49c
      Peter Hurley authored
      commit 4291086b upstream.
      
      The tty atomic_write_lock does not provide an exclusion guarantee for
      the tty driver if the termios settings are LECHO & !OPOST.  And since
      it is unexpected and not allowed to call TTY buffer helpers like
      tty_insert_flip_string concurrently, this may lead to crashes when
      concurrect writers call pty_write. In that case the following two
      writers:
      * the ECHOing from a workqueue and
      * pty_write from the process
      race and can overflow the corresponding TTY buffer like follows.
      
      If we look into tty_insert_flip_string_fixed_flag, there is:
        int space = __tty_buffer_request_room(port, goal, flags);
        struct tty_buffer *tb = port->buf.tail;
        ...
        memcpy(char_buf_ptr(tb, tb->used), chars, space);
        ...
        tb->used += space;
      
      so the race of the two can result in something like this:
                    A                                B
      __tty_buffer_request_room
                                        __tty_buffer_request_room
      memcpy(buf(tb->used), ...)
      tb->used += space;
                                        memcpy(buf(tb->used), ...) ->BOOM
      
      B's memcpy is past the tty_buffer due to the previous A's tb->used
      increment.
      
      Since the N_TTY line discipline input processing can output
      concurrently with a tty write, obtain the N_TTY ldisc output_lock to
      serialize echo output with normal tty writes.  This ensures the tty
      buffer helper tty_insert_flip_string is not called concurrently and
      everything is fine.
      
      Note that this is nicely reproducible by an ordinary user using
      forkpty and some setup around that (raw termios + ECHO). And it is
      present in kernels at least after commit
      d945cb9c (pty: Rework the pty layer to
      use the normal buffering logic) in 2.6.31-rc3.
      
      js: add more info to the commit log
      js: switch to bool
      js: lock unconditionally
      js: lock only the tty->ops->write call
      
      References: CVE-2014-0196
      Reported-and-tested-by: default avatarJiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9a8bf49c
  6. 01 May, 2014 21 commits
    • Mike Marciniszyn's avatar
      ib_srpt: Use correct ib_sg_dma primitives · ea594d5e
      Mike Marciniszyn authored
      commit b0768080 upstream.
      
      The code was incorrectly using sg_dma_address() and
      sg_dma_len() instead of ib_sg_dma_address() and
      ib_sg_dma_len().
      
      This prevents srpt from functioning with the
      Intel HCA and indeed will corrupt memory
      badly.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Tested-by: default avatarVinod Kumar <vinod.kumar@intel.com>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ea594d5e
    • Andy Grover's avatar
      target/tcm_fc: Fix use-after-free of ft_tpg · 52e7fdf3
      Andy Grover authored
      commit 2c42be2d upstream.
      
      ft_del_tpg checks tpg->tport is set before unlinking the tpg from the
      tport when the tpg is being removed. Set this pointer in ft_tport_create,
      or the unlinking won't happen in ft_del_tpg and tport->tpg will reference
      a deleted object.
      
      This patch sets tpg->tport in ft_tport_create, because that's what
      ft_del_tpg checks, and is the only way to get back to the tport to
      clear tport->tpg.
      
      The bug was occuring when:
      
      - lport created, tport (our per-lport, per-provider context) is
        allocated.
        tport->tpg = NULL
      - tpg created
      - a PRLI is received. ft_tport_create is called, tpg is found and
        tport->tpg is set
      - tpg removed. ft_tpg is freed in ft_del_tpg. Since tpg->tport was not
        set, tport->tpg is not cleared and points at freed memory
      - Future calls to ft_tport_create return tport via first conditional,
        instead of searching for new tpg by calling ft_lport_find_tpg.
        tport->tpg is still invalid, and will access freed memory.
      
      see https://bugzilla.redhat.com/show_bug.cgi?id=1071340Signed-off-by: default avatarAndy Grover <agrover@redhat.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      52e7fdf3
    • H. Peter Anvin's avatar
      x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels · b75a4dbf
      H. Peter Anvin authored
      commit b3b42ac2 upstream.
      
      The IRET instruction, when returning to a 16-bit segment, only
      restores the bottom 16 bits of the user space stack pointer.  We have
      a software workaround for that ("espfix") for the 32-bit kernel, but
      it relies on a nonzero stack segment base which is not available in
      32-bit mode.
      
      Since 16-bit support is somewhat crippled anyway on a 64-bit kernel
      (no V86 mode), and most (if not quite all) 64-bit processors support
      virtualization for the users who really need it, simply reject
      attempts at creating a 16-bit segment when running on top of a 64-bit
      kernel.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-kicdm89kzw9lldryb1br9od0@git.kernel.orgSigned-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b75a4dbf
    • Rafał Miłecki's avatar
      b43: Fix machine check error due to improper access of B43_MMIO_PSM_PHY_HDR · 3d0b8927
      Rafał Miłecki authored
      commit 12cd43c6 upstream.
      
      Register B43_MMIO_PSM_PHY_HDR is 16 bit one, so accessing it with 32b
      functions isn't safe. On my machine it causes delayed (!) CPU exception:
      
      Disabling lock debugging due to kernel taint
      mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 4: b200000000070f0f
      mce: [Hardware Error]: TSC 164083803dc
      mce: [Hardware Error]: PROCESSOR 2:20fc2 TIME 1396650505 SOCKET 0 APIC 0 microcode 0
      mce: [Hardware Error]: Run the above through 'mcelog --ascii'
      mce: [Hardware Error]: Machine check: Processor context corrupt
      Kernel panic - not syncing: Fatal machine check on current CPU
      Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
      Signed-off-by: default avatarRafał Miłecki <zajec5@gmail.com>
      Acked-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3d0b8927
    • Hui Wang's avatar
      ALSA: hda - add headset mic detect quirk for a Dell laptop · 00ada678
      Hui Wang authored
      commit 137bcc33 upstream.
      
      When we plug a 3-ring headset on the Dell machine (VID: 0x10ec0283,
      SID: 0x10280667), the headset mic can't be detected, after apply this
      patch, the headset mic can work well.
      
      BugLink: https://bugs.launchpad.net/bugs/1297581
      Cc: David Henningsson <david.henningsson@canonical.com>
      Signed-off-by: default avatarHui Wang <hui.wang@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      00ada678
    • NeilBrown's avatar
      md/raid1: r1buf_pool_alloc: free allocate pages when subsequent allocation fails. · 37ab1401
      NeilBrown authored
      commit da1aab3d upstream.
      
      When performing a user-request check/repair (MD_RECOVERY_REQUEST is set)
      on a raid1, we allocate multiple bios each with their own set of pages.
      
      If the page allocations for one bio fails, we currently do *not* free
      the pages allocated for the previous bios, nor do we free the bio itself.
      
      This patch frees all the already-allocate pages, and makes sure that
      all the bios are freed as well.
      
      This bug can cause a memory leak which can ultimately OOM a machine.
      It was introduced in 3.10-rc1.
      
      Fixes: a0787606
      Cc: Kent Overstreet <koverstreet@google.com>
      Reported-by: default avatarRussell King - ARM Linux <linux@arm.linux.org.uk>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      37ab1401
    • Jens Axboe's avatar
      lib/percpu_counter.c: fix bad percpu counter state during suspend · 9064f8ee
      Jens Axboe authored
      commit e39435ce upstream.
      
      I got a bug report yesterday from Laszlo Ersek in which he states that
      his kvm instance fails to suspend.  Laszlo bisected it down to this
      commit 1cf7e9c6 ("virtio_blk: blk-mq support") where virtio-blk is
      converted to use the blk-mq infrastructure.
      
      After digging a bit, it became clear that the issue was with the queue
      drain.  blk-mq tracks queue usage in a percpu counter, which is
      incremented on request alloc and decremented when the request is freed.
      The initial hunt was for an inconsistency in blk-mq, but everything
      seemed fine.  In fact, the counter only returned crazy values when
      suspend was in progress.
      
      When a CPU is unplugged, the percpu counters merges that CPU state with
      the general state.  blk-mq takes care to register a hotcpu notifier with
      the appropriate priority, so we know it runs after the percpu counter
      notifier.  However, the percpu counter notifier only merges the state
      when the CPU is fully gone.  This leaves a state transition where the
      CPU going away is no longer in the online mask, yet it still holds
      private values.  This means that in this state, percpu_counter_sum()
      returns invalid results, and the suspend then hangs waiting for
      abs(dead-cpu-value) requests to complete which of course will never
      happen.
      
      Fix this by clearing the state earlier, so we never have a case where
      the CPU isn't in online mask but still holds private state.  This bug
      has been there since forever, I guess we don't have a lot of users where
      percpu counters needs to be reliable during the suspend cycle.
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Reported-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Tested-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9064f8ee
    • Takashi Iwai's avatar
      ALSA: ice1712: Fix boundary checks in PCM pointer ops · e4430e8f
      Takashi Iwai authored
      commit 4f8e9400 upstream.
      
      PCM pointer callbacks in ice1712 driver check the buffer size boundary
      wrongly between bytes and frames.  This leads to PCM core warnings
      like:
         snd_pcm_update_hw_ptr0: 105 callbacks suppressed
         ALSA pcm_lib.c:352 BUG: pcmC3D0c:0, pos = 5461, buffer size = 5461, period size = 2730
      
      This patch fixes these checks to be placed after the proper unit
      conversions.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e4430e8f
    • Russell King's avatar
      DRM: armada: fix corruption while loading cursors · ea2c40b5
      Russell King authored
      commit c39b0695 upstream.
      
      Loading cursors to the LCD controller's SRAM can be corrupted when the
      configured pixel clock is relatively slow.  This seems to be caused
      when we write back-to-back to the SRAM registers.
      
      There doesn't appear to be any status register we can read to check
      when an access has completed.
      
      Inserting a dummy read between the writes appears to fix the problem.
      Signed-off-by: default avatarRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarDave Airlie <airlied@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ea2c40b5
    • Liu Hua's avatar
      hung_task: check the value of "sysctl_hung_task_timeout_sec" · 6b59aed9
      Liu Hua authored
      commit 80df2847 upstream.
      
      As sysctl_hung_task_timeout_sec is unsigned long, when this value is
      larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
      watchdog will return immediately without sleep and with print :
      
        schedule_timeout: wrong timeout value ffffffffffffff83
      
      and then the funtion watchdog will call schedule_timeout_interruptible
      again and again.  The screen will be filled with
      
      	"schedule_timeout: wrong timeout value ffffffffffffff83"
      
      This patch does some check and correction in sysctl, to let the function
      schedule_timeout_interruptible allways get the valid parameter.
      Signed-off-by: default avatarLiu Hua <sdu.liu@huawei.com>
      Tested-by: default avatarSatoru Takeuchi <satoru.takeuchi@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6b59aed9
    • Mizuma, Masayoshi's avatar
      mm: hugetlb: fix softlockup when a large number of hugepages are freed. · 9c046926
      Mizuma, Masayoshi authored
      commit 55f67141 upstream.
      
      When I decrease the value of nr_hugepage in procfs a lot, softlockup
      happens.  It is because there is no chance of context switch during this
      process.
      
      On the other hand, when I allocate a large number of hugepages, there is
      some chance of context switch.  Hence softlockup doesn't happen during
      this process.  So it's necessary to add the context switch in the
      freeing process as same as allocating process to avoid softlockup.
      
      When I freed 12 TB hugapages with kernel-2.6.32-358.el6, the freeing
      process occupied a CPU over 150 seconds and following softlockup message
      appeared twice or more.
      
      $ echo 6000000 > /proc/sys/vm/nr_hugepages
      $ cat /proc/sys/vm/nr_hugepages
      6000000
      $ grep ^Huge /proc/meminfo
      HugePages_Total:   6000000
      HugePages_Free:    6000000
      HugePages_Rsvd:        0
      HugePages_Surp:        0
      Hugepagesize:       2048 kB
      $ echo 0 > /proc/sys/vm/nr_hugepages
      
      BUG: soft lockup - CPU#16 stuck for 67s! [sh:12883] ...
      Pid: 12883, comm: sh Not tainted 2.6.32-358.el6.x86_64 #1
      Call Trace:
        free_pool_huge_page+0xb8/0xd0
        set_max_huge_pages+0x128/0x190
        hugetlb_sysctl_handler_common+0x113/0x140
        hugetlb_sysctl_handler+0x1e/0x20
        proc_sys_call_handler+0x97/0xd0
        proc_sys_write+0x14/0x20
        vfs_write+0xb8/0x1a0
        sys_write+0x51/0x90
        __audit_syscall_exit+0x265/0x290
        system_call_fastpath+0x16/0x1b
      
      I have not confirmed this problem with upstream kernels because I am not
      able to prepare the machine equipped with 12TB memory now.  However I
      confirmed that the amount of decreasing hugepages was directly
      proportional to the amount of required time.
      
      I measured required times on a smaller machine.  It showed 130-145
      hugepages decreased in a millisecond.
      
        Amount of decreasing     Required time      Decreasing rate
        hugepages                     (msec)         (pages/msec)
        ------------------------------------------------------------
        10,000 pages == 20GB         70 -  74          135-142
        30,000 pages == 60GB        208 - 229          131-144
      
      It means decrement of 6TB hugepages will trigger softlockup with the
      default threshold 20sec, in this decreasing rate.
      Signed-off-by: default avatarMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9c046926
    • Vlastimil Babka's avatar
      mm: try_to_unmap_cluster() should lock_page() before mlocking · 94a3e116
      Vlastimil Babka authored
      commit 57e68e9c upstream.
      
      A BUG_ON(!PageLocked) was triggered in mlock_vma_page() by Sasha Levin
      fuzzing with trinity.  The call site try_to_unmap_cluster() does not lock
      the pages other than its check_page parameter (which is already locked).
      
      The BUG_ON in mlock_vma_page() is not documented and its purpose is
      somewhat unclear, but apparently it serializes against page migration,
      which could otherwise fail to transfer the PG_mlocked flag.  This would
      not be fatal, as the page would be eventually encountered again, but
      NR_MLOCK accounting would become distorted nevertheless.  This patch adds
      a comment to the BUG_ON in mlock_vma_page() and munlock_vma_page() to that
      effect.
      
      The call site try_to_unmap_cluster() is fixed so that for page !=
      check_page, trylock_page() is attempted (to avoid possible deadlocks as we
      already have check_page locked) and mlock_vma_page() is performed only
      upon success.  If the page lock cannot be obtained, the page is left
      without PG_mlocked, which is again not a problem in the whole unevictable
      memory design.
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarBob Liu <bob.liu@oracle.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      94a3e116
    • Johannes Weiner's avatar
      mm: page_alloc: spill to remote nodes before waking kswapd · 7bd9fcd9
      Johannes Weiner authored
      commit 3a025760 upstream.
      
      On NUMA systems, a node may start thrashing cache or even swap anonymous
      pages while there are still free pages on remote nodes.
      
      This is a result of commits 81c0a2bb ("mm: page_alloc: fair zone
      allocator policy") and fff4068c ("mm: page_alloc: revert NUMA aspect
      of fair allocation policy").
      
      Before those changes, the allocator would first try all allowed zones,
      including those on remote nodes, before waking any kswapds.  But now,
      the allocator fastpath doubles as the fairness pass, which in turn can
      only consider the local node to prevent remote spilling based on
      exhausted fairness batches alone.  Remote nodes are only considered in
      the slowpath, after the kswapds are woken up.  But if remote nodes still
      have free memory, kswapd should not be woken to rebalance the local node
      or it may thrash cash or swap prematurely.
      
      Fix this by adding one more unfair pass over the zonelist that is
      allowed to spill to remote nodes after the local fairness pass fails but
      before entering the slowpath and waking the kswapds.
      
      This also gets rid of the GFP_THISNODE exemption from the fairness
      protocol because the unfair pass is no longer tied to kswapd, which
      GFP_THISNODE is not allowed to wake up.
      
      However, because remote spills can be more frequent now - we prefer them
      over local kswapd reclaim - the allocation batches on remote nodes could
      underflow more heavily.  When resetting the batches, use
      atomic_long_read() directly instead of zone_page_state() to calculate the
      delta as the latter filters negative counter values.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7bd9fcd9
    • Martin Svec's avatar
      Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist · 9f66dd6c
      Martin Svec authored
      commit a1e1774c upstream.
      
      When compiled with CONFIG_DEBUG_SG set, uninitialized SGL leads
      to BUG() in compare_and_write_callback().
      Signed-off-by: default avatarMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9f66dd6c
    • Nicholas Bellinger's avatar
      iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug · bf879dc8
      Nicholas Bellinger authored
      commit d444edc6 upstream.
      
      This patch fixes a long-standing bug in iscsit_build_conn_drop_async_message()
      where during ERL=2 connection recovery, a bogus conn_p pointer could
      end up being used to send the ISCSI_OP_ASYNC_EVENT + DROPPING_CONNECTION
      notifying the initiator that cmd->logout_cid has failed.
      
      The bug was manifesting itself as an OOPs in iscsit_allocate_cmd() with
      a bogus conn_p pointer in iscsit_build_conn_drop_async_message().
      Reported-by: default avatarArshad Hussain <arshad.hussain@calsoftinc.com>
      Reported-by: default avatarsantosh kulkarni <santosh.kulkarni@calsoftinc.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bf879dc8
    • Nicholas Bellinger's avatar
      iser-target: Add missing se_cmd put for WRITE_PENDING in tx_comp_err · 84a55168
      Nicholas Bellinger authored
      commit 03e7848a upstream.
      
      This patch fixes a bug where outstanding RDMA_READs with WRITE_PENDING
      status require an extra target_put_sess_cmd() in isert_put_cmd() code
      when called from isert_cq_tx_comp_err() + isert_cq_drain_comp_llist()
      context during session shutdown.
      
      The extra kref PUT is required so that transport_generic_free_cmd()
      invokes the last target_put_sess_cmd() -> target_release_cmd_kref(),
      which will complete(&se_cmd->cmd_wait_comp) the outstanding se_cmd
      descriptor with WRITE_PENDING status, and awake the completion in
      target_wait_for_sess_cmds() to invoke TFO->release_cmd().
      
      The bug was manifesting itself in target_wait_for_sess_cmds() where
      a se_cmd descriptor with WRITE_PENDING status would end up sleeping
      indefinately.
      Acked-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      [ kamal: backport to 3.13 (context) ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      84a55168
    • Michael Neuling's avatar
      powerpc/tm: Disable IRQ in tm_recheckpoint · 42d50463
      Michael Neuling authored
      commit e6b8fd02 upstream.
      
      We can't take an IRQ when we're about to do a trechkpt as our GPR state is set
      to user GPR values.
      
      We've hit this when running some IBM Java stress tests in the lab resulting in
      the following dump:
      
        cpu 0x3f: Vector: 700 (Program Check) at [c000000007eb3d40]
            pc: c000000000050074: restore_gprs+0xc0/0x148
            lr: 00000000b52a8184
            sp: ac57d360
           msr: 8000000100201030
          current = 0xc00000002c500000
          paca    = 0xc000000007dbfc00     softe: 0     irq_happened: 0x00
            pid   = 34535, comm = Pooled Thread #
        R00 = 00000000b52a8184   R16 = 00000000b3e48fda
        R01 = 00000000ac57d360   R17 = 00000000ade79bd8
        R02 = 00000000ac586930   R18 = 000000000fac9bcc
        R03 = 00000000ade60000   R19 = 00000000ac57f930
        R04 = 00000000f6624918   R20 = 00000000ade79be8
        R05 = 00000000f663f238   R21 = 00000000ac218a54
        R06 = 0000000000000002   R22 = 000000000f956280
        R07 = 0000000000000008   R23 = 000000000000007e
        R08 = 000000000000000a   R24 = 000000000000000c
        R09 = 00000000b6e69160   R25 = 00000000b424cf00
        R10 = 0000000000000181   R26 = 00000000f66256d4
        R11 = 000000000f365ec0   R27 = 00000000b6fdcdd0
        R12 = 00000000f66400f0   R28 = 0000000000000001
        R13 = 00000000ada71900   R29 = 00000000ade5a300
        R14 = 00000000ac2185a8   R30 = 00000000f663f238
        R15 = 0000000000000004   R31 = 00000000f6624918
        pc  = c000000000050074 restore_gprs+0xc0/0x148
        cfar= c00000000004fe28 dont_restore_vec+0x1c/0x1a4
        lr  = 00000000b52a8184
        msr = 8000000100201030   cr  = 24804888
        ctr = 0000000000000000   xer = 0000000000000000   trap =  700
      
      This moves tm_recheckpoint to a C function and moves the tm_restore_sprs into
      that function.  It then adds IRQ disabling over the trechkpt critical section.
      It also sets the TEXASR FS in the signals code to ensure this is never set now
      that we explictly write the TM sprs in tm_recheckpoint.
      Signed-off-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      42d50463
    • Takashi Iwai's avatar
      thinkpad_acpi: Fix inconsistent mute LED after resume · ca98f1e0
      Takashi Iwai authored
      commit 119f4498 upstream.
      
      The mute LED states have to be restored after resume.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70351Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMatthew Garrett <matthew.garrett@nebula.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ca98f1e0
    • Joe Thornber's avatar
      dm cache: fix a lock-inversion · cb3706ff
      Joe Thornber authored
      commit 0596661f upstream.
      
      When suspending a cache the policy is walked and the individual policy
      hints written to the metadata via sync_metadata().  This led to this
      lock order:
      
            policy->lock
              cache_metadata->root_lock
      
      When loading the cache target the policy is populated while the metadata
      lock is held:
      
            cache_metadata->root_lock
               policy->lock
      
      Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
      ensuring the cache_metadata root_lock is held whilst all the hints are
      written, rather than being repeatedly locked while policy->lock is held
      (as was the case with each callout that policy_walk_mappings() made to
      the old save_hint() method).
      
      Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
      locking correctness") build option.  However, it is not clear how the
      LOCKDEP reported paths can lead to a deadlock since the two paths,
      suspending a target and loading a target, never occur at the same time.
      But that doesn't mean the same lock-inversion couldn't have occurred
      elsewhere.
      Reported-by: default avatarMarian Csontos <mcsontos@redhat.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      [ kamal: backport to 3.13: omitted cache_target version string change ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cb3706ff
    • Giacomo Comes's avatar
      Skip intel_crt_init for Dell XPS 8700 · 8fff1d7a
      Giacomo Comes authored
      commit 10b6ee4a upstream.
      
      The Dell XPS 8700 has a onboard Display port and HDMI port and no VGA port.
      The call intel_crt_init freeze the machine, so skip such call.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73559
      Signed-off-by: Giacomo Comes <comes at naic.edu>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8fff1d7a
    • Imre Deak's avatar
      drm/i915: move power domain init earlier during system resume · 670c7451
      Imre Deak authored
      commit 76c4b250 upstream.
      
      During resume the intel hda audio driver depends on the i915 driver
      reinitializing the audio power domain. Since the order of calling the
      i915 resume handler wrt. that of the audio driver is not guaranteed,
      move the power domain reinitialization step to the resume_early
      handler. This is guaranteed to run before the resume handler of any
      other driver.
      
      The power domain initialization in turn requires us to enable the i915
      pci device first, so move that part earlier too.
      
      Accordingly disabling of the i915 pci device should happen after the
      audio suspend handler ran. So move the disabling later from the i915
      resume handler to the resume_late handler.
      
      v2:
      - move intel_uncore_sanitize/early_sanitize earlier too, so they don't
        get reordered wrt. intel_power_domains_init_hw()
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76152Signed-off-by: default avatarImre Deak <imre.deak@intel.com>
      Reviewed-by: default avatarTakashi Iwai <tiwai@suse.de>
      [danvet: Add cc: stable and loud comments that this is just a hack.]
      [danvet: Fix "Should it be static?" sparse warning reported by Wu
      Fengguang's kbuilder.]
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      [ kamal: backport to 3.13: intel_power_domains_init_hw(dev) not (dev_priv) ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      670c7451