1. 16 Jul, 2014 40 commits
    • Johannes Weiner's avatar
      mm: vmscan: clear kswapd's special reclaim powers before exiting · e0592cd8
      Johannes Weiner authored
      commit 71abdc15 upstream.
      
      When kswapd exits, it can end up taking locks that were previously held
      by allocating tasks while they waited for reclaim.  Lockdep currently
      warns about this:
      
      On Wed, May 28, 2014 at 06:06:34PM +0800, Gu Zheng wrote:
      >  inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
      >  kswapd2/1151 [HC0[0]:SC0[0]:HE1:SE1] takes:
      >   (&sig->group_rwsem){+++++?}, at: exit_signals+0x24/0x130
      >  {RECLAIM_FS-ON-W} state was registered at:
      >     mark_held_locks+0xb9/0x140
      >     lockdep_trace_alloc+0x7a/0xe0
      >     kmem_cache_alloc_trace+0x37/0x240
      >     flex_array_alloc+0x99/0x1a0
      >     cgroup_attach_task+0x63/0x430
      >     attach_task_by_pid+0x210/0x280
      >     cgroup_procs_write+0x16/0x20
      >     cgroup_file_write+0x120/0x2c0
      >     vfs_write+0xc0/0x1f0
      >     SyS_write+0x4c/0xa0
      >     tracesys+0xdd/0xe2
      >  irq event stamp: 49
      >  hardirqs last  enabled at (49):  _raw_spin_unlock_irqrestore+0x36/0x70
      >  hardirqs last disabled at (48):  _raw_spin_lock_irqsave+0x2b/0xa0
      >  softirqs last  enabled at (0):  copy_process.part.24+0x627/0x15f0
      >  softirqs last disabled at (0):            (null)
      >
      >  other info that might help us debug this:
      >   Possible unsafe locking scenario:
      >
      >         CPU0
      >         ----
      >    lock(&sig->group_rwsem);
      >    <Interrupt>
      >      lock(&sig->group_rwsem);
      >
      >   *** DEADLOCK ***
      >
      >  no locks held by kswapd2/1151.
      >
      >  stack backtrace:
      >  CPU: 30 PID: 1151 Comm: kswapd2 Not tainted 3.10.39+ #4
      >  Call Trace:
      >    dump_stack+0x19/0x1b
      >    print_usage_bug+0x1f7/0x208
      >    mark_lock+0x21d/0x2a0
      >    __lock_acquire+0x52a/0xb60
      >    lock_acquire+0xa2/0x140
      >    down_read+0x51/0xa0
      >    exit_signals+0x24/0x130
      >    do_exit+0xb5/0xa50
      >    kthread+0xdb/0x100
      >    ret_from_fork+0x7c/0xb0
      
      This is because the kswapd thread is still marked as a reclaimer at the
      time of exit.  But because it is exiting, nobody is actually waiting on
      it to make reclaim progress anymore, and it's nothing but a regular
      thread at this point.  Be tidy and strip it of all its powers
      (PF_MEMALLOC, PF_SWAPWRITE, PF_KSWAPD, and the lockdep reclaim state)
      before returning from the thread function.
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e0592cd8
    • Bart Van Assche's avatar
      IB/umad: Fix use-after-free on close · a567a196
      Bart Van Assche authored
      commit 60e1751c upstream.
      
      Avoid that closing /dev/infiniband/umad<n> or /dev/infiniband/issm<n>
      triggers a use-after-free.  __fput() invokes f_op->release() before it
      invokes cdev_put().  Make sure that the ib_umad_device structure is
      freed by the cdev_put() call instead of f_op->release().  This avoids
      that changing the port mode from IB into Ethernet and back to IB
      followed by restarting opensmd triggers the following kernel oops:
      
          general protection fault: 0000 [#1] PREEMPT SMP
          RIP: 0010:[<ffffffff810cc65c>]  [<ffffffff810cc65c>] module_put+0x2c/0x170
          Call Trace:
           [<ffffffff81190f20>] cdev_put+0x20/0x30
           [<ffffffff8118e2ce>] __fput+0x1ae/0x1f0
           [<ffffffff8118e35e>] ____fput+0xe/0x10
           [<ffffffff810723bc>] task_work_run+0xac/0xe0
           [<ffffffff81002a9f>] do_notify_resume+0x9f/0xc0
           [<ffffffff814b8398>] int_signal+0x12/0x17
      
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=75051Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      Reviewed-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a567a196
    • Michael Ellerman's avatar
      powerpc/mm: Check paca psize is up to date for huge mappings · 9a128d0e
      Michael Ellerman authored
      commit 09567e7f upstream.
      
      We have a bug in our hugepage handling which exhibits as an infinite
      loop of hash faults. If the fault is being taken in the kernel it will
      typically trigger the softlockup detector, or the RCU stall detector.
      
      The bug is as follows:
      
       1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..)
       2. Slice code converts the slice psize to 16M.
       3. The code on lines 539-540 of slice.c in slice_get_unmapped_area()
          synchronises the mm->context with the paca->context. So the paca slice
          mask is updated to include the 16M slice.
       3. Either:
          * mmap() fails because there are no huge pages available.
          * mmap() succeeds and the mapping is then munmapped.
          In both cases the slice psize remains at 16M in both the paca & mm.
       4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..)
       5. The slice psize is converted back to 64K. Because of the check on line 539
          of slice.c we DO NOT update the paca->context. The paca slice mask is now
          out of sync with the mm slice mask.
       6. User/kernel accesses 0xa0000000.
       7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask**
          to create an SLB entry and inserts it in the SLB.
      18. With the 16M SLB entry in place the hardware does a hash lookup, no entry
          is found so a data access exception is generated.
      19. The data access handler calls do_page_fault() -> handle_mm_fault().
      10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page().
      11. The hardware retries the access, there is still nothing in the hash table
          so once again a data access exception is generated.
      12. hash_page() calls into __hash_page_thp() and inserts a mapping in the
          hash. Although the THP mapping maps 16M the hashing is done using 64K
          as the segment page size.
      13. hash_page() returns immediately after calling __hash_page_thp(), skipping
          over the code at line 1125. Resulting in the mismatch between the
          paca->context and mm->context not being detected.
      14. The hardware retries the access, the hash it generates using the 16M
          SLB entry does NOT match the hash we inserted.
      15. We take another data access and go into __hash_page_thp().
      16. We see a valid entry in the hpte_slot_array and so we call updatepp()
          which succeeds.
      17. Goto 14.
      
      We could fix this in two ways. The first would be to remove or modify
      the check on line 539 of slice.c.
      
      The second option is to cause the check of paca psize in hash_page() on
      line 1125 to also be done for THP pages.
      
      We prefer the latter, because the check & update of the paca psize is
      not done until we know it's necessary. It's also done only on the
      current cpu, so we don't need to IPI all other cpus.
      
      Without further rearranging the code, the simplest fix is to pull out
      the code that checks paca psize and call it in two places. Firstly for
      THP/hugetlb, and secondly for other mappings as before.
      
      Thanks to Dave Jones for trinity, which originally found this bug.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9a128d0e
    • Nicholas Bellinger's avatar
      iscsi-target: Reject mutual authentication with reflected CHAP_C · 46f4bce6
      Nicholas Bellinger authored
      commit 1d2b60a5 upstream.
      
      This patch adds an explicit check in chap_server_compute_md5() to ensure
      the CHAP_C value received from the initiator during mutual authentication
      does not match the original CHAP_C provided by the target.
      
      This is in line with RFC-3720, section 8.2.1:
      
         Originators MUST NOT reuse the CHAP challenge sent by the Responder
         for the other direction of a bidirectional authentication.
         Responders MUST check for this condition and close the iSCSI TCP
         connection if it occurs.
      Reported-by: default avatarTejas Vaykole <tejas.vaykole@calsoftinc.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      46f4bce6
    • Alex Elder's avatar
      rbd: use reference counts for image requests · 5a759539
      Alex Elder authored
      commit 0f2d5be7 upstream.
      
      Each image request contains a reference count, but to date it has
      not actually been used.  (I think this was just an oversight.) A
      recent report involving rbd failing an assertion shed light on why
      and where we need to use these reference counts.
      
      Every OSD request associated with an object request uses
      rbd_osd_req_callback() as its callback function.  That function will
      call a helper function (dependent on the type of OSD request) that
      will set the object request's "done" flag if the object request if
      appropriate.  If that "done" flag is set, the object request is
      passed to rbd_obj_request_complete().
      
      In rbd_obj_request_complete(), requests are processed in sequential
      order.  So if an object request completes before one of its
      predecessors in the image request, the completion is deferred.
      Otherwise, if it's a completing object's "turn" to be completed, it
      is passed to rbd_img_obj_end_request(), which records the result of
      the operation, accumulates transferred bytes, and so on.  Next, the
      successor to this request is checked and if it is marked "done",
      (deferred) completion processing is performed on that request, and
      so on.  If the last object request in an image request is completed,
      rbd_img_request_complete() is called, which (typically) destroys
      the image request.
      
      There is a race here, however.  The instant an object request is
      marked "done" it can be provided (by a thread handling completion of
      one of its predecessor operations) to rbd_img_obj_end_request(),
      which (for the last request) can then lead to the image request
      getting torn down.  And this can happen *before* that object has
      itself entered rbd_img_obj_end_request().  As a result, once it
      *does* enter that function, the image request (and even the object
      request itself) may have been freed and become invalid.
      
      All that's necessary to avoid this is to properly count references
      to the image requests.  We tear down an image request's object
      requests all at once--only when the entire image request has
      completed.  So there's no need for an image request to count
      references for its object requests.  However, we don't want an
      image request to go away until the last of its object requests
      has passed through rbd_img_obj_callback().  In other words,
      we don't want rbd_img_request_complete() to necessarily
      result in the image request being destroyed, because it may
      get called before we've finished processing on all of its
      object requests.
      
      So the fix is to add a reference to an image request for
      each of its object requests.  The reference can be viewed
      as representing an object request that has not yet finished
      its call to rbd_img_obj_callback().  That is emphasized by
      getting the reference right after assigning that as the image
      object's callback function.  The corresponding release of that
      reference is done at the end of rbd_img_obj_callback(), which
      every image object request passes through exactly once.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Reviewed-by: default avatarIlya Dryomov <ilya.dryomov@inktank.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      5a759539
    • Kailang Yang's avatar
      ALSA: hda/realtek - Add support of ALC891 codec · edbaa560
      Kailang Yang authored
      commit b6c5fbad upstream.
      
      New codec support for ALC891.
      Signed-off-by: default avatarKailang Yang <kailang@realtek.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      edbaa560
    • Anton Blanchard's avatar
      powerpc: 64bit sendfile is capped at 2GB · cb722eb4
      Anton Blanchard authored
      commit 5d73320a upstream.
      
      commit 8f9c0119 (compat: fs: Generic compat_sys_sendfile
      implementation) changed the PowerPC 64bit sendfile call from
      sys_sendile64 to sys_sendfile.
      
      Unfortunately this broke sendfile of lengths greater than 2G because
      sys_sendfile caps at MAX_NON_LFS. Restore what we had previously which
      fixes the bug.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      cb722eb4
    • Benjamin Herrenschmidt's avatar
      powerpc/serial: Use saner flags when creating legacy ports · 3070babc
      Benjamin Herrenschmidt authored
      commit c4cad90f upstream.
      
      We had a mix & match of flags used when creating legacy ports
      depending on where we found them in the device-tree. Among others
      we were missing UPF_SKIP_TEST for some kind of ISA ports which is
      a problem as quite a few UARTs out there don't support the loopback
      test (such as a lot of BMCs).
      
      Let's pick the set of flags used by the SoC code and generalize it
      which means autoconf, no loopback test, irq maybe shared and fixed
      port.
      
      Sending to stable as the lack of UPF_SKIP_TEST is breaking
      serial on some machines so I want this back into distros
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3070babc
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) · 4a05b868
      Naoya Horiguchi authored
      commit 3ba08129 upstream.
      
      Currently memory error handler handles action optional errors in the
      deferred manner by default.  And if a recovery aware application wants
      to handle it immediately, it can do it by setting PF_MCE_EARLY flag.
      However, such signal can be sent only to the main thread, so it's
      problematic if the application wants to have a dedicated thread to
      handler such signals.
      
      So this patch adds dedicated thread support to memory error handler.  We
      have PF_MCE_EARLY flags for each thread separately, so with this patch
      AO signal is sent to the thread with PF_MCE_EARLY flag set, not the main
      thread.  If you want to implement a dedicated thread, you call prctl()
      to set PF_MCE_EARLY on the thread.
      
      Memory error handler collects processes to be killed, so this patch lets
      it check PF_MCE_EARLY flag on each thread in the collecting routines.
      
      No behavioral change for all non-early kill cases.
      
      Tony said:
      
      : The old behavior was crazy - someone with a multithreaded process might
      : well expect that if they call prctl(PF_MCE_EARLY) in just one thread, then
      : that thread would see the SIGBUS with si_code = BUS_MCEERR_A0 - even if
      : that thread wasn't the main thread for the process.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Kamil Iskra <iskra@mcs.anl.gov>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chen Gong <gong.chen@linux.jf.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4a05b868
    • Tony Luck's avatar
      mm/memory-failure.c: don't let collect_procs() skip over processes for MF_ACTION_REQUIRED · 09a11b80
      Tony Luck authored
      commit 74614de1 upstream.
      
      When Linux sees an "action optional" machine check (where h/w has reported
      an error that is not in the current execution path) we generally do not
      want to signal a process, since most processes do not have a SIGBUS
      handler - we'd just prematurely terminate the process for a problem that
      they might never actually see.
      
      task_early_kill() decides whether to consider a process - and it checks
      whether this specific process has been marked for early signals with
      "prctl", or if the system administrator has requested early signals for
      all processes using /proc/sys/vm/memory_failure_early_kill.
      
      But for MF_ACTION_REQUIRED case we must not defer.  The error is in the
      execution path of the current thread so we must send the SIGBUS
      immediatley.
      
      Fix by passing a flag argument through collect_procs*() to
      task_early_kill() so it knows whether we can defer or must take action.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chen Gong <gong.chen@linux.jf.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      09a11b80
    • Tony Luck's avatar
      mm/memory-failure.c-failure: send right signal code to correct thread · 906d967d
      Tony Luck authored
      commit a70ffcac upstream.
      
      When a thread in a multi-threaded application hits a machine check because
      of an uncorrectable error in memory - we want to send the SIGBUS with
      si.si_code = BUS_MCEERR_AR to that thread.  Currently we fail to do that
      if the active thread is not the primary thread in the process.
      collect_procs() just finds primary threads and this test:
      
      	if ((flags & MF_ACTION_REQUIRED) && t == current) {
      
      will see that the thread we found isn't the current thread and so send a
      si.si_code = BUS_MCEERR_AO to the primary (and nothing to the active
      thread at this time).
      
      We can fix this by checking whether "current" shares the same mm with the
      process that collect_procs() said owned the page.  If so, we send the
      SIGBUS to current (with code BUS_MCEERR_AR).
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarOtto Bruggeman <otto.g.bruggeman@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chen Gong <gong.chen@linux.jf.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      906d967d
    • Mel Gorman's avatar
      mm: page_alloc: use word-based accesses for get/set pageblock bitmaps · 51a5edbf
      Mel Gorman authored
      commit e58469ba upstream.
      
      The test_bit operations in get/set pageblock flags are expensive.  This
      patch reads the bitmap on a word basis and use shifts and masks to isolate
      the bits of interest.  Similarly masks are used to set a local copy of the
      bitmap and then use cmpxchg to update the bitmap if there have been no
      other changes made in parallel.
      
      In a test running dd onto tmpfs the overhead of the pageblock-related
      functions went from 1.27% in profiles to 0.5%.
      
      In addition to the performance benefits, this patch closes races that are
      possible between:
      
      a) get_ and set_pageblock_migratetype(), where get_pageblock_migratetype()
         reads part of the bits before and other part of the bits after
         set_pageblock_migratetype() has updated them.
      
      b) set_pageblock_migratetype() and set_pageblock_skip(), where the non-atomic
         read-modify-update set bit operation in set_pageblock_skip() will cause
         lost updates to some bits changed in the set_pageblock_migratetype().
      
      Joonsoo Kim first reported the case a) via code inspection.  Vlastimil
      Babka's testing with a debug patch showed that either a) or b) occurs
      roughly once per mmtests' stress-highalloc benchmark (although not
      necessarily in the same pageblock).  Furthermore during development of
      unrelated compaction patches, it was observed that frequent calls to
      {start,undo}_isolate_page_range() the race occurs several thousands of
      times and has resulted in NULL pointer dereferences in move_freepages()
      and free_one_page() in places where free_list[migratetype] is
      manipulated by e.g.  list_move().  Further debugging confirmed that
      migratetype had invalid value of 6, causing out of bounds access to the
      free_list array.
      
      That confirmed that the race exist, although it may be extremely rare,
      and currently only fatal where page isolation is performed due to
      memory hot remove.  Races on pageblocks being updated by
      set_pageblock_migratetype(), where both old and new migratetype are
      lower MIGRATE_RESERVE, currently cannot result in an invalid value
      being observed, although theoretically they may still lead to
      unexpected creation or destruction of MIGRATE_RESERVE pageblocks.
      Furthermore, things could get suddenly worse when memory isolation is
      used more, or when new migratetypes are added.
      
      After this patch, the race has no longer been observed in testing.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-and-tested-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ kamal: backport to 3.13-stable: context ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      51a5edbf
    • Michal Hocko's avatar
      memcg: do not hang on OOM when killed by userspace OOM access to memory reserves · 6d38ef23
      Michal Hocko authored
      commit d8dc595c upstream.
      
      Eric has reported that he can see task(s) stuck in memcg OOM handler
      regularly.  The only way out is to
      
      	echo 0 > $GROUP/memory.oom_control
      
      His usecase is:
      
      - Setup a hierarchy with memory and the freezer (disable kernel oom and
        have a process watch for oom).
      
      - In that memory cgroup add a process with one thread per cpu.
      
      - In one thread slowly allocate once per second I think it is 16M of ram
        and mlock and dirty it (just to force the pages into ram and stay
        there).
      
      - When oom is achieved loop:
        * attempt to freeze all of the tasks.
        * if frozen send every task SIGKILL, unfreeze, remove the directory in
          cgroupfs.
      
      Eric has then pinpointed the issue to be memcg specific.
      
      All tasks are sitting on the memcg_oom_waitq when memcg oom is disabled.
      Those that have received fatal signal will bypass the charge and should
      continue on their way out.  The tricky part is that the exit path might
      trigger a page fault (e.g.  exit_robust_list), thus the memcg charge,
      while its memcg is still under OOM because nobody has released any charges
      yet.
      
      Unlike with the in-kernel OOM handler the exiting task doesn't get
      TIF_MEMDIE set so it doesn't shortcut further charges of the killed task
      and falls to the memcg OOM again without any way out of it as there are no
      fatal signals pending anymore.
      
      This patch fixes the issue by checking PF_EXITING early in
      mem_cgroup_try_charge and bypass the charge same as if it had fatal
      signal pending or TIF_MEMDIE set.
      
      Normally exiting tasks (aka not killed) will bypass the charge now but
      this should be OK as the task is leaving and will release memory and
      increasing the memory pressure just to release it in a moment seems
      dubious wasting of cycles.  Besides that charges after exit_signals should
      be rare.
      
      I am bringing this patch again (rebased on the current mmotm tree). I
      hope we can move forward finally. If there is still an opposition then
      I would really appreciate a concurrent approach so that we can discuss
      alternatives.
      
      http://comments.gmane.org/gmane.linux.kernel.stable/77650 is a reference
      to the followup discussion when the patch has been dropped from the mmotm
      last time.
      Reported-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [ kamal: backport to 3.13: whitespace ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6d38ef23
    • Mel Gorman's avatar
      mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL · 08f3f30e
      Mel Gorman authored
      commit 675becce upstream.
      
      throttle_direct_reclaim() is meant to trigger during swap-over-network
      during which the min watermark is treated as a pfmemalloc reserve.  It
      throttes on the first node in the zonelist but this is flawed.
      
      The user-visible impact is that a process running on CPU whose local
      memory node has no ZONE_NORMAL will stall for prolonged periods of time,
      possibly indefintely.  This is due to throttle_direct_reclaim thinking the
      pfmemalloc reserves are depleted when in fact they don't exist on that
      node.
      
      On a NUMA machine running a 32-bit kernel (I know) allocation requests
      from CPUs on node 1 would detect no pfmemalloc reserves and the process
      gets throttled.  This patch adjusts throttling of direct reclaim to
      throttle based on the first node in the zonelist that has a usable
      ZONE_NORMAL or lower zone.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      08f3f30e
    • Tetsuo Handa's avatar
      kthread: fix return value of kthread_create() upon SIGKILL. · f0fa26c5
      Tetsuo Handa authored
      commit 8fe6929c upstream.
      
      Commit 786235ee ("kthread: make kthread_create() killable") meant
      for allowing kthread_create() to abort as soon as killed by the
      OOM-killer.  But returning -ENOMEM is wrong if killed by SIGKILL from
      userspace.  Change kthread_create() to return -EINTR upon SIGKILL.
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f0fa26c5
    • Naoya Horiguchi's avatar
      hugetlb: restrict hugepage_migration_support() to x86_64 · 7a4e1a3a
      Naoya Horiguchi authored
      commit c177c81e upstream.
      
      Currently hugepage migration is available for all archs which support
      pmd-level hugepage, but testing is done only for x86_64 and there're
      bugs for other archs.  So to avoid breaking such archs, this patch
      limits the availability strictly to x86_64 until developers of other
      archs get interested in enabling this feature.
      
      Simply disabling hugepage migration on non-x86_64 archs is not enough to
      fix the reported problem where sys_move_pages() hits the BUG_ON() in
      follow_page(FOLL_GET), so let's fix this by checking if hugepage
      migration is supported in vma_migratable().
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      7a4e1a3a
    • Hugh Dickins's avatar
      mm: fix sleeping function warning from __put_anon_vma · 93e4ba12
      Hugh Dickins authored
      commit 7f39dda9 upstream.
      
      Trinity reports BUG:
      
        sleeping function called from invalid context at kernel/locking/rwsem.c:47
        in_atomic(): 0, irqs_disabled(): 0, pid: 5787, name: trinity-c27
      
      __might_sleep < down_write < __put_anon_vma < page_get_anon_vma <
      migrate_pages < compact_zone < compact_zone_order < try_to_compact_pages ..
      
      Right, since conversion to mutex then rwsem, we should not put_anon_vma()
      from inside an rcu_read_lock()ed section: fix the two places that did so.
      And add might_sleep() to anon_vma_free(), as suggested by Peter Zijlstra.
      
      Fixes: 88c22088 ("mm: optimize page_lock_anon_vma() fast-path")
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      93e4ba12
    • Jérôme Carretero's avatar
      ahci: Add Device ID for HighPoint RocketRaid 642L · 149f9a46
      Jérôme Carretero authored
      commit d2518365 upstream.
      
      This device normally comes with a proprietary driver, using a web GUI
      to configure RAID:
       http://www.highpoint-tech.com/USA_new/series_rr600-download.htm
      But thankfully it also works out of the box with the AHCI driver,
      being just a Marvell 88SE9235.
      
      Devices 640L, 644L, 644LS should also be supported but not tested here.
      Signed-off-by: default avatarJérôme Carretero <cJ-ko@zougloub.eu>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      149f9a46
    • Alex Deucher's avatar
      drm/radeon: only apply hdmi bpc pll flags when encoder mode is hdmi · f8547ac3
      Alex Deucher authored
      commit 7d5ab300 upstream.
      
      May fix display issues with non-HDMI displays.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f8547ac3
    • Alex Deucher's avatar
      drm/radeon/atom: fix dithering on certain panels · 8a8d34ac
      Alex Deucher authored
      commit 64252835 upstream.
      
      We need to specify the encoder mode as LVDS for eDP
      when using the Crtc_Source atom table in order to properly
      set up the FMT hardware.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=73911Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8a8d34ac
    • Alex Deucher's avatar
      drm/radeon/dp: fix lane/clock setup for dp 1.2 capable devices · 3d7204e7
      Alex Deucher authored
      commit 3b6d9fd2 upstream.
      
      Only DCE5+ asics support DP 1.2.
      
      Noticed by ArtForz on IRC.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3d7204e7
    • Alex Deucher's avatar
      drm/radeon: fix typo in radeon_connector_is_dp12_capable() · e89cd293
      Alex Deucher authored
      commit af5d3653 upstream.
      
      We were checking the ext clock rather than the display clock.
      
      Noticed by ArtForz on IRC.
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e89cd293
    • Alex Deucher's avatar
      vgaswitcheroo: switch the mux to the igp on power down when runpm is enabled · 43c93b6e
      Alex Deucher authored
      commit f2bc5616 upstream.
      
      Avoids blank screens on muxed systems when runpm is active.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=75917Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      43c93b6e
    • Jukka Taimisto's avatar
      Bluetooth: Fix L2CAP deadlock · 4ebf55bc
      Jukka Taimisto authored
      commit 8a96f3cd upstream.
      
      -[0x01 Introduction
      
      We have found a programming error causing a deadlock in Bluetooth subsystem
      of Linux kernel. The problem is caused by missing release_sock() call when
      L2CAP connection creation fails due full accept queue.
      
      The issue can be reproduced with 3.15-rc5 kernel and is also present in
      earlier kernels.
      
      -[0x02 Details
      
      The problem occurs when multiple L2CAP connections are created to a PSM which
      contains listening socket (like SDP) and left pending, for example,
      configuration (the underlying ACL link is not disconnected between
      connections).
      
      When L2CAP connection request is received and listening socket is found the
      l2cap_sock_new_connection_cb() function (net/bluetooth/l2cap_sock.c) is called.
      This function locks the 'parent' socket and then checks if the accept queue
      is full.
      
      1178         lock_sock(parent);
      1179
      1180         /* Check for backlog size */
      1181         if (sk_acceptq_is_full(parent)) {
      1182                 BT_DBG("backlog full %d", parent->sk_ack_backlog);
      1183                 return NULL;
      1184         }
      
      If case the accept queue is full NULL is returned, but the 'parent' socket
      is not released. Thus when next L2CAP connection request is received the code
      blocks on lock_sock() since the parent is still locked.
      
      Also note that for connections already established and waiting for
      configuration to complete a timeout will occur and l2cap_chan_timeout()
      (net/bluetooth/l2cap_core.c) will be called. All threads calling this
      function will also be blocked waiting for the channel mutex since the thread
      which is waiting on lock_sock() alread holds the channel mutex.
      
      We were able to reproduce this by sending continuously L2CAP connection
      request followed by disconnection request containing invalid CID. This left
      the created connections pending configuration.
      
      After the deadlock occurs it is impossible to kill bluetoothd, btmon will not
      get any more data etc. requiring reboot to recover.
      
      -[0x03 Fix
      
      Releasing the 'parent' socket when l2cap_sock_new_connection_cb() returns NULL
      seems to fix the issue.
      Signed-off-by: default avatarJukka Taimisto <jtt@codenomicon.com>
      Reported-by: default avatarTommi Mäkilä <tmakila@codenomicon.com>
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4ebf55bc
    • hujianyang's avatar
      UBIFS: Remove incorrect assertion in shrink_tnc() · ba2706aa
      hujianyang authored
      commit 72abc8f4 upstream.
      
      I hit the same assert failed as Dolev Raviv reported in Kernel v3.10
      shows like this:
      
      [ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297)
      [ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G           O 3.10.40 #1
      [ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24)
      [ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28)
      [ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs])
      [ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs])
      [ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8)
      [ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544)
      [ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398)
      [ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8)
      [ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238)
      [ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c)
      [ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188)
      [ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs])
      [ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs])
      [ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418)
      [ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530)
      [ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54)
      [ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184)
      [ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac)
      [ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0)
      [ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40)
      
      After analyzing the code, I found a condition that may cause this failed
      in correct operations. Thus, I think this assertion is wrong and should be
      removed.
      
      Suppose there are two clean znodes and one dirty znode in TNC. So the
      per-filesystem atomic_t @clean_zn_cnt is (2). If commit start, dirty_znode
      is set to COW_ZNODE in get_znodes_to_commit() in case of potentially ops
      on this znode. We clear COW bit and DIRTY bit in write_index() without
      @tnc_mutex locked. We don't increase @clean_zn_cnt in this place. As the
      comments in write_index() shows, if another process hold @tnc_mutex and
      dirty this znode after we clean it, @clean_zn_cnt would be decreased to (1).
      We will increase @clean_zn_cnt to (2) with @tnc_mutex locked in
      free_obsolete_znodes() to keep it right.
      
      If shrink_tnc() performs between decrease and increase, it will release
      other 2 clean znodes it holds and found @clean_zn_cnt is less than zero
      (1 - 2 = -1), then hit the assertion. Because free_obsolete_znodes() will
      soon correct @clean_zn_cnt and no harm to fs in this case, I think this
      assertion could be removed.
      
      2 clean zondes and 1 dirty znode, @clean_zn_cnt == 2
      
      Thread A (commit)         Thread B (write or others)       Thread C (shrinker)
      ->write_index
         ->clear_bit(DIRTY_NODE)
         ->clear_bit(COW_ZNODE)
      
                  @clean_zn_cnt == 2
                                ->mutex_locked(&tnc_mutex)
                                ->dirty_cow_znode
                                    ->!ubifs_zn_cow(znode)
                                    ->!test_and_set_bit(DIRTY_NODE)
                                    ->atomic_dec(&clean_zn_cnt)
                                ->mutex_unlocked(&tnc_mutex)
      
                  @clean_zn_cnt == 1
                                                                 ->mutex_locked(&tnc_mutex)
                                                                 ->shrink_tnc
                                                                   ->destroy_tnc_subtree
                                                                   ->atomic_sub(&clean_zn_cnt, 2)
                                                                   ->ubifs_assert  <- hit
                                                                 ->mutex_unlocked(&tnc_mutex)
      
                  @clean_zn_cnt == -1
      ->mutex_lock(&tnc_mutex)
      ->free_obsolete_znodes
         ->atomic_inc(&clean_zn_cnt)
      ->mutux_unlock(&tnc_mutex)
      
                  @clean_zn_cnt == 0 (correct after shrink)
      Signed-off-by: default avatarhujianyang <hujianyang@huawei.com>
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ba2706aa
    • Peter Ujfalusi's avatar
      ASoC: tlv320aci3x: Fix custom snd_soc_dapm_put_volsw_aic3x() function · f2d5bb37
      Peter Ujfalusi authored
      commit e6c111fa upstream.
      
      For some unknown reason the parameters for snd_soc_test_bits() were in wrong
      order:
      It was:
      snd_soc_test_bits(codec, val, mask, reg); /* WRONG!!! */
      while it should be:
      snd_soc_test_bits(codec, reg, mask, val);
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@ti.com>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f2d5bb37
    • Christoph Hellwig's avatar
      nfsd: getattr for FATTR4_WORD0_FILES_AVAIL needs the statfs buffer · 3127268e
      Christoph Hellwig authored
      commit 12337901 upstream.
      
      Note nobody's ever noticed because the typical client probably never
      requests FILES_AVAIL without also requesting something else on the list.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3127268e
    • Ming Lei's avatar
      block: virtio_blk: don't hold spin lock during world switch · c1ce02b2
      Ming Lei authored
      commit e8edca6f upstream.
      
      Firstly, it isn't necessary to hold lock of vblk->vq_lock
      when notifying hypervisor about queued I/O.
      
      Secondly, virtqueue_notify() will cause world switch and
      it may take long time on some hypervisors(such as, qemu-arm),
      so it isn't good to hold the lock and block other vCPUs.
      
      On arm64 quad core VM(qemu-kvm), the patch can increase I/O
      performance a lot with VIRTIO_RING_F_EVENT_IDX enabled:
      	- without the patch: 14K IOPS
      	- with the patch: 34K IOPS
      
      fio script:
      	[global]
      	direct=1
      	bsrange=4k-4k
      	timeout=10
      	numjobs=4
      	ioengine=libaio
      	iodepth=64
      
      	filename=/dev/vdc
      	group_reporting=1
      
      	[f1]
      	rw=randread
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      c1ce02b2
    • James Hogan's avatar
      MIPS: KVM: Allocate at least 16KB for exception handlers · ed89c43b
      James Hogan authored
      commit 7006e2df upstream.
      
      Each MIPS KVM guest has its own copy of the KVM exception vector. This
      contains the TLB refill exception handler at offset 0x000, the general
      exception handler at offset 0x180, and interrupt exception handlers at
      offset 0x200 in case Cause_IV=1. A common handler is copied to offset
      0x2000 and offset 0x3000 is used for temporarily storing k1 during entry
      from guest.
      
      However the amount of memory allocated for this purpose is calculated as
      0x200 rounded up to the next page boundary, which is insufficient if 4KB
      pages are in use. This can lead to the common handler at offset 0x2000
      being overwritten and infinitely recursive exceptions on the next exit
      from the guest.
      
      Increase the minimum size from 0x200 to 0x4000 to cover the full use of
      the page.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ed89c43b
    • Yann Droneaud's avatar
      RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp · e05f0518
      Yann Droneaud authored
      commit b6f04d3d upstream.
      
      The i386 ABI disagrees with most other ABIs regarding alignment of
      data types larger than 4 bytes: on most ABIs a padding must be added
      at end of the structures, while it is not required on i386.
      
      So for most ABI struct c4iw_create_cq_resp gets implicitly padded
      to be aligned on a 8 bytes multiple, while for i386, such padding
      is not added.
      
      The tool pahole can be used to find such implicit padding:
      
        $ pahole --anon_include \
                 --nested_anon_include \
                 --recursive \
                 --class_name c4iw_create_cq_resp \
                 drivers/infiniband/hw/cxgb4/iw_cxgb4.o
      
      Then, structure layout can be compared between i386 and x86_64:
      
        +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt   2014-03-28 11:43:05.547432195 +0100
        --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100
        @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp {
                __u32                      size;                 /*    28     4 */
                __u32                      qid_mask;             /*    32     4 */
      
        -       /* size: 36, cachelines: 1, members: 6 */
        -       /* last cacheline: 36 bytes */
        +       /* size: 40, cachelines: 1, members: 6 */
        +       /* padding: 4 */
        +       /* last cacheline: 40 bytes */
         };
      
      This ABI disagreement will make an x86_64 kernel try to write past the
      buffer provided by an i386 binary.
      
      When boundary check will be implemented, the x86_64 kernel will refuse
      to write past the i386 userspace provided buffer and the uverbs will
      fail.
      
      If the structure is on a page boundary and the next page is not
      mapped, ib_copy_to_udata() will fail and the uverb will fail.
      
      This patch adds an explicit padding at end of structure
      c4iw_create_cq_resp, and, like 92b0ca7c ("IB/mlx5: Fix stack info
      leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq()
      not writting this padding field to userspace. This way, x86_64 kernel
      will be able to write struct c4iw_create_cq_resp as expected by
      unpatched and patched i386 libcxgb4.
      
      Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com
      Fixes: cfdda9d7 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC")
      Fixes: e24a72a3 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()")
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Acked-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e05f0518
    • Bart Van Assche's avatar
      IB/umad: Fix error handling · 98260bc9
      Bart Van Assche authored
      commit 8ec0a0e6 upstream.
      
      Avoid leaking a kref count in ib_umad_open() if port->ib_dev == NULL
      or if nonseekable_open() fails.
      
      Avoid leaking a kref count, that sm_sem is kept down and also that the
      IB_PORT_SM capability mask is not cleared in ib_umad_sm_open() if
      nonseekable_open() fails.
      
      Since container_of() never returns NULL, remove the code that tests
      whether container_of() returns NULL.
      
      Moving the kref_get() call from the start of ib_umad_*open() to the
      end is safe since it is the responsibility of the caller of these
      functions to ensure that the cdev pointer remains valid until at least
      when these functions return.
      Signed-off-by: default avatarBart Van Assche <bvanassche@acm.org>
      
      [ydroneaud@opteya.com: rework a bit to reduce the amount of code changed]
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      
      [ nonseekable_open() can't actually fail, but....  - Roland ]
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      98260bc9
    • Aleksander Morgado's avatar
      usb: qcserial: add additional Sierra Wireless QMI devices · 884e86fd
      Aleksander Morgado authored
      commit 0ce5fb58 upstream.
      
      A set of new VID/PIDs retrieved from the out-of-tree GobiNet/GobiSerial
      Sierra Wireless drivers.
      Signed-off-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Link: http://marc.info/?l=linux-usb&m=140136310027293&w=2Signed-off-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ Aleksander: backport to 3.13-stable ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      884e86fd
    • Aleksander Morgado's avatar
      usb: qcserial: add Netgear AirCard 341U · f2e5e154
      Aleksander Morgado authored
      commit ff1fcd50 upstream.
      Signed-off-by: default avatarAleksander Morgado <aleksander@aleksander.es>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ Aleksander: backport to 3.13-stable ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f2e5e154
    • Radim Krčmář's avatar
      hv: use correct order when freeing monitor_pages · 9ea7ebad
      Radim Krčmář authored
      commit a100d88d upstream.
      
      We try to free two pages when only one has been allocated.
      Cleanup path is unlikely, so I haven't found any trace that would fit,
      but I hope that free_pages_prepare() does catch it.
      Signed-off-by: default avatarRadim Krčmář <rkrcmar@redhat.com>
      Reviewed-by: default avatarAmos Kong <akong@redhat.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9ea7ebad
    • Thomas Petazzoni's avatar
      mtd: pxa3xx_nand: make the driver work on big-endian systems · bec5a431
      Thomas Petazzoni authored
      commit b7e46062 upstream.
      
      The pxa3xx_nand driver currently uses __raw_writel() and __raw_readl()
      to access I/O registers. However, those functions do not do any
      endianness swapping, which means that they won't work when the CPU
      runs in big-endian but the I/O registers are little endian, which is
      the common situation for ARM systems running big endian.
      
      Since __raw_writel() and __raw_readl() do not include any memory
      barriers and the pxa3xx_nand driver can only be compiled for ARM
      platforms, the closest I/o accessors functions that do endianess
      swapping are writel_relaxed() and readl_relaxed().
      
      This patch has been verified to work on Armada XP GP: without the
      patch, the NAND is not detected when the kernel runs big endian while
      it is properly detected when the kernel runs little endian. With the
      patch applied, the NAND is properly detected in both situations
      (little and big endian).
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarBrian Norris <computersforpeace@gmail.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      bec5a431
    • Arik Nemtsov's avatar
      mac80211: don't check netdev state for debugfs read/write · 311f89c2
      Arik Nemtsov authored
      commit 923eaf36 upstream.
      
      Doing so will lead to an oops for a p2p-dev interface, since it has
      no netdev.
      Signed-off-by: default avatarArik Nemtsov <arikx.nemtsov@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      311f89c2
    • Felix Fietkau's avatar
      mac80211: fix a memory leak on sta rate selection table · 411ba7c3
      Felix Fietkau authored
      commit 53d04525 upstream.
      
      If the rate control algorithm uses a selection table, it
      is leaked when the station is destroyed - fix that.
      Signed-off-by: default avatarFelix Fietkau <nbd@openwrt.org>
      Reported-by: default avatarChristophe Prévotaux <cprevotaux@nltinc.com>
      Fixes: 0d528d85 ("mac80211: improve the rate control API")
      [add commit log entry, remove pointless NULL check]
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      411ba7c3
    • Christian Borntraeger's avatar
      s390/lowcore: reserve 96 bytes for IRB in lowcore · 85b70eb2
      Christian Borntraeger authored
      commit 993072ee upstream.
      
      The IRB might be 96 bytes if the extended-I/O-measurement facility is
      used. This feature is currently not used by Linux, but struct irb
      already has the emw defined. So let's make the irb in lowcore match the
      size of the internal data structure to be future proof.
      We also have to add a pad, to correctly align the paste.
      
      The bigger irb field also circumvents a bug in some QEMU versions that
      always write the emw field on test subchannel and therefore destroy the
      paste definitions of this CPU. Running under these QEMU version broke
      some timing functions in the VDSO and all users of these functions,
      e.g. some JREs.
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      85b70eb2
    • Huang Rui's avatar
      usb: usbtest: fix unlink write error with pattern 1 · de0f2ead
      Huang Rui authored
      commit e4d58f5d upstream.
      
      TEST 12 and TEST 24 unlinks the URB write request for N times. When
      host and gadget both initialize pattern 1 (mod 63) data series to
      transfer, the gadget side will complain the wrong data which is not
      expected.  Because in host side, usbtest doesn't fill the data buffer
      as mod 63 and this patch fixed it.
      
      [20285.488974] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready
      [20285.489181] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Not Active
      [20285.489423] dwc3 dwc3.0.auto: ep1out-bulk: req ffff8800aa6cb480 dma aeb50800 length 512 last
      [20285.489727] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Start Transfer' params 00000000 a9eaf000 00000000
      [20285.490055] dwc3 dwc3.0.auto: Command Complete --> 0
      [20285.490281] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready
      [20285.490492] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Active
      [20285.490713] dwc3 dwc3.0.auto: ep1out-bulk: endpoint busy
      [20285.490909] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Complete
      [20285.491117] dwc3 dwc3.0.auto: request ffff8800aa6cb480 from ep1out-bulk completed 512/512 ===> 0
      [20285.491431] zero gadget: bad OUT byte, buf[1] = 0
      [20285.491605] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Set Stall' params 00000000 00000000 00000000
      [20285.491915] dwc3 dwc3.0.auto: Command Complete --> 0
      [20285.492099] dwc3 dwc3.0.auto: queing request ffff8800aa6cb480 to ep1out-bulk length 512
      [20285.492387] dwc3 dwc3.0.auto: ep1out-bulk: Transfer Not Ready
      [20285.492595] dwc3 dwc3.0.auto: ep1out-bulk: reason Transfer Not Active
      [20285.492830] dwc3 dwc3.0.auto: ep1out-bulk: req ffff8800aa6cb480 dma aeb51000 length 512 last
      [20285.493135] dwc3 dwc3.0.auto: ep1out-bulk: cmd 'Start Transfer' params 00000000 a9eaf000 00000000
      [20285.493465] dwc3 dwc3.0.auto: Command Complete --> 0
      Signed-off-by: default avatarHuang Rui <ray.huang@amd.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      de0f2ead
    • Tomas Winkler's avatar
      mei: me: read H_CSR after asserting reset · 986c4aa5
      Tomas Winkler authored
      commit c40765d9 upstream.
      
      According the spec the host should read H_CSR again
      after asserting reset H_RST to ensure that reset was
      read by the firmware
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ kamal: backport to 3.13-stable: also includes mei_me_hw_reset() change from
          33ec0826 mei: revamp mei reset state machine ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      986c4aa5