1. 09 Jun, 2012 5 commits
    • Meenakshi Venkataraman's avatar
      iwlwifi: update BT traffic load states correctly · e48fdd4d
      Meenakshi Venkataraman authored
      commit 882dde8e upstream.
      
      When BT traffic load changes from its
      previous state, a new LQ command needs to be
      sent down to the firmware. This needs to
      be done only once per change. The state
      variable that keeps track of this change is
      last_bt_traffic_load. However, it was not
      being updated when the change had been
      handled. Not updating this variable was
      causing a flood of advanced BT config
      commands to be sent to the firmware. Fix
      this.
      Signed-off-by: default avatarMeenakshi Venkataraman <meenakshi.venkataraman@intel.com>
      Signed-off-by: default avatarWey-Yi Guy <wey-yi.w.guy@intel.com>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e48fdd4d
    • Andrea Arcangeli's avatar
      mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition · 2d363e95
      Andrea Arcangeli authored
      commit 26c19178 upstream.
      
      When holding the mmap_sem for reading, pmd_offset_map_lock should only
      run on a pmd_t that has been read atomically from the pmdp pointer,
      otherwise we may read only half of it leading to this crash.
      
      PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
       #0 [f06a9dd8] crash_kexec at c049b5ec
       #1 [f06a9e2c] oops_end at c083d1c2
       #2 [f06a9e40] no_context at c0433ded
       #3 [f06a9e64] bad_area_nosemaphore at c043401a
       #4 [f06a9e6c] __do_page_fault at c0434493
       #5 [f06a9eec] do_page_fault at c083eb45
       #6 [f06a9f04] error_code (via page_fault) at c083c5d5
          EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
          00000000
          DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
          CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
       #7 [f06a9f38] _spin_lock at c083bc14
       #8 [f06a9f44] sys_mincore at c0507b7d
       #9 [f06a9fb0] system_call at c083becd
                               start           len
          EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
          DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
          SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
          CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286
      
      This should be a longstanding bug affecting x86 32bit PAE without THP.
      Only archs with 64bit large pmd_t and 32bit unsigned long should be
      affected.
      
      With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
      would partly hide the bug when the pmd transition from none to stable,
      by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
      enabled a new set of problem arises by the fact could then transition
      freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
      So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
      unconditional isn't good idea and it would be a flakey solution.
      
      This should be fully fixed by introducing a pmd_read_atomic that reads
      the pmd in order with THP disabled, or by reading the pmd atomically
      with cmpxchg8b with THP enabled.
      
      Luckily this new race condition only triggers in the places that must
      already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
      is localized there but this bug is not related to THP.
      
      NOTE: this can trigger on x86 32bit systems with PAE enabled with more
      than 4G of ram, otherwise the high part of the pmd will never risk to be
      truncated because it would be zero at all times, in turn so hiding the
      SMP race.
      
      This bug was discovered and fully debugged by Ulrich, quote:
      
      ----
      [..]
      pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
      eax.
      
          496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
          *pmd)
          497 {
          498         /* depend on compiler for an atomic pmd read */
          499         pmd_t pmdval = *pmd;
      
                                      // edi = pmd pointer
      0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
      ...
                                      // edx = PTE page table high address
      0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
      ...
                                      // eax = PTE page table low address
      0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax
      
      [..]
      
      Please note that the PMD is not read atomically. These are two "mov"
      instructions where the high order bits of the PMD entry are fetched
      first. Hence, the above machine code is prone to the following race.
      
      -  The PMD entry {high|low} is 0x0000000000000000.
         The "mov" at 0xc0507a84 loads 0x00000000 into edx.
      
      -  A page fault (on another CPU) sneaks in between the two "mov"
         instructions and instantiates the PMD.
      
      -  The PMD entry {high|low} is now 0x00000003fda38067.
         The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
      ----
      Reported-by: default avatarUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d363e95
    • Michal Hocko's avatar
      mm: consider all swapped back pages in used-once logic · dce59c2f
      Michal Hocko authored
      commit e4898273 upstream.
      
      Commit 64574746 ("vmscan: detect mapped file pages used only once")
      made mapped pages have another round in inactive list because they might
      be just short lived and so we could consider them again next time.  This
      heuristic helps to reduce pressure on the active list with a streaming
      IO worklods.
      
      This patch fixes a regression introduced by this commit for heavy shmem
      based workloads because unlike Anon pages, which are excluded from this
      heuristic because they are usually long lived, shmem pages are handled
      as a regular page cache.
      
      This doesn't work quite well, unfortunately, if the workload is mostly
      backed by shmem (in memory database sitting on 80% of memory) with a
      streaming IO in the background (backup - up to 20% of memory).  Anon
      inactive list is full of (dirty) shmem pages when watermarks are hit.
      Shmem pages are kept in the inactive list (they are referenced) in the
      first round and it is hard to reclaim anything else so we reach lower
      scanning priorities very quickly which leads to an excessive swap out.
      
      Let's fix this by excluding all swap backed pages (they tend to be long
      lived wrt.  the regular page cache anyway) from used-once heuristic and
      rather activate them if they are referenced.
      
      The customer's workload is shmem backed database (80% of RAM) and they
      are measuring transactions/s with an IO in the background (20%).
      Transactions touch more or less random rows in the table.  The
      transaction rate fell by a factor of 3 (in the worst case) because of
      commit 64574746.  This patch restores the previous numbers.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dce59c2f
    • Jun'ichi Nomura's avatar
      SCSI: Fix dm-multipath starvation when scsi host is busy · f4090d82
      Jun'ichi Nomura authored
      commit b7e94a16 upstream.
      
      block congestion control doesn't have any concept of fairness across
      multiple queues.  This means that if SCSI reports the host as busy in
      the queue congestion control it can result in an unfair starvation
      situation in dm-mp if there are multiple multipath devices on the same
      host.  For example:
      http://www.redhat.com/archives/dm-devel/2012-May/msg00123.html
      
      The fix for this is to report only the sdev busy state (and ignore the
      host busy state) in the block congestion control call back.
      The host is still congested, but the SCSI subsystem will sort out the
      congestion in a fair way because it knows the relation between the
      queues and the host.
      
      [jejb: fixed up trailing whitespace]
      Reported-by: default avatarBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
      Tested-by: default avatarBernd Schubert <bernd.schubert@itwm.fraunhofer.de>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4090d82
    • James Bottomley's avatar
      SCSI: fix scsi_wait_scan · af9c3bad
      James Bottomley authored
      commit 1ff2f403 upstream.
      
      Commit  c7510859
      Author: Rafael J. Wysocki <rjw@sisk.pl>
      Date:   Sun Apr 12 20:06:56 2009 +0200
      
          PM/Hibernate: Wait for SCSI devices scan to complete during resume
      
      Broke the scsi_wait_scan module in 2.6.30.  Apparently debian still uses it so
      fix it and backport to stable before removing it in 3.6.
      
      The breakage is caused because the function template in
      include/scsi/scsi_scan.h is defined to be a nop unless SCSI is built in.
      That means that in the modular case (which is every distro), the
      scsi_wait_scan module does a simple async_synchronize_full() instead of
      waiting for scans.
      Signed-off-by: default avatarJames Bottomley <JBottomley@Parallels.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af9c3bad
  2. 01 Jun, 2012 35 commits