1. 22 Feb, 2014 28 commits
  2. 20 Feb, 2014 12 commits
    • Greg Kroah-Hartman's avatar
      Linux 3.10.31 · a43e02cf
      Greg Kroah-Hartman authored
      a43e02cf
    • Xishi Qiu's avatar
      mm: fix process accidentally killed by mce because of huge page migration · 6843d925
      Xishi Qiu authored
      Based on c8721bbb upstream, but only the
      bugfix portion pulled out.
      
      Hi Naoya or Greg,
      
      We found a bug in 3.10.x.
      The problem is that we accidentally have a hwpoisoned hugepage in free
      hugepage list. It could happend in the the following scenario:
      
              process A                           process B
      
        migrate_huge_page
        put_page (old hugepage)
          linked to free hugepage list
                                           hugetlb_fault
                                             hugetlb_no_page
                                               alloc_huge_page
                                                 dequeue_huge_page_vma
                                                   dequeue_huge_page_node
                                                     (steal hwpoisoned hugepage)
        set_page_hwpoison_huge_page
        dequeue_hwpoisoned_huge_page
          (fail to dequeue)
      
      I tested this bug, one process keeps allocating huge page, and I 
      use sysfs interface to soft offline a huge page, then received:
      "MCE: Killing UCP:2717 due to hardware memory corruption fault at 8200034"
      
      Upstream kernel is free from this bug because of these two commits:
      
      f15bdfa8
      mm/memory-failure.c: fix memory leak in successful soft offlining
      
      c8721bbb
      mm: memory-hotplug: enable memory hotplug to handle hugepage
      
      The first one, although the problem is about memory leak, this patch
      moves unset_migratetype_isolate(), which is important to avoid the race.
      The latter is not a bug fix and it's too big, so I rewrite a small one.
      
      The following patch can fix this bug.(please apply f15bdfa8 first)
      Signed-off-by: default avatarXishi Qiu <qiuxishi@huawei.com>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6843d925
    • Jan Kara's avatar
      IB/qib: Convert qib_user_sdma_pin_pages() to use get_user_pages_fast() · 2d9258e4
      Jan Kara authored
      commit 603e7729 upstream.
      
      qib_user_sdma_queue_pkts() gets called with mmap_sem held for
      writing. Except for get_user_pages() deep down in
      qib_user_sdma_pin_pages() we don't seem to need mmap_sem at all.  Even
      more interestingly the function qib_user_sdma_queue_pkts() (and also
      qib_user_sdma_coalesce() called somewhat later) call copy_from_user()
      which can hit a page fault and we deadlock on trying to get mmap_sem
      when handling that fault.
      
      So just make qib_user_sdma_pin_pages() use get_user_pages_fast() and
      leave mmap_sem locking for mm.
      
      This deadlock has actually been observed in the wild when the node
      is under memory pressure.
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      [Backported to 3.10: (Thanks to Ben Huthings)
       - Adjust context
       - Adjust indentation and nr_pages argument in qib_user_sdma_pin_pages()]
      Signed-off-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d9258e4
    • Naoya Horiguchi's avatar
      mm/memory-failure.c: fix memory leak in successful soft offlining · b0d4c0f8
      Naoya Horiguchi authored
      commit f15bdfa8 upstream.
      
      After a successful page migration by soft offlining, the source page is
      not properly freed and it's never reusable even if we unpoison it
      afterward.
      
      This is caused by the race between freeing page and setting PG_hwpoison.
      In successful soft offlining, the source page is put (and the refcount
      becomes 0) by putback_lru_page() in unmap_and_move(), where it's linked
      to pagevec and actual freeing back to buddy is delayed.  So if
      PG_hwpoison is set for the page before freeing, the freeing does not
      functions as expected (in such case freeing aborts in
      free_pages_prepare() check.)
      
      This patch tries to make sure to free the source page before setting
      PG_hwpoison on it.  To avoid reallocating, the page keeps
      MIGRATE_ISOLATE until after setting PG_hwpoison.
      
      This patch also removes obsolete comments about "keeping elevated
      refcount" because what they say is not true.  Unlike memory_failure(),
      soft_offline_page() uses no special page isolation code, and the
      soft-offlined pages have no elevated.
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0d4c0f8
    • Stanislaw Gruszka's avatar
      pinctrl: protect pinctrl_list add · 1fc42b84
      Stanislaw Gruszka authored
      commit 7b320cb1 upstream.
      
      We have few fedora bug reports about list corruption on pinctrl,
      for example:
      https://bugzilla.redhat.com/show_bug.cgi?id=1051918
      
      Most likely corruption happen due lack of protection of pinctrl_list
      when adding new nodes to it. Patch corrects that.
      
      Fixes: 42fed7ba ("pinctrl: move subsystem mutex to pinctrl_dev struct")
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: default avatarStephen Warren <swarren@nvidia.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1fc42b84
    • Tony Prisk's avatar
      pinctrl: vt8500: Change devicetree data parsing · c2b570f3
      Tony Prisk authored
      commit f17248ed upstream.
      
      Due to an assumption in the VT8500 pinctrl driver, the value passed
      from devicetree for 'wm,pull' was not explicitly translated before
      being passed to pinconf.
      
      Since v3.10, changes to 'enum pin_config_param', PIN_CONFIG_BIAS_PULL_(UP/DOWN)
      no longer map 1-to-1 with the expected values in devicetree.
      
      This patch adds a small translation between the devicetree values (0..2)
      and the enum pin_config_param equivalent values.
      Signed-off-by: default avatarTony Prisk <linux@prisktech.co.nz>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2b570f3
    • Peter Oberparleiter's avatar
      x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y · f878b94f
      Peter Oberparleiter authored
      commit 6583327c upstream.
      
      Commit d61931d8, "x86: Add optimized popcnt variants" introduced
      compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
      options -fprofile-arcs and -O2, this flag causes gcc to generate
      broken constructor code. As a result, a 64 bit x86 kernel compiled
      with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
      file" and runs into sproadic BUGs during boot.
      
      The gcc people indicate that these kinds of problems are endemic when
      using ad hoc calling conventions.  It is therefore best to treat any
      file compiled with ad hoc calling conventions as an isolated
      environment and avoid things like profiling or coverage analysis,
      since those subsystems assume a "normal" calling conventions.
      
      This patch avoids the bug by excluding lib/hweight.o from coverage
      profiling.
      Reported-by: default avatarMeelis Roos <mroos@linux.ee>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.comSigned-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f878b94f
    • Dave Jones's avatar
      mxl111sf: Fix compile when CONFIG_DVB_USB_MXL111SF is unset · f19148c7
      Dave Jones authored
      commit 13e1b87c upstream.
      
      Fix the following build error:
      
      drivers/media/usb/dvb-usb-v2/
      mxl111sf-tuner.h:72:9: error: expected ‘;’, ‘,’ or ‘)’ before ‘struct’
               struct mxl111sf_tuner_config *cfg)
      Signed-off-by: default avatarDave Jones <davej@fedoraproject.org>
      Signed-off-by: default avatarMichael Krufky <mkrufky@linuxtv.org>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f19148c7
    • Antti Palosaari's avatar
      af9035: add ID [2040:f900] Hauppauge WinTV-MiniStick 2 · 728311f5
      Antti Palosaari authored
      commit f2e4c5e0 upstream.
      
      Add USB ID [2040:f900] for Hauppauge WinTV-MiniStick 2.
      Device is build upon IT9135 chipset.
      Tested-by: default avatarStefan Becker <schtefan@gmx.net>
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      728311f5
    • Mel Gorman's avatar
      x86: mm: change tlb_flushall_shift for IvyBridge · 9fa5c526
      Mel Gorman authored
      commit f98b7a77 upstream.
      
      There was a large performance regression that was bisected to
      commit 611ae8e3 ("x86/tlb: enable tlb flush range support for
      x86").  This patch simply changes the default balance point
      between a local and global flush for IvyBridge.
      
      In the interest of allowing the tests to be reproduced, this
      patch was tested using mmtests 0.15 with the following
      configurations
      
      	configs/config-global-dhp__tlbflush-performance
      	configs/config-global-dhp__scheduler-performance
      	configs/config-global-dhp__network-performance
      
      Results are from two machines
      
      Ivybridge   4 threads:  Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
      Ivybridge   8 threads:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
      
      Page fault microbenchmark showed nothing interesting.
      
      Ebizzy was configured to run multiple iterations and threads.
      Thread counts ranged from 1 to NR_CPUS*2. For each thread count,
      it ran 100 iterations and each iteration lasted 10 seconds.
      
      Ivybridge 4 threads
                          3.13.0-rc7            3.13.0-rc7
                             vanilla           altshift-v3
      Mean   1     6395.44 (  0.00%)     6789.09 (  6.16%)
      Mean   2     7012.85 (  0.00%)     8052.16 ( 14.82%)
      Mean   3     6403.04 (  0.00%)     6973.74 (  8.91%)
      Mean   4     6135.32 (  0.00%)     6582.33 (  7.29%)
      Mean   5     6095.69 (  0.00%)     6526.68 (  7.07%)
      Mean   6     6114.33 (  0.00%)     6416.64 (  4.94%)
      Mean   7     6085.10 (  0.00%)     6448.51 (  5.97%)
      Mean   8     6120.62 (  0.00%)     6462.97 (  5.59%)
      
      Ivybridge 8 threads
                           3.13.0-rc7            3.13.0-rc7
                              vanilla           altshift-v3
      Mean   1      7336.65 (  0.00%)     7787.02 (  6.14%)
      Mean   2      8218.41 (  0.00%)     9484.13 ( 15.40%)
      Mean   3      7973.62 (  0.00%)     8922.01 ( 11.89%)
      Mean   4      7798.33 (  0.00%)     8567.03 (  9.86%)
      Mean   5      7158.72 (  0.00%)     8214.23 ( 14.74%)
      Mean   6      6852.27 (  0.00%)     7952.45 ( 16.06%)
      Mean   7      6774.65 (  0.00%)     7536.35 ( 11.24%)
      Mean   8      6510.50 (  0.00%)     6894.05 (  5.89%)
      Mean   12     6182.90 (  0.00%)     6661.29 (  7.74%)
      Mean   16     6100.09 (  0.00%)     6608.69 (  8.34%)
      
      Ebizzy hits the worst case scenario for TLB range flushing every
      time and it shows for these Ivybridge CPUs at least that the
      default choice is a poor on.  The patch addresses the problem.
      
      Next was a tlbflush microbenchmark written by Alex Shi at
      http://marc.info/?l=linux-kernel&m=133727348217113 .  It
      measures access costs while the TLB is being flushed.  The
      expectation is that if there are always full TLB flushes that
      the benchmark would suffer and it benefits from range flushing
      
      There are 320 iterations of the test per thread count.  The
      number of entries is randomly selected with a min of 1 and max
      of 512.  To ensure a reasonably even spread of entries, the full
      range is broken up into 8 sections and a random number selected
      within that section.
      
      iteration 1, random number between 0-64
      iteration 2, random number between 64-128 etc
      
      This is still a very weak methodology.  When you do not know
      what are typical ranges, random is a reasonable choice but it
      can be easily argued that the opimisation was for smaller ranges
      and an even spread is not representative of any workload that
      matters.  To improve this, we'd need to know the probability
      distribution of TLB flush range sizes for a set of workloads
      that are considered "common", build a synthetic trace and feed
      that into this benchmark.  Even that is not perfect because it
      would not account for the time between flushes but there are
      limits of what can be reasonably done and still be doing
      something useful.  If a representative synthetic trace is
      provided then this benchmark could be revisited and the shift values retuned.
      
      Ivybridge 4 threads
                              3.13.0-rc7            3.13.0-rc7
                                 vanilla           altshift-v3
      Mean       1       10.50 (  0.00%)       10.50 (  0.03%)
      Mean       2       17.59 (  0.00%)       17.18 (  2.34%)
      Mean       3       22.98 (  0.00%)       21.74 (  5.41%)
      Mean       5       47.13 (  0.00%)       46.23 (  1.92%)
      Mean       8       43.30 (  0.00%)       42.56 (  1.72%)
      
      Ivybridge 8 threads
                               3.13.0-rc7            3.13.0-rc7
                                  vanilla           altshift-v3
      Mean       1         9.45 (  0.00%)        9.36 (  0.93%)
      Mean       2         9.37 (  0.00%)        9.70 ( -3.54%)
      Mean       3         9.36 (  0.00%)        9.29 (  0.70%)
      Mean       5        14.49 (  0.00%)       15.04 ( -3.75%)
      Mean       8        41.08 (  0.00%)       38.73 (  5.71%)
      Mean       13       32.04 (  0.00%)       31.24 (  2.49%)
      Mean       16       40.05 (  0.00%)       39.04 (  2.51%)
      
      For both CPUs, average access time is reduced which is good as
      this is the benchmark that was used to tune the shift values in
      the first place albeit it is now known *how* the benchmark was
      used.
      
      The scheduler benchmarks were somewhat inconclusive.  They
      showed gains and losses and makes me reconsider how stable those
      benchmarks really are or if something else might be interfering
      with the test results recently.
      
      Network benchmarks were inconclusive.  Almost all results were
      flat except for netperf-udp tests on the 4 thread machine.
      These results were unstable and showed large variations between
      reboots.  It is unknown if this is a recent problems but I've
      noticed before that netperf-udp results tend to vary.
      
      Based on these results, changing the default for Ivybridge seems
      like a logical choice.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Tested-by: default avatarDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: default avatarAlex Shi <alex.shi@linaro.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fa5c526
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq · 7e405afc
      KOSAKI Motohiro authored
      commit 227d53b3 upstream.
      
      To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
      During aio buffer migration, we have a possibility to see the following
      call stack.
      
      aio_migratepage  [disable interrupt]
        migrate_page_copy
          clear_page_dirty_for_io
            set_page_dirty
              __set_page_dirty_buffers
                __set_page_dirty
                  spin_lock_irq
      
      This mean, current aio migration is a deadlockable.  spin_lock_irqsave
      is a safer alternative and we should use it.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: David Rientjes rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7e405afc
    • KOSAKI Motohiro's avatar
      mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() · dce0b4fc
      KOSAKI Motohiro authored
      commit a85d9df1 upstream.
      
      During aio stress test, we observed the following lockdep warning.  This
      mean AIO+numa_balancing is currently deadlockable.
      
      The problem is, aio_migratepage disable interrupt, but
      __set_page_dirty_nobuffers unintentionally enable it again.
      
      Generally, all helper function should use spin_lock_irqsave() instead of
      spin_lock_irq() because they don't know caller at all.
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&(&ctx->completion_lock)->rlock);
           <Interrupt>
             lock(&(&ctx->completion_lock)->rlock);
      
          *** DEADLOCK ***
      
            dump_stack+0x19/0x1b
            print_usage_bug+0x1f7/0x208
            mark_lock+0x21d/0x2a0
            mark_held_locks+0xb9/0x140
            trace_hardirqs_on_caller+0x105/0x1d0
            trace_hardirqs_on+0xd/0x10
            _raw_spin_unlock_irq+0x2c/0x50
            __set_page_dirty_nobuffers+0x8c/0xf0
            migrate_page_copy+0x434/0x540
            aio_migratepage+0xb1/0x140
            move_to_new_page+0x7d/0x230
            migrate_pages+0x5e5/0x700
            migrate_misplaced_page+0xbc/0xf0
            do_numa_page+0x102/0x190
            handle_pte_fault+0x241/0x970
            handle_mm_fault+0x265/0x370
            __do_page_fault+0x172/0x5a0
            do_page_fault+0x1a/0x70
            page_fault+0x28/0x30
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dce0b4fc