1. 02 Mar, 2015 40 commits
    • Grazvydas Ignotas's avatar
      mm/memory.c: actually remap enough memory · 1c648766
      Grazvydas Ignotas authored
      commit 9cb12d7b upstream.
      
      For whatever reason, generic_access_phys() only remaps one page, but
      actually allows to access arbitrary size.  It's quite easy to trigger
      large reads, like printing out large structure with gdb, which leads to a
      crash.  Fix it by remapping correct size.
      
      Fixes: 28b2ee20 ("access_process_vm device memory infrastructure")
      Signed-off-by: default avatarGrazvydas Ignotas <notasas@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      1c648766
    • Joonsoo Kim's avatar
      mm/compaction: fix wrong order check in compact_finished() · 3ed567d6
      Joonsoo Kim authored
      commit 372549c2 upstream.
      
      What we want to check here is whether there is highorder freepage in buddy
      list of other migratetype in order to steal it without fragmentation.
      But, current code just checks cc->order which means allocation request
      order.  So, this is wrong.
      
      Without this fix, non-movable synchronous compaction below pageblock order
      would not stopped until compaction is complete, because migratetype of
      most pageblocks are movable and high order freepage made by compaction is
      usually on movable type buddy list.
      
      There is some report related to this bug. See below link.
      
        http://www.spinics.net/lists/linux-mm/msg81666.html
      
      Although the issued system still has load spike comes from compaction,
      this makes that system completely stable and responsive according to his
      report.
      
      stress-highalloc test in mmtests with non movable order 7 allocation
      doesn't show any notable difference in allocation success rate, but, it
      shows more compaction success rate.
      
      Compaction success rate (Compaction success * 100 / Compaction stalls, %)
      18.47 : 28.94
      
      Fixes: 1fb3f8ca ("mm: compaction: capture a suitable high-order page immediately when it is made available")
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      3ed567d6
    • Nicholas Bellinger's avatar
      target: Fix PR_APTPL_BUF_LEN buffer size limitation · 736bd6f2
      Nicholas Bellinger authored
      commit f161d4b4 upstream.
      
      This patch addresses the original PR_APTPL_BUF_LEN = 8k limitiation
      for write-out of PR APTPL metadata that Martin has recently been
      running into.
      
      It changes core_scsi3_update_and_write_aptpl() to use vzalloc'ed
      memory instead of kzalloc, and increases the default hardcoded
      length to 256k.
      
      It also adds logic in core_scsi3_update_and_write_aptpl() to double
      the original length upon core_scsi3_update_aptpl_buf() failure, and
      retries until the vzalloc'ed buffer is large enough to accommodate
      the outgoing APTPL metadata.
      Reported-by: default avatarMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      736bd6f2
    • Nicholas Bellinger's avatar
      iscsi-target: Drop problematic active_ts_list usage · b2af1ac5
      Nicholas Bellinger authored
      commit 3fd7b60f upstream.
      
      This patch drops legacy active_ts_list usage within iscsi_target_tq.c
      code.  It was originally used to track the active thread sets during
      iscsi-target shutdown, and is no longer used by modern upstream code.
      
      Two people have reported list corruption using traditional iscsi-target
      and iser-target with the following backtrace, that appears to be related
      to iscsi_thread_set->ts_list being used across both active_ts_list and
      inactive_ts_list.
      
      [   60.782534] ------------[ cut here ]------------
      [   60.782543] WARNING: CPU: 0 PID: 9430 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
      [   60.782545] list_del corruption, ffff88045b00d180->next is LIST_POISON1 (dead000000100100)
      [   60.782546] Modules linked in: ib_srpt tcm_qla2xxx qla2xxx tcm_loop tcm_fc libfc scsi_transport_fc scsi_tgt ib_isert rdma_cm iw_cm ib_addr iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod configfs ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc autofs4 sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ib_ipoib ib_cm ib_uverbs ib_umad mlx4_en mlx4_ib ib_sa ib_mad ib_core mlx4_core dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan vhost tun kvm_intel kvm uinput iTCO_wdt iTCO_vendor_support microcode serio_raw pcspkr sb_edac edac_core sg i2c_i801 lpc_ich mfd_core mtip32xx igb i2c_algo_bit i2c_core ptp pps_core ioatdma dca wmi ext3(F) jbd(F) mbcache(F) sd_mod(F) crc_t10dif(F) crct10dif_common(F) ahci(F) libahci(F) isci(F) libsas(F) scsi_transport_sas(F) [last unloaded: speedstep_lib]
      [   60.782597] CPU: 0 PID: 9430 Comm: iscsi_ttx Tainted: GF 3.12.19+ #2
      [   60.782598] Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
      [   60.782599]  0000000000000035 ffff88044de31d08 ffffffff81553ae7 0000000000000035
      [   60.782602]  ffff88044de31d58 ffff88044de31d48 ffffffff8104d1cc 0000000000000002
      [   60.782605]  ffff88045b00d180 ffff88045b00d0c0 ffff88045b00d0c0 ffff88044de31e58
      [   60.782607] Call Trace:
      [   60.782611]  [<ffffffff81553ae7>] dump_stack+0x49/0x62
      [   60.782615]  [<ffffffff8104d1cc>] warn_slowpath_common+0x8c/0xc0
      [   60.782618]  [<ffffffff8104d2b6>] warn_slowpath_fmt+0x46/0x50
      [   60.782620]  [<ffffffff81280933>] __list_del_entry+0x63/0xd0
      [   60.782622]  [<ffffffff812809b1>] list_del+0x11/0x40
      [   60.782630]  [<ffffffffa06e7cf9>] iscsi_del_ts_from_active_list+0x29/0x50 [iscsi_target_mod]
      [   60.782635]  [<ffffffffa06e87b1>] iscsi_tx_thread_pre_handler+0xa1/0x180 [iscsi_target_mod]
      [   60.782642]  [<ffffffffa06fb9ae>] iscsi_target_tx_thread+0x4e/0x220 [iscsi_target_mod]
      [   60.782647]  [<ffffffffa06fb960>] ? iscsit_handle_snack+0x190/0x190 [iscsi_target_mod]
      [   60.782652]  [<ffffffffa06fb960>] ? iscsit_handle_snack+0x190/0x190 [iscsi_target_mod]
      [   60.782655]  [<ffffffff8106f99e>] kthread+0xce/0xe0
      [   60.782657]  [<ffffffff8106f8d0>] ? kthread_freezable_should_stop+0x70/0x70
      [   60.782660]  [<ffffffff8156026c>] ret_from_fork+0x7c/0xb0
      [   60.782662]  [<ffffffff8106f8d0>] ? kthread_freezable_should_stop+0x70/0x70
      [   60.782663] ---[ end trace 9662f4a661d33965 ]---
      
      Since this code is no longer used, go ahead and drop the problematic usage
      all-together.
      Reported-by: default avatarGavin Guo <gavin.guo@canonical.com>
      Reported-by: default avatarMoussa Ba <moussaba@micron.com>
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b2af1ac5
    • Roman Gushchin's avatar
      mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() · 1c000bc0
      Roman Gushchin authored
      commit 8138a67a upstream.
      
      I noticed that "allowed" can easily overflow by falling below 0, because
      (total_vm / 32) can be larger than "allowed".  The problem occurs in
      OVERCOMMIT_NONE mode.
      
      In this case, a huge allocation can success and overcommit the system
      (despite OVERCOMMIT_NONE mode).  All subsequent allocations will fall
      (system-wide), so system become unusable.
      
      The problem was masked out by commit c9b1d098
      ("mm: limit growth of 3% hardcoded other user reserve"),
      but it's easy to reproduce it on older kernels:
      1) set overcommit_memory sysctl to 2
      2) mmap() large file multiple times (with VM_SHARED flag)
      3) try to malloc() large amount of memory
      
      It also can be reproduced on newer kernels, but miss-configured
      sysctl_user_reserve_kbytes is required.
      
      Fix this issue by switching to signed arithmetic here.
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      1c000bc0
    • Roman Gushchin's avatar
      mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() · 2fc66e9f
      Roman Gushchin authored
      commit 5703b087 upstream.
      
      I noticed, that "allowed" can easily overflow by falling below 0,
      because (total_vm / 32) can be larger than "allowed".  The problem
      occurs in OVERCOMMIT_NONE mode.
      
      In this case, a huge allocation can success and overcommit the system
      (despite OVERCOMMIT_NONE mode).  All subsequent allocations will fall
      (system-wide), so system become unusable.
      
      The problem was masked out by commit c9b1d098
      ("mm: limit growth of 3% hardcoded other user reserve"),
      but it's easy to reproduce it on older kernels:
      1) set overcommit_memory sysctl to 2
      2) mmap() large file multiple times (with VM_SHARED flag)
      3) try to malloc() large amount of memory
      
      It also can be reproduced on newer kernels, but miss-configured
      sysctl_user_reserve_kbytes is required.
      
      Fix this issue by switching to signed arithmetic here.
      
      [akpm@linux-foundation.org: use min_t]
      Signed-off-by: default avatarRoman Gushchin <klamm@yandex-team.ru>
      Cc: Andrew Shewmaker <agshew@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      2fc66e9f
    • Vlastimil Babka's avatar
      mm: when stealing freepages, also take pages created by splitting buddy page · 6333e244
      Vlastimil Babka authored
      commit 99592d59 upstream.
      
      When studying page stealing, I noticed some weird looking decisions in
      try_to_steal_freepages().  The first I assume is a bug (Patch 1), the
      following two patches were driven by evaluation.
      
      Testing was done with stress-highalloc of mmtests, using the
      mm_page_alloc_extfrag tracepoint and postprocessing to get counts of how
      often page stealing occurs for individual migratetypes, and what
      migratetypes are used for fallbacks.  Arguably, the worst case of page
      stealing is when UNMOVABLE allocation steals from MOVABLE pageblock.
      RECLAIMABLE allocation stealing from MOVABLE allocation is also not ideal,
      so the goal is to minimize these two cases.
      
      The evaluation of v2 wasn't always clear win and Joonsoo questioned the
      results.  Here I used different baseline which includes RFC compaction
      improvements from [1].  I found that the compaction improvements reduce
      variability of stress-highalloc, so there's less noise in the data.
      
      First, let's look at stress-highalloc configured to do sync compaction,
      and how these patches reduce page stealing events during the test.  First
      column is after fresh reboot, other two are reiterations of test without
      reboot.  That was all accumulater over 5 re-iterations (so the benchmark
      was run 5x3 times with 5 fresh restarts).
      
      Baseline:
      
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        5-nothp-1       5-nothp-2       5-nothp-3
      Page alloc extfrag event                               10264225     8702233    10244125
      Extfrag fragmenting                                    10263271     8701552    10243473
      Extfrag fragmenting for unmovable                         13595       17616       15960
      Extfrag fragmenting unmovable placed with movable          7989       12193        8447
      Extfrag fragmenting for reclaimable                         658        1840        1817
      Extfrag fragmenting reclaimable placed with movable         558        1677        1679
      Extfrag fragmenting for movable                        10249018     8682096    10225696
      
      With Patch 1:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        6-nothp-1       6-nothp-2       6-nothp-3
      Page alloc extfrag event                               11834954     9877523     9774860
      Extfrag fragmenting                                    11833993     9876880     9774245
      Extfrag fragmenting for unmovable                          7342       16129       11712
      Extfrag fragmenting unmovable placed with movable          4191       10547        6270
      Extfrag fragmenting for reclaimable                         373        1130         923
      Extfrag fragmenting reclaimable placed with movable         302         906         738
      Extfrag fragmenting for movable                        11826278     9859621     9761610
      
      With Patch 2:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        7-nothp-1       7-nothp-2       7-nothp-3
      Page alloc extfrag event                                4725990     3668793     3807436
      Extfrag fragmenting                                     4725104     3668252     3806898
      Extfrag fragmenting for unmovable                          6678        7974        7281
      Extfrag fragmenting unmovable placed with movable          2051        3829        4017
      Extfrag fragmenting for reclaimable                         429        1208        1278
      Extfrag fragmenting reclaimable placed with movable         369         976        1034
      Extfrag fragmenting for movable                         4717997     3659070     3798339
      
      With Patch 3:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                        8-nothp-1       8-nothp-2       8-nothp-3
      Page alloc extfrag event                                5016183     4700142     3850633
      Extfrag fragmenting                                     5015325     4699613     3850072
      Extfrag fragmenting for unmovable                          1312        3154        3088
      Extfrag fragmenting unmovable placed with movable          1115        2777        2714
      Extfrag fragmenting for reclaimable                         437        1193        1097
      Extfrag fragmenting reclaimable placed with movable         330         969         879
      Extfrag fragmenting for movable                         5013576     4695266     3845887
      
      In v2 we've seen apparent regression with Patch 1 for unmovable events,
      this is now gone, suggesting it was indeed noise.  Here, each patch
      improves the situation for unmovable events.  Reclaimable is improved by
      patch 1 and then either the same modulo noise, or perhaps sligtly worse -
      a small price for unmovable improvements, IMHO.  The number of movable
      allocations falling back to other migratetypes is most noisy, but it's
      reduced to half at Patch 2 nevertheless.  These are least critical as
      compaction can move them around.
      
      If we look at success rates, the patches don't affect them, that didn't change.
      
      Baseline:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  5-nothp-1             5-nothp-2             5-nothp-3
      Success 1 Min         49.00 (  0.00%)       42.00 ( 14.29%)       41.00 ( 16.33%)
      Success 1 Mean        51.00 (  0.00%)       45.00 ( 11.76%)       42.60 ( 16.47%)
      Success 1 Max         55.00 (  0.00%)       51.00 (  7.27%)       46.00 ( 16.36%)
      Success 2 Min         53.00 (  0.00%)       47.00 ( 11.32%)       44.00 ( 16.98%)
      Success 2 Mean        59.60 (  0.00%)       50.80 ( 14.77%)       48.20 ( 19.13%)
      Success 2 Max         64.00 (  0.00%)       56.00 ( 12.50%)       52.00 ( 18.75%)
      Success 3 Min         84.00 (  0.00%)       82.00 (  2.38%)       78.00 (  7.14%)
      Success 3 Mean        85.60 (  0.00%)       82.80 (  3.27%)       79.40 (  7.24%)
      Success 3 Max         86.00 (  0.00%)       83.00 (  3.49%)       80.00 (  6.98%)
      
      Patch 1:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  6-nothp-1             6-nothp-2             6-nothp-3
      Success 1 Min         49.00 (  0.00%)       44.00 ( 10.20%)       44.00 ( 10.20%)
      Success 1 Mean        51.80 (  0.00%)       46.00 ( 11.20%)       45.80 ( 11.58%)
      Success 1 Max         54.00 (  0.00%)       49.00 (  9.26%)       49.00 (  9.26%)
      Success 2 Min         58.00 (  0.00%)       49.00 ( 15.52%)       48.00 ( 17.24%)
      Success 2 Mean        60.40 (  0.00%)       51.80 ( 14.24%)       50.80 ( 15.89%)
      Success 2 Max         63.00 (  0.00%)       54.00 ( 14.29%)       55.00 ( 12.70%)
      Success 3 Min         84.00 (  0.00%)       81.00 (  3.57%)       79.00 (  5.95%)
      Success 3 Mean        85.00 (  0.00%)       81.60 (  4.00%)       79.80 (  6.12%)
      Success 3 Max         86.00 (  0.00%)       82.00 (  4.65%)       82.00 (  4.65%)
      
      Patch 2:
      
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  7-nothp-1             7-nothp-2             7-nothp-3
      Success 1 Min         50.00 (  0.00%)       44.00 ( 12.00%)       39.00 ( 22.00%)
      Success 1 Mean        52.80 (  0.00%)       45.60 ( 13.64%)       42.40 ( 19.70%)
      Success 1 Max         55.00 (  0.00%)       46.00 ( 16.36%)       47.00 ( 14.55%)
      Success 2 Min         52.00 (  0.00%)       48.00 (  7.69%)       45.00 ( 13.46%)
      Success 2 Mean        53.40 (  0.00%)       49.80 (  6.74%)       48.80 (  8.61%)
      Success 2 Max         57.00 (  0.00%)       52.00 (  8.77%)       52.00 (  8.77%)
      Success 3 Min         84.00 (  0.00%)       81.00 (  3.57%)       79.00 (  5.95%)
      Success 3 Mean        85.00 (  0.00%)       82.40 (  3.06%)       79.60 (  6.35%)
      Success 3 Max         86.00 (  0.00%)       83.00 (  3.49%)       80.00 (  6.98%)
      
      Patch 3:
                                   3.19-rc4              3.19-rc4              3.19-rc4
                                  8-nothp-1             8-nothp-2             8-nothp-3
      Success 1 Min         46.00 (  0.00%)       44.00 (  4.35%)       42.00 (  8.70%)
      Success 1 Mean        50.20 (  0.00%)       45.60 (  9.16%)       44.00 ( 12.35%)
      Success 1 Max         52.00 (  0.00%)       47.00 (  9.62%)       47.00 (  9.62%)
      Success 2 Min         53.00 (  0.00%)       49.00 (  7.55%)       48.00 (  9.43%)
      Success 2 Mean        55.80 (  0.00%)       50.60 (  9.32%)       49.00 ( 12.19%)
      Success 2 Max         59.00 (  0.00%)       52.00 ( 11.86%)       51.00 ( 13.56%)
      Success 3 Min         84.00 (  0.00%)       80.00 (  4.76%)       79.00 (  5.95%)
      Success 3 Mean        85.40 (  0.00%)       81.60 (  4.45%)       80.40 (  5.85%)
      Success 3 Max         87.00 (  0.00%)       83.00 (  4.60%)       82.00 (  5.75%)
      
      While there's no improvement here, I consider reduced fragmentation events
      to be worth on its own.  Patch 2 also seems to reduce scanning for free
      pages, and migrations in compaction, suggesting it has somewhat less work
      to do:
      
      Patch 1:
      
      Compaction stalls                 4153        3959        3978
      Compaction success                1523        1441        1446
      Compaction failures               2630        2517        2531
      Page migrate success           4600827     4943120     5104348
      Page migrate failure             19763       16656       17806
      Compaction pages isolated      9597640    10305617    10653541
      Compaction migrate scanned    77828948    86533283    87137064
      Compaction free scanned      517758295   521312840   521462251
      Compaction cost                   5503        5932        6110
      
      Patch 2:
      
      Compaction stalls                 3800        3450        3518
      Compaction success                1421        1316        1317
      Compaction failures               2379        2134        2201
      Page migrate success           4160421     4502708     4752148
      Page migrate failure             19705       14340       14911
      Compaction pages isolated      8731983     9382374     9910043
      Compaction migrate scanned    98362797    96349194    98609686
      Compaction free scanned      496512560   469502017   480442545
      Compaction cost                   5173        5526        5811
      
      As with v2, /proc/pagetypeinfo appears unaffected with respect to numbers
      of unmovable and reclaimable pageblocks.
      
      Configuring the benchmark to allocate like THP page fault (i.e.  no sync
      compaction) gives much noisier results for iterations 2 and 3 after
      reboot.  This is not so surprising given how [1] offers lower improvements
      in this scenario due to less restarts after deferred compaction which
      would change compaction pivot.
      
      Baseline:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          5-thp-1         5-thp-2         5-thp-3
      Page alloc extfrag event                                8148965     6227815     6646741
      Extfrag fragmenting                                     8147872     6227130     6646117
      Extfrag fragmenting for unmovable                         10324       12942       15975
      Extfrag fragmenting unmovable placed with movable          5972        8495       10907
      Extfrag fragmenting for reclaimable                         601        1707        2210
      Extfrag fragmenting reclaimable placed with movable         520        1570        2000
      Extfrag fragmenting for movable                         8136947     6212481     6627932
      
      Patch 1:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          6-thp-1         6-thp-2         6-thp-3
      Page alloc extfrag event                                8345457     7574471     7020419
      Extfrag fragmenting                                     8343546     7573777     7019718
      Extfrag fragmenting for unmovable                         10256       18535       30716
      Extfrag fragmenting unmovable placed with movable          6893       11726       22181
      Extfrag fragmenting for reclaimable                         465        1208        1023
      Extfrag fragmenting reclaimable placed with movable         353         996         843
      Extfrag fragmenting for movable                         8332825     7554034     6987979
      
      Patch 2:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          7-thp-1         7-thp-2         7-thp-3
      Page alloc extfrag event                                3512847     3020756     2891625
      Extfrag fragmenting                                     3511940     3020185     2891059
      Extfrag fragmenting for unmovable                          9017        6892        6191
      Extfrag fragmenting unmovable placed with movable          1524        3053        2435
      Extfrag fragmenting for reclaimable                         445        1081        1160
      Extfrag fragmenting reclaimable placed with movable         375         918         986
      Extfrag fragmenting for movable                         3502478     3012212     2883708
      
      Patch 3:
                                                         3.19-rc4        3.19-rc4        3.19-rc4
                                                          8-thp-1         8-thp-2         8-thp-3
      Page alloc extfrag event                                3181699     3082881     2674164
      Extfrag fragmenting                                     3180812     3082303     2673611
      Extfrag fragmenting for unmovable                          1201        4031        4040
      Extfrag fragmenting unmovable placed with movable           974        3611        3645
      Extfrag fragmenting for reclaimable                         478        1165        1294
      Extfrag fragmenting reclaimable placed with movable         387         985        1030
      Extfrag fragmenting for movable                         3179133     3077107     2668277
      
      The improvements for first iteration are clear, the rest is much noisier
      and can appear like regression for Patch 1.  Anyway, patch 2 rectifies it.
      
      Allocation success rates are again unaffected so there's no point in
      making this e-mail any longer.
      
      [1] http://marc.info/?l=linux-mm&m=142166196321125&w=2
      
      This patch (of 3):
      
      When __rmqueue_fallback() is called to allocate a page of order X, it will
      find a page of order Y >= X of a fallback migratetype, which is different
      from the desired migratetype.  With the help of try_to_steal_freepages(),
      it may change the migratetype (to the desired one) also of:
      
      1) all currently free pages in the pageblock containing the fallback page
      2) the fallback pageblock itself
      3) buddy pages created by splitting the fallback page (when Y > X)
      
      These decisions take the order Y into account, as well as the desired
      migratetype, with the goal of preventing multiple fallback allocations
      that could e.g.  distribute UNMOVABLE allocations among multiple
      pageblocks.
      
      Originally, decision for 1) has implied the decision for 3).  Commit
      47118af0 ("mm: mmzone: MIGRATE_CMA migration type added") changed that
      (probably unintentionally) so that the buddy pages in case 3) are always
      changed to the desired migratetype, except for CMA pageblocks.
      
      Commit fef903ef ("mm/page_allo.c: restructure free-page stealing code
      and fix a bug") did some refactoring and added a comment that the case of
      3) is intended.  Commit 0cbef29a ("mm: __rmqueue_fallback() should
      respect pageblock type") removed the comment and tried to restore the
      original behavior where 1) implies 3), but due to the previous
      refactoring, the result is instead that only 2) implies 3) - and the
      conditions for 2) are less frequently met than conditions for 1).  This
      may increase fragmentation in situations where the code decides to steal
      all free pages from the pageblock (case 1)), but then gives back the buddy
      pages produced by splitting.
      
      This patch restores the original intended logic where 1) implies 3).
      During testing with stress-highalloc from mmtests, this has shown to
      decrease the number of events where UNMOVABLE and RECLAIMABLE allocations
      steal from MOVABLE pageblocks, which can lead to permanent fragmentation.
      In some cases it has increased the number of events when MOVABLE
      allocations steal from UNMOVABLE or RECLAIMABLE pageblocks, but these are
      fixable by sync compaction and thus less harmful.
      
      Note that evaluation has shown that the behavior introduced by
      47118af0 for buddy pages in case 3) is actually even better than the
      original logic, so the following patch will introduce it properly once
      again.  For stable backports of this patch it makes thus sense to only fix
      versions containing 0cbef29a.
      
      [iamjoonsoo.kim@lge.com: tracepoint fix]
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      6333e244
    • Naoya Horiguchi's avatar
      mm/hugetlb: add migration entry check in __unmap_hugepage_range · dea8682d
      Naoya Horiguchi authored
      commit 9fbc1f63 upstream.
      
      If __unmap_hugepage_range() tries to unmap the address range over which
      hugepage migration is on the way, we get the wrong page because pte_page()
      doesn't work for migration entries.  This patch simply clears the pte for
      migration entries as we do for hwpoison entries.
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      dea8682d
    • Naoya Horiguchi's avatar
      mm/hugetlb: add migration/hwpoisoned entry check in hugetlb_change_protection · cefc5fb5
      Naoya Horiguchi authored
      commit a8bda28d upstream.
      
      There is a race condition between hugepage migration and
      change_protection(), where hugetlb_change_protection() doesn't care about
      migration entries and wrongly overwrites them.  That causes unexpected
      results like kernel crash.  HWPoison entries also can cause the same
      problem.
      
      This patch adds is_hugetlb_entry_(migration|hwpoisoned) check in this
      function to do proper actions.
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      cefc5fb5
    • Naoya Horiguchi's avatar
      mm/hugetlb: fix getting refcount 0 page in hugetlb_fault() · 78ecca66
      Naoya Horiguchi authored
      commit 0f792cf9 upstream.
      
      When running the test which causes the race as shown in the previous patch,
      we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault().
      
      This race happens when pte turns into migration entry just after the first
      check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false.
      To fix this, we need to check pte_present() again after huge_ptep_get().
      
      This patch also reorders taking ptl and doing pte_page(), because
      pte_page() should be done in ptl.  Due to this reordering, we need use
      trylock_page() in page != pagecache_page case to respect locking order.
      
      Fixes: 66aebce7 ("hugetlb: fix race condition in hugetlb_fault()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      78ecca66
    • Naoya Horiguchi's avatar
      mm/hugetlb: take page table lock in follow_huge_pmd() · 7b0f6722
      Naoya Horiguchi authored
      commit e66f17ff upstream.
      
      We have a race condition between move_pages() and freeing hugepages, where
      move_pages() calls follow_page(FOLL_GET) for hugepages internally and
      tries to get its refcount without preventing concurrent freeing.  This
      race crashes the kernel, so this patch fixes it by moving FOLL_GET code
      for hugepages into follow_huge_pmd() with taking the page table lock.
      
      This patch intentionally removes page==NULL check after pte_page.
      This is justified because pte_page() never returns NULL for any
      architectures or configurations.
      
      This patch changes the behavior of follow_huge_pmd() for tail pages and
      then tail pages can be pinned/returned.  So the caller must be changed to
      properly handle the returned tail pages.
      
      We could have a choice to add the similar locking to
      follow_huge_(addr|pud) for consistency, but it's not necessary because
      currently these functions don't support FOLL_GET flag, so let's leave it
      for future development.
      
      Here is the reproducer:
      
        $ cat movepages.c
        #include <stdio.h>
        #include <stdlib.h>
        #include <numaif.h>
      
        #define ADDR_INPUT      0x700000000000UL
        #define HPS             0x200000
        #define PS              0x1000
      
        int main(int argc, char *argv[]) {
                int i;
                int nr_hp = strtol(argv[1], NULL, 0);
                int nr_p  = nr_hp * HPS / PS;
                int ret;
                void **addrs;
                int *status;
                int *nodes;
                pid_t pid;
      
                pid = strtol(argv[2], NULL, 0);
                addrs  = malloc(sizeof(char *) * nr_p + 1);
                status = malloc(sizeof(char *) * nr_p + 1);
                nodes  = malloc(sizeof(char *) * nr_p + 1);
      
                while (1) {
                        for (i = 0; i < nr_p; i++) {
                                addrs[i] = (void *)ADDR_INPUT + i * PS;
                                nodes[i] = 1;
                                status[i] = 0;
                        }
                        ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                              MPOL_MF_MOVE_ALL);
                        if (ret == -1)
                                err("move_pages");
      
                        for (i = 0; i < nr_p; i++) {
                                addrs[i] = (void *)ADDR_INPUT + i * PS;
                                nodes[i] = 0;
                                status[i] = 0;
                        }
                        ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
                                              MPOL_MF_MOVE_ALL);
                        if (ret == -1)
                                err("move_pages");
                }
                return 0;
        }
      
        $ cat hugepage.c
        #include <stdio.h>
        #include <sys/mman.h>
        #include <string.h>
      
        #define ADDR_INPUT      0x700000000000UL
        #define HPS             0x200000
      
        int main(int argc, char *argv[]) {
                int nr_hp = strtol(argv[1], NULL, 0);
                char *p;
      
                while (1) {
                        p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
                                 MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
                        if (p != (void *)ADDR_INPUT) {
                                perror("mmap");
                                break;
                        }
                        memset(p, 0, nr_hp * HPS);
                        munmap(p, nr_hp * HPS);
                }
        }
      
        $ sysctl vm.nr_hugepages=40
        $ ./hugepage 10 &
        $ ./movepages 10 $(pgrep -f hugepage)
      
      Fixes: e632a938 ("mm: migrate: add hugepage migration code to move_pages()")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      7b0f6722
    • Naoya Horiguchi's avatar
      mm/hugetlb: pmd_huge() returns true for non-present hugepage · 24bb9d18
      Naoya Horiguchi authored
      commit cbef8478 upstream.
      
      Migrating hugepages and hwpoisoned hugepages are considered as non-present
      hugepages, and they are referenced via migration entries and hwpoison
      entries in their page table slots.
      
      This behavior causes race condition because pmd_huge() doesn't tell
      non-huge pages from migrating/hwpoisoned hugepages.  follow_page_mask() is
      one example where the kernel would call follow_page_pte() for such
      hugepage while this function is supposed to handle only normal pages.
      
      To avoid this, this patch makes pmd_huge() return true when pmd_none() is
      true *and* pmd_present() is false.  We don't have to worry about mixing up
      non-present pmd entry with normal pmd (pointing to leaf level pte entry)
      because pmd_present() is true in normal pmd.
      
      The same race condition could happen in (x86-specific) gup_pmd_range(),
      where this patch simply adds pmd_present() check instead of pmd_huge().
      This is because gup_pmd_range() is fast path.  If we have non-present
      hugepage in this function, we will go into gup_huge_pmd(), then return 0
      at flag mask check, and finally fall back to the slow path.
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      24bb9d18
    • Mikulas Patocka's avatar
      cpufreq: speedstep-smi: enable interrupts when waiting · a3b6e7d4
      Mikulas Patocka authored
      commit d4d4eda2 upstream.
      
      On Dell Latitude C600 laptop with Pentium 3 850MHz processor, the
      speedstep-smi driver sometimes loads and sometimes doesn't load with
      "change to state X failed" message.
      
      The hardware sometimes refuses to change frequency and in this case, we
      need to retry later. I found out that we need to enable interrupts while
      waiting. When we enable interrupts, the hardware blockage that prevents
      frequency transition resolves and the transition is possible. With
      disabled interrupts, the blockage doesn't resolve (no matter how long do
      we wait). The exact reasons for this hardware behavior are unknown.
      
      This patch enables interrupts in the function speedstep_set_state that can
      be called with disabled interrupts. However, this function is called with
      disabled interrupts only from speedstep_get_freqs, so it shouldn't cause
      any problem.
      
      Signed-off-by: Mikulas Patocka <mpatocka@redhat.com
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a3b6e7d4
    • Trond Myklebust's avatar
      NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args · 237b8a43
      Trond Myklebust authored
      commit d8ba1f97 upstream.
      
      If the call to decode_rc_list() fails due to a memory allocation error,
      then we need to truncate the array size to ensure that we only call
      kfree() on those pointer that were allocated.
      Reported-by: default avatarDavid Ramos <daramos@stanford.edu>
      Fixes: 4aece6a1 ("nfs41: cb_sequence xdr implementation")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      237b8a43
    • Alex Deucher's avatar
      drm/radeon: only enable kv/kb dpm interrupts once v3 · dbdfc09c
      Alex Deucher authored
      commit 410af8d7 upstream.
      
      Enable at init and disable on fini. Workaround for hardware problems.
      
      v2 (chk): extend commit message
      v3: add new function
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: Christian König <christian.koenig@amd.com> (v2)
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      dbdfc09c
    • Christian König's avatar
      drm/radeon: workaround for CP HW bug on CIK · b7573839
      Christian König authored
      commit a9c73a0e upstream.
      
      Emit the EOP twice to avoid cache flushing problems.
      Signed-off-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b7573839
    • Tony Battersby's avatar
      blk-mq: fix double-free in error path · c5caff73
      Tony Battersby authored
      commit 564e559f upstream.
      
      If the allocation of bt->bs fails, then bt->map can be freed twice, once
      in blk_mq_init_bitmap_tags() -> bt_alloc(), and once in
      blk_mq_init_bitmap_tags() -> bt_free().  Fix by setting the pointer to
      NULL after the first free.
      Signed-off-by: default avatarTony Battersby <tonyb@cybernetics.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      c5caff73
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Do not wake up a splice waiter when page is not full · de8548cf
      Steven Rostedt (Red Hat) authored
      commit 1e0d6714 upstream.
      
      When an application connects to the ring buffer via splice, it can only
      read full pages. Splice does not work with partial pages. If there is
      not enough data to fill a page, the splice command will either block
      or return -EAGAIN (if set to nonblock).
      
      Code was added where if the page is not full, to just sleep again.
      The problem is, it will get woken up again on the next event. That
      is, when something is written into the ring buffer, if there is a waiter
      it will wake it up. The waiter would then check the buffer, see that
      it still does not have enough data to fill a page and go back to sleep.
      To make matters worse, when the waiter goes back to sleep, it could
      cause another event, which would wake it back up again to see it
      doesn't have enough data and sleep again. This produces a tremendous
      overhead and fills the ring buffer with noise.
      
      For example, recording sched_switch on an idle system for 10 seconds
      produces 25,350,475 events!!!
      
      Create another wait queue for those waiters wanting full pages.
      When an event is written, it only wakes up waiters if there's a full
      page of data. It does not wake up the waiter if the page is not yet
      full.
      
      After this change, recording sched_switch on an idle system for 10
      seconds produces only 800 events. Getting rid of 25,349,675 useless
      events (99.9969% of events!!), is something to take seriously.
      
      Cc: Rabin Vincent <rabin@rab.in>
      Fixes: e30f53aa "tracing: Do not busy wait in buffer splice"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      de8548cf
    • Jan Kara's avatar
      fsnotify: fix handling of renames in audit · ac627c9e
      Jan Kara authored
      commit 6ee8e25f upstream.
      
      Commit e9fd702a ("audit: convert audit watches to use fsnotify
      instead of inotify") broke handling of renames in audit.  Audit code
      wants to update inode number of an inode corresponding to watched name
      in a directory.  When something gets renamed into a directory to a
      watched name, inotify previously passed moved inode to audit code
      however new fsnotify code passes directory inode where the change
      happened.  That confuses audit and it starts watching parent directory
      instead of a file in a directory.
      
      This can be observed for example by doing:
      
        cd /tmp
        touch foo bar
        auditctl -w /tmp/foo
        touch foo
        mv bar foo
        touch foo
      
      In audit log we see events like:
      
        type=CONFIG_CHANGE msg=audit(1423563584.155:90): auid=1000 ses=2 op="updated rules" path="/tmp/foo" key=(null) list=4 res=1
        ...
        type=PATH msg=audit(1423563584.155:91): item=2 name="bar" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
        type=PATH msg=audit(1423563584.155:91): item=3 name="foo" inode=1046842 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=DELETE
        type=PATH msg=audit(1423563584.155:91): item=4 name="foo" inode=1046884 dev=08:0 2 mode=0100644 ouid=0 ogid=0 rdev=00:00 nametype=CREATE
        ...
      
      and that's it - we see event for the first touch after creating the
      audit rule, we see events for rename but we don't see any event for the
      last touch.  However we start seeing events for unrelated stuff
      happening in /tmp.
      
      Fix the problem by passing moved inode as data in the FS_MOVED_FROM and
      FS_MOVED_TO events instead of the directory where the change happens.
      This doesn't introduce any new problems because noone besides
      audit_watch.c cares about the passed value:
      
        fs/notify/fanotify/fanotify.c cares only about FSNOTIFY_EVENT_PATH events.
        fs/notify/dnotify/dnotify.c doesn't care about passed 'data' value at all.
        fs/notify/inotify/inotify_fsnotify.c uses 'data' only for FSNOTIFY_EVENT_PATH.
        kernel/audit_tree.c doesn't care about passed 'data' at all.
        kernel/audit_watch.c expects moved inode as 'data'.
      
      Fixes: e9fd702a ("audit: convert audit watches to use fsnotify instead of inotify")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Eric Paris <eparis@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ac627c9e
    • Vikram Mulukutla's avatar
      tracing: Fix unmapping loop in tracing_mark_write · 366475b9
      Vikram Mulukutla authored
      commit 7215853e upstream.
      
      Commit 6edb2a8a introduced
      an array map_pages that contains the addresses returned by
      kmap_atomic. However, when unmapping those pages, map_pages[0]
      is unmapped before map_pages[1], breaking the nesting requirement
      as specified in the documentation for kmap_atomic/kunmap_atomic.
      
      This was caught by the highmem debug code present in kunmap_atomic.
      Fix the loop to do the unmapping properly.
      
      Link: http://lkml.kernel.org/r/1418871056-6614-1-git-send-email-markivx@codeaurora.orgReviewed-by: default avatarStephen Boyd <sboyd@codeaurora.org>
      Reported-by: default avatarLime Yang <limey@codeaurora.org>
      Signed-off-by: default avatarVikram Mulukutla <markivx@codeaurora.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      366475b9
    • Konstantin Khlebnikov's avatar
      cfq-iosched: handle failure of cfq group allocation · 3b3c7a2a
      Konstantin Khlebnikov authored
      commit 69abaffe upstream.
      
      Cfq_lookup_create_cfqg() allocates struct blkcg_gq using GFP_ATOMIC.
      In cfq_find_alloc_queue() possible allocation failure is not handled.
      As a result kernel oopses on NULL pointer dereference when
      cfq_link_cfqq_cfqg() calls cfqg_get() for NULL pointer.
      
      Bug was introduced in v3.5 in commit cd1604fa ("blkcg: factor
      out blkio_group creation"). Prior to that commit cfq group lookup
      had returned pointer to root group as fallback.
      
      This patch handles this error using existing fallback oom_cfqq.
      Signed-off-by: default avatarKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Fixes: cd1604fa ("blkcg: factor out blkio_group creation")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      3b3c7a2a
    • Shobhit Kumar's avatar
      drm/i915: Correct the IOSF Dev_FN field for IOSF transfers · 70fd9b70
      Shobhit Kumar authored
      commit d180d2bb upstream.
      
      As per the specififcation, the SB_DevFn is the PCI_DEVFN of the target
      device and not the source. So PCI_DEVFN(2,0) is not correct. Further the
      port ID should be enough to identify devices unless they are MFD. The
      SB_DevFn was intended to remove ambiguity in case of these MFD devices.
      
      For non MFD devices the recommendation for the target device IP was to
      ignore these fields, but not all of them followed the recommendation.
      Some like CCK ignore these fields and hence PCI_DEVFN(2, 0) works and so
      does PCI_DEVFN(0, 0) as it works for DPIO. The issue came to light because
      of GPIONC which was not getting programmed correctly with PCI_DEVFN(2, 0).
      It turned out that this did not follow the recommendation and expected 0
      in this field.
      
      In general the recommendation is to use SB_DevFn as PCI_DEVFN(0, 0) for
      all devices except target PCI devices.
      Signed-off-by: default avatarShobhit Kumar <shobhit.kumar@intel.com>
      Reviewed-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      70fd9b70
    • David Hildenbrand's avatar
      KVM: s390: floating irqs: fix user triggerable endless loop · f823f3b3
      David Hildenbrand authored
      commit 8e2207cd upstream.
      
      If a vm with no VCPUs is created, the injection of a floating irq
      leads to an endless loop in the kernel.
      
      Let's skip the search for a destination VCPU for a floating irq if no
      VCPUs were created.
      Reviewed-by: default avatarDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: default avatarCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: default avatarDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      f823f3b3
    • Hans de Goede's avatar
      ACPI / video: Add disable_native_backlight quirk for Samsung 510R · 3be0cd3f
      Hans de Goede authored
      commit e77a1635 upstream.
      
      Backlight control through the native intel interface does not work properly
      on the Samsung 510R, where as using the acpi_video interface does work, add
      a quirk for this.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1186097Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      3be0cd3f
    • Hans de Goede's avatar
      ACPI / video: Add disable_native_backlight quirk for Samsung 730U3E/740U3E · 53d8495e
      Hans de Goede authored
      commit 3295d730 upstream.
      
      The Samsung 730U3E/740U3E has integrated ATI Radeon graphics, and backlight
      control does not work properly when using the native interfaces.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1094948Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      53d8495e
    • Hans de Goede's avatar
      ACPI / video: Add disable_native_backlight quirk for Dell XPS15 L521X · 7461b84a
      Hans de Goede authored
      commit 6a3ef10b upstream.
      
      The L521X variant of the Dell XPS15 has integrated nvidia graphics, and
      backlight control does not work properly when using the native interfaces.
      
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1163574Signed-off-by: default avatarHans de Goede <hdegoede@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      7461b84a
    • Aaron Lu's avatar
      ACPI / video: Add some Samsung models to disable_native_backlight list · 134dcb36
      Aaron Lu authored
      commit 7d0b9349 upstream.
      
      Several Samsung laptop models (SAMSUNG 870Z5E/880Z5E/680Z5E and
      SAMSUNG 370R4E/370R4V/370R5E/3570RE/370R5V) do not have a working
      native backlight control interface so restore their acpi_videoX
      interface.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=84221
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=84651
      For SAMSUNG 870Z5E/880Z5E/680Z5E:
      Reported-and-tested-by: default avatarBrent Saner <brent.saner@gmail.com>
      Reported-by: default avatarVitaliy Filippov <vitalif@yourcmc.ru>
      Reported-by: default avatarLaszlo KREKACS <laszlo.krekacs.list@gmail.com>
      For SAMSUNG 370R4E/370R4V/370R5E/3570RE/370R5V:
      Reported-by: default avatarVladimir Perepechin <vovochka13@gmail.com>
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      134dcb36
    • Ross Lagerwall's avatar
      xen/manage: Fix USB interaction issues when resuming · be7a2a5d
      Ross Lagerwall authored
      commit 72978b2f upstream.
      
      Commit 61a734d3 ("xen/manage: Always freeze/thaw processes when
      suspend/resuming") ensured that userspace processes were always frozen
      before suspending to reduce interaction issues when resuming devices.
      However, freeze_processes() does not freeze kernel threads.  Freeze
      kernel threads as well to prevent deadlocks with the khubd thread when
      resuming devices.
      
      This is what native suspend and resume does.
      
      Example deadlock:
      [ 7279.648010]  [<ffffffff81446bde>] ? xen_poll_irq_timeout+0x3e/0x50
      [ 7279.648010]  [<ffffffff81448d60>] xen_poll_irq+0x10/0x20
      [ 7279.648010]  [<ffffffff81011723>] xen_lock_spinning+0xb3/0x120
      [ 7279.648010]  [<ffffffff810115d1>] __raw_callee_save_xen_lock_spinning+0x11/0x20
      [ 7279.648010]  [<ffffffff815620b6>] ? usb_control_msg+0xe6/0x120
      [ 7279.648010]  [<ffffffff81747e50>] ? _raw_spin_lock_irq+0x50/0x60
      [ 7279.648010]  [<ffffffff8174522c>] wait_for_completion+0xac/0x160
      [ 7279.648010]  [<ffffffff8109c520>] ? try_to_wake_up+0x2c0/0x2c0
      [ 7279.648010]  [<ffffffff814b60f2>] dpm_wait+0x32/0x40
      [ 7279.648010]  [<ffffffff814b6eb0>] device_resume+0x90/0x210
      [ 7279.648010]  [<ffffffff814b7d71>] dpm_resume+0x121/0x250
      [ 7279.648010]  [<ffffffff8144c570>] ? xenbus_dev_request_and_reply+0xc0/0xc0
      [ 7279.648010]  [<ffffffff814b80d5>] dpm_resume_end+0x15/0x30
      [ 7279.648010]  [<ffffffff81449fba>] do_suspend+0x10a/0x200
      [ 7279.648010]  [<ffffffff8144a2f0>] ? xen_pre_suspend+0x20/0x20
      [ 7279.648010]  [<ffffffff8144a1d0>] shutdown_handler+0x120/0x150
      [ 7279.648010]  [<ffffffff8144c60f>] xenwatch_thread+0x9f/0x160
      [ 7279.648010]  [<ffffffff810ac510>] ? finish_wait+0x80/0x80
      [ 7279.648010]  [<ffffffff8108d189>] kthread+0xc9/0xe0
      [ 7279.648010]  [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80
      [ 7279.648010]  [<ffffffff8175087c>] ret_from_fork+0x7c/0xb0
      [ 7279.648010]  [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80
      
      [ 7441.216287] INFO: task khubd:89 blocked for more than 120 seconds.
      [ 7441.219457]       Tainted: G            X 3.13.11-ckt12.kz #1
      [ 7441.222176] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [ 7441.225827] khubd           D ffff88003f433440     0    89      2 0x00000000
      [ 7441.229258]  ffff88003ceb9b98 0000000000000046 ffff88003ce83000 0000000000013440
      [ 7441.232959]  ffff88003ceb9fd8 0000000000013440 ffff88003cd13000 ffff88003ce83000
      [ 7441.236658]  0000000000000286 ffff88003d3e0000 ffff88003ceb9bd0 00000001001aa01e
      [ 7441.240415] Call Trace:
      [ 7441.241614]  [<ffffffff817442f9>] schedule+0x29/0x70
      [ 7441.243930]  [<ffffffff81743406>] schedule_timeout+0x166/0x2c0
      [ 7441.246681]  [<ffffffff81075b80>] ? call_timer_fn+0x110/0x110
      [ 7441.249339]  [<ffffffff8174357e>] schedule_timeout_uninterruptible+0x1e/0x20
      [ 7441.252644]  [<ffffffff81077710>] msleep+0x20/0x30
      [ 7441.254812]  [<ffffffff81555f00>] hub_port_reset+0xf0/0x580
      [ 7441.257400]  [<ffffffff81558465>] hub_port_init+0x75/0xb40
      [ 7441.259981]  [<ffffffff814bb3c9>] ? update_autosuspend+0x39/0x60
      [ 7441.262817]  [<ffffffff814bb4f0>] ? pm_runtime_set_autosuspend_delay+0x50/0xa0
      [ 7441.266212]  [<ffffffff8155a64a>] hub_thread+0x71a/0x1750
      [ 7441.268728]  [<ffffffff810ac510>] ? finish_wait+0x80/0x80
      [ 7441.271272]  [<ffffffff81559f30>] ? usb_port_resume+0x670/0x670
      [ 7441.274067]  [<ffffffff8108d189>] kthread+0xc9/0xe0
      [ 7441.276305]  [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80
      [ 7441.279131]  [<ffffffff8175087c>] ret_from_fork+0x7c/0xb0
      [ 7441.281659]  [<ffffffff8108d0c0>] ? flush_kthread_worker+0x80/0x80
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      be7a2a5d
    • Takashi Iwai's avatar
      ALSA: hda - Set up GPIO for Toshiba Satellite S50D · ad889794
      Takashi Iwai authored
      commit 4227de2a upstream.
      
      Toshiba Satellite S50D laptop with an IDT codec uses the GPIO4 (0x10)
      as the master EAPD.
      
      Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=915858Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ad889794
    • Takashi Iwai's avatar
      ALSA: hda - Add the pin fixup for HP Envy TS bass speaker · a4ab0d80
      Takashi Iwai authored
      commit 8695a003 upstream.
      
      NID 0x10 seems corresponding to the bass speaker.
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a4ab0d80
    • James Hogan's avatar
      KVM: MIPS: Don't leak FPU/DSP to guest · b4db76b6
      James Hogan authored
      commit f798217d upstream.
      
      The FPU and DSP are enabled via the CP0 Status CU1 and MX bits by
      kvm_mips_set_c0_status() on a guest exit, presumably in case there is
      active state that needs saving if pre-emption occurs. However neither of
      these bits are cleared again when returning to the guest.
      
      This effectively gives the guest access to the FPU/DSP hardware after
      the first guest exit even though it is not aware of its presence,
      allowing FP instructions in guest user code to intermittently actually
      execute instead of trapping into the guest OS for emulation. It will
      then read & manipulate the hardware FP registers which technically
      belong to the user process (e.g. QEMU), or are stale from another user
      process. It can also crash the guest OS by causing an FP exception, for
      which a guest exception handler won't have been registered.
      
      First lets save and disable the FPU (and MSA) state with lose_fpu(1)
      before entering the guest. This simplifies the problem, especially for
      when guest FPU/MSA support is added in the future, and prevents FR=1 FPU
      state being live when the FR bit gets cleared for the guest, which
      according to the architecture causes the contents of the FPU and vector
      registers to become UNPREDICTABLE.
      
      We can then safely remove the enabling of the FPU in
      kvm_mips_set_c0_status(), since there should never be any active FPU or
      MSA state to save at pre-emption, which should plug the FPU leak.
      
      DSP state is always live rather than being lazily restored, so for that
      it is simpler to just clear the MX bit again when re-entering the guest.
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      [ luis: backported to 3.16: files rename:
        - locore.S -> kvm_locore.S
        - mips.c -> kvm_mips.c ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      b4db76b6
    • Alexander Usyskin's avatar
      mei: me: release hw from reset only during the reset flow · f4d17408
      Alexander Usyskin authored
      commit 663b7ee9 upstream.
      
      We might enter the interrupt handler with hw_ready already set,
      but prior we actually started the reset flow.
      To soleve this we move the reset release from the interrupt handler
      to the HW start wait function which is part of the reset sequence.
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      [ luis: backported to 3.16: adjusted context ]
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      f4d17408
    • Alexander Usyskin's avatar
      mei: mask interrupt set bit on clean reset bit · f6106f83
      Alexander Usyskin authored
      commit 1ab1e79b upstream.
      
      We should mask interrupt set bit when writing back
      hcsr value in reset bit clean-up.
      
      This is refinement for
      mei: clean reset bit before reset
      commit b13a65efSigned-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      f6106f83
    • Malcolm Priestley's avatar
      [media] lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb · a9c7abac
      Malcolm Priestley authored
      commit 15e1ce33 upstream.
      
      A quirk of some older firmwares that report endpoint pipe type as PIPE_BULK
      but the endpoint otheriwse functions as interrupt.
      
      Check if usb_endpoint_type is USB_ENDPOINT_XFER_BULK and set as usb_rcvbulkpipe.
      Signed-off-by: default avatarMalcolm Priestley <tvboxspy@gmail.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@osg.samsung.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a9c7abac
    • Peng Tao's avatar
      nfs41: .init_read and .init_write can be called with valid pg_lseg · 6184a5e0
      Peng Tao authored
      commit cb5d04bc upstream.
      
      With pgio refactoring in v3.15, .init_read and .init_write can be
      called with valid pgio->pg_lseg. file layout was fixed at that time
      by commit c6194271 (pnfs: filelayout: support non page aligned
      layouts). But the generic helper still needs to be fixed.
      Signed-off-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      6184a5e0
    • Matej Dubovy's avatar
      Bluetooth: btusb: Add support for Lite-On (04ca) Broadcom based, BCM43142 · 3d08b574
      Matej Dubovy authored
      commit 8f0c304c upstream.
      
      Please add support for sub BT chip on the combo card
      Broadcom 43142A0 (in Lenovo E145), 04ca:2007
      
      /sys/kernel/debug/usb/devices
      
      T:  Bus=05 Lev=01 Prnt=01 Port=01 Cnt=02 Dev#=  3 Spd=12   MxCh= 0
      D:  Ver= 2.00 Cls=ff(vend.) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=04ca ProdID=2007 Rev= 1.12
      S:  Manufacturer=Broadcom Corp
      S:  Product=BCM43142A0
      S:  SerialNumber=28E347EC73BD
      C:* #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=  0mA
      I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=81(I) Atr=03(Int.) MxPS=  16 Ivl=1ms
      E:  Ad=82(I) Atr=02(Bulk) MxPS=  64 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS=  64 Ivl=0ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   0 Ivl=1ms
      I:  If#= 1 Alt= 1 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=   9 Ivl=1ms
      I:  If#= 1 Alt= 2 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  17 Ivl=1ms
      I:  If#= 1 Alt= 3 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  25 Ivl=1ms
      I:  If#= 1 Alt= 4 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  33 Ivl=1ms
      I:  If#= 1 Alt= 5 #EPs= 2 Cls=ff(vend.) Sub=01 Prot=01 Driver=(none)
      E:  Ad=83(I) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      E:  Ad=03(O) Atr=01(Isoc) MxPS=  49 Ivl=1ms
      I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
      E:  Ad=84(I) Atr=02(Bulk) MxPS=  32 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS=  32 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 0 Cls=fe(app. ) Sub=01 Prot=01 Driver=(none)
      
      Firmware for 04ca:2007 can be extracted from the latest Lenovo E145
      Bluetooth driver for Windows (driver is however described as BCM20702
      but contains also firwmare for BCM43142).
      Search for BCM43142A0_001.001.011.0122.0153.hex within hex files, then
      it must be converted using hex2hcd utility. Rename file to
      BCM43142A0-04ca-2007.hcd, then move to /lib/firmware/brcm/.
      Signed-off-by: default avatarMatej Dubovy <matej.dubovy@gmail.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      3d08b574
    • Viresh Kumar's avatar
      cpufreq: Set cpufreq_cpu_data to NULL before putting kobject · a71f2f60
      Viresh Kumar authored
      commit 6ffae8c0 upstream.
      
      In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs
      to be cleared before calling kobject_put(&policy->kobj) and under
      cpufreq_driver_lock. Otherwise, if someone else calls cpufreq_cpu_get()
      in parallel with it, they can obtain a non-NULL policy from that after
      kobject_put(&policy->kobj) was executed.
      
      Consider this case:
      
      Thread A				Thread B
      cpufreq_cpu_get()
        acquire cpufreq_driver_lock
        read-per-cpu cpufreq_cpu_data
      					kobject_put(&policy->kobj);
        kobject_get(&policy->kobj);
      					...
      					per_cpu(&cpufreq_cpu_data, cpu) = NULL
      
      And this will result in a warning like this one:
      
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
       kobject_get+0x41/0x50()
       Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
       lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
       ...
       Call Trace:
        [<ffffffff81661b14>] dump_stack+0x46/0x58
        [<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
        [<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff812e16d1>] kobject_get+0x41/0x50
        [<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
        [<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
        [<ffffffff810b8cb2>] ? up+0x32/0x50
        [<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
        [<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
        [<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
        [<ffffffff81360967>] ? acpi_has_method+0x25/0x40
        [<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
        [<ffffffff81089566>] ? move_linked_works+0x66/0x90
        [<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
        [<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
        [<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
        [<ffffffff8108c910>] process_one_work+0x160/0x410
        [<ffffffff8108d05b>] worker_thread+0x11b/0x520
        [<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
        [<ffffffff81092421>] kthread+0xe1/0x100
        [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
        [<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
        [<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
       ---[ end trace 89e66eb9795efdf7 ]---
      
      The actual code flow is as follows:
      
       Thread A: Workqueue: kacpi_notify
      
       acpi_processor_notify()
         acpi_processor_ppc_has_changed()
               cpufreq_update_policy()
                 cpufreq_cpu_get()
                   kobject_get()
      
       Thread B: xenbus_thread()
      
       xenbus_thread()
         msg->u.watch.handle->callback()
           handle_vcpu_hotplug_event()
             vcpu_hotplug()
               cpu_down()
                 __cpu_notify(CPU_POST_DEAD..)
                   cpufreq_cpu_callback()
                     __cpufreq_remove_dev_finish()
                       cpufreq_policy_put_kobj()
                         kobject_put()
      
      cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data
      under cpufreq_driver_lock, and once it gets a valid policy it expects it
      to not be freed until cpufreq_cpu_put() is called.
      
      But the race happens when another thread puts the kobject first and updates
      cpufreq_cpu_data before or later. And so the first thread gets a valid policy
      structure and before it does kobject_get() on it, the second one has already
      done kobject_put().
      
      Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
      too under locks.
      Reported-by: default avatarEthan Zhao <ethan.zhao@oracle.com>
      Reported-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      a71f2f60
    • Stefan Agner's avatar
      serial: fsl_lpuart: avoid new transfer while DMA is running · 27e53e7c
      Stefan Agner authored
      commit 5f1437f6 upstream.
      
      When the UART is in DMA receive mode (RDMAS set) and one character
      just arrived while another interrupt is handled (e.g. TX), the RDRF
      (receiver data register full flag) is set due to the water level of
      1. But since the DMA will take care of this character, there is no
      need to handle it by calling lpuart_prepare_rx. Handling it leads to
      adding the RX timeout timer twice:
      
      [   74.336698] Kernel BUG at 80053070 [verbose debug info unavailable]
      [   74.342999] Internal error: Oops - BUG: 0 [#1] ARM0:00.00 khungtaskd
      [   74.347817] Modules linked in:    0 S  0.0  0.0   0:00.00 writeback
      [   74.350926] CPU: 0 PID: 0 Comm: swapper Not tainted 3.19.0-rc3-00001-g39d78e2 #1788
      [   74.358617] Hardware name: Freescale Vybrid VF610 (Device Tree)t
      [   74.364563] task: 807a7678 ti: 8079c000 task.ti: 8079c000 kblockd
      [   74.370002] PC is at add_timer+0x24/0x28.0  0.0   0:00.09 kworker/u2:1
      [   74.373960] LR is at lpuart_int+0x15c/0x3d8
      [   74.378171] pc : [<80053070>]    lr : [<802e0d88>]    psr: a0010193
      [   74.378171] sp : 8079de10  ip : 8079de20  fp : 8079de1c
      [   74.389694] r10: 807d44c0  r9 : 8688c300  r8 : 00000013
      [   74.394943] r7 : 20010193  r6 : 00000000  r5 : 000000a0  r4 : 86997210
      [   74.401498] r3 : ffffa7da  r2 : 80817868  r1 : 86997210  r0 : 86997344
      [   74.408052] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
      [   74.415489] Control: 10c5387d  Table: 8611c059  DAC: 00000015
      [   74.421265] Process swapper (pid: 0, stack limit = 0x8079c230)
      ...
      
      Solve this by only execute the receiver path (lpuart_prepare_rx) if
      the DMA receive mode (RDMAS) is not set. Also, make sure the flag is
      cleared on initialization, in case it has been left set.
      
      This can be best reproduced using UART as a serial console, then
      running top while dd'ing data into the terminal.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      27e53e7c
    • Stefan Agner's avatar
      serial: fsl_lpuart: delete timer on shutdown · ac64d693
      Stefan Agner authored
      commit 4a8588a1 upstream.
      
      If the serial port gets closed while a RX transfer is in progress,
      the timer might fire after the serial port shutdown finished. This
      leads in a NULL pointer dereference:
      
      [    7.508324] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [    7.516590] pgd = 86348000
      [    7.519445] [00000000] *pgd=86179831, *pte=00000000, *ppte=00000000
      [    7.526145] Internal error: Oops: 17 [#1] ARM
      [    7.530611] Modules linked in:
      [    7.533876] CPU: 0 PID: 123 Comm: systemd Not tainted 3.19.0-rc3-00004-g5b11ea7 #1778
      [    7.541827] Hardware name: Freescale Vybrid VF610 (Device Tree)
      [    7.547862] task: 861c3400 ti: 86ac8000 task.ti: 86ac8000
      [    7.553392] PC is at lpuart_timer_func+0x24/0xf8
      [    7.558127] LR is at lpuart_timer_func+0x20/0xf8
      [    7.562857] pc : [<802df99c>]    lr : [<802df998>]    psr: 600b0113
      [    7.562857] sp : 86ac9b90  ip : 86ac9b90  fp : 86ac9bbc
      [    7.574467] r10: 80817180  r9 : 80817b98  r8 : 80817998
      [    7.579803] r7 : 807acee0  r6 : 86989000  r5 : 00000100  r4 : 86997210
      [    7.586444] r3 : 86ac8000  r2 : 86ac9bc0  r1 : 86997210  r0 : 00000000
      [    7.593085] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      [    7.600341] Control: 10c5387d  Table: 86348059  DAC: 00000015
      [    7.606203] Process systemd (pid: 123, stack limit = 0x86ac8230)
      
      Setup the timer on UART startup which allows to delete the timer
      unconditionally on shutdown. This also saves the initialization
      on each transfer.
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      ac64d693
    • Peter Hurley's avatar
      tty: Prevent untrappable signals from malicious program · 0d8aa186
      Peter Hurley authored
      commit 37480a05 upstream.
      
      Commit 26df6d13 ("tty: Add EXTPROC support for LINEMODE")
      allows a process which has opened a pty master to send _any_ signal
      to the process group of the pty slave. Although potentially
      exploitable by a malicious program running a setuid program on
      a pty slave, it's unknown if this exploit currently exists.
      
      Limit to signals actually used.
      
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Howard Chu <hyc@symas.com>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Signed-off-by: default avatarPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarLuis Henriques <luis.henriques@canonical.com>
      0d8aa186