1. 31 Jan, 2019 18 commits
    • Eugeniy Paltsev's avatar
      ARCv2: lib: memeset: fix doing prefetchw outside of buffer · 95907c66
      Eugeniy Paltsev authored
      commit e6a72b7d upstream.
      
      ARCv2 optimized memset uses PREFETCHW instruction for prefetching the
      next cache line but doesn't ensure that the line is not past the end of
      the buffer. PRETECHW changes the line ownership and marks it dirty,
      which can cause issues in SMP config when next line was already owned by
      other core. Fix the issue by avoiding the PREFETCHW
      
      Some more details:
      
      The current code has 3 logical loops (ignroing the unaligned part)
        (a) Big loop for doing aligned 64 bytes per iteration with PREALLOC
        (b) Loop for 32 x 2 bytes with PREFETCHW
        (c) any left over bytes
      
      loop (a) was already eliding the last 64 bytes, so PREALLOC was
      safe. The fix was removing PREFETCW from (b).
      
      Another potential issue (applicable to configs with 32 or 128 byte L1
      cache line) is that PREALLOC assumes 64 byte cache line and may not do
      the right thing specially for 32b. While it would be easy to adapt,
      there are no known configs with those lie sizes, so for now, just
      compile out PREALLOC in such cases.
      Signed-off-by: default avatarEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
      Cc: stable@vger.kernel.org #4.4+
      Signed-off-by: default avatarVineet Gupta <vgupta@synopsys.com>
      [vgupta: rewrote changelog, used asm .macro vs. "C" macro]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95907c66
    • Anthony Wong's avatar
      ALSA: hda - Add mute LED support for HP ProBook 470 G5 · 752c738a
      Anthony Wong authored
      commit 69939038 upstream.
      
      Support speaker and mic mute LEDs on HP ProBook 470 G5.
      
      BugLink: https://bugs.launchpad.net/bugs/1811254Signed-off-by: default avatarAnthony Wong <anthony.wong@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      752c738a
    • Gustavo A. R. Silva's avatar
      ASoC: rt5514-spi: Fix potential NULL pointer dereference · 585a4feb
      Gustavo A. R. Silva authored
      commit 060d0bf4 upstream.
      
      There is a potential NULL pointer dereference in case devm_kzalloc()
      fails and returns NULL.
      
      Fix this by adding a NULL check on rt5514_dsp.
      
      This issue was detected with the help of Coccinelle.
      
      Fixes: 6eebf35b ("ASoC: rt5514: add rt5514 SPI driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      585a4feb
    • Kangjie Lu's avatar
      ASoC: atom: fix a missing check of snd_pcm_lib_malloc_pages · 27657a6a
      Kangjie Lu authored
      commit 44fabd8c upstream.
      
      snd_pcm_lib_malloc_pages() may fail, so let's check its status and
      return its error code upstream.
      Signed-off-by: default avatarKangjie Lu <kjlu@umn.edu>
      Acked-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      27657a6a
    • Charles Yeh's avatar
      USB: serial: pl2303: add new PID to support PL2303TB · d1b8cba6
      Charles Yeh authored
      commit 4dcf9ddc upstream.
      
      Add new PID to support PL2303TB (TYPE_HX)
      Signed-off-by: default avatarCharles Yeh <charlesyeh522@gmail.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1b8cba6
    • Max Schulze's avatar
      USB: serial: simple: add Motorola Tetra TPG2200 device id · d9046ae6
      Max Schulze authored
      commit b81c2c33 upstream.
      
      Add new Motorola Tetra device id for Motorola Solutions TETRA PEI device
      
      T:  Bus=02 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=0cad ProdID=9016 Rev=24.16
      S:  Manufacturer=Motorola Solutions, Inc.
      S:  Product=TETRA PEI interface
      C:  #Ifs= 2 Cfg#= 1 Atr=80 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=usb_serial_simple
      I:  If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=usb_serial_simple
      Signed-off-by: default avatarMax Schulze <max.schulze@posteo.de>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9046ae6
    • Tomas Winkler's avatar
      mei: me: add denverton innovation engine device IDs · db818691
      Tomas Winkler authored
      commit f7ee8ead upstream.
      
      Add the Denverton innovation engine (IE) device ids.
      The IE is an ME-like device which provides HW security
      offloading.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db818691
    • Vijay Viswanath's avatar
      mmc: Kconfig: Enable CONFIG_MMC_SDHCI_IO_ACCESSORS · c4616619
      Vijay Viswanath authored
      commit 99d570da upstream.
      
      Enable CONFIG_MMC_SDHCI_IO_ACCESSORS so that SDHC controller specific
      register read and write APIs, if registered, can be used.
      Signed-off-by: default avatarVijay Viswanath <vviswana@codeaurora.org>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Cc: Koen Vandeputte <koen.vandeputte@ncentric.com>
      Cc: Loic Poulain <loic.poulain@linaro.org>
      Signed-off-by: default avatarGeorgi Djakov <georgi.djakov@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4616619
    • Paolo Abeni's avatar
      ipfrag: really prevent allocation on netns exit · 01c85b4b
      Paolo Abeni authored
      [ Upstream commit f6f2a4a2 ]
      
      Setting the low threshold to 0 has no effect on frags allocation,
      we need to clear high_thresh instead.
      
      The code was pre-existent to commit 648700f7 ("inet: frags:
      use rhashtables for reassembly units"), but before the above,
      such assignment had a different role: prevent concurrent eviction
      from the worker and the netns cleanup helper.
      
      Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01c85b4b
    • Willem de Bruijn's avatar
      tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state · ab668871
      Willem de Bruijn authored
      [ Upstream commit 13d7f463 ]
      
      TCP transmission with MSG_ZEROCOPY fails if the peer closes its end of
      the connection and so transitions this socket to CLOSE_WAIT state.
      
      Transmission in close wait state is acceptable. Other similar tests in
      the stack (e.g., in FastOpen) accept both states. Relax this test, too.
      
      Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg276886.html
      Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg227390.html
      Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
      Reported-by: default avatarMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      CC: Yuchung Cheng <ycheng@google.com>
      CC: Neal Cardwell <ncardwell@google.com>
      CC: Soheil Hassas Yeganeh <soheil@google.com>
      CC: Alexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ab668871
    • Ido Schimmel's avatar
      net: ipv4: Fix memory leak in network namespace dismantle · 0781b0f9
      Ido Schimmel authored
      [ Upstream commit f97f4dd8 ]
      
      IPv4 routing tables are flushed in two cases:
      
      1. In response to events in the netdev and inetaddr notification chains
      2. When a network namespace is being dismantled
      
      In both cases only routes associated with a dead nexthop group are
      flushed. However, a nexthop group will only be marked as dead in case it
      is populated with actual nexthops using a nexthop device. This is not
      the case when the route in question is an error route (e.g.,
      'blackhole', 'unreachable').
      
      Therefore, when a network namespace is being dismantled such routes are
      not flushed and leaked [1].
      
      To reproduce:
      # ip netns add blue
      # ip -n blue route add unreachable 192.0.2.0/24
      # ip netns del blue
      
      Fix this by not skipping error routes that are not marked with
      RTNH_F_DEAD when flushing the routing tables.
      
      To prevent the flushing of such routes in case #1, add a parameter to
      fib_table_flush() that indicates if the table is flushed as part of
      namespace dismantle or not.
      
      Note that this problem does not exist in IPv6 since error routes are
      associated with the loopback device.
      
      [1]
      unreferenced object 0xffff888066650338 (size 56):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff  ..........ba....
          e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00  ...d............
        backtrace:
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      unreferenced object 0xffff888061621c88 (size 48):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
          6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff  kkkkkkkk..&_....
        backtrace:
          [<00000000733609e3>] fib_table_insert+0x978/0x1500
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      
      Fixes: 8cced9ef ("[NETNS]: Enable routing configuration in non-initial namespace.")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0781b0f9
    • Jason Wang's avatar
      vhost: log dirty page correctly · 1981e4c9
      Jason Wang authored
      [ Upstream commit cc5e7107 ]
      
      Vhost dirty page logging API is designed to sync through GPA. But we
      try to log GIOVA when device IOTLB is enabled. This is wrong and may
      lead to missing data after migration.
      
      To solve this issue, when logging with device IOTLB enabled, we will:
      
      1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
         get HVA, for writable descriptor, get HVA through iovec. For used
         ring update, translate its GIOVA to HVA
      2) traverse the GPA->HVA mapping to get the possible GPA and log
         through GPA. Pay attention this reverse mapping is not guaranteed
         to be unique, so we should log each possible GPA in this case.
      
      This fix the failure of scp to guest during migration. In -next, we
      will probably support passing GIOVA->GPA instead of GIOVA->HVA.
      
      Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
      Reported-by: default avatarJintack Lim <jintack@cs.columbia.edu>
      Cc: Jintack Lim <jintack@cs.columbia.edu>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1981e4c9
    • Ross Lagerwall's avatar
      openvswitch: Avoid OOB read when parsing flow nlattrs · 520126ca
      Ross Lagerwall authored
      [ Upstream commit 04a4af33 ]
      
      For nested and variable attributes, the expected length of an attribute
      is not known and marked by a negative number.  This results in an OOB
      read when the expected length is later used to check if the attribute is
      all zeros. Fix this by using the actual length of the attribute rather
      than the expected length.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      520126ca
    • Cong Wang's avatar
      net_sched: refetch skb protocol for each filter · 6da1dfff
      Cong Wang authored
      [ Upstream commit cd0c4e70 ]
      
      Martin reported a set of filters don't work after changing
      from reclassify to continue. Looking into the code, it
      looks like skb protocol is not always fetched for each
      iteration of the filters. But, as demonstrated by Martin,
      TC actions could modify skb->protocol, for example act_vlan,
      this means we have to refetch skb protocol in each iteration,
      rather than using the one we fetch in the beginning of the loop.
      
      This bug is _not_ introduced by commit 3b3ae880
      ("net: sched: consolidate tc_classify{,_compat}"), technically,
      if act_vlan is the only action that modifies skb protocol, then
      it is commit c7e2b968 ("sched: introduce vlan action") which
      introduced this bug.
      Reported-by: default avatarMartin Olsson <martin.olsson+netdev@sentorsecurity.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6da1dfff
    • Thomas Petazzoni's avatar
      net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling · 6bd069b5
      Thomas Petazzoni authored
      [ Upstream commit e40e2a2e ]
      
      The current code in __mdiobus_register() doesn't properly handle
      failures returned by the devm_gpiod_get_optional() call: it returns
      immediately, without unregistering the device that was added by the
      call to device_register() earlier in the function.
      
      This leaves a stale device, which then causes a NULL pointer
      dereference in the code that handles deferred probing:
      
      [    1.489982] Unable to handle kernel NULL pointer dereference at virtual address 00000074
      [    1.498110] pgd = (ptrval)
      [    1.500838] [00000074] *pgd=00000000
      [    1.504432] Internal error: Oops: 17 [#1] SMP ARM
      [    1.509133] Modules linked in:
      [    1.512192] CPU: 1 PID: 51 Comm: kworker/1:3 Not tainted 4.20.0-00039-g3b73a4cc8b3e-dirty #99
      [    1.520708] Hardware name: Xilinx Zynq Platform
      [    1.525261] Workqueue: events deferred_probe_work_func
      [    1.530403] PC is at klist_next+0x10/0xfc
      [    1.534403] LR is at device_for_each_child+0x40/0x94
      [    1.539361] pc : [<c0683fbc>]    lr : [<c0455d90>]    psr: 200e0013
      [    1.545628] sp : ceeefe68  ip : 00000001  fp : ffffe000
      [    1.550863] r10: 00000000  r9 : c0c66790  r8 : 00000000
      [    1.556079] r7 : c0457d44  r6 : 00000000  r5 : ceeefe8c  r4 : cfa2ec78
      [    1.562604] r3 : 00000064  r2 : c0457d44  r1 : ceeefe8c  r0 : 00000064
      [    1.569129] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [    1.576263] Control: 18c5387d  Table: 0ed7804a  DAC: 00000051
      [    1.582013] Process kworker/1:3 (pid: 51, stack limit = 0x(ptrval))
      [    1.588280] Stack: (0xceeefe68 to 0xceef0000)
      [    1.592630] fe60:                   cfa2ec78 c0c03c08 00000000 c0457d44 00000000 c0c66790
      [    1.600814] fe80: 00000000 c0455d90 ceeefeac 00000064 00000000 0d7a542e cee9d494 cfa2ec78
      [    1.608998] fea0: cfa2ec78 00000000 c0457d44 c0457d7c cee9d494 c0c03c08 00000000 c0455dac
      [    1.617182] fec0: cf98ba44 cf926a00 cee9d494 0d7a542e 00000000 cf935a10 cf935a10 cf935a10
      [    1.625366] fee0: c0c4e9b8 c0457d7c c0c4e80c 00000001 cf935a10 c0457df4 cf935a10 c0c4e99c
      [    1.633550] ff00: c0c4e99c c045a27c c0c4e9c4 ced63f80 cfde8a80 cfdebc00 00000000 c013893c
      [    1.641734] ff20: cfde8a80 cfde8a80 c07bd354 ced63f80 ced63f94 cfde8a80 00000008 c0c02d00
      [    1.649936] ff40: cfde8a98 cfde8a80 ffffe000 c0139a30 ffffe000 c0c6624a c07bd354 00000000
      [    1.658120] ff60: ffffe000 cee9e780 ceebfe00 00000000 ceeee000 ced63f80 c0139788 cf8cdea4
      [    1.666304] ff80: cee9e79c c013e598 00000001 ceebfe00 c013e44c 00000000 00000000 00000000
      [    1.674488] ffa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
      [    1.682671] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    1.690855] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
      [    1.699058] [<c0683fbc>] (klist_next) from [<c0455d90>] (device_for_each_child+0x40/0x94)
      [    1.707241] [<c0455d90>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
      [    1.716476] [<c0457d7c>] (device_reorder_to_tail) from [<c0455dac>] (device_for_each_child+0x5c/0x94)
      [    1.725692] [<c0455dac>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
      [    1.734927] [<c0457d7c>] (device_reorder_to_tail) from [<c0457df4>] (device_pm_move_to_tail+0x28/0x40)
      [    1.744235] [<c0457df4>] (device_pm_move_to_tail) from [<c045a27c>] (deferred_probe_work_func+0x58/0x8c)
      [    1.753746] [<c045a27c>] (deferred_probe_work_func) from [<c013893c>] (process_one_work+0x210/0x4fc)
      [    1.762888] [<c013893c>] (process_one_work) from [<c0139a30>] (worker_thread+0x2a8/0x5c0)
      [    1.771072] [<c0139a30>] (worker_thread) from [<c013e598>] (kthread+0x14c/0x154)
      [    1.778482] [<c013e598>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
      [    1.785689] Exception stack(0xceeeffb0 to 0xceeefff8)
      [    1.790739] ffa0:                                     00000000 00000000 00000000 00000000
      [    1.798923] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    1.807107] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
      [    1.813724] Code: e92d47f0 e1a05000 e8900048 e1a00003 (e5937010)
      [    1.819844] ---[ end trace 3c2c0c8b65399ec9 ]---
      
      The actual error that we had from devm_gpiod_get_optional() was
      -EPROBE_DEFER, due to the GPIO being provided by a driver that is
      probed later than the Ethernet controller driver.
      
      To fix this, we simply add the missing device_del() invocation in the
      error path.
      
      Fixes: 69226896 ("mdio_bus: Issue GPIO RESET to PHYs")
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@bootlin.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bd069b5
    • Ross Lagerwall's avatar
      net: Fix usage of pskb_trim_rcsum · 66a011d1
      Ross Lagerwall authored
      [ Upstream commit 6c57f045 ]
      
      In certain cases, pskb_trim_rcsum() may change skb pointers.
      Reinitialize header pointers afterwards to avoid potential
      use-after-frees. Add a note in the documentation of
      pskb_trim_rcsum(). Found by KASAN.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      66a011d1
    • Yunjian Wang's avatar
      net: bridge: Fix ethernet header pointer before check skb forwardable · 1ae7a7cb
      Yunjian Wang authored
      [ Upstream commit 28c1382f ]
      
      The skb header should be set to ethernet header before using
      is_skb_forwardable. Because the ethernet header length has been
      considered in is_skb_forwardable(including dev->hard_header_len
      length).
      
      To reproduce the issue:
      1, add 2 ports on linux bridge br using following commands:
      $ brctl addbr br
      $ brctl addif br eth0
      $ brctl addif br eth1
      2, the MTU of eth0 and eth1 is 1500
      3, send a packet(Data 1480, UDP 8, IP 20, Ethernet 14, VLAN 4)
      from eth0 to eth1
      
      So the expect result is packet larger than 1500 cannot pass through
      eth0 and eth1. But currently, the packet passes through success, it
      means eth1's MTU limit doesn't take effect.
      
      Fixes: f6367b46 ("bridge: use is_skb_forwardable in forward path")
      Cc: bridge@lists.linux-foundation.org
      Cc: Nkolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ae7a7cb
    • Lendacky, Thomas's avatar
      amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs · 99afa192
      Lendacky, Thomas authored
      [ Upstream commit 5ab3121b ]
      
      The XGBE hardware has support for performing MDIO operations using an
      MDIO command request. The driver mistakenly uses the mdio port address
      as the MDIO command request device address instead of the MDIO command
      request port address. Additionally, the driver does not properly check
      for and create a clause 45 MDIO command.
      
      Check the supplied MDIO register to determine if the request is a clause
      45 operation (MII_ADDR_C45). For a clause 45 operation, extract the device
      address and register number from the supplied MDIO register and use them
      to set the MDIO command request device address and register number fields.
      For a clause 22 operation, the MDIO request device address is set to zero
      and the MDIO command request register number is set to the supplied MDIO
      register. In either case, the supplied MDIO port address is used as the
      MDIO command request port address.
      
      Fixes: 732f2ab7 ("amd-xgbe: Add support for MDIO attached PHYs")
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Tested-by: default avatarShyam Sundar S K <Shyam-sundar.S-k@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99afa192
  2. 26 Jan, 2019 22 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.14.96 · e6608e1f
      Greg Kroah-Hartman authored
      e6608e1f
    • Corey Minyard's avatar
      ipmi:ssif: Fix handling of multi-part return messages · 7c307d32
      Corey Minyard authored
      commit 7d6380cd upstream.
      
      The block number was not being compared right, it was off by one
      when checking the response.
      
      Some statistics wouldn't be incremented properly in some cases.
      
      Check to see if that middle-part messages always have 31 bytes of
      data.
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Cc: stable@vger.kernel.org # 4.4
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c307d32
    • Marc Zyngier's avatar
      PCI: dwc: Move interrupt acking into the proper callback · 413cb66b
      Marc Zyngier authored
      commit 3f7bb2ec upstream.
      
      The write to the status register is really an ACK for the HW,
      and should be treated as such by the driver. Let's move it to the
      irq_ack() callback, which will prevent people from moving it around
      in order to paper over other bugs.
      
      Fixes: 8c934095 ("PCI: dwc: Clear MSI interrupt status after it is handled,
      not before")
      Fixes: 7c5925af ("PCI: dwc: Move MSI IRQs allocation to IRQ domains
      hierarchical API")
      Link: https://lore.kernel.org/linux-pci/20181113225734.8026-1-marc.zyngier@arm.com/Reported-by: default avatarTrent Piepho <tpiepho@impinj.com>
      Tested-by: default avatarNiklas Cassel <niklas.cassel@linaro.org>
      Tested-by: default avatarGustavo Pimentel <gustavo.pimentel@synopsys.com>
      Tested-by: default avatarStanimir Varbanov <svarbanov@mm-sol.com>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      [lorenzo.pieralisi@arm.com: updated commit log]
      Signed-off-by: default avatarLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      413cb66b
    • Zhenyu Wang's avatar
      drm/i915/gvt: Fix mmap range check · e89ec9b9
      Zhenyu Wang authored
      commit 51b00d85 upstream.
      
      This is to fix missed mmap range check on vGPU bar2 region
      and only allow to map vGPU allocated GMADDR range, which means
      user space should support sparse mmap to get proper offset for
      mmap vGPU aperture. And this takes care of actual pgoff in mmap
      request as original code always does from beginning of vGPU
      aperture.
      
      Fixes: 659643f7 ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT")
      Cc: "Monroy, Rodrigo Axel" <rodrigo.axel.monroy@intel.com>
      Cc: "Orrala Contreras, Alfredo" <alfredo.orrala.contreras@intel.com>
      Cc: stable@vger.kernel.org # v4.10+
      Reviewed-by: default avatarHang Yuan <hang.yuan@intel.com>
      Signed-off-by: default avatarZhenyu Wang <zhenyuw@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e89ec9b9
    • Steve French's avatar
      cifs: allow disabling insecure dialects in the config · bfbd7e95
      Steve French authored
      commit 7420451f upstream.
      
      allow disabling cifs (SMB1 ie vers=1.0) and vers=2.0 in the
      config for the build of cifs.ko if want to always prevent mounting
      with these less secure dialects.
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      Reviewed-by: default avatarAurelien Aptel <aaptel@suse.com>
      Reviewed-by: default avatarJeremy Allison <jra@samba.org>
      Cc: Alakesh Haloi <alakeshh@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfbd7e95
    • Corey Minyard's avatar
      ipmi:pci: Blacklist a Realtek "IPMI" device · de616eb2
      Corey Minyard authored
      commit bc48fa1b upstream.
      
      Realtek has some sort of "Virtual" IPMI device on the PCI bus as a
      KCS controller, but whatever it is, it's not one.  Ignore it if seen.
      
      [ Commit 13d0b35c (ipmi_si: Move PCI setup to another file) from Linux
        4.15-rc1 has not been back ported, so the PCI code is still in
        `drivers/char/ipmi/ipmi_si_intf.c`, requiring to apply the commit
        manually.
      
        This fixes a 100 s boot delay on the HP EliteDesk 705 G4 MT with Linux
        4.14.94. ]
      Reported-by: default avatarChris Chiu <chiu@endlessm.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Tested-by: default avatarDaniel Drake <drake@endlessm.com>
      Signed-off-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de616eb2
    • Scott Mayhew's avatar
      nfs: fix a deadlock in nfs client initialization · 53818c76
      Scott Mayhew authored
      commit c156618e upstream.
      
      The following deadlock can occur between a process waiting for a client
      to initialize in while walking the client list during nfsv4 server trunking
      detection and another process waiting for the nfs_clid_init_mutex so it
      can initialize that client:
      
      Process 1                               Process 2
      ---------                               ---------
      spin_lock(&nn->nfs_client_lock);
      list_add_tail(&CLIENTA->cl_share_link,
              &nn->nfs_client_list);
      spin_unlock(&nn->nfs_client_lock);
                                              spin_lock(&nn->nfs_client_lock);
                                              list_add_tail(&CLIENTB->cl_share_link,
                                                      &nn->nfs_client_list);
                                              spin_unlock(&nn->nfs_client_lock);
                                              mutex_lock(&nfs_clid_init_mutex);
                                              nfs41_walk_client_list(clp, result, cred);
                                              nfs_wait_client_init_complete(CLIENTA);
      (waiting for nfs_clid_init_mutex)
      
      Make sure nfs_match_client() only evaluates clients that have completed
      initialization in order to prevent that deadlock.
      
      This patch also fixes v4.0 trunking behavior by not marking the client
      NFS_CS_READY until the clientid has been confirmed.
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarQian Lu <luqia@amazon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      53818c76
    • Michal Hocko's avatar
      mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps · 696ce77b
      Michal Hocko authored
      [ Upstream commit 7550c607 ]
      
      Patch series "THP eligibility reporting via proc".
      
      This series of three patches aims at making THP eligibility reporting much
      more robust and long term sustainable.  The trigger for the change is a
      regression report [2] and the long follow up discussion.  In short the
      specific application didn't have good API to query whether a particular
      mapping can be backed by THP so it has used VMA flags to workaround that.
      These flags represent a deep internal state of VMAs and as such they
      should be used by userspace with a great deal of caution.
      
      A similar has happened for [3] when users complained that VM_MIXEDMAP is
      no longer set on DAX mappings.  Again a lack of a proper API led to an
      abuse.
      
      The first patch in the series tries to emphasise that that the semantic of
      flags might change and any application consuming those should be really
      careful.
      
      The remaining two patches provide a more suitable interface to address [2]
      and provide a consistent API to query the THP status both for each VMA and
      process wide as well.  [1]
      
      http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2]
      http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
      [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
      
      This patch (of 3):
      
      Even though vma flags exported via /proc/<pid>/smaps are explicitly
      documented to be not guaranteed for future compatibility the warning
      doesn't go far enough because it doesn't mention semantic changes to those
      flags.  And they are important as well because these flags are a deep
      implementation internal to the MM code and the semantic might change at
      any time.
      
      Let's consider two recent examples:
      http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
      : commit e1fb4a08 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
      : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
      : mean time certain customer of ours started poking into /proc/<pid>/smaps
      : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
      : flags, the application just fails to start complaining that DAX support is
      : missing in the kernel.
      
      http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
      : Commit 18600332 ("mm: make PR_SET_THP_DISABLE immediately active")
      : introduced a regression in that userspace cannot always determine the set
      : of vmas where thp is ineligible.
      : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
      : to determine if a vma is eligible to be backed by hugepages.
      : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
      : be disabled and emit "nh" as a flag for the corresponding vmas as part of
      : /proc/pid/smaps.  After the commit, thp is disabled by means of an mm
      : flag and "nh" is not emitted.
      : This causes smaps parsing libraries to assume a vma is eligible for thp
      : and ends up puzzling the user on why its memory is not backed by thp.
      
      In both cases userspace was relying on a semantic of a specific VMA flag.
      The primary reason why that happened is a lack of a proper interface.
      While this has been worked on and it will be fixed properly, it seems that
      our wording could see some refinement and be more vocal about semantic
      aspect of these flags as well.
      
      Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Oppenheimer <bepvte@gmail.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      696ce77b
    • Aaron Lu's avatar
      mm/swap: use nr_node_ids for avail_lists in swap_info_struct · 4fb12a08
      Aaron Lu authored
      [ Upstream commit 66f71da9 ]
      
      Since a2468cc9 ("swap: choose swap device according to numa node"),
      avail_lists field of swap_info_struct is changed to an array with
      MAX_NUMNODES elements.  This made swap_info_struct size increased to 40KiB
      and needs an order-4 page to hold it.
      
      This is not optimal in that:
      1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
        is a waste of memory;
      2 It could cause swapon failure if the swap device is swapped on
        after system has been running for a while, due to no order-4
        page is available as pointed out by Vasily Averin.
      
      Solve the above two issues by using nr_node_ids(which is the actual
      possible node number the running system has) for avail_lists instead of
      MAX_NUMNODES.
      
      nr_node_ids is unknown at compile time so can't be directly used when
      declaring this array.  What I did here is to declare avail_lists as zero
      element array and allocate space for it when allocating space for
      swap_info_struct.  The reason why keep using array but not pointer is
      plist_for_each_entry needs the field to be part of the struct, so pointer
      will not work.
      
      This patch is on top of Vasily Averin's fix commit.  I think the use of
      kvzalloc for swap_info_struct is still needed in case nr_node_ids is
      really big on some systems.
      
      Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.comSigned-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4fb12a08
    • Brian Foster's avatar
      mm/page-writeback.c: don't break integrity writeback on ->writepage() error · 694c20fe
      Brian Foster authored
      [ Upstream commit 3fa750dc ]
      
      write_cache_pages() is used in both background and integrity writeback
      scenarios by various filesystems.  Background writeback is mostly
      concerned with cleaning a certain number of dirty pages based on various
      mm heuristics.  It may not write the full set of dirty pages or wait for
      I/O to complete.  Integrity writeback is responsible for persisting a set
      of dirty pages before the writeback job completes.  For example, an
      fsync() call must perform integrity writeback to ensure data is on disk
      before the call returns.
      
      write_cache_pages() unconditionally breaks out of its processing loop in
      the event of a ->writepage() error.  This is fine for background
      writeback, which had no strict requirements and will eventually come
      around again.  This can cause problems for integrity writeback on
      filesystems that might need to clean up state associated with failed page
      writeouts.  For example, XFS performs internal delayed allocation
      accounting before returning a ->writepage() error, where applicable.  If
      the current writeback happens to be associated with an unmount and
      write_cache_pages() completes the writeback prematurely due to error, the
      filesystem is unmounted in an inconsistent state if dirty+delalloc pages
      still exist.
      
      To handle this problem, update write_cache_pages() to always process the
      full set of pages for integrity writeback regardless of ->writepage()
      errors.  Save the first encountered error and return it to the caller once
      complete.  This facilitates XFS (or any other fs that expects integrity
      writeback to process the entire set of dirty pages) to clean up its
      internal state completely in the event of persistent mapping errors.
      Background writeback continues to exit on the first error encountered.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.comSigned-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      694c20fe
    • Junxiao Bi's avatar
      ocfs2: fix panic due to unrecovered local alloc · f976e59e
      Junxiao Bi authored
      [ Upstream commit 532e1e54 ]
      
      mount.ocfs2 ignore the inconsistent error that journal is clean but
      local alloc is unrecovered.  After mount, local alloc not empty, then
      reserver cluster didn't alloc a new local alloc window, reserveration
      map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the
      following panic.
      
      This issue was reported at
      
        https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html
      
      and was advised to fixed during mount.  But this is a very unusual
      inconsistent state, usually journal dirty flag should be cleared at the
      last stage of umount until every other things go right.  We may need do
      further debug to check that.  Any way to avoid possible futher
      corruption, mount should be abort and fsck should be run.
      
        (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered!
        found = 6518, set = 6518, taken = 8192, off = 15912372
        ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode.
        o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes
        ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode.
        o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device
        o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777
        o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
        o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
        ------------[ cut here ]------------
        kernel BUG at fs/ocfs2/reservations.c:507!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod
        CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2
        Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018
        task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000
        RIP: 0010:[<ffffffffa05e96a8>]  [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
        Call Trace:
          ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2]
          ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2]
          __ocfs2_claim_clusters+0x178/0x360 [ocfs2]
          ocfs2_claim_clusters+0x1f/0x30 [ocfs2]
          ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2]
          ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2]
          ocfs2_write_begin+0x13e/0x230 [ocfs2]
          generic_perform_write+0xbf/0x1c0
          __generic_file_write_iter+0x19c/0x1d0
          ocfs2_file_write_iter+0x589/0x1360 [ocfs2]
          __vfs_write+0xb8/0x110
          vfs_write+0xa9/0x1b0
          SyS_write+0x46/0xb0
          system_call_fastpath+0x18/0xd7
        Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85
        RIP   __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
         RSP <ffff8800ea4db668>
        ---[ end trace 566f07529f2edf3c ]---
        Kernel panic - not syncing: Fatal exception
        Kernel Offset: disabled
      
      Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Acked-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f976e59e
    • Qian Cai's avatar
      scsi: megaraid: fix out-of-bound array accesses · c010d494
      Qian Cai authored
      [ Upstream commit c7a082e4 ]
      
      UBSAN reported those with MegaRAID SAS-3 3108,
      
      [   77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
      [   77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
      [   77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1
      [   77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
      [   77.495791] Workqueue: events work_for_cpu_fn
      [   77.500154] Call trace:
      [   77.502610]  dump_backtrace+0x0/0x2c8
      [   77.506279]  show_stack+0x24/0x30
      [   77.509604]  dump_stack+0x118/0x19c
      [   77.513098]  ubsan_epilogue+0x14/0x60
      [   77.516765]  __ubsan_handle_out_of_bounds+0xfc/0x13c
      [   77.521767]  mr_update_load_balance_params+0x150/0x158 [megaraid_sas]
      [   77.528230]  MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas]
      [   77.533825]  megasas_get_map_info+0x244/0x2f0 [megaraid_sas]
      [   77.539505]  megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas]
      [   77.545794]  megasas_init_fw+0x1ab4/0x3518 [megaraid_sas]
      [   77.551212]  megasas_probe_one+0x2c4/0xbe0 [megaraid_sas]
      [   77.556614]  local_pci_probe+0x7c/0xf0
      [   77.560365]  work_for_cpu_fn+0x34/0x50
      [   77.564118]  process_one_work+0x61c/0xf08
      [   77.568129]  worker_thread+0x534/0xa70
      [   77.571882]  kthread+0x1c8/0x1d0
      [   77.575114]  ret_from_fork+0x10/0x1c
      
      [   89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
      [   89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
      [   89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1
      [   89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
      [   89.268903] Workqueue: events_unbound async_run_entry_fn
      [   89.274222] Call trace:
      [   89.276680]  dump_backtrace+0x0/0x2c8
      [   89.280348]  show_stack+0x24/0x30
      [   89.283671]  dump_stack+0x118/0x19c
      [   89.287167]  ubsan_epilogue+0x14/0x60
      [   89.290835]  __ubsan_handle_out_of_bounds+0xfc/0x13c
      [   89.295828]  MR_LdRaidGet+0x50/0x58 [megaraid_sas]
      [   89.300638]  megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas]
      [   89.306576]  megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas]
      [   89.313468]  megasas_queue_command+0x398/0x3d0 [megaraid_sas]
      [   89.319222]  scsi_dispatch_cmd+0x1dc/0x8a8
      [   89.323321]  scsi_request_fn+0x8e8/0xdd0
      [   89.327249]  __blk_run_queue+0xc4/0x158
      [   89.331090]  blk_execute_rq_nowait+0xf4/0x158
      [   89.335449]  blk_execute_rq+0xdc/0x158
      [   89.339202]  __scsi_execute+0x130/0x258
      [   89.343041]  scsi_probe_and_add_lun+0x2fc/0x1488
      [   89.347661]  __scsi_scan_target+0x1cc/0x8c8
      [   89.351848]  scsi_scan_channel.part.3+0x8c/0xc0
      [   89.356382]  scsi_scan_host_selected+0x130/0x1f0
      [   89.361002]  do_scsi_scan_host+0xd8/0xf0
      [   89.364927]  do_scan_async+0x9c/0x320
      [   89.368594]  async_run_entry_fn+0x138/0x420
      [   89.372780]  process_one_work+0x61c/0xf08
      [   89.376793]  worker_thread+0x13c/0xa70
      [   89.380546]  kthread+0x1c8/0x1d0
      [   89.383778]  ret_from_fork+0x10/0x1c
      
      This is because when populating Driver Map using firmware raid map, all
      non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later.
      
      From drivers/scsi/megaraid/megaraid_sas_base.c ,
      memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);
      
      From drivers/scsi/megaraid/megaraid_sas_fp.c ,
      /* For non existing VDs, iterate to next VD*/
      if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1))
      	continue;
      
      However, there are a few places that failed to skip those non-existing VDs
      due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff,
      map) and triggered the out-of-bound accesses.
      
      Fixes: 51087a86 ("megaraid_sas : Extended VD support")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c010d494
    • Yanjiang Jin's avatar
      scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown() · 008b4a70
      Yanjiang Jin authored
      [ Upstream commit e57b2945 ]
      
      We must free all irqs during shutdown, else kexec's 2nd kernel would hang
      in pqi_wait_for_completion_io() as below:
      
      Call trace:
      
       pqi_wait_for_completion_io
       pqi_submit_raid_request_synchronous.constprop.78+0x23c/0x310 [smartpqi]
       pqi_configure_events+0xec/0x1f8 [smartpqi]
       pqi_ctrl_init+0x814/0xca0 [smartpqi]
       pqi_pci_probe+0x400/0x46c [smartpqi]
       local_pci_probe+0x48/0xb0
       pci_device_probe+0x14c/0x1b0
       really_probe+0x218/0x3fc
       driver_probe_device+0x70/0x140
       __driver_attach+0x11c/0x134
       bus_for_each_dev+0x70/0xc8
       driver_attach+0x30/0x38
       bus_add_driver+0x1f0/0x294
       driver_register+0x74/0x12c
       __pci_register_driver+0x64/0x70
       pqi_init+0xd0/0x10000 [smartpqi]
       do_one_initcall+0x60/0x1d8
       do_init_module+0x64/0x1f8
       load_module+0x10ec/0x1350
       __se_sys_finit_module+0xd4/0x100
       __arm64_sys_finit_module+0x28/0x34
       el0_svc_handler+0x104/0x160
       el0_svc+0x8/0xc
      
      This happens only in the following combinations:
      
      1. smartpqi is built as module, not built-in;
      2. We have a disk connected to smartpqi card;
      3. Both kexec's 1st and 2nd kernels use this disk as Rootfs' mount point.
      Signed-off-by: default avatarYanjiang Jin <yanjiang.jin@hxt-semitech.com>
      Acked-by: default avatarDon Brace <don.brace@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      008b4a70
    • Kevin Barnett's avatar
      scsi: smartpqi: correct lun reset issues · ceecd0c6
      Kevin Barnett authored
      [ Upstream commit 2ba55c98 ]
      
      Problem:
      The Linux kernel takes a logical volume offline after a LUN reset.  This is
      generally accompanied by this message in the dmesg output:
      
      Device offlined - not ready after error recovery
      
      Root Cause:
      The root cause is a "quirk" in the timeout handling in the Linux SCSI
      layer. The Linux kernel places a 30-second timeout on most media access
      commands (reads and writes) that it send to device drivers.  When a media
      access command times out, the Linux kernel goes into error recovery mode
      for the LUN that was the target of the command that timed out. Every
      command that timed out is kept on a list inside of the Linux kernel to be
      retried later. The kernel attempts to recover the command(s) that timed out
      by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and
      TEST UNIT READY commands are successful, the kernel retries the command(s)
      that timed out.
      
      Each SCSI command issued by the kernel has a result field associated with
      it. This field indicates the final result of the command (success or
      error). When a command times out, the kernel places a value in this result
      field indicating that the command timed out.
      
      The "quirk" is that after the LUN reset and TEST UNIT READY commands are
      completed, the kernel checks each command on the timed-out command list
      before retrying it. If the result field is still "timed out", the kernel
      treats that command as not having been successfully recovered for a
      retry. If the number of commands that are in this state are greater than
      two, the kernel takes the LUN offline.
      
      Fix:
      When our RAIDStack receives a LUN reset, it simply waits until all
      outstanding commands complete. Generally, all of these outstanding commands
      complete successfully. Therefore, the fix in the smartpqi driver is to
      always set the command result field to indicate success when a request
      completes successfully. This normally isn’t necessary because the result
      field is always initialized to success when the command is submitted to the
      driver. So when the command completes successfully, the result field is
      left untouched. But in this case, the kernel changes the result field
      behind the driver’s back and then expects the field to be changed by the
      driver as the commands that timed-out complete.
      Reviewed-by: default avatarDave Carroll <david.carroll@microsemi.com>
      Reviewed-by: default avatarScott Teel <scott.teel@microsemi.com>
      Signed-off-by: default avatarKevin Barnett <kevin.barnett@microsemi.com>
      Signed-off-by: default avatarDon Brace <don.brace@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ceecd0c6
    • Parvi Kaustubhi's avatar
      IB/usnic: Fix potential deadlock · 9d3e1c09
      Parvi Kaustubhi authored
      [ Upstream commit 8036e90f ]
      
      Acquiring the rtnl lock while holding usdev_lock could result in a
      deadlock.
      
      For example:
      
      usnic_ib_query_port()
      | mutex_lock(&us_ibdev->usdev_lock)
       | ib_get_eth_speed()
        | rtnl_lock()
      
      rtnl_lock()
      | usnic_ib_netdevice_event()
       | mutex_lock(&us_ibdev->usdev_lock)
      
      This commit moves the usdev_lock acquisition after the rtnl lock has been
      released.
      
      This is safe to do because usdev_lock is not protecting anything being
      accessed in ib_get_eth_speed(). Hence, the correct order of holding locks
      (rtnl -> usdev_lock) is not violated.
      Signed-off-by: default avatarParvi Kaustubhi <pkaustub@cisco.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9d3e1c09
    • Daniel Vetter's avatar
      sysfs: Disable lockdep for driver bind/unbind files · d9c25a6f
      Daniel Vetter authored
      [ Upstream commit 4f4b3743 ]
      
      This is the much more correct fix for my earlier attempt at:
      
      https://lkml.org/lkml/2018/12/10/118
      
      Short recap:
      
      - There's not actually a locking issue, it's just lockdep being a bit
        too eager to complain about a possible deadlock.
      
      - Contrary to what I claimed the real problem is recursion on
        kn->count. Greg pointed me at sysfs_break_active_protection(), used
        by the scsi subsystem to allow a sysfs file to unbind itself. That
        would be a real deadlock, which isn't what's happening here. Also,
        breaking the active protection means we'd need to manually handle
        all the lifetime fun.
      
      - With Rafael we discussed the task_work approach, which kinda works,
        but has two downsides: It's a functional change for a lockdep
        annotation issue, and it won't work for the bind file (which needs
        to get the errno from the driver load function back to userspace).
      
      - Greg also asked why this never showed up: To hit this you need to
        unregister a 2nd driver from the unload code of your first driver. I
        guess only gpus do that. The bug has always been there, but only
        with a recent patch series did we add more locks so that lockdep
        built a chain from unbinding the snd-hda driver to the
        acpi_video_unregister call.
      
      Full lockdep splat:
      
      [12301.898799] ============================================
      [12301.898805] WARNING: possible recursive locking detected
      [12301.898811] 4.20.0-rc7+ #84 Not tainted
      [12301.898815] --------------------------------------------
      [12301.898821] bash/5297 is trying to acquire lock:
      [12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80
      [12301.898841] but task is already holding lock:
      [12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
      [12301.898856] other info that might help us debug this:
      [12301.898862]  Possible unsafe locking scenario:
      [12301.898867]        CPU0
      [12301.898870]        ----
      [12301.898874]   lock(kn->count#39);
      [12301.898879]   lock(kn->count#39);
      [12301.898883] *** DEADLOCK ***
      [12301.898891]  May be due to missing lock nesting notation
      [12301.898899] 5 locks held by bash/5297:
      [12301.898903]  #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0
      [12301.898915]  #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190
      [12301.898925]  #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190
      [12301.898936]  #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240
      [12301.898950]  #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40
      [12301.898960] stack backtrace:
      [12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84
      [12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011
      [12301.898982] Call Trace:
      [12301.898989]  dump_stack+0x67/0x9b
      [12301.898997]  __lock_acquire+0x6ad/0x1410
      [12301.899003]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899010]  ? find_held_lock+0x2d/0x90
      [12301.899017]  ? mutex_spin_on_owner+0xe4/0x150
      [12301.899023]  ? find_held_lock+0x2d/0x90
      [12301.899030]  ? lock_acquire+0x90/0x180
      [12301.899036]  lock_acquire+0x90/0x180
      [12301.899042]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899049]  __kernfs_remove+0x296/0x310
      [12301.899055]  ? kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899060]  ? kernfs_name_hash+0xd/0x80
      [12301.899066]  ? kernfs_find_ns+0x6c/0x100
      [12301.899073]  kernfs_remove_by_name_ns+0x3b/0x80
      [12301.899080]  bus_remove_driver+0x92/0xa0
      [12301.899085]  acpi_video_unregister+0x24/0x40
      [12301.899127]  i915_driver_unload+0x42/0x130 [i915]
      [12301.899160]  i915_pci_remove+0x19/0x30 [i915]
      [12301.899169]  pci_device_remove+0x36/0xb0
      [12301.899176]  device_release_driver_internal+0x185/0x240
      [12301.899183]  unbind_store+0xaf/0x180
      [12301.899189]  kernfs_fop_write+0x104/0x190
      [12301.899195]  __vfs_write+0x31/0x180
      [12301.899203]  ? rcu_read_lock_sched_held+0x6f/0x80
      [12301.899209]  ? rcu_sync_lockdep_assert+0x29/0x50
      [12301.899216]  ? __sb_start_write+0x13c/0x1a0
      [12301.899221]  ? vfs_write+0x17f/0x1b0
      [12301.899227]  vfs_write+0xb9/0x1b0
      [12301.899233]  ksys_write+0x50/0xc0
      [12301.899239]  do_syscall_64+0x4b/0x180
      [12301.899247]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [12301.899253] RIP: 0033:0x7f452ac7f7a4
      [12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83
      [12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4
      [12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001
      [12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730
      [12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d
      [12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d
      
      Looking around I've noticed that usb and i2c already handle similar
      recursion problems, where a sysfs file can unbind the same type of
      sysfs somewhere else in the hierarchy. Relevant commits are:
      
      commit 356c05d5
      Author: Alan Stern <stern@rowland.harvard.edu>
      Date:   Mon May 14 13:30:03 2012 -0400
      
          sysfs: get rid of some lockdep false positives
      
      commit e9b526fe
      Author: Alexander Sverdlin <alexander.sverdlin@nsn.com>
      Date:   Fri May 17 14:56:35 2013 +0200
      
          i2c: suppress lockdep warning on delete_device
      
      Implement the same trick for driver bind/unbind.
      
      v2: Put the macro into bus.c (Greg).
      Reviewed-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Ramalingam C <ramalingam.c@intel.com>
      Cc: Arend van Spriel <aspriel@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Bartosz Golaszewski <brgl@bgdev.pl>
      Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com>
      Cc: Vivek Gautam <vivek.gautam@codeaurora.org>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d9c25a6f
    • Takashi Sakamoto's avatar
      ALSA: bebob: fix model-id of unit for Apogee Ensemble · 7afd8e3b
      Takashi Sakamoto authored
      [ Upstream commit 644b2e97 ]
      
      This commit fixes hard-coded model-id for an unit of Apogee Ensemble with
      a correct value. This unit uses DM1500 ASIC produced ArchWave AG (formerly
      known as BridgeCo AG).
      
      I note that this model supports three modes in the number of data channels
      in tx/rx streams; 8 ch pairs, 10 ch pairs, 18 ch pairs. The mode is
      switched by Vendor-dependent AV/C command, like:
      
      $ cd linux-firewire-utils
      $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0600000000 (8ch pairs)
      $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0601000000 (10ch pairs)
      $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0602000000 (18ch pairs)
      
      When switching between different mode, the unit disappears from IEEE 1394
      bus, then appears on the bus with different combination of stream formats.
      In a mode of 18 ch pairs, available sampling rate is up to 96.0 kHz, else
      up to 192.0 kHz.
      
      $ ./hinawa-config-rom-printer /dev/fw1
      { 'bus-info': { 'adj': False,
                      'bmc': True,
                      'chip_ID': 21474898341,
                      'cmc': True,
                      'cyc_clk_acc': 100,
                      'generation': 2,
                      'imc': True,
                      'isc': True,
                      'link_spd': 2,
                      'max_ROM': 1,
                      'max_rec': 512,
                      'name': '1394',
                      'node_vendor_ID': 987,
                      'pmc': False},
        'root-directory': [ ['HARDWARE_VERSION', 19],
                            [ 'NODE_CAPABILITIES',
                              { 'addressing': {'64': True, 'fix': True, 'prv': False},
                                'misc': {'int': False, 'ms': False, 'spt': True},
                                'state': { 'atn': False,
                                           'ded': False,
                                           'drq': True,
                                           'elo': False,
                                           'init': False,
                                           'lst': True,
                                           'off': False},
                                'testing': {'bas': False, 'ext': False}}],
                            ['VENDOR', 987],
                            ['DESCRIPTOR', 'Apogee Electronics'],
                            ['MODEL', 126702],
                            ['DESCRIPTOR', 'Ensemble'],
                            ['VERSION', 5297],
                            [ 'UNIT',
                              [ ['SPECIFIER_ID', 41005],
                                ['VERSION', 65537],
                                ['MODEL', 126702],
                                ['DESCRIPTOR', 'Ensemble']]],
                            [ 'DEPENDENT_INFO',
                              [ ['SPECIFIER_ID', 2037],
                                ['VERSION', 1],
                                [(58, 'IMMEDIATE'), 16777159],
                                [(59, 'IMMEDIATE'), 1048576],
                                [(60, 'IMMEDIATE'), 16777159],
                                [(61, 'IMMEDIATE'), 6291456]]]]}
      Signed-off-by: default avatarTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7afd8e3b
    • Yangtao Li's avatar
      clocksource/drivers/integrator-ap: Add missing of_node_put() · 16df3ffc
      Yangtao Li authored
      [ Upstream commit 5eb73c83 ]
      
      The function of_find_node_by_path() acquires a reference to the node
      returned by it and that reference needs to be dropped by its caller.
      
      integrator_ap_timer_init_of() doesn't do that.  The pri_node and the
      sec_node are used as an identifier to compare against the current
      node, so we can directly drop the refcount after getting the node from
      the path as it is not used as pointer.
      
      By dropping the refcount right after getting it, a single variable is
      needed instead of two.
      
      Fix this by use a single variable and drop the refcount right after
      of_find_node_by_path().
      Signed-off-by: default avatarYangtao Li <tiny.windzz@gmail.com>
      Signed-off-by: default avatarDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      16df3ffc
    • Javier Barrio's avatar
      quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls. · 8c3e1bcc
      Javier Barrio authored
      [ Upstream commit 41c4f85c ]
      
      Commit 1fa5efe3 (ext4: Use generic helpers for quotaon
      and quotaoff) made possible to call quotactl(Q_XQUOTAON/OFF) on ext4 filesystems
      with sysfile quota support. This leads to calling dquot_enable/disable without s_umount
      held in excl. mode, because quotactl_cmd_onoff checks only for Q_QUOTAON/OFF.
      
      The following WARN_ON_ONCE triggers (in this case for dquot_enable, ext4, latest Linus' tree):
      
      [  117.807056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: quota,prjquota
      
      [...]
      
      [  155.036847] WARNING: CPU: 0 PID: 2343 at fs/quota/dquot.c:2469 dquot_enable+0x34/0xb9
      [  155.036851] Modules linked in: quota_v2 quota_tree ipv6 af_packet joydev mousedev psmouse serio_raw pcspkr i2c_piix4 intel_agp intel_gtt e1000 ttm drm_kms_helper drm agpgart fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core input_leds kvm_intel kvm irqbypass qemu_fw_cfg floppy evdev parport_pc parport button crc32c_generic dm_mod ata_generic pata_acpi ata_piix libata loop ext4 crc16 mbcache jbd2 usb_storage usbcore sd_mod scsi_mod
      [  155.036901] CPU: 0 PID: 2343 Comm: qctl Not tainted 4.20.0-rc6-00025-gf5d58277 #9
      [  155.036903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [  155.036911] RIP: 0010:dquot_enable+0x34/0xb9
      [  155.036915] Code: 41 56 41 55 41 54 55 53 4c 8b 6f 28 74 02 0f 0b 4d 8d 7d 70 49 89 fc 89 cb 41 89 d6 89 f5 4c 89 ff e8 23 09 ea ff 85 c0 74 0a <0f> 0b 4c 89 ff e8 8b 09 ea ff 85 db 74 6a 41 8b b5 f8 00 00 00 0f
      [  155.036918] RSP: 0018:ffffb09b00493e08 EFLAGS: 00010202
      [  155.036922] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 0000000000000008
      [  155.036924] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9781b67cd870
      [  155.036926] RBP: 0000000000000002 R08: 0000000000000000 R09: 61c8864680b583eb
      [  155.036929] R10: ffffb09b00493e48 R11: ffffffffff7ce7d4 R12: ffff9781b7ee8d78
      [  155.036932] R13: ffff9781b67cd800 R14: 0000000000000004 R15: ffff9781b67cd870
      [  155.036936] FS:  00007fd813250b88(0000) GS:ffff9781ba000000(0000) knlGS:0000000000000000
      [  155.036939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  155.036942] CR2: 00007fd812ff61d6 CR3: 000000007c882000 CR4: 00000000000006b0
      [  155.036951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  155.036953] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  155.036955] Call Trace:
      [  155.037004]  dquot_quota_enable+0x8b/0xd0
      [  155.037011]  kernel_quotactl+0x628/0x74e
      [  155.037027]  ? do_mprotect_pkey+0x2a6/0x2cd
      [  155.037034]  __x64_sys_quotactl+0x1a/0x1d
      [  155.037041]  do_syscall_64+0x55/0xe4
      [  155.037078]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  155.037105] RIP: 0033:0x7fd812fe1198
      [  155.037109] Code: 02 77 0d 48 89 c1 48 c1 e9 3f 75 04 48 8b 04 24 48 83 c4 50 5b c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 ff b8 b3 00 00 00 0f 05 <48> 89 c7 e8 c1 eb ff ff 5a c3 48 63 ff b8 bb 00 00 00 0f 05 48 89
      [  155.037112] RSP: 002b:00007ffe8cd7b050 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3
      [  155.037116] RAX: ffffffffffffffda RBX: 00007ffe8cd7b148 RCX: 00007fd812fe1198
      [  155.037119] RDX: 0000000000000000 RSI: 00007ffe8cd7cea9 RDI: 0000000000580102
      [  155.037121] RBP: 00007ffe8cd7b0f0 R08: 000055fc8eba8a9d R09: 0000000000000000
      [  155.037124] R10: 00007ffe8cd7b074 R11: 0000000000000206 R12: 00007ffe8cd7b168
      [  155.037126] R13: 000055fc8eba8897 R14: 0000000000000000 R15: 0000000000000000
      [  155.037131] ---[ end trace 210f864257175c51 ]---
      
      and then the syscall proceeds without s_umount locking.
      
      This patch locks the superblock ->s_umount sem. in exclusive mode for all Q_XQUOTAON/OFF
      quotactls too in addition to Q_QUOTAON/OFF.
      
      AFAICT, other than ext4, only xfs and ocfs2 are affected by this change.
      The VFS will now call in xfs_quota_* functions with s_umount held, which wasn't the case
      before. This looks good to me but I can not say for sure. Ext4 and ocfs2 where already
      beeing called with s_umount exclusive via quota_quotaon/off which is basically the same.
      Signed-off-by: default avatarJavier Barrio <javier.barrio.mart@gmail.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8c3e1bcc
    • Nikos Tsironis's avatar
      dm snapshot: Fix excessive memory usage and workqueue stalls · 8189ee47
      Nikos Tsironis authored
      [ Upstream commit 721b1d98 ]
      
      kcopyd has no upper limit to the number of jobs one can allocate and
      issue. Under certain workloads this can lead to excessive memory usage
      and workqueue stalls. For example, when creating multiple dm-snapshot
      targets with a 4K chunk size and then writing to the origin through the
      page cache. Syncing the page cache causes a large number of BIOs to be
      issued to the dm-snapshot origin target, which itself issues an even
      larger (because of the BIO splitting taking place) number of kcopyd
      jobs.
      
      Running the following test, from the device mapper test suite [1],
      
        dmtest run --suite snapshot -n many_snapshots_of_same_volume_N
      
      , with 8 active snapshots, results in the kcopyd job slab cache growing
      to 10G. Depending on the available system RAM this can lead to the OOM
      killer killing user processes:
      
      [463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP),
                    nodemask=(null), order=1, oom_score_adj=0
      [463.492894] kthreadd cpuset=/ mems_allowed=0
      [463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3
      [463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [463.492952] Call Trace:
      [463.492964]  dump_stack+0x7d/0xbb
      [463.492973]  dump_header+0x6b/0x2fc
      [463.492987]  ? lockdep_hardirqs_on+0xee/0x190
      [463.493012]  oom_kill_process+0x302/0x370
      [463.493021]  out_of_memory+0x113/0x560
      [463.493030]  __alloc_pages_slowpath+0xf40/0x1020
      [463.493055]  __alloc_pages_nodemask+0x348/0x3c0
      [463.493067]  cache_grow_begin+0x81/0x8b0
      [463.493072]  ? cache_grow_begin+0x874/0x8b0
      [463.493078]  fallback_alloc+0x1e4/0x280
      [463.493092]  kmem_cache_alloc_node+0xd6/0x370
      [463.493098]  ? copy_process.part.31+0x1c5/0x20d0
      [463.493105]  copy_process.part.31+0x1c5/0x20d0
      [463.493115]  ? __lock_acquire+0x3cc/0x1550
      [463.493121]  ? __switch_to_asm+0x34/0x70
      [463.493129]  ? kthread_create_worker_on_cpu+0x70/0x70
      [463.493135]  ? finish_task_switch+0x90/0x280
      [463.493165]  _do_fork+0xe0/0x6d0
      [463.493191]  ? kthreadd+0x19f/0x220
      [463.493233]  kernel_thread+0x25/0x30
      [463.493235]  kthreadd+0x1bf/0x220
      [463.493242]  ? kthread_create_on_cpu+0x90/0x90
      [463.493248]  ret_from_fork+0x3a/0x50
      [463.493279] Mem-Info:
      [463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0
      [463.493285]  active_file:80216 inactive_file:80107 isolated_file:435
      [463.493285]  unevictable:0 dirty:51266 writeback:109372 unstable:0
      [463.493285]  slab_reclaimable:31191 slab_unreclaimable:3483521
      [463.493285]  mapped:526 shmem:4903 pagetables:1759 bounce:0
      [463.493285]  free:33623 free_pcp:2392 free_cma:0
      ...
      [463.493489] Unreclaimable slab info:
      [463.493513] Name                      Used          Total
      [463.493522] bio-6                   1028KB       1028KB
      [463.493525] bio-5                   1028KB       1028KB
      [463.493528] dm_snap_pending_exception     236783KB     243789KB
      [463.493531] dm_exception              41KB         42KB
      [463.493534] bio-4                   1216KB       1216KB
      [463.493537] bio-3                 439396KB     439396KB
      [463.493539] kcopyd_job           6973427KB    6973427KB
      ...
      [463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child
      [463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB
      [463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
      
      Moreover, issuing a large number of kcopyd jobs results in kcopyd
      hogging the CPU, while processing them. As a result, processing of work
      items, queued for execution on the same CPU as the currently running
      kcopyd thread, is stalled for long periods of time, hurting performance.
      Running the aforementioned test we get, in dmesg, messages like the
      following:
      
      [67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s!
      [67501.195586] Showing busy workqueues and worker pools:
      [67501.195591] workqueue events: flags=0x0
      [67501.195597]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195611]     pending: cache_reap
      [67501.195641] workqueue mm_percpu_wq: flags=0x8
      [67501.195645]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195656]     pending: vmstat_update
      [67501.195682] workqueue kblockd: flags=0x18
      [67501.195687]   pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256
      [67501.195698]     pending: blk_timeout_work
      [67501.195753] workqueue kcopyd: flags=0x8
      [67501.195757]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195768]     pending: do_work [dm_mod]
      [67501.195802] workqueue kcopyd: flags=0x8
      [67501.195806]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195817]     pending: do_work [dm_mod]
      [67501.195834] workqueue kcopyd: flags=0x8
      [67501.195838]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195848]     pending: do_work [dm_mod]
      [67501.195881] workqueue kcopyd: flags=0x8
      [67501.195885]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
      [67501.195896]     pending: do_work [dm_mod]
      [67501.195920] workqueue kcopyd: flags=0x8
      [67501.195924]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256
      [67501.195935]     in-flight: 67:do_work [dm_mod]
      [67501.195945]     pending: do_work [dm_mod]
      [67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765
      
      The root cause for these issues is the way dm-snapshot uses kcopyd. In
      particular, the lack of an explicit or implicit limit to the maximum
      number of in-flight COW jobs. The merging path is not affected because
      it implicitly limits the in-flight kcopyd jobs to one.
      
      Fix these issues by using a semaphore to limit the maximum number of
      in-flight kcopyd jobs. We grab the semaphore before allocating a new
      kcopyd job in start_copy() and start_full_bio() and release it after the
      job finishes in copy_callback().
      
      The initial semaphore value is configurable through a module parameter,
      to allow fine tuning the maximum number of in-flight COW jobs. Setting
      this parameter to zero initializes the semaphore to INT_MAX.
      
      A default value of 2048 maximum in-flight kcopyd jobs was chosen. This
      value was decided experimentally as a trade-off between memory
      consumption, stalling the kernel's workqueues and maintaining a high
      enough throughput.
      
      Re-running the aforementioned test:
      
        * Workqueue stalls are eliminated
        * kcopyd's job slab cache uses a maximum of 130MB
        * The time taken by the test to write to the snapshot-origin target is
          reduced from 05m20.48s to 03m26.38s
      
      [1] https://github.com/jthornber/device-mapper-test-suiteSigned-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarIlias Tsitsimpis <iliastsi@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8189ee47
    • Arnaldo Carvalho de Melo's avatar
      tools lib subcmd: Don't add the kernel sources to the include path · 55f603dc
      Arnaldo Carvalho de Melo authored
      [ Upstream commit ece98049 ]
      
      At some point we decided not to directly include kernel sources files
      when building tools/perf/, but when tools/lib/subcmd/ was forked from
      tools/perf it somehow ended up adding it via these two lines in its
      Makefile:
      
        CFLAGS += -I$(srctree)/include/uapi
        CFLAGS += -I$(srctree)/include
      
      As $(srctree) points to the kernel sources.
      
      Removing those lines and keeping just:
      
        CFLAGS += -I$(srctree)/tools/include/
      
      Is enough to build tools/perf and tools/objtool.
      
      This fixes the build when building from the sources in environments such
      as the Android NDK crossbuilding from a fedora:26 system:
      
        subcmd-util.h:11:15: error: expected ',' or ';' before 'void'
         static inline void report(const char *prefix, const char *err, va_list params)
                       ^
        In file included from /git/perf/include/uapi/linux/stddef.h:2:0,
                         from /git/perf/include/uapi/linux/posix_types.h:5,
                         from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h:36,
                         from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/unistd.h:33,
                         from run-command.c:2:
        subcmd-util.h:18:17: error: '__no_instrument_function__' attribute applies only to functions
      
      The /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h
      file that includes linux/posix_types.h ends up getting the one in the kernel
      sources causing the breakage. Fix it.
      
      Test built tools/objtool/ too.
      Reported-by: default avatarJiri Olsa <jolsa@kernel.org>
      Tested-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Fixes: 4b6ab94e ("perf subcmd: Create subcmd library")
      Link: https://lkml.kernel.org/n/tip-5lhaoecrj12t0bqwvpiu14sm@git.kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      55f603dc
    • Nikos Tsironis's avatar
      dm kcopyd: Fix bug causing workqueue stalls · 8561f28c
      Nikos Tsironis authored
      [ Upstream commit d7e6b8df ]
      
      When using kcopyd to run callbacks through dm_kcopyd_do_callback() or
      submitting copy jobs with a source size of 0, the jobs are pushed
      directly to the complete_jobs list, which could be under processing by
      the kcopyd thread. As a result, the kcopyd thread can continue running
      completed jobs indefinitely, without releasing the CPU, as long as
      someone keeps submitting new completed jobs through the aforementioned
      paths. Processing of work items, queued for execution on the same CPU as
      the currently running kcopyd thread, is thus stalled for excessive
      amounts of time, hurting performance.
      
      Running the following test, from the device mapper test suite [1],
      
        dmtest run --suite snapshot -n parallel_io_to_many_snaps_N
      
      , with 8 active snapshots, we get, in dmesg, messages like the
      following:
      
      [68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s!
      [68899.949282] Showing busy workqueues and worker pools:
      [68899.949288] workqueue events: flags=0x0
      [68899.949295]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
      [68899.949306]     pending: vmstat_shepherd, cache_reap
      [68899.949331] workqueue mm_percpu_wq: flags=0x8
      [68899.949337]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949345]     pending: vmstat_update
      [68899.949387] workqueue dm_bufio_cache: flags=0x8
      [68899.949392]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
      [68899.949400]     pending: work_fn [dm_bufio]
      [68899.949423] workqueue kcopyd: flags=0x8
      [68899.949429]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949437]     pending: do_work [dm_mod]
      [68899.949452] workqueue kcopyd: flags=0x8
      [68899.949458]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
      [68899.949466]     in-flight: 13:do_work [dm_mod]
      [68899.949474]     pending: do_work [dm_mod]
      [68899.949487] workqueue kcopyd: flags=0x8
      [68899.949493]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949501]     pending: do_work [dm_mod]
      [68899.949515] workqueue kcopyd: flags=0x8
      [68899.949521]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949529]     pending: do_work [dm_mod]
      [68899.949541] workqueue kcopyd: flags=0x8
      [68899.949547]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
      [68899.949555]     pending: do_work [dm_mod]
      [68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084
      
      Fix this by splitting the complete_jobs list into two parts: A user
      facing part, named callback_jobs, and one used internally by kcopyd,
      retaining the name complete_jobs. dm_kcopyd_do_callback() and
      dispatch_job() now push their jobs to the callback_jobs list, which is
      spliced to the complete_jobs list once, every time the kcopyd thread
      wakes up. This prevents kcopyd from hogging the CPU indefinitely and
      causing workqueue stalls.
      
      Re-running the aforementioned test:
      
        * Workqueue stalls are eliminated
        * The maximum writing time among all targets is reduced from 09m37.10s
          to 06m04.85s and the total run time of the test is reduced from
          10m43.591s to 7m19.199s
      
      [1] https://github.com/jthornber/device-mapper-test-suiteSigned-off-by: default avatarNikos Tsironis <ntsironis@arrikto.com>
      Signed-off-by: default avatarIlias Tsitsimpis <iliastsi@arrikto.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8561f28c