1. 31 Jan, 2019 22 commits
    • Tomas Winkler's avatar
      mei: me: add denverton innovation engine device IDs · f8982204
      Tomas Winkler authored
      commit f7ee8ead upstream.
      
      Add the Denverton innovation engine (IE) device ids.
      The IE is an ME-like device which provides HW security
      offloading.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f8982204
    • Alexander Usyskin's avatar
      mei: me: mark LBG devices as having dma support · adfda26b
      Alexander Usyskin authored
      commit 173436ba upstream.
      
      The LBG server platform sports DMA support.
      
      Cc: <stable@vger.kernel.org> #v5.0+
      Signed-off-by: default avatarAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: default avatarTomas Winkler <tomas.winkler@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adfda26b
    • Willem de Bruijn's avatar
      tcp: allow MSG_ZEROCOPY transmission also in CLOSE_WAIT state · 2cade15d
      Willem de Bruijn authored
      [ Upstream commit 13d7f463 ]
      
      TCP transmission with MSG_ZEROCOPY fails if the peer closes its end of
      the connection and so transitions this socket to CLOSE_WAIT state.
      
      Transmission in close wait state is acceptable. Other similar tests in
      the stack (e.g., in FastOpen) accept both states. Relax this test, too.
      
      Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg276886.html
      Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg227390.html
      Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
      Reported-by: default avatarMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      CC: Yuchung Cheng <ycheng@google.com>
      CC: Neal Cardwell <ncardwell@google.com>
      CC: Soheil Hassas Yeganeh <soheil@google.com>
      CC: Alexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2cade15d
    • Hangbin Liu's avatar
      ip6_gre: update version related info when changing link · 6c4d069a
      Hangbin Liu authored
      [ Upstream commit 80b3671e ]
      
      We forgot to update ip6erspan version related info when changing link,
      which will cause setting new hwid failed.
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Fixes: 94d7d8f2 ("ip6_gre: add erspan v2 support")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c4d069a
    • Andrew Lunn's avatar
      net: phy: marvell: Fix deadlock from wrong locking · c9fe9d19
      Andrew Lunn authored
      [ Upstream commit e0a7328f ]
      
      m88e1318_set_wol() takes the lock as part of phy_select_page(). Don't
      take the lock again with phy_read(), use the unlocked __phy_read().
      
      Fixes: 424ca4c5 ("net: phy: marvell: fix paged access races")
      Reported-by: default avatarÅke Rehnman <ake.rehnman@gmail.com>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9fe9d19
    • Xin Long's avatar
      erspan: build the header with the right proto according to erspan_ver · 552cd931
      Xin Long authored
      [ Upstream commit 20704bd1 ]
      
      As said in draft-foschiano-erspan-03#section4:
      
         Different frame variants known as "ERSPAN Types" can be
         distinguished based on the GRE "Protocol Type" field value: Type I
         and II's value is 0x88BE while Type III's is 0x22EB [ETYPES].
      
      So set it properly in erspan_xmit() according to erspan_ver. While at
      it, also remove the unused parameter 'proto' in erspan_fb_xmit().
      
      Fixes: 94d7d8f2 ("ip6_gre: add erspan v2 support")
      Reported-by: default avatarJianlin Shi <jishi@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      552cd931
    • Olivier Matz's avatar
      ip6_gre: fix tunnel list corruption for x-netns · 0449da6f
      Olivier Matz authored
      [ Upstream commit ab5098fa ]
      
      In changelink ops, the ip6gre_net pointer is retrieved from
      dev_net(dev), which is wrong in case of x-netns. Thus, the tunnel is not
      unlinked from its current list and is relinked into another net
      namespace. This corrupts the tunnel lists and can later trigger a kernel
      oops.
      
      Fix this by retrieving the netns from device private area.
      
      Fixes: c8632fc3 ("net: ip6_gre: Split up ip6gre_changelink()")
      Cc: Petr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarOlivier Matz <olivier.matz@6wind.com>
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0449da6f
    • Willem de Bruijn's avatar
      udp: with udp_segment release on error path · e3fa624e
      Willem de Bruijn authored
      [ Upstream commit 0f149c9f ]
      
      Failure __ip_append_data triggers udp_flush_pending_frames, but these
      tests happen later. The skb must be freed directly.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3fa624e
    • Ivan Vecera's avatar
      net/sched: cls_flower: allocate mask dynamically in fl_change() · 84bf7430
      Ivan Vecera authored
      [ Upstream commit 2cddd201 ]
      
      Recent changes (especially 05cd271f ("cls_flower: Support multiple
      masks per priority")) in the fl_flow_mask structure grow it and its
      current size e.g. on x86_64 with defconfig is 760 bytes and more than
      1024 bytes with some debug options enabled. Prior the mentioned commit
      its size was 176 bytes (using defconfig on x86_64).
      With regard to this fact it's reasonable to allocate this structure
      dynamically in fl_change() to reduce its stack size.
      
      v2:
      - use kzalloc() instead of kcalloc()
      
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Paul Blakey <paulb@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      84bf7430
    • Ido Schimmel's avatar
      mlxsw: pci: Ring CQ's doorbell before RDQ's · bdafc159
      Ido Schimmel authored
      When a packet should be trapped to the CPU the device consumes a WQE
      (work queue element) from an RDQ (receive descriptor queue) and copies
      the packet to the address specified in the WQE. The device then tries to
      post a CQE (completion queue element) that contains various metadata
      (e.g., ingress port) about the packet to a CQ (completion queue).
      
      In case the device managed to consume a WQE, but did not manage to post
      the corresponding CQE, it will get stuck. This unlikely situation can be
      triggered due to the scheme the driver is currently using to process
      CQEs.
      
      The driver will consume up to 512 CQEs at a time and after processing
      each corresponding WQE it will ring the RDQ's doorbell, letting the
      device know that a new WQE was posted for it to consume. Only after
      processing all the CQEs (up to 512), the driver will ring the CQ's
      doorbell, letting the device know that new ones can be posted.
      
      Fix this by having the driver ring the CQ's doorbell for every processed
      CQE, but before ringing the RDQ's doorbell. This guarantees that
      whenever we post a new WQE, there is a corresponding CQE available. Copy
      the currently processed CQE to prevent the device from overwriting it
      with a new CQE after ringing the doorbell.
      
      Note that the driver still arms the CQ only after processing all the
      pending CQEs, so that interrupts for this CQ will only be delivered
      after the driver finished its processing.
      
      Before commit 8404f6f2 ("mlxsw: pci: Allow to use CQEs of version 1
      and version 2") the issue was virtually impossible to trigger since the
      number of CQEs was twice the number of WQEs and the number of CQEs
      processed at a time was equal to the number of available WQEs.
      
      Fixes: 8404f6f2 ("mlxsw: pci: Allow to use CQEs of version 1 and version 2")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarSemion Lisyansky <semionl@mellanox.com>
      Tested-by: default avatarSemion Lisyansky <semionl@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bdafc159
    • Nir Dotan's avatar
      mlxsw: spectrum_fid: Update dummy FID index · c82f4684
      Nir Dotan authored
      [ Upstream commit a11dcd64 ]
      
      When using a tc flower action of egress mirred redirect, the driver adds
      an implicit FID setting action. This implicit action sets a dummy FID to
      the packet and is used as part of a design for trapping unmatched flows
      in OVS.  While this implicit FID setting action is supposed to be a NOP
      when a redirect action is added, in Spectrum-2 the FID record is
      consulted as the dummy FID index is an 802.1D FID index and the packet
      is dropped instead of being redirected.
      
      Set the dummy FID index value to be within 802.1Q range. This satisfies
      both Spectrum-1 which ignores the FID and Spectrum-2 which identifies it
      as an 802.1Q FID and will then follow the redirect action.
      
      Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c82f4684
    • Ido Schimmel's avatar
      net: ipv4: Fix memory leak in network namespace dismantle · adbf7e58
      Ido Schimmel authored
      [ Upstream commit f97f4dd8 ]
      
      IPv4 routing tables are flushed in two cases:
      
      1. In response to events in the netdev and inetaddr notification chains
      2. When a network namespace is being dismantled
      
      In both cases only routes associated with a dead nexthop group are
      flushed. However, a nexthop group will only be marked as dead in case it
      is populated with actual nexthops using a nexthop device. This is not
      the case when the route in question is an error route (e.g.,
      'blackhole', 'unreachable').
      
      Therefore, when a network namespace is being dismantled such routes are
      not flushed and leaked [1].
      
      To reproduce:
      # ip netns add blue
      # ip -n blue route add unreachable 192.0.2.0/24
      # ip netns del blue
      
      Fix this by not skipping error routes that are not marked with
      RTNH_F_DEAD when flushing the routing tables.
      
      To prevent the flushing of such routes in case #1, add a parameter to
      fib_table_flush() that indicates if the table is flushed as part of
      namespace dismantle or not.
      
      Note that this problem does not exist in IPv6 since error routes are
      associated with the loopback device.
      
      [1]
      unreferenced object 0xffff888066650338 (size 56):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff  ..........ba....
          e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00  ...d............
        backtrace:
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      unreferenced object 0xffff888061621c88 (size 48):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
          6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff  kkkkkkkk..&_....
        backtrace:
          [<00000000733609e3>] fib_table_insert+0x978/0x1500
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      
      Fixes: 8cced9ef ("[NETNS]: Enable routing configuration in non-initial namespace.")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adbf7e58
    • Nir Dotan's avatar
      mlxsw: pci: Increase PCI SW reset timeout · bc4e2300
      Nir Dotan authored
      [ Upstream commit d2f372ba ]
      
      Spectrum-2 PHY layer introduces a calibration period which is a part of the
      Spectrum-2 firmware boot process. Hence increase the SW timeout waiting for
      the firmware to come out of boot. This does not increase system boot time
      in cases where the firmware PHY calibration process is done quickly.
      
      Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Signed-off-by: default avatarNir Dotan <nird@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc4e2300
    • Jason Wang's avatar
      vhost: log dirty page correctly · 1688e75c
      Jason Wang authored
      [ Upstream commit cc5e7107 ]
      
      Vhost dirty page logging API is designed to sync through GPA. But we
      try to log GIOVA when device IOTLB is enabled. This is wrong and may
      lead to missing data after migration.
      
      To solve this issue, when logging with device IOTLB enabled, we will:
      
      1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
         get HVA, for writable descriptor, get HVA through iovec. For used
         ring update, translate its GIOVA to HVA
      2) traverse the GPA->HVA mapping to get the possible GPA and log
         through GPA. Pay attention this reverse mapping is not guaranteed
         to be unique, so we should log each possible GPA in this case.
      
      This fix the failure of scp to guest during migration. In -next, we
      will probably support passing GIOVA->GPA instead of GIOVA->HVA.
      
      Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
      Reported-by: default avatarJintack Lim <jintack@cs.columbia.edu>
      Cc: Jintack Lim <jintack@cs.columbia.edu>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1688e75c
    • Ross Lagerwall's avatar
      openvswitch: Avoid OOB read when parsing flow nlattrs · 3d997bf0
      Ross Lagerwall authored
      [ Upstream commit 04a4af33 ]
      
      For nested and variable attributes, the expected length of an attribute
      is not known and marked by a negative number.  This results in an OOB
      read when the expected length is later used to check if the attribute is
      all zeros. Fix this by using the actual length of the attribute rather
      than the expected length.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Acked-by: default avatarPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d997bf0
    • Cong Wang's avatar
      net_sched: refetch skb protocol for each filter · 916c27c8
      Cong Wang authored
      [ Upstream commit cd0c4e70 ]
      
      Martin reported a set of filters don't work after changing
      from reclassify to continue. Looking into the code, it
      looks like skb protocol is not always fetched for each
      iteration of the filters. But, as demonstrated by Martin,
      TC actions could modify skb->protocol, for example act_vlan,
      this means we have to refetch skb protocol in each iteration,
      rather than using the one we fetch in the beginning of the loop.
      
      This bug is _not_ introduced by commit 3b3ae880
      ("net: sched: consolidate tc_classify{,_compat}"), technically,
      if act_vlan is the only action that modifies skb protocol, then
      it is commit c7e2b968 ("sched: introduce vlan action") which
      introduced this bug.
      Reported-by: default avatarMartin Olsson <martin.olsson+netdev@sentorsecurity.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      916c27c8
    • Davide Caratti's avatar
      net/sched: act_tunnel_key: fix memory leak in case of action replace · 02239e79
      Davide Caratti authored
      [ Upstream commit 9174c3df ]
      
      running the following TDC test cases:
      
       7afc - Replace tunnel_key set action with all parameters
       364d - Replace tunnel_key set action with all parameters and cookie
      
      it's possible to trigger kmemleak warnings like:
      
        unreferenced object 0xffff94797127ab40 (size 192):
        comm "tc", pid 3248, jiffies 4300565293 (age 1006.862s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 c0 93 f9 8a ff ff ff ff  ................
          41 84 ee 89 ff ff ff ff 00 00 00 00 00 00 00 00  A...............
        backtrace:
          [<000000001e85b61c>] tunnel_key_init+0x31d/0x820 [act_tunnel_key]
          [<000000007f3f6ee7>] tcf_action_init_1+0x384/0x4c0
          [<00000000e89e3ded>] tcf_action_init+0x12b/0x1a0
          [<00000000c1c8c0f8>] tcf_action_add+0x73/0x170
          [<0000000095a9fc28>] tc_ctl_action+0x122/0x160
          [<000000004bebeac5>] rtnetlink_rcv_msg+0x263/0x2d0
          [<000000009fd862dd>] netlink_rcv_skb+0x4a/0x110
          [<00000000b55199e7>] netlink_unicast+0x1a0/0x250
          [<000000004996cd21>] netlink_sendmsg+0x2c1/0x3c0
          [<000000004d6a94b4>] sock_sendmsg+0x36/0x40
          [<000000005d9f0208>] ___sys_sendmsg+0x280/0x2f0
          [<00000000dec19023>] __sys_sendmsg+0x5e/0xa0
          [<000000004b82ac81>] do_syscall_64+0x5b/0x180
          [<00000000a0f1209a>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
          [<000000002926b2ab>] 0xffffffffffffffff
      
      when the tunnel_key action is replaced, the kernel forgets to release the
      dst metadata: ensure they are released by tunnel_key_init(), the same way
      it's done in tunnel_key_release().
      
      Fixes: d0f6dd8a ("net/sched: Introduce act_tunnel_key")
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02239e79
    • Thomas Petazzoni's avatar
      net: phy: mdio_bus: add missing device_del() in mdiobus_register() error handling · 3e4cd067
      Thomas Petazzoni authored
      [ Upstream commit e40e2a2e ]
      
      The current code in __mdiobus_register() doesn't properly handle
      failures returned by the devm_gpiod_get_optional() call: it returns
      immediately, without unregistering the device that was added by the
      call to device_register() earlier in the function.
      
      This leaves a stale device, which then causes a NULL pointer
      dereference in the code that handles deferred probing:
      
      [    1.489982] Unable to handle kernel NULL pointer dereference at virtual address 00000074
      [    1.498110] pgd = (ptrval)
      [    1.500838] [00000074] *pgd=00000000
      [    1.504432] Internal error: Oops: 17 [#1] SMP ARM
      [    1.509133] Modules linked in:
      [    1.512192] CPU: 1 PID: 51 Comm: kworker/1:3 Not tainted 4.20.0-00039-g3b73a4cc8b3e-dirty #99
      [    1.520708] Hardware name: Xilinx Zynq Platform
      [    1.525261] Workqueue: events deferred_probe_work_func
      [    1.530403] PC is at klist_next+0x10/0xfc
      [    1.534403] LR is at device_for_each_child+0x40/0x94
      [    1.539361] pc : [<c0683fbc>]    lr : [<c0455d90>]    psr: 200e0013
      [    1.545628] sp : ceeefe68  ip : 00000001  fp : ffffe000
      [    1.550863] r10: 00000000  r9 : c0c66790  r8 : 00000000
      [    1.556079] r7 : c0457d44  r6 : 00000000  r5 : ceeefe8c  r4 : cfa2ec78
      [    1.562604] r3 : 00000064  r2 : c0457d44  r1 : ceeefe8c  r0 : 00000064
      [    1.569129] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [    1.576263] Control: 18c5387d  Table: 0ed7804a  DAC: 00000051
      [    1.582013] Process kworker/1:3 (pid: 51, stack limit = 0x(ptrval))
      [    1.588280] Stack: (0xceeefe68 to 0xceef0000)
      [    1.592630] fe60:                   cfa2ec78 c0c03c08 00000000 c0457d44 00000000 c0c66790
      [    1.600814] fe80: 00000000 c0455d90 ceeefeac 00000064 00000000 0d7a542e cee9d494 cfa2ec78
      [    1.608998] fea0: cfa2ec78 00000000 c0457d44 c0457d7c cee9d494 c0c03c08 00000000 c0455dac
      [    1.617182] fec0: cf98ba44 cf926a00 cee9d494 0d7a542e 00000000 cf935a10 cf935a10 cf935a10
      [    1.625366] fee0: c0c4e9b8 c0457d7c c0c4e80c 00000001 cf935a10 c0457df4 cf935a10 c0c4e99c
      [    1.633550] ff00: c0c4e99c c045a27c c0c4e9c4 ced63f80 cfde8a80 cfdebc00 00000000 c013893c
      [    1.641734] ff20: cfde8a80 cfde8a80 c07bd354 ced63f80 ced63f94 cfde8a80 00000008 c0c02d00
      [    1.649936] ff40: cfde8a98 cfde8a80 ffffe000 c0139a30 ffffe000 c0c6624a c07bd354 00000000
      [    1.658120] ff60: ffffe000 cee9e780 ceebfe00 00000000 ceeee000 ced63f80 c0139788 cf8cdea4
      [    1.666304] ff80: cee9e79c c013e598 00000001 ceebfe00 c013e44c 00000000 00000000 00000000
      [    1.674488] ffa0: 00000000 00000000 00000000 c01010e8 00000000 00000000 00000000 00000000
      [    1.682671] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    1.690855] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
      [    1.699058] [<c0683fbc>] (klist_next) from [<c0455d90>] (device_for_each_child+0x40/0x94)
      [    1.707241] [<c0455d90>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
      [    1.716476] [<c0457d7c>] (device_reorder_to_tail) from [<c0455dac>] (device_for_each_child+0x5c/0x94)
      [    1.725692] [<c0455dac>] (device_for_each_child) from [<c0457d7c>] (device_reorder_to_tail+0x38/0x88)
      [    1.734927] [<c0457d7c>] (device_reorder_to_tail) from [<c0457df4>] (device_pm_move_to_tail+0x28/0x40)
      [    1.744235] [<c0457df4>] (device_pm_move_to_tail) from [<c045a27c>] (deferred_probe_work_func+0x58/0x8c)
      [    1.753746] [<c045a27c>] (deferred_probe_work_func) from [<c013893c>] (process_one_work+0x210/0x4fc)
      [    1.762888] [<c013893c>] (process_one_work) from [<c0139a30>] (worker_thread+0x2a8/0x5c0)
      [    1.771072] [<c0139a30>] (worker_thread) from [<c013e598>] (kthread+0x14c/0x154)
      [    1.778482] [<c013e598>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
      [    1.785689] Exception stack(0xceeeffb0 to 0xceeefff8)
      [    1.790739] ffa0:                                     00000000 00000000 00000000 00000000
      [    1.798923] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [    1.807107] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
      [    1.813724] Code: e92d47f0 e1a05000 e8900048 e1a00003 (e5937010)
      [    1.819844] ---[ end trace 3c2c0c8b65399ec9 ]---
      
      The actual error that we had from devm_gpiod_get_optional() was
      -EPROBE_DEFER, due to the GPIO being provided by a driver that is
      probed later than the Ethernet controller driver.
      
      To fix this, we simply add the missing device_del() invocation in the
      error path.
      
      Fixes: 69226896 ("mdio_bus: Issue GPIO RESET to PHYs")
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@bootlin.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e4cd067
    • Andrew Lunn's avatar
      net: phy: marvell: Errata for mv88e6390 internal PHYs · 1a864e38
      Andrew Lunn authored
      [ Upstream commit 8cbcdc1a ]
      
      The VOD can be out of spec, unless some magic value is poked into an
      undocumented register in an undocumented page.
      
      Fixes: e4cf8a38 ("net: phy: Marvell: Add mv88e6390 internal PHY")
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a864e38
    • Ross Lagerwall's avatar
      net: Fix usage of pskb_trim_rcsum · 40f2f080
      Ross Lagerwall authored
      [ Upstream commit 6c57f045 ]
      
      In certain cases, pskb_trim_rcsum() may change skb pointers.
      Reinitialize header pointers afterwards to avoid potential
      use-after-frees. Add a note in the documentation of
      pskb_trim_rcsum(). Found by KASAN.
      Signed-off-by: default avatarRoss Lagerwall <ross.lagerwall@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      40f2f080
    • Yunjian Wang's avatar
      net: bridge: Fix ethernet header pointer before check skb forwardable · e287968a
      Yunjian Wang authored
      [ Upstream commit 28c1382f ]
      
      The skb header should be set to ethernet header before using
      is_skb_forwardable. Because the ethernet header length has been
      considered in is_skb_forwardable(including dev->hard_header_len
      length).
      
      To reproduce the issue:
      1, add 2 ports on linux bridge br using following commands:
      $ brctl addbr br
      $ brctl addif br eth0
      $ brctl addif br eth1
      2, the MTU of eth0 and eth1 is 1500
      3, send a packet(Data 1480, UDP 8, IP 20, Ethernet 14, VLAN 4)
      from eth0 to eth1
      
      So the expect result is packet larger than 1500 cannot pass through
      eth0 and eth1. But currently, the packet passes through success, it
      means eth1's MTU limit doesn't take effect.
      
      Fixes: f6367b46 ("bridge: use is_skb_forwardable in forward path")
      Cc: bridge@lists.linux-foundation.org
      Cc: Nkolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarYunjian Wang <wangyunjian@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e287968a
    • Lendacky, Thomas's avatar
      amd-xgbe: Fix mdio access for non-zero ports and clause 45 PHYs · 779a5077
      Lendacky, Thomas authored
      [ Upstream commit 5ab3121b ]
      
      The XGBE hardware has support for performing MDIO operations using an
      MDIO command request. The driver mistakenly uses the mdio port address
      as the MDIO command request device address instead of the MDIO command
      request port address. Additionally, the driver does not properly check
      for and create a clause 45 MDIO command.
      
      Check the supplied MDIO register to determine if the request is a clause
      45 operation (MII_ADDR_C45). For a clause 45 operation, extract the device
      address and register number from the supplied MDIO register and use them
      to set the MDIO command request device address and register number fields.
      For a clause 22 operation, the MDIO request device address is set to zero
      and the MDIO command request register number is set to the supplied MDIO
      register. In either case, the supplied MDIO port address is used as the
      MDIO command request port address.
      
      Fixes: 732f2ab7 ("amd-xgbe: Add support for MDIO attached PHYs")
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Tested-by: default avatarShyam Sundar S K <Shyam-sundar.S-k@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      779a5077
  2. 26 Jan, 2019 18 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.19.18 · 34ae6572
      Greg Kroah-Hartman authored
      34ae6572
    • Corey Minyard's avatar
      ipmi: Don't initialize anything in the core until something uses it · b40aec33
      Corey Minyard authored
      commit 913a89f0 upstream.
      
      The IPMI driver was recently modified to use SRCU, but it turns out
      this uses a chunk of percpu memory, even if IPMI is never used.
      
      So modify thing to on initialize on the first use.  There was already
      code to sort of handle this for handling init races, so piggy back
      on top of that, and simplify it in the process.
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Reported-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org # 4.18
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b40aec33
    • Corey Minyard's avatar
      ipmi:ssif: Fix handling of multi-part return messages · 031a94ff
      Corey Minyard authored
      commit 7d6380cd upstream.
      
      The block number was not being compared right, it was off by one
      when checking the response.
      
      Some statistics wouldn't be incremented properly in some cases.
      
      Check to see if that middle-part messages always have 31 bytes of
      data.
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Cc: stable@vger.kernel.org # 4.4
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      031a94ff
    • Fred Klassen's avatar
      ipmi: Prevent use-after-free in deliver_response · 821a003d
      Fred Klassen authored
      commit 479d6b39 upstream.
      
      Some IPMI modules (e.g. ibmpex_msg_handler()) will have ipmi_usr_hdlr
      handlers that call ipmi_free_recv_msg() directly. This will essentially
      kfree(msg), leading to use-after-free.
      
      This does not happen in the ipmi_devintf module, which will queue the
      message and run ipmi_free_recv_msg() later.
      
      BUG: KASAN: use-after-free in deliver_response+0x12f/0x1b0
      Read of size 8 at addr ffff888a7bf20018 by task ksoftirqd/3/27
      CPU: 3 PID: 27 Comm: ksoftirqd/3 Tainted: G           O      4.19.11-amd64-ani99-debug #12.0.1.601133+pv
      Hardware name: AppNeta r1000/X11SPW-TF, BIOS 2.1a-AP 09/17/2018
      Call Trace:
      dump_stack+0x92/0xeb
      print_address_description+0x73/0x290
      kasan_report+0x258/0x380
      deliver_response+0x12f/0x1b0
      ? ipmi_free_recv_msg+0x50/0x50
      deliver_local_response+0xe/0x50
      handle_one_recv_msg+0x37a/0x21d0
      handle_new_recv_msgs+0x1ce/0x440
      ...
      
      Allocated by task 9885:
      kasan_kmalloc+0xa0/0xd0
      kmem_cache_alloc_trace+0x116/0x290
      ipmi_alloc_recv_msg+0x28/0x70
      i_ipmi_request+0xb4a/0x1640
      ipmi_request_settime+0x1b8/0x1e0
      ...
      
      Freed by task 27:
      __kasan_slab_free+0x12e/0x180
      kfree+0xe9/0x280
      deliver_response+0x122/0x1b0
      deliver_local_response+0xe/0x50
      handle_one_recv_msg+0x37a/0x21d0
      handle_new_recv_msgs+0x1ce/0x440
      tasklet_action_common.isra.19+0xc4/0x250
      __do_softirq+0x11f/0x51f
      
      Fixes: e86ee2d4 ("ipmi: Rework locking and shutdown for hot remove")
      Cc: stable@vger.kernel.org # 4.18
      Signed-off-by: default avatarFred Klassen <fklassen@appneta.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      821a003d
    • Gustavo A. R. Silva's avatar
      ipmi: msghandler: Fix potential Spectre v1 vulnerabilities · 753abe2a
      Gustavo A. R. Silva authored
      commit a7102c74 upstream.
      
      channel and addr->channel are indirectly controlled by user-space,
      hence leading to a potential exploitation of the Spectre variant 1
      vulnerability.
      
      These issues were detected with the help of Smatch:
      
      drivers/char/ipmi/ipmi_msghandler.c:1381 ipmi_set_my_address() warn: potential spectre issue 'user->intf->addrinfo' [w] (local cap)
      drivers/char/ipmi/ipmi_msghandler.c:1401 ipmi_get_my_address() warn: potential spectre issue 'user->intf->addrinfo' [r] (local cap)
      drivers/char/ipmi/ipmi_msghandler.c:1421 ipmi_set_my_LUN() warn: potential spectre issue 'user->intf->addrinfo' [w] (local cap)
      drivers/char/ipmi/ipmi_msghandler.c:1441 ipmi_get_my_LUN() warn: potential spectre issue 'user->intf->addrinfo' [r] (local cap)
      drivers/char/ipmi/ipmi_msghandler.c:2260 check_addr() warn: potential spectre issue 'intf->addrinfo' [r] (local cap)
      
      Fix this by sanitizing channel and addr->channel before using them to
      index user->intf->addrinfo and intf->addrinfo, correspondingly.
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://lore.kernel.org/lkml/20180423164740.GY17484@dhcp22.suse.cz/
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      753abe2a
    • Yang Yingliang's avatar
      ipmi: fix use-after-free of user->release_barrier.rda · 1c393ca1
      Yang Yingliang authored
      commit 77f82696 upstream.
      
      When we do the following test, we got oops in ipmi_msghandler driver
      while((1))
      do
      	service ipmievd restart & service ipmievd restart
      done
      
      ---------------------------------------------------------------
      [  294.230186] Unable to handle kernel paging request at virtual address 0000803fea6ea008
      [  294.230188] Mem abort info:
      [  294.230190]   ESR = 0x96000004
      [  294.230191]   Exception class = DABT (current EL), IL = 32 bits
      [  294.230193]   SET = 0, FnV = 0
      [  294.230194]   EA = 0, S1PTW = 0
      [  294.230195] Data abort info:
      [  294.230196]   ISV = 0, ISS = 0x00000004
      [  294.230197]   CM = 0, WnR = 0
      [  294.230199] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000a1c1b75a
      [  294.230201] [0000803fea6ea008] pgd=0000000000000000
      [  294.230204] Internal error: Oops: 96000004 [#1] SMP
      [  294.235211] Modules linked in: nls_utf8 isofs rpcrdma ib_iser ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_umad rdma_cm ib_cm iw_cm dm_mirror dm_region_hash dm_log dm_mod aes_ce_blk crypto_simd cryptd aes_ce_cipher ghash_ce sha2_ce ses sha256_arm64 sha1_ce hibmc_drm hisi_sas_v2_hw enclosure sg hisi_sas_main sbsa_gwdt ip_tables mlx5_ib ib_uverbs marvell ib_core mlx5_core ixgbe ipmi_si mdio hns_dsaf ipmi_devintf ipmi_msghandler hns_enet_drv hns_mdio
      [  294.277745] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted 5.0.0-rc2+ #113
      [  294.285511] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.37 11/21/2017
      [  294.292835] pstate: 80000005 (Nzcv daif -PAN -UAO)
      [  294.297695] pc : __srcu_read_lock+0x38/0x58
      [  294.301940] lr : acquire_ipmi_user+0x2c/0x70 [ipmi_msghandler]
      [  294.307853] sp : ffff00001001bc80
      [  294.311208] x29: ffff00001001bc80 x28: ffff0000117e5000
      [  294.316594] x27: 0000000000000000 x26: dead000000000100
      [  294.321980] x25: dead000000000200 x24: ffff803f6bd06800
      [  294.327366] x23: 0000000000000000 x22: 0000000000000000
      [  294.332752] x21: ffff00001001bd04 x20: ffff80df33d19018
      [  294.338137] x19: ffff80df33d19018 x18: 0000000000000000
      [  294.343523] x17: 0000000000000000 x16: 0000000000000000
      [  294.348908] x15: 0000000000000000 x14: 0000000000000002
      [  294.354293] x13: 0000000000000000 x12: 0000000000000000
      [  294.359679] x11: 0000000000000000 x10: 0000000000100000
      [  294.365065] x9 : 0000000000000000 x8 : 0000000000000004
      [  294.370451] x7 : 0000000000000000 x6 : ffff80df34558678
      [  294.375836] x5 : 000000000000000c x4 : 0000000000000000
      [  294.381221] x3 : 0000000000000001 x2 : 0000803fea6ea000
      [  294.386607] x1 : 0000803fea6ea008 x0 : 0000000000000001
      [  294.391994] Process swapper/3 (pid: 0, stack limit = 0x0000000083087293)
      [  294.398791] Call trace:
      [  294.401266]  __srcu_read_lock+0x38/0x58
      [  294.405154]  acquire_ipmi_user+0x2c/0x70 [ipmi_msghandler]
      [  294.410716]  deliver_response+0x80/0xf8 [ipmi_msghandler]
      [  294.416189]  deliver_local_response+0x28/0x68 [ipmi_msghandler]
      [  294.422193]  handle_one_recv_msg+0x158/0xcf8 [ipmi_msghandler]
      [  294.432050]  handle_new_recv_msgs+0xc0/0x210 [ipmi_msghandler]
      [  294.441984]  smi_recv_tasklet+0x8c/0x158 [ipmi_msghandler]
      [  294.451618]  tasklet_action_common.isra.5+0x88/0x138
      [  294.460661]  tasklet_action+0x2c/0x38
      [  294.468191]  __do_softirq+0x120/0x2f8
      [  294.475561]  irq_exit+0x134/0x140
      [  294.482445]  __handle_domain_irq+0x6c/0xc0
      [  294.489954]  gic_handle_irq+0xb8/0x178
      [  294.497037]  el1_irq+0xb0/0x140
      [  294.503381]  arch_cpu_idle+0x34/0x1a8
      [  294.510096]  do_idle+0x1d4/0x290
      [  294.516322]  cpu_startup_entry+0x28/0x30
      [  294.523230]  secondary_start_kernel+0x184/0x1d0
      [  294.530657] Code: d538d082 d2800023 8b010c81 8b020021 (c85f7c25)
      [  294.539746] ---[ end trace 8a7a880dee570b29 ]---
      [  294.547341] Kernel panic - not syncing: Fatal exception in interrupt
      [  294.556837] SMP: stopping secondary CPUs
      [  294.563996] Kernel Offset: disabled
      [  294.570515] CPU features: 0x002,21006008
      [  294.577638] Memory Limit: none
      [  294.587178] Starting crashdump kernel...
      [  294.594314] Bye!
      
      Because the user->release_barrier.rda is freed in ipmi_destroy_user(), but
      the refcount is not zero, when acquire_ipmi_user() uses user->release_barrier.rda
      in __srcu_read_lock(), it causes oops.
      Fix this by calling cleanup_srcu_struct() when the refcount is zero.
      
      Fixes: e86ee2d4 ("ipmi: Rework locking and shutdown for hot remove")
      Cc: stable@vger.kernel.org # 4.18
      Signed-off-by: default avatarYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1c393ca1
    • Johan Hedberg's avatar
      Bluetooth: Fix unnecessary error message for HCI request completion · 7557895b
      Johan Hedberg authored
      commit 1629db9c upstream.
      
      In case a command which completes in Command Status was sent using the
      hci_cmd_send-family of APIs there would be a misleading error in the
      hci_get_cmd_complete function, since the code would be trying to fetch
      the Command Complete parameters when there are none.
      
      Avoid the misleading error and silently bail out from the function in
      case the received event is a command status.
      Signed-off-by: default avatarJohan Hedberg <johan.hedberg@intel.com>
      Acked-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Tested-by Adam Ford <aford173@gmail.com> #4.19.16
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7557895b
    • Avraham Stern's avatar
      iwlwifi: mvm: Send LQ command as async when necessary · d9bcbcb7
      Avraham Stern authored
      commit 3baf7528 upstream.
      
      The parameter that indicated whether the LQ command should be sent
      as sync or async was removed, causing the LQ command to be sent as
      sync from interrupt context (e.g. from the RX path). This resulted
      in a kernel warning: "scheduling while atomic" and failing to send
      the LQ command, which ultimately leads to a queue hang.
      
      Fix it by adding back the required parameter to send the command as
      sync only when it is allowed.
      
      Fixes: d94c5a82 ("iwlwifi: mvm: open BA session only when sta is authorized")
      Signed-off-by: default avatarAvraham Stern <avraham.stern@intel.com>
      Signed-off-by: default avatarLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: default avatarEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9bcbcb7
    • Michal Hocko's avatar
      mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps · 0d73e773
      Michal Hocko authored
      [ Upstream commit 7550c607 ]
      
      Patch series "THP eligibility reporting via proc".
      
      This series of three patches aims at making THP eligibility reporting much
      more robust and long term sustainable.  The trigger for the change is a
      regression report [2] and the long follow up discussion.  In short the
      specific application didn't have good API to query whether a particular
      mapping can be backed by THP so it has used VMA flags to workaround that.
      These flags represent a deep internal state of VMAs and as such they
      should be used by userspace with a great deal of caution.
      
      A similar has happened for [3] when users complained that VM_MIXEDMAP is
      no longer set on DAX mappings.  Again a lack of a proper API led to an
      abuse.
      
      The first patch in the series tries to emphasise that that the semantic of
      flags might change and any application consuming those should be really
      careful.
      
      The remaining two patches provide a more suitable interface to address [2]
      and provide a consistent API to query the THP status both for each VMA and
      process wide as well.  [1]
      
      http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2]
      http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
      [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
      
      This patch (of 3):
      
      Even though vma flags exported via /proc/<pid>/smaps are explicitly
      documented to be not guaranteed for future compatibility the warning
      doesn't go far enough because it doesn't mention semantic changes to those
      flags.  And they are important as well because these flags are a deep
      implementation internal to the MM code and the semantic might change at
      any time.
      
      Let's consider two recent examples:
      http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz
      : commit e1fb4a08 "dax: remove VM_MIXEDMAP for fsdax and device dax" has
      : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the
      : mean time certain customer of ours started poking into /proc/<pid>/smaps
      : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA
      : flags, the application just fails to start complaining that DAX support is
      : missing in the kernel.
      
      http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp.google.com
      : Commit 18600332 ("mm: make PR_SET_THP_DISABLE immediately active")
      : introduced a regression in that userspace cannot always determine the set
      : of vmas where thp is ineligible.
      : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps
      : to determine if a vma is eligible to be backed by hugepages.
      : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to
      : be disabled and emit "nh" as a flag for the corresponding vmas as part of
      : /proc/pid/smaps.  After the commit, thp is disabled by means of an mm
      : flag and "nh" is not emitted.
      : This causes smaps parsing libraries to assume a vma is eligible for thp
      : and ends up puzzling the user on why its memory is not backed by thp.
      
      In both cases userspace was relying on a semantic of a specific VMA flag.
      The primary reason why that happened is a lack of a proper interface.
      While this has been worked on and it will be fixed properly, it seems that
      our wording could see some refinement and be more vocal about semantic
      aspect of these flags as well.
      
      Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Oppenheimer <bepvte@gmail.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0d73e773
    • Peter Xu's avatar
      userfaultfd: clear flag if remap event not enabled · 2011eb74
      Peter Xu authored
      [ Upstream commit 3cfd22be ]
      
      When the process being tracked does mremap() without
      UFFD_FEATURE_EVENT_REMAP on the corresponding tracking uffd file handle,
      we should not generate the remap event, and at the same time we should
      clear all the uffd flags on the new VMA.  Without this patch, we can still
      have the VM_UFFD_MISSING|VM_UFFD_WP flags on the new VMA even the fault
      handling process does not even know the existance of the VMA.
      
      Link: http://lkml.kernel.org/r/20181211053409.20317-1-peterx@redhat.comSigned-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: default avatarMike Rapoport <rppt@linux.vnet.ibm.com>
      Reviewed-by: default avatarWilliam Kucharski <william.kucharski@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Pravin Shedge <pravin.shedge4linux@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2011eb74
    • Aaron Lu's avatar
      mm/swap: use nr_node_ids for avail_lists in swap_info_struct · b0cd52e6
      Aaron Lu authored
      [ Upstream commit 66f71da9 ]
      
      Since a2468cc9 ("swap: choose swap device according to numa node"),
      avail_lists field of swap_info_struct is changed to an array with
      MAX_NUMNODES elements.  This made swap_info_struct size increased to 40KiB
      and needs an order-4 page to hold it.
      
      This is not optimal in that:
      1 Most systems have way less than MAX_NUMNODES(1024) nodes so it
        is a waste of memory;
      2 It could cause swapon failure if the swap device is swapped on
        after system has been running for a while, due to no order-4
        page is available as pointed out by Vasily Averin.
      
      Solve the above two issues by using nr_node_ids(which is the actual
      possible node number the running system has) for avail_lists instead of
      MAX_NUMNODES.
      
      nr_node_ids is unknown at compile time so can't be directly used when
      declaring this array.  What I did here is to declare avail_lists as zero
      element array and allocate space for it when allocating space for
      swap_info_struct.  The reason why keep using array but not pointer is
      plist_for_each_entry needs the field to be part of the struct, so pointer
      will not work.
      
      This patch is on top of Vasily Averin's fix commit.  I think the use of
      kvzalloc for swap_info_struct is still needed in case nr_node_ids is
      really big on some systems.
      
      Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.comSigned-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vasily Averin <vvs@virtuozzo.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b0cd52e6
    • Brian Foster's avatar
      mm/page-writeback.c: don't break integrity writeback on ->writepage() error · dc15e3fd
      Brian Foster authored
      [ Upstream commit 3fa750dc ]
      
      write_cache_pages() is used in both background and integrity writeback
      scenarios by various filesystems.  Background writeback is mostly
      concerned with cleaning a certain number of dirty pages based on various
      mm heuristics.  It may not write the full set of dirty pages or wait for
      I/O to complete.  Integrity writeback is responsible for persisting a set
      of dirty pages before the writeback job completes.  For example, an
      fsync() call must perform integrity writeback to ensure data is on disk
      before the call returns.
      
      write_cache_pages() unconditionally breaks out of its processing loop in
      the event of a ->writepage() error.  This is fine for background
      writeback, which had no strict requirements and will eventually come
      around again.  This can cause problems for integrity writeback on
      filesystems that might need to clean up state associated with failed page
      writeouts.  For example, XFS performs internal delayed allocation
      accounting before returning a ->writepage() error, where applicable.  If
      the current writeback happens to be associated with an unmount and
      write_cache_pages() completes the writeback prematurely due to error, the
      filesystem is unmounted in an inconsistent state if dirty+delalloc pages
      still exist.
      
      To handle this problem, update write_cache_pages() to always process the
      full set of pages for integrity writeback regardless of ->writepage()
      errors.  Save the first encountered error and return it to the caller once
      complete.  This facilitates XFS (or any other fs that expects integrity
      writeback to process the entire set of dirty pages) to clean up its
      internal state completely in the event of persistent mapping errors.
      Background writeback continues to exit on the first error encountered.
      
      [akpm@linux-foundation.org: fix typo in comment]
      Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.comSigned-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dc15e3fd
    • Junxiao Bi's avatar
      ocfs2: fix panic due to unrecovered local alloc · 5a404f39
      Junxiao Bi authored
      [ Upstream commit 532e1e54 ]
      
      mount.ocfs2 ignore the inconsistent error that journal is clean but
      local alloc is unrecovered.  After mount, local alloc not empty, then
      reserver cluster didn't alloc a new local alloc window, reserveration
      map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the
      following panic.
      
      This issue was reported at
      
        https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html
      
      and was advised to fixed during mount.  But this is a very unusual
      inconsistent state, usually journal dirty flag should be cleared at the
      last stage of umount until every other things go right.  We may need do
      further debug to check that.  Any way to avoid possible futher
      corruption, mount should be abort and fsck should be run.
      
        (mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered!
        found = 6518, set = 6518, taken = 8192, off = 15912372
        ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode.
        o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes
        ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode.
        o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device
        o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777
        o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
        o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes
        ------------[ cut here ]------------
        kernel BUG at fs/ocfs2/reservations.c:507!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod
        CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2
        Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018
        task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000
        RIP: 0010:[<ffffffffa05e96a8>]  [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
        Call Trace:
          ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2]
          ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2]
          __ocfs2_claim_clusters+0x178/0x360 [ocfs2]
          ocfs2_claim_clusters+0x1f/0x30 [ocfs2]
          ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2]
          ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2]
          ocfs2_write_begin+0x13e/0x230 [ocfs2]
          generic_perform_write+0xbf/0x1c0
          __generic_file_write_iter+0x19c/0x1d0
          ocfs2_file_write_iter+0x589/0x1360 [ocfs2]
          __vfs_write+0xb8/0x110
          vfs_write+0xa9/0x1b0
          SyS_write+0x46/0xb0
          system_call_fastpath+0x18/0xd7
        Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85
        RIP   __ocfs2_resv_find_window+0x498/0x760 [ocfs2]
         RSP <ffff8800ea4db668>
        ---[ end trace 566f07529f2edf3c ]---
        Kernel panic - not syncing: Fatal exception
        Kernel Offset: disabled
      
      Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Acked-by: default avatarJoseph Qi <jiangqi903@gmail.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5a404f39
    • Eric Sandeen's avatar
      iomap: don't search past page end in iomap_is_partially_uptodate · c9dcb871
      Eric Sandeen authored
      [ Upstream commit 3cc31fa6 ]
      
      iomap_is_partially_uptodate() is intended to check wither blocks within
      the selected range of a not-uptodate page are uptodate; if the range we
      care about is up to date, it's an optimization.
      
      However, the iomap implementation continues to check all blocks up to
      from+count, which is beyond the page, and can even be well beyond the
      iop->uptodate bitmap.
      
      I think the worst that will happen is that we may eventually find a zero
      bit and return "not partially uptodate" when it would have otherwise
      returned true, and skip the optimization.  Still, it's clearly an invalid
      memory access that must be fixed.
      
      So: fix this by limiting the search to within the page as is done in the
      non-iomap variant, block_is_partially_uptodate().
      
      Zorro noticed thiswhen KASAN went off for 512 byte blocks on a 64k
      page system:
      
       BUG: KASAN: slab-out-of-bounds in iomap_is_partially_uptodate+0x1a0/0x1e0
       Read of size 8 at addr ffff800120c3a318 by task fsstress/22337
      Reported-by: default avatarZorro Lang <zlang@redhat.com>
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatarEric Sandeen <sandeen@sandeen.net>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c9dcb871
    • Qian Cai's avatar
      scsi: megaraid: fix out-of-bound array accesses · 00886ceb
      Qian Cai authored
      [ Upstream commit c7a082e4 ]
      
      UBSAN reported those with MegaRAID SAS-3 3108,
      
      [   77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
      [   77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
      [   77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1
      [   77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
      [   77.495791] Workqueue: events work_for_cpu_fn
      [   77.500154] Call trace:
      [   77.502610]  dump_backtrace+0x0/0x2c8
      [   77.506279]  show_stack+0x24/0x30
      [   77.509604]  dump_stack+0x118/0x19c
      [   77.513098]  ubsan_epilogue+0x14/0x60
      [   77.516765]  __ubsan_handle_out_of_bounds+0xfc/0x13c
      [   77.521767]  mr_update_load_balance_params+0x150/0x158 [megaraid_sas]
      [   77.528230]  MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas]
      [   77.533825]  megasas_get_map_info+0x244/0x2f0 [megaraid_sas]
      [   77.539505]  megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas]
      [   77.545794]  megasas_init_fw+0x1ab4/0x3518 [megaraid_sas]
      [   77.551212]  megasas_probe_one+0x2c4/0xbe0 [megaraid_sas]
      [   77.556614]  local_pci_probe+0x7c/0xf0
      [   77.560365]  work_for_cpu_fn+0x34/0x50
      [   77.564118]  process_one_work+0x61c/0xf08
      [   77.568129]  worker_thread+0x534/0xa70
      [   77.571882]  kthread+0x1c8/0x1d0
      [   77.575114]  ret_from_fork+0x10/0x1c
      
      [   89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32
      [   89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]'
      [   89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1
      [   89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018
      [   89.268903] Workqueue: events_unbound async_run_entry_fn
      [   89.274222] Call trace:
      [   89.276680]  dump_backtrace+0x0/0x2c8
      [   89.280348]  show_stack+0x24/0x30
      [   89.283671]  dump_stack+0x118/0x19c
      [   89.287167]  ubsan_epilogue+0x14/0x60
      [   89.290835]  __ubsan_handle_out_of_bounds+0xfc/0x13c
      [   89.295828]  MR_LdRaidGet+0x50/0x58 [megaraid_sas]
      [   89.300638]  megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas]
      [   89.306576]  megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas]
      [   89.313468]  megasas_queue_command+0x398/0x3d0 [megaraid_sas]
      [   89.319222]  scsi_dispatch_cmd+0x1dc/0x8a8
      [   89.323321]  scsi_request_fn+0x8e8/0xdd0
      [   89.327249]  __blk_run_queue+0xc4/0x158
      [   89.331090]  blk_execute_rq_nowait+0xf4/0x158
      [   89.335449]  blk_execute_rq+0xdc/0x158
      [   89.339202]  __scsi_execute+0x130/0x258
      [   89.343041]  scsi_probe_and_add_lun+0x2fc/0x1488
      [   89.347661]  __scsi_scan_target+0x1cc/0x8c8
      [   89.351848]  scsi_scan_channel.part.3+0x8c/0xc0
      [   89.356382]  scsi_scan_host_selected+0x130/0x1f0
      [   89.361002]  do_scsi_scan_host+0xd8/0xf0
      [   89.364927]  do_scan_async+0x9c/0x320
      [   89.368594]  async_run_entry_fn+0x138/0x420
      [   89.372780]  process_one_work+0x61c/0xf08
      [   89.376793]  worker_thread+0x13c/0xa70
      [   89.380546]  kthread+0x1c8/0x1d0
      [   89.383778]  ret_from_fork+0x10/0x1c
      
      This is because when populating Driver Map using firmware raid map, all
      non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later.
      
      From drivers/scsi/megaraid/megaraid_sas_base.c ,
      memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);
      
      From drivers/scsi/megaraid/megaraid_sas_fp.c ,
      /* For non existing VDs, iterate to next VD*/
      if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1))
      	continue;
      
      However, there are a few places that failed to skip those non-existing VDs
      due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff,
      map) and triggered the out-of-bound accesses.
      
      Fixes: 51087a86 ("megaraid_sas : Extended VD support")
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Acked-by: default avatarSumit Saxena <sumit.saxena@broadcom.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      00886ceb
    • Yanjiang Jin's avatar
      scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown() · d640fb10
      Yanjiang Jin authored
      [ Upstream commit e57b2945 ]
      
      We must free all irqs during shutdown, else kexec's 2nd kernel would hang
      in pqi_wait_for_completion_io() as below:
      
      Call trace:
      
       pqi_wait_for_completion_io
       pqi_submit_raid_request_synchronous.constprop.78+0x23c/0x310 [smartpqi]
       pqi_configure_events+0xec/0x1f8 [smartpqi]
       pqi_ctrl_init+0x814/0xca0 [smartpqi]
       pqi_pci_probe+0x400/0x46c [smartpqi]
       local_pci_probe+0x48/0xb0
       pci_device_probe+0x14c/0x1b0
       really_probe+0x218/0x3fc
       driver_probe_device+0x70/0x140
       __driver_attach+0x11c/0x134
       bus_for_each_dev+0x70/0xc8
       driver_attach+0x30/0x38
       bus_add_driver+0x1f0/0x294
       driver_register+0x74/0x12c
       __pci_register_driver+0x64/0x70
       pqi_init+0xd0/0x10000 [smartpqi]
       do_one_initcall+0x60/0x1d8
       do_init_module+0x64/0x1f8
       load_module+0x10ec/0x1350
       __se_sys_finit_module+0xd4/0x100
       __arm64_sys_finit_module+0x28/0x34
       el0_svc_handler+0x104/0x160
       el0_svc+0x8/0xc
      
      This happens only in the following combinations:
      
      1. smartpqi is built as module, not built-in;
      2. We have a disk connected to smartpqi card;
      3. Both kexec's 1st and 2nd kernels use this disk as Rootfs' mount point.
      Signed-off-by: default avatarYanjiang Jin <yanjiang.jin@hxt-semitech.com>
      Acked-by: default avatarDon Brace <don.brace@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d640fb10
    • Zhi Chen's avatar
      ath10k: fix peer stats null pointer dereference · dd619b90
      Zhi Chen authored
      [ Upstream commit 2d3b5585 ]
      
      There was a race condition in SMP that an ath10k_peer was created but its
      member sta was null. Following are procedures of ath10k_peer creation and
      member sta access in peer statistics path.
      
          1. Peer creation:
              ath10k_peer_create()
                  =>ath10k_wmi_peer_create()
                      =>ath10k_wait_for_peer_created()
                      ...
      
              # another kernel path, RX from firmware
              ath10k_htt_t2h_msg_handler()
              =>ath10k_peer_map_event()
                      =>wake_up()
                      # ar->peer_map[id] = peer //add peer to map
      
              #wake up original path from waiting
                      ...
                      # peer->sta = sta //sta assignment
      
          2.  RX path of statistics
              ath10k_htt_t2h_msg_handler()
                  =>ath10k_update_per_peer_tx_stats()
                      =>ath10k_htt_fetch_peer_stats()
                      # peer->sta //sta accessing
      
      Any access of peer->sta after peer was added to peer_map but before sta was
      assigned could cause a null pointer issue. And because these two steps are
      asynchronous, no proper lock can protect them. So both peer and sta need to
      be checked before access.
      
      Tested: QCA9984 with firmware ver 10.4-3.9.0.1-00005
      Signed-off-by: default avatarZhi Chen <zhichen@codeaurora.org>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      dd619b90
    • Kevin Barnett's avatar
      scsi: smartpqi: correct lun reset issues · ca8ad9bc
      Kevin Barnett authored
      [ Upstream commit 2ba55c98 ]
      
      Problem:
      The Linux kernel takes a logical volume offline after a LUN reset.  This is
      generally accompanied by this message in the dmesg output:
      
      Device offlined - not ready after error recovery
      
      Root Cause:
      The root cause is a "quirk" in the timeout handling in the Linux SCSI
      layer. The Linux kernel places a 30-second timeout on most media access
      commands (reads and writes) that it send to device drivers.  When a media
      access command times out, the Linux kernel goes into error recovery mode
      for the LUN that was the target of the command that timed out. Every
      command that timed out is kept on a list inside of the Linux kernel to be
      retried later. The kernel attempts to recover the command(s) that timed out
      by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and
      TEST UNIT READY commands are successful, the kernel retries the command(s)
      that timed out.
      
      Each SCSI command issued by the kernel has a result field associated with
      it. This field indicates the final result of the command (success or
      error). When a command times out, the kernel places a value in this result
      field indicating that the command timed out.
      
      The "quirk" is that after the LUN reset and TEST UNIT READY commands are
      completed, the kernel checks each command on the timed-out command list
      before retrying it. If the result field is still "timed out", the kernel
      treats that command as not having been successfully recovered for a
      retry. If the number of commands that are in this state are greater than
      two, the kernel takes the LUN offline.
      
      Fix:
      When our RAIDStack receives a LUN reset, it simply waits until all
      outstanding commands complete. Generally, all of these outstanding commands
      complete successfully. Therefore, the fix in the smartpqi driver is to
      always set the command result field to indicate success when a request
      completes successfully. This normally isn’t necessary because the result
      field is always initialized to success when the command is submitted to the
      driver. So when the command completes successfully, the result field is
      left untouched. But in this case, the kernel changes the result field
      behind the driver’s back and then expects the field to be changed by the
      driver as the commands that timed-out complete.
      Reviewed-by: default avatarDave Carroll <david.carroll@microsemi.com>
      Reviewed-by: default avatarScott Teel <scott.teel@microsemi.com>
      Signed-off-by: default avatarKevin Barnett <kevin.barnett@microsemi.com>
      Signed-off-by: default avatarDon Brace <don.brace@microsemi.com>
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ca8ad9bc