1. 22 Jun, 2019 30 commits
    • Willem de Bruijn's avatar
      net: correct udp zerocopy refcnt also when zerocopy only on append · 0ab34cf2
      Willem de Bruijn authored
      [ Upstream commit 522924b5 ]
      
      The below patch fixes an incorrect zerocopy refcnt increment when
      appending with MSG_MORE to an existing zerocopy udp skb.
      
        send(.., MSG_ZEROCOPY | MSG_MORE);	// refcnt 1
        send(.., MSG_ZEROCOPY | MSG_MORE);	// refcnt still 1 (bar frags)
      
      But it missed that zerocopy need not be passed at the first send. The
      right test whether the uarg is newly allocated and thus has extra
      refcnt 1 is not !skb, but !skb_zcopy.
      
        send(.., MSG_MORE);			// <no uarg>
        send(.., MSG_ZEROCOPY);		// refcnt 1
      
      Fixes: 100f6d8e ("net: correct zerocopy refcnt with udp MSG_MORE")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ab34cf2
    • Eli Britstein's avatar
      net/mlx5e: Support tagged tunnel over bond · 284bc559
      Eli Britstein authored
      Stacked devices like bond interface may have a VLAN device on top of
      them. Detect lag state correctly under this condition, and return the
      correct routed net device, according to it the encap header is built.
      
      Fixes: e32ee6c7 ("net/mlx5e: Support tunnel encap over tagged Ethernet")
      Signed-off-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      284bc559
    • Petr Machata's avatar
      mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2 · 1bb4eb4e
      Petr Machata authored
      Due to an issue on Spectrum-2, in front-panel ports split four ways, 2 out
      of 32 port buffers cannot be used. To work around this, the next FW release
      will mark them as unused, and will report correspondingly lower total
      shared buffer size. mlxsw will pick up the new value through a query to
      cap_total_buffer_size resource. However the initial size for shared buffer
      pool 0 is hard-coded and therefore needs to be updated.
      
      Thus reduce the pool size by 2.7 MiB (which corresponds to 2/32 of the
      total size of 42 MiB), and round down to the whole number of cells.
      
      Fixes: fe099bf6 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bb4eb4e
    • Raed Salem's avatar
      net/mlx5e: Fix source port matching in fdb peer flow rule · b43417db
      Raed Salem authored
      The cited commit changed the initialization placement of the eswitch
      attributes so it is done prior to parse tc actions function call,
      including among others the in_rep and in_mdev fields which are mistakenly
      reassigned inside the parse actions function.
      
      This breaks the source port matching criteria of the peer redirect rule.
      
      Fix by removing the now redundant reassignment of the already initialized
      fields.
      
      Fixes: 988ab9c7 ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper")
      Signed-off-by: default avatarRaed Salem <raeds@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b43417db
    • Jiri Pirko's avatar
      mlxsw: spectrum_flower: Fix TOS matching · acec2a39
      Jiri Pirko authored
      The TOS value was not extracted correctly. Fix it.
      
      Fixes: 87996f91 ("mlxsw: spectrum_flower: Add support for ip tos")
      Reported-by: default avatarAlexander Petrovskiy <alexpe@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      acec2a39
    • Chris Mi's avatar
      net/mlx5e: Add ndo_set_feature for uplink representor · dd3690f2
      Chris Mi authored
      After we have a dedicated uplink representor, the new netdev ops
      doesn't support ndo_set_feature. Because of that, we can't change
      some features, eg. rxvlan. Now add it back.
      
      In this patch, I also do a cleanup for the features flag handling,
      eg. remove duplicate NETIF_F_HW_TC flag setting.
      
      Fixes: aec002f6 ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode")
      Signed-off-by: default avatarChris Mi <chrism@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd3690f2
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes dead · 01bf40c9
      Ido Schimmel authored
      The driver tries to periodically refresh neighbours that are used to
      reach nexthops. This is done by periodically calling neigh_event_send().
      
      However, if the neighbour becomes dead, there is nothing we can do to
      return it to a connected state and the above function call is basically
      a NOP.
      
      This results in the nexthop never being written to the device's
      adjacency table and therefore never used to forward packets.
      
      Fix this by dropping our reference from the dead neighbour and
      associating the nexthop with a new neigbhour which we will try to
      refresh.
      
      Fixes: a7ff87ac ("mlxsw: spectrum_router: Implement next-hop routing")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reported-by: default avatarAlex Veber <alexve@mellanox.com>
      Tested-by: default avatarAlex Veber <alexve@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      01bf40c9
    • Edward Srouji's avatar
      net/mlx5: Update pci error handler entries and command translation · 37afcc01
      Edward Srouji authored
      Add missing entries for create/destroy UCTX and UMEM commands.
      This could get us wrong "unknown FW command" error in flows
      where we unbind the device or reset the driver.
      
      Also the translation of these commands from opcodes to string
      was missing.
      
      Fixes: 6e3722ba ("IB/mlx5: Use the correct commands for UMEM and UCTX allocation")
      Signed-off-by: default avatarEdward Srouji <edwards@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      37afcc01
    • Maxime Chevallier's avatar
      net: ethtool: Allow matching on vlan DEI bit · a21291c7
      Maxime Chevallier authored
      [ Upstream commit f0d2ca15 ]
      
      Using ethtool, users can specify a classification action matching on the
      full vlan tag, which includes the DEI bit (also previously called CFI).
      
      However, when converting the ethool_flow_spec to a flow_rule, we use
      dissector keys to represent the matching patterns.
      
      Since the vlan dissector key doesn't include the DEI bit, this
      information was silently discarded when translating the ethtool
      flow spec in to a flow_rule.
      
      This commit adds the DEI bit into the vlan dissector key, and allows
      propagating the information to the driver when parsing the ethtool flow
      spec.
      
      Fixes: eca4205f ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
      Reported-by: default avatarMichał Mirosław <mirq-linux@rere.qmqm.pl>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a21291c7
    • Robert Hancock's avatar
      net: dsa: microchip: Don't try to read stats for unused ports · 02b6efb8
      Robert Hancock authored
      [ Upstream commit 6bb9e376 ]
      
      If some of the switch ports were not listed in the device tree, due to
      being unused, the ksz_mib_read_work function ended up accessing a NULL
      dp->slave pointer and causing an oops. Skip checking statistics for any
      unused ports.
      
      Fixes: 7c6ff470 ("net: dsa: microchip: add MIB counter reading support")
      Signed-off-by: default avatarRobert Hancock <hancock@sedsystems.ca>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      02b6efb8
    • Maxime Chevallier's avatar
      net: mvpp2: prs: Use the correct helpers when removing all VID filters · f61bd3c7
      Maxime Chevallier authored
      [ Upstream commit 6b7a3430 ]
      
      When removing all VID filters, the mvpp2_prs_vid_entry_remove would be
      called with the TCAM id incorrectly used as a VID, causing the wrong
      TCAM entries to be invalidated.
      
      Fix this by directly invalidating entries in the VID range.
      
      Fixes: 56beda3d ("net: mvpp2: Add hardware offloading for VLAN filtering")
      Suggested-by: default avatarYuri Chipchev <yuric@marvell.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f61bd3c7
    • Maxime Chevallier's avatar
      net: mvpp2: prs: Fix parser range for VID filtering · 60c653e2
      Maxime Chevallier authored
      [ Upstream commit 46b0090a ]
      
      VID filtering is implemented in the Header Parser, with one range of 11
      vids being assigned for each no-loopback port.
      
      Make sure we use the per-port range when looking for existing entries in
      the Parser.
      
      Since we used a global range instead of a per-port one, this causes VIDs
      to be removed from the whitelist from all ports of the same PPv2
      instance.
      
      Fixes: 56beda3d ("net: mvpp2: Add hardware offloading for VLAN filtering")
      Suggested-by: default avatarYuri Chipchev <yuric@marvell.com>
      Signed-off-by: default avatarMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60c653e2
    • Stefano Brivio's avatar
      geneve: Don't assume linear buffers in error handler · af1c05e6
      Stefano Brivio authored
      [ Upstream commit eccc73a6 ]
      
      In commit a0796644 ("geneve: ICMP error lookup handler") I wrongly
      assumed buffers from icmp_socket_deliver() would be linear. This is not
      the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear
      data.
      
      Eric fixed this same issue for fou and fou6 in commits 26fc181e
      ("fou, fou6: do not assume linear skbs") and 5355ed63 ("fou, fou6:
      avoid uninit-value in gue_err() and gue6_err()").
      
      Use pskb_may_pull() instead of checking skb->len, and take into account
      the fact we later access the GENEVE header with udp_hdr(), so we also
      need to sum skb_transport_header() here.
      Reported-by: default avatarGuillaume Nault <gnault@redhat.com>
      Fixes: a0796644 ("geneve: ICMP error lookup handler")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af1c05e6
    • Stefano Brivio's avatar
      vxlan: Don't assume linear buffers in error handler · 0cb7d324
      Stefano Brivio authored
      [ Upstream commit 8399a693 ]
      
      In commit c3a43b9f ("vxlan: ICMP error lookup handler") I wrongly
      assumed buffers from icmp_socket_deliver() would be linear. This is not
      the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear
      data.
      
      Eric fixed this same issue for fou and fou6 in commits 26fc181e
      ("fou, fou6: do not assume linear skbs") and 5355ed63 ("fou, fou6:
      avoid uninit-value in gue_err() and gue6_err()").
      
      Use pskb_may_pull() instead of checking skb->len, and take into account
      the fact we later access the VXLAN header with udp_hdr(), so we also
      need to sum skb_transport_header() here.
      Reported-by: default avatarGuillaume Nault <gnault@redhat.com>
      Fixes: c3a43b9f ("vxlan: ICMP error lookup handler")
      Signed-off-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0cb7d324
    • Alaa Hleihel's avatar
      net/mlx5: Avoid reloading already removed devices · 251c58f8
      Alaa Hleihel authored
      Prior to reloading a device we must first verify that it was not already
      removed. Otherwise, the attempt to remove the device will do nothing, and
      in that case we will end up proceeding with adding an new device that no
      one was expecting to remove, leaving behind used resources such as EQs that
      causes a failure to destroy comp EQs and syndrome (0x30f433).
      
      Fix that by making sure that we try to remove and add a device (based on a
      protocol) only if the device is already added.
      
      Fixes: c5447c70 ("net/mlx5: E-Switch, Reload IB interface when switching devlink modes")
      Signed-off-by: default avatarAlaa Hleihel <alaa@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      251c58f8
    • Stephen Barber's avatar
      vsock/virtio: set SOCK_DONE on peer shutdown · 8882d7e9
      Stephen Barber authored
      [ Upstream commit 42f5cda5 ]
      
      Set the SOCK_DONE flag to match the TCP_CLOSING state when a peer has
      shut down and there is nothing left to read.
      
      This fixes the following bug:
      1) Peer sends SHUTDOWN(RDWR).
      2) Socket enters TCP_CLOSING but SOCK_DONE is not set.
      3) read() returns -ENOTCONN until close() is called, then returns 0.
      Signed-off-by: default avatarStephen Barber <smbarber@chromium.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8882d7e9
    • Xin Long's avatar
      tipc: purge deferredq list for each grp member in tipc_group_delete · 72788b85
      Xin Long authored
      [ Upstream commit 5cf02612 ]
      
      Syzbot reported a memleak caused by grp members' deferredq list not
      purged when the grp is be deleted.
      
      The issue occurs when more(msg_grp_bc_seqno(hdr), m->bc_rcv_nxt) in
      tipc_group_filter_msg() and the skb will stay in deferredq.
      
      So fix it by calling __skb_queue_purge for each member's deferredq
      in tipc_group_delete() when a tipc sk leaves the grp.
      
      Fixes: b87a5ea3 ("tipc: guarantee group unicast doesn't bypass group broadcast")
      Reported-by: syzbot+78fbe679c8ca8d264a8d@syzkaller.appspotmail.com
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72788b85
    • John Paul Adrian Glaubitz's avatar
      sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg · aecf98cd
      John Paul Adrian Glaubitz authored
      [ Upstream commit 07a6d63e ]
      
      In d5a2aa24, the name in struct console sunhv_console was changed from "ttyS"
      to "ttyHV" while the name in struct uart_ops sunhv_pops remained unchanged.
      
      This results in the hypervisor console device to be listed as "ttyHV0" under
      /proc/consoles while the device node is still named "ttyS0":
      
      root@osaka:~# cat /proc/consoles
      ttyHV0               -W- (EC p  )    4:64
      tty0                 -WU (E     )    4:1
      root@osaka:~# readlink /sys/dev/char/4:64
      ../../devices/root/f02836f0/f0285690/tty/ttyS0
      root@osaka:~#
      
      This means that any userland code which tries to determine the name of the
      device file of the hypervisor console device can not rely on the information
      provided by /proc/consoles. In particular, booting current versions of debian-
      installer inside a SPARC LDOM will fail with the installer unable to determine
      the console device.
      
      After renaming the device in struct uart_ops sunhv_pops to "ttyHV" as well,
      the inconsistency is fixed and it is possible again to determine the name
      of the device file of the hypervisor console device by reading the contents
      of /proc/console:
      
      root@osaka:~# cat /proc/consoles
      ttyHV0               -W- (EC p  )    4:64
      tty0                 -WU (E     )    4:1
      root@osaka:~# readlink /sys/dev/char/4:64
      ../../devices/root/f02836f0/f0285690/tty/ttyHV0
      root@osaka:~#
      
      With this change, debian-installer works correctly when installing inside
      a SPARC LDOM.
      Signed-off-by: default avatarJohn Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aecf98cd
    • Neil Horman's avatar
      sctp: Free cookie before we memdup a new one · 9cb8eddf
      Neil Horman authored
      [ Upstream commit ce950f10 ]
      
      Based on comments from Xin, even after fixes for our recent syzbot
      report of cookie memory leaks, its possible to get a resend of an INIT
      chunk which would lead to us leaking cookie memory.
      
      To ensure that we don't leak cookie memory, free any previously
      allocated cookie first.
      
      Change notes
      v1->v2
      update subsystem tag in subject (davem)
      repeat kfree check for peer_random and peer_hmacs (xin)
      
      v2->v3
      net->sctp
      also free peer_chunks
      
      v3->v4
      fix subject tags
      
      v4->v5
      remove cut line
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
      CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      CC: Xin Long <lucien.xin@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9cb8eddf
    • Young Xiao's avatar
      nfc: Ensure presence of required attributes in the deactivate_target handler · 4c85012b
      Young Xiao authored
      [ Upstream commit 385097a3 ]
      
      Check that the NFC_ATTR_TARGET_INDEX attributes (in addition to
      NFC_ATTR_DEVICE_INDEX) are provided by the netlink client prior to
      accessing them. This prevents potential unhandled NULL pointer dereference
      exceptions which can be triggered by malicious user-mode programs,
      if they omit one or both of these attributes.
      Signed-off-by: default avatarYoung Xiao <92siuyang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4c85012b
    • John Fastabend's avatar
      net: tls, correctly account for copied bytes with multiple sk_msgs · 20d3d270
      John Fastabend authored
      [ Upstream commit 648ee6ce ]
      
      tls_sw_do_sendpage needs to return the total number of bytes sent
      regardless of how many sk_msgs are allocated. Unfortunately, copied
      (the value we return up the stack) is zero'd before each new sk_msg
      is allocated so we only return the copied size of the last sk_msg used.
      
      The caller (splice, etc.) of sendpage will then believe only part
      of its data was sent and send the missing chunks again. However,
      because the data actually was sent the receiver will get multiple
      copies of the same data.
      
      To reproduce this do multiple sendfile calls with a length close to
      the max record size. This will in turn call splice/sendpage, sendpage
      may use multiple sk_msg in this case and then returns the incorrect
      number of bytes. This will cause splice to resend creating duplicate
      data on the receiver. Andre created a C program that can easily
      generate this case so we will push a similar selftest for this to
      bpf-next shortly.
      
      The fix is to _not_ zero the copied field so that the total sent
      bytes is returned.
      Reported-by: default avatarSteinar H. Gunderson <steinar+kernel@gunderson.no>
      Reported-by: default avatarAndre Tomt <andre@tomt.net>
      Tested-by: default avatarAndre Tomt <andre@tomt.net>
      Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      20d3d270
    • Taehee Yoo's avatar
      net: openvswitch: do not free vport if register_netdevice() is failed. · 39ea88e3
      Taehee Yoo authored
      [ Upstream commit 309b6697 ]
      
      In order to create an internal vport, internal_dev_create() is used and
      that calls register_netdevice() internally.
      If register_netdevice() fails, it calls dev->priv_destructor() to free
      private data of netdev. actually, a private data of this is a vport.
      
      Hence internal_dev_create() should not free and use a vport after failure
      of register_netdevice().
      
      Test command
          ovs-dpctl add-dp bonding_masters
      
      Splat looks like:
      [ 1035.667767] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [ 1035.675958] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 1035.676916] CPU: 1 PID: 1028 Comm: ovs-vswitchd Tainted: G    B             5.2.0-rc3+ #240
      [ 1035.676916] RIP: 0010:internal_dev_create+0x2e5/0x4e0 [openvswitch]
      [ 1035.676916] Code: 48 c1 ea 03 80 3c 02 00 0f 85 9f 01 00 00 4c 8b 23 48 b8 00 00 00 00 00 fc ff df 49 8d bc 24 60 05 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 86 01 00 00 49 8b bc 24 60 05 00 00 e8 e4 68 f4
      [ 1035.713720] RSP: 0018:ffff88810dcb7578 EFLAGS: 00010206
      [ 1035.713720] RAX: dffffc0000000000 RBX: ffff88810d13fe08 RCX: ffffffff84297704
      [ 1035.713720] RDX: 00000000000000ac RSI: 0000000000000000 RDI: 0000000000000560
      [ 1035.713720] RBP: 00000000ffffffef R08: fffffbfff0d3b881 R09: fffffbfff0d3b881
      [ 1035.713720] R10: 0000000000000001 R11: fffffbfff0d3b880 R12: 0000000000000000
      [ 1035.768776] R13: 0000607ee460b900 R14: ffff88810dcb7690 R15: ffff88810dcb7698
      [ 1035.777709] FS:  00007f02095fc980(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000
      [ 1035.777709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1035.777709] CR2: 00007ffdf01d2f28 CR3: 0000000108258000 CR4: 00000000001006e0
      [ 1035.777709] Call Trace:
      [ 1035.777709]  ovs_vport_add+0x267/0x4f0 [openvswitch]
      [ 1035.777709]  new_vport+0x15/0x1e0 [openvswitch]
      [ 1035.777709]  ovs_vport_cmd_new+0x567/0xd10 [openvswitch]
      [ 1035.777709]  ? ovs_dp_cmd_dump+0x490/0x490 [openvswitch]
      [ 1035.777709]  ? __kmalloc+0x131/0x2e0
      [ 1035.777709]  ? genl_family_rcv_msg+0xa54/0x1030
      [ 1035.777709]  genl_family_rcv_msg+0x63a/0x1030
      [ 1035.777709]  ? genl_unregister_family+0x630/0x630
      [ 1035.841681]  ? debug_show_all_locks+0x2d0/0x2d0
      [ ... ]
      
      Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      Signed-off-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39ea88e3
    • Linus Walleij's avatar
      net: dsa: rtl8366: Fix up VLAN filtering · 22da034c
      Linus Walleij authored
      [ Upstream commit 760c80b7 ]
      
      We get this regression when using RTL8366RB as part of a bridge
      with OpenWrt:
      
      WARNING: CPU: 0 PID: 1347 at net/switchdev/switchdev.c:291
      	 switchdev_port_attr_set_now+0x80/0xa4
      lan0: Commit of attribute (id=7) failed.
      (...)
      realtek-smi switch lan0: failed to initialize vlan filtering on this port
      
      This is because it is trying to disable VLAN filtering
      on VLAN0, as we have forgot to add 1 to the port number
      to get the right VLAN in rtl8366_vlan_filtering(): when
      we initialize the VLAN we associate VLAN1 with port 0,
      VLAN2 with port 1 etc, so we need to add 1 to the port
      offset.
      
      Fixes: d8652956 ("net: dsa: realtek-smi: Add Realtek SMI driver")
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      22da034c
    • Eric Dumazet's avatar
      neigh: fix use-after-free read in pneigh_get_next · d97ce9c7
      Eric Dumazet authored
      [ Upstream commit f3e92cb8 ]
      
      Nine years ago, I added RCU handling to neighbours, not pneighbours.
      (pneigh are not commonly used)
      
      Unfortunately I missed that /proc dump operations would use a
      common entry and exit point : neigh_seq_start() and neigh_seq_stop()
      
      We need to read_lock(tbl->lock) or risk use-after-free while
      iterating the pneigh structures.
      
      We might later convert pneigh to RCU and revert this patch.
      
      sysbot reported :
      
      BUG: KASAN: use-after-free in pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158
      Read of size 8 at addr ffff888097f2a700 by task syz-executor.0/9825
      
      CPU: 1 PID: 9825 Comm: syz-executor.0 Not tainted 5.2.0-rc4+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
       pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158
       neigh_seq_next+0xdb/0x210 net/core/neighbour.c:3240
       seq_read+0x9cf/0x1110 fs/seq_file.c:258
       proc_reg_read+0x1fc/0x2c0 fs/proc/inode.c:221
       do_loop_readv_writev fs/read_write.c:714 [inline]
       do_loop_readv_writev fs/read_write.c:701 [inline]
       do_iter_read+0x4a4/0x660 fs/read_write.c:935
       vfs_readv+0xf0/0x160 fs/read_write.c:997
       kernel_readv fs/splice.c:359 [inline]
       default_file_splice_read+0x475/0x890 fs/splice.c:414
       do_splice_to+0x127/0x180 fs/splice.c:877
       splice_direct_to_actor+0x2d2/0x970 fs/splice.c:954
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1063
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4592c9
      Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f4aab51dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00000000004592c9
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
      RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000080000000 R11: 0000000000000246 R12: 00007f4aab51e6d4
      R13: 00000000004c689d R14: 00000000004db828 R15: 00000000ffffffff
      
      Allocated by task 9827:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_kmalloc mm/kasan/common.c:489 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
       kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503
       __do_kmalloc mm/slab.c:3660 [inline]
       __kmalloc+0x15c/0x740 mm/slab.c:3669
       kmalloc include/linux/slab.h:552 [inline]
       pneigh_lookup+0x19c/0x4a0 net/core/neighbour.c:731
       arp_req_set_public net/ipv4/arp.c:1010 [inline]
       arp_req_set+0x613/0x720 net/ipv4/arp.c:1026
       arp_ioctl+0x652/0x7f0 net/ipv4/arp.c:1226
       inet_ioctl+0x2a0/0x340 net/ipv4/af_inet.c:926
       sock_do_ioctl+0xd8/0x2f0 net/socket.c:1043
       sock_ioctl+0x3ed/0x780 net/socket.c:1194
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:509 [inline]
       do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696
       ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
       __do_sys_ioctl fs/ioctl.c:720 [inline]
       __se_sys_ioctl fs/ioctl.c:718 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 9824:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
       __cache_free mm/slab.c:3432 [inline]
       kfree+0xcf/0x220 mm/slab.c:3755
       pneigh_ifdown_and_unlock net/core/neighbour.c:812 [inline]
       __neigh_ifdown+0x236/0x2f0 net/core/neighbour.c:356
       neigh_ifdown+0x20/0x30 net/core/neighbour.c:372
       arp_ifdown+0x1d/0x21 net/ipv4/arp.c:1274
       inetdev_destroy net/ipv4/devinet.c:319 [inline]
       inetdev_event+0xa14/0x11f0 net/ipv4/devinet.c:1544
       notifier_call_chain+0xc2/0x230 kernel/notifier.c:95
       __raw_notifier_call_chain kernel/notifier.c:396 [inline]
       raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:403
       call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1749
       call_netdevice_notifiers_extack net/core/dev.c:1761 [inline]
       call_netdevice_notifiers net/core/dev.c:1775 [inline]
       rollback_registered_many+0x9b9/0xfc0 net/core/dev.c:8178
       rollback_registered+0x109/0x1d0 net/core/dev.c:8220
       unregister_netdevice_queue net/core/dev.c:9267 [inline]
       unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9260
       unregister_netdevice include/linux/netdevice.h:2631 [inline]
       __tun_detach+0xd8a/0x1040 drivers/net/tun.c:724
       tun_detach drivers/net/tun.c:741 [inline]
       tun_chr_close+0xe0/0x180 drivers/net/tun.c:3451
       __fput+0x2ff/0x890 fs/file_table.c:280
       ____fput+0x16/0x20 fs/file_table.c:313
       task_work_run+0x145/0x1c0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:185 [inline]
       exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168
       prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline]
       syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
       do_syscall_64+0x58e/0x680 arch/x86/entry/common.c:304
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff888097f2a700
       which belongs to the cache kmalloc-64 of size 64
      The buggy address is located 0 bytes inside of
       64-byte region [ffff888097f2a700, ffff888097f2a740)
      The buggy address belongs to the page:
      page:ffffea00025fca80 refcount:1 mapcount:0 mapping:ffff8880aa400340 index:0x0
      flags: 0x1fffc0000000200(slab)
      raw: 01fffc0000000200 ffffea000250d548 ffffea00025726c8 ffff8880aa400340
      raw: 0000000000000000 ffff888097f2a000 0000000100000020 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888097f2a600: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
       ffff888097f2a680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      >ffff888097f2a700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                         ^
       ffff888097f2a780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff888097f2a800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      
      Fixes: 767e97e1 ("neigh: RCU conversion of struct neighbour")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d97ce9c7
    • Jeremy Sowden's avatar
      lapb: fixed leak of control-blocks. · da930ec9
      Jeremy Sowden authored
      [ Upstream commit 6be8e297 ]
      
      lapb_register calls lapb_create_cb, which initializes the control-
      block's ref-count to one, and __lapb_insert_cb, which increments it when
      adding the new block to the list of blocks.
      
      lapb_unregister calls __lapb_remove_cb, which decrements the ref-count
      when removing control-block from the list of blocks, and calls lapb_put
      itself to decrement the ref-count before returning.
      
      However, lapb_unregister also calls __lapb_devtostruct to look up the
      right control-block for the given net_device, and __lapb_devtostruct
      also bumps the ref-count, which means that when lapb_unregister returns
      the ref-count is still 1 and the control-block is leaked.
      
      Call lapb_put after __lapb_devtostruct to fix leak.
      
      Reported-by: syzbot+afb980676c836b4a0afa@syzkaller.appspotmail.com
      Signed-off-by: default avatarJeremy Sowden <jeremy@azazel.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da930ec9
    • Eric Dumazet's avatar
      ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero · a7f938c9
      Eric Dumazet authored
      [ Upstream commit 65a3c497 ]
      
      Before taking a refcount, make sure the object is not already
      scheduled for deletion.
      
      Same fix is needed in ipv6_flowlabel_opt()
      
      Fixes: 18367681 ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a7f938c9
    • Haiyang Zhang's avatar
      hv_netvsc: Set probe mode to sync · 3514df79
      Haiyang Zhang authored
      [ Upstream commit 9a33629b ]
      
      For better consistency of synthetic NIC names, we set the probe mode to
      PROBE_FORCE_SYNCHRONOUS. So the names can be aligned with the vmbus
      channel offer sequence.
      
      Fixes: af0a5646 ("use the new async probing feature for the hyperv drivers")
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3514df79
    • Ivan Vecera's avatar
      be2net: Fix number of Rx queues used for flow hashing · 1a976c6c
      Ivan Vecera authored
      [ Upstream commit 718f4a25 ]
      
      Number of Rx queues used for flow hashing returned by the driver is
      incorrect and this bug prevents user to use the last Rx queue in
      indirection table.
      
      Let's say we have a NIC with 6 combined queues:
      
      [root@sm-03 ~]# ethtool -l enp4s0f0
      Channel parameters for enp4s0f0:
      Pre-set maximums:
      RX:             5
      TX:             5
      Other:          0
      Combined:       6
      Current hardware settings:
      RX:             0
      TX:             0
      Other:          0
      Combined:       6
      
      Default indirection table maps all (6) queues equally but the driver
      reports only 5 rings available.
      
      [root@sm-03 ~]# ethtool -x enp4s0f0
      RX flow hash indirection table for enp4s0f0 with 5 RX ring(s):
          0:      0     1     2     3     4     5     0     1
          8:      2     3     4     5     0     1     2     3
         16:      4     5     0     1     2     3     4     5
         24:      0     1     2     3     4     5     0     1
      ...
      
      Now change indirection table somehow:
      
      [root@sm-03 ~]# ethtool -X enp4s0f0 weight 1 1
      [root@sm-03 ~]# ethtool -x enp4s0f0
      RX flow hash indirection table for enp4s0f0 with 6 RX ring(s):
          0:      0     0     0     0     0     0     0     0
      ...
         64:      1     1     1     1     1     1     1     1
      ...
      
      Now it is not possible to change mapping back to equal (default) state:
      
      [root@sm-03 ~]# ethtool -X enp4s0f0 equal 6
      Cannot set RX flow hash configuration: Invalid argument
      
      Fixes: 594ad54a ("be2net: Add support for setting and getting rx flow hash options")
      Reported-by: default avatarTianhao <tizhao@redhat.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a976c6c
    • Eric Dumazet's avatar
      ax25: fix inconsistent lock state in ax25_destroy_timer · 5f9c0c15
      Eric Dumazet authored
      [ Upstream commit d4d5d8e8 ]
      
      Before thread in process context uses bh_lock_sock()
      we must disable bh.
      
      sysbot reported :
      
      WARNING: inconsistent lock state
      5.2.0-rc3+ #32 Not tainted
      
      inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      blkid/26581 [HC0[0]:SC1[1]:HE1:SE0] takes:
      00000000e0da85ee (slock-AF_AX25){+.?.}, at: spin_lock include/linux/spinlock.h:338 [inline]
      00000000e0da85ee (slock-AF_AX25){+.?.}, at: ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275
      {SOFTIRQ-ON-W} state was registered at:
        lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303
        __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
        _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
        spin_lock include/linux/spinlock.h:338 [inline]
        ax25_rt_autobind+0x3ca/0x720 net/ax25/ax25_route.c:429
        ax25_connect.cold+0x30/0xa4 net/ax25/af_ax25.c:1221
        __sys_connect+0x264/0x330 net/socket.c:1834
        __do_sys_connect net/socket.c:1845 [inline]
        __se_sys_connect net/socket.c:1842 [inline]
        __x64_sys_connect+0x73/0xb0 net/socket.c:1842
        do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      irq event stamp: 2272
      hardirqs last  enabled at (2272): [<ffffffff810065f3>] trace_hardirqs_on_thunk+0x1a/0x1c
      hardirqs last disabled at (2271): [<ffffffff8100660f>] trace_hardirqs_off_thunk+0x1a/0x1c
      softirqs last  enabled at (1522): [<ffffffff87400654>] __do_softirq+0x654/0x94c kernel/softirq.c:320
      softirqs last disabled at (2267): [<ffffffff81449010>] invoke_softirq kernel/softirq.c:374 [inline]
      softirqs last disabled at (2267): [<ffffffff81449010>] irq_exit+0x180/0x1d0 kernel/softirq.c:414
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(slock-AF_AX25);
        <Interrupt>
          lock(slock-AF_AX25);
      
       *** DEADLOCK ***
      
      1 lock held by blkid/26581:
       #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: lockdep_copy_map include/linux/lockdep.h:175 [inline]
       #0: 0000000010fd154d ((&ax25->dtimer)){+.-.}, at: call_timer_fn+0xe0/0x720 kernel/time/timer.c:1312
      
      stack backtrace:
      CPU: 1 PID: 26581 Comm: blkid Not tainted 5.2.0-rc3+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_usage_bug.cold+0x393/0x4a2 kernel/locking/lockdep.c:2935
       valid_state kernel/locking/lockdep.c:2948 [inline]
       mark_lock_irq kernel/locking/lockdep.c:3138 [inline]
       mark_lock+0xd46/0x1370 kernel/locking/lockdep.c:3513
       mark_irqflags kernel/locking/lockdep.c:3391 [inline]
       __lock_acquire+0x159f/0x5490 kernel/locking/lockdep.c:3745
       lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4303
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x2f/0x40 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:338 [inline]
       ax25_destroy_timer+0x53/0xc0 net/ax25/af_ax25.c:275
       call_timer_fn+0x193/0x720 kernel/time/timer.c:1322
       expire_timers kernel/time/timer.c:1366 [inline]
       __run_timers kernel/time/timer.c:1685 [inline]
       __run_timers kernel/time/timer.c:1653 [inline]
       run_timer_softirq+0x66f/0x1740 kernel/time/timer.c:1698
       __do_softirq+0x25c/0x94c kernel/softirq.c:293
       invoke_softirq kernel/softirq.c:374 [inline]
       irq_exit+0x180/0x1d0 kernel/softirq.c:414
       exiting_irq arch/x86/include/asm/apic.h:536 [inline]
       smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806
       </IRQ>
      RIP: 0033:0x7f858d5c3232
      Code: 8b 61 08 48 8b 84 24 d8 00 00 00 4c 89 44 24 28 48 8b ac 24 d0 00 00 00 4c 8b b4 24 e8 00 00 00 48 89 7c 24 68 48 89 4c 24 78 <48> 89 44 24 58 8b 84 24 e0 00 00 00 89 84 24 84 00 00 00 8b 84 24
      RSP: 002b:00007ffcaf0cf5c0 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      RAX: 00007f858d7d27a8 RBX: 00007f858d7d8820 RCX: 00007f858d3940d8
      RDX: 00007ffcaf0cf798 RSI: 00000000f5e616f3 RDI: 00007f858d394fee
      RBP: 0000000000000000 R08: 00007ffcaf0cf780 R09: 00007f858d7db480
      R10: 0000000000000000 R11: 0000000009691a75 R12: 0000000000000005
      R13: 00000000f5e616f3 R14: 0000000000000000 R15: 00007ffcaf0cf798
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f9c0c15
    • Florian Westphal's avatar
      netfilter: nat: fix udp checksum corruption · f51ec54d
      Florian Westphal authored
      commit 6bac76db upstream.
      
      Due to copy&paste error nf_nat_mangle_udp_packet passes IPPROTO_TCP,
      resulting in incorrect udp checksum when payload had to be mangled.
      
      Fixes: dac3fe72 ("netfilter: nat: remove csum_recalc hook")
      Reported-by: default avatarMarc Haber <mh+netdev@zugschlus.de>
      Tested-by: default avatarMarc Haber <mh+netdev@zugschlus.de>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f51ec54d
  2. 19 Jun, 2019 10 commits
    • Greg Kroah-Hartman's avatar
      Linux 5.1.12 · 5752b504
      Greg Kroah-Hartman authored
      5752b504
    • Nicholas Piggin's avatar
      powerpc/64s: Fix THP PMD collapse serialisation · ee6688c1
      Nicholas Piggin authored
      commit 33258a1d upstream.
      
      Commit 1b2443a5 ("powerpc/book3s64: Avoid multiple endian
      conversion in pte helpers") changed the actual bitwise tests in
      pte_access_permitted by using pte_write() and pte_present() helpers
      rather than raw bitwise testing _PAGE_WRITE and _PAGE_PRESENT bits.
      
      The pte_present() change now returns true for PTEs which are
      !_PAGE_PRESENT and _PAGE_INVALID, which is the combination used by
      pmdp_invalidate() to synchronize access from lock-free lookups.
      pte_access_permitted() is used by pmd_access_permitted(), so allowing
      GUP lock free access to proceed with such PTEs breaks this
      synchronisation.
      
      This bug has been observed on a host using the hash page table MMU,
      with random crashes and corruption in guests, usually together with
      bad PMD messages in the host.
      
      Fix this by adding an explicit check in pmd_access_permitted(), and
      documenting the condition explicitly.
      
      The pte_write() change should be okay, and would prevent GUP from
      falling back to the slow path when encountering savedwrite PTEs, which
      matches what x86 (that does not implement savedwrite) does.
      
      Fixes: 1b2443a5 ("powerpc/book3s64: Avoid multiple endian conversion in pte helpers")
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee6688c1
    • Christophe Leroy's avatar
      powerpc: Fix kexec failure on book3s/32 · c3871b75
      Christophe Leroy authored
      commit 6c284228 upstream.
      
      In the old days, _PAGE_EXEC didn't exist on 6xx aka book3s/32.
      Therefore, allthough __mapin_ram_chunk() was already mapping kernel
      text with PAGE_KERNEL_TEXT and the rest with PAGE_KERNEL, the entire
      memory was executable. Part of the memory (first 512kbytes) was
      mapped with BATs instead of page table, but it was also entirely
      mapped as executable.
      
      In commit 385e89d5 ("powerpc/mm: add exec protection on
      powerpc 603"), we started adding exec protection to some 6xx, namely
      the 603, for pages mapped via pagetables.
      
      Then, in commit 63b2bc61 ("powerpc/mm/32s: Use BATs for
      STRICT_KERNEL_RWX"), the exec protection was extended to BAT mapped
      memory, so that really only the kernel text could be executed.
      
      The problem here is that kexec is based on copying some code into
      upper part of memory then executing it from there in order to install
      a fresh new kernel at its definitive location.
      
      However, the code is position independant and first part of it is
      just there to deactivate the MMU and jump to the second part. So it
      is possible to run this first part inplace instead of running the
      copy. Once the MMU is off, there is no protection anymore and the
      second part of the code will just run as before.
      Reported-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Fixes: 63b2bc61 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX")
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@c-s.fr>
      Tested-by: default avatarAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c3871b75
    • Jani Nikula's avatar
      drm: add fallback override/firmware EDID modes workaround · f14229b4
      Jani Nikula authored
      commit 48eaeb76 upstream.
      
      We've moved the override and firmware EDID (simply "override EDID" from
      now on) handling to the low level drm_do_get_edid() function in order to
      transparently use the override throughout the stack. The idea is that
      you get the override EDID via the ->get_modes() hook.
      
      Unfortunately, there are scenarios where the DDC probe in drm_get_edid()
      called via ->get_modes() fails, although the preceding ->detect()
      succeeds.
      
      In the case reported by Paul Wise, the ->detect() hook,
      intel_crt_detect(), relies on hotplug detect, bypassing the DDC. In the
      case reported by Ilpo Järvinen, there is no ->detect() hook, which is
      interpreted as connected. The subsequent DDC probe reached via
      ->get_modes() fails, and we don't even look at the override EDID,
      resulting in no modes being added.
      
      Because drm_get_edid() is used via ->detect() all over the place, we
      can't trivially remove the DDC probe, as it leads to override EDID
      effectively meaning connector forcing. The goal is that connector
      forcing and override EDID remain orthogonal.
      
      Generally, the underlying problem here is the conflation of ->detect()
      and ->get_modes() via drm_get_edid(). The former should just detect, and
      the latter should just get the modes, typically via reading the EDID. As
      long as drm_get_edid() is used in ->detect(), it needs to retain the DDC
      probe. Or such users need to have a separate DDC probe step first.
      
      The EDID caching between ->detect() and ->get_modes() done by some
      drivers is a further complication that prevents us from making
      drm_do_get_edid() adapt to the two cases.
      
      Work around the regression by falling back to a separate attempt at
      getting the override EDID at drm_helper_probe_single_connector_modes()
      level. With a working DDC and override EDID, it'll never be called; the
      override EDID will come via ->get_modes(). There will still be a failing
      DDC probe attempt in the cases that require the fallback.
      
      v2:
      - Call drm_connector_update_edid_property (Paul)
      - Update commit message about EDID caching (Daniel)
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107583Reported-by: default avatarPaul Wise <pabs3@bonedaddy.net>
      Cc: Paul Wise <pabs3@bonedaddy.net>
      References: http://mid.mail-archive.com/alpine.DEB.2.20.1905262211270.24390@whs-18.cs.helsinki.fiReported-by: default avatarIlpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
      Cc: Ilpo Järvinen <ilpo.jarvinen@cs.helsinki.fi>
      Suggested-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      References: 15f080f0 ("drm/edid: respect connector force for drm_get_edid ddc probe")
      Fixes: 53fd40a9 ("drm: handle override and firmware EDID at drm_do_get_edid() level")
      Cc: <stable@vger.kernel.org> # v4.15+ 56a2b7f2 drm/edid: abstract override/firmware EDID retrieval
      Cc: <stable@vger.kernel.org> # v4.15+
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Harish Chegondi <harish.chegondi@intel.com>
      Tested-by: default avatarPaul Wise <pabs3@bonedaddy.net>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190610093054.28445-1-jani.nikula@intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f14229b4
    • Jani Nikula's avatar
      drm/edid: abstract override/firmware EDID retrieval · dafaad67
      Jani Nikula authored
      commit 56a2b7f2 upstream.
      
      Abstract the debugfs override and the firmware EDID retrieval
      function. We'll be needing it in the follow-up. No functional changes.
      
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Harish Chegondi <harish.chegondi@intel.com>
      Reviewed-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Tested-by: default avatarTested-by: Paul Wise <pabs3@bonedaddy.net>
      Signed-off-by: default avatarJani Nikula <jani.nikula@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190607110513.12072-1-jani.nikula@intel.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dafaad67
    • Prarit Bhargava's avatar
      x86/resctrl: Prevent NULL pointer dereference when local MBM is disabled · 99287843
      Prarit Bhargava authored
      commit c7563e62 upstream.
      
      Booting with kernel parameter "rdt=cmt,mbmtotal,memlocal,l3cat,mba" and
      executing "mount -t resctrl resctrl -o mba_MBps /sys/fs/resctrl" results in
      a NULL pointer dereference on systems which do not have local MBM support
      enabled..
      
      BUG: kernel NULL pointer dereference, address: 0000000000000020
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      CPU: 0 PID: 722 Comm: kworker/0:3 Not tainted 5.2.0-0.rc3.git0.1.el7_UNSUPPORTED.x86_64 #2
      Workqueue: events mbm_handle_overflow
      RIP: 0010:mbm_handle_overflow+0x150/0x2b0
      
      Only enter the bandwith update loop if the system has local MBM enabled.
      
      Fixes: de73f38f ("x86/intel_rdt/mba_sc: Feedback loop to dynamically update mem bandwidth")
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Reinette Chatre <reinette.chatre@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190610171544.13474-1-prarit@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99287843
    • Baoquan He's avatar
      x86/mm/KASLR: Compute the size of the vmemmap section properly · cb1d9463
      Baoquan He authored
      commit 00e5a2bb upstream.
      
      The size of the vmemmap section is hardcoded to 1 TB to support the
      maximum amount of system RAM in 4-level paging mode - 64 TB.
      
      However, 1 TB is not enough for vmemmap in 5-level paging mode. Assuming
      the size of struct page is 64 Bytes, to support 4 PB system RAM in 5-level,
      64 TB of vmemmap area is needed:
      
        4 * 1000^5 PB / 4096 bytes page size * 64 bytes per page struct / 1000^4 TB = 62.5 TB.
      
      This hardcoding may cause vmemmap to corrupt the following
      cpu_entry_area section, if KASLR puts vmemmap very close to it and the
      actual vmemmap size is bigger than 1 TB.
      
      So calculate the actual size of the vmemmap region needed and then align
      it up to 1 TB boundary.
      
      In 4-level paging mode it is always 1 TB. In 5-level it's adjusted on
      demand. The current code reserves 0.5 PB for vmemmap on 5-level. With
      this change, the space can be saved and thus used to increase entropy
      for the randomization.
      
       [ bp: Spell out how the 64 TB needed for vmemmap is computed and massage commit
         message. ]
      
      Fixes: eedb92ab ("x86/mm: Make virtual memory layout dynamic for CONFIG_X86_5LEVEL=y")
      Signed-off-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarKirill A. Shutemov <kirill@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: kirill.shutemov@linux.intel.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190523025744.3756-1-bhe@redhat.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb1d9463
    • Andrey Ryabinin's avatar
      x86/kasan: Fix boot with 5-level paging and KASAN · 3fd63c38
      Andrey Ryabinin authored
      commit f3176ec9 upstream.
      
      Since commit d52888aa ("x86/mm: Move LDT remap out of KASLR region on
      5-level paging") kernel doesn't boot with KASAN on 5-level paging machines.
      The bug is actually in early_p4d_offset() and introduced by commit
      12a8cc7f ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
      
      early_p4d_offset() tries to convert pgd_val(*pgd) value to a physical
      address. This doesn't make sense because pgd_val() already contains the
      physical address.
      
      It did work prior to commit d52888aa because the result of
      "__pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK" was the same as "pgd_val(*pgd)
      & PTE_PFN_MASK". __pa_nodebug() just set some high bits which were masked
      out by applying PTE_PFN_MASK.
      
      After the change of the PAGE_OFFSET offset in commit d52888aa
      __pa_nodebug(pgd_val(*pgd)) started to return a value with more high bits
      set and PTE_PFN_MASK wasn't enough to mask out all of them. So it returns a
      wrong not even canonical address and crashes on the attempt to dereference
      it.
      
      Switch back to pgd_val() & PTE_PFN_MASK to cure the issue.
      
      Fixes: 12a8cc7f ("x86/kasan: Use the same shadow offset for 4- and 5-level paging")
      Reported-by: default avatarKirill A. Shutemov <kirill@shutemov.name>
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: kasan-dev@googlegroups.com
      Cc: stable@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20190614143149.2227-1-aryabinin@virtuozzo.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3fd63c38
    • Borislav Petkov's avatar
      x86/microcode, cpuhotplug: Add a microcode loader CPU hotplug callback · 7ce303bd
      Borislav Petkov authored
      commit 78f4e932 upstream.
      
      Adric Blake reported the following warning during suspend-resume:
      
        Enabling non-boot CPUs ...
        x86: Booting SMP configuration:
        smpboot: Booting Node 0 Processor 1 APIC 0x2
        unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0000000000000000) \
         at rIP: 0xffffffff8d267924 (native_write_msr+0x4/0x20)
        Call Trace:
         intel_set_tfa
         intel_pmu_cpu_starting
         ? x86_pmu_dead_cpu
         x86_pmu_starting_cpu
         cpuhp_invoke_callback
         ? _raw_spin_lock_irqsave
         notify_cpu_starting
         start_secondary
         secondary_startup_64
        microcode: sig=0x806ea, pf=0x80, revision=0x96
        microcode: updated to revision 0xb4, date = 2019-04-01
        CPU1 is up
      
      The MSR in question is MSR_TFA_RTM_FORCE_ABORT and that MSR is emulated
      by microcode. The log above shows that the microcode loader callback
      happens after the PMU restoration, leading to the conjecture that
      because the microcode hasn't been updated yet, that MSR is not present
      yet, leading to the #GP.
      
      Add a microcode loader-specific hotplug vector which comes before
      the PERF vectors and thus executes earlier and makes sure the MSR is
      present.
      
      Fixes: 400816f6 ("perf/x86/intel: Implement support for TSX Force Abort")
      Reported-by: default avatarAdric Blake <promarbler14@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: x86@kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=203637Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ce303bd
    • Borislav Petkov's avatar
      RAS/CEC: Fix binary search function · 4a163bcf
      Borislav Petkov authored
      commit f3c74b38 upstream.
      
      Switch to using Donald Knuth's binary search algorithm (The Art of
      Computer Programming, vol. 3, section 6.2.1). This should've been done
      from the very beginning but the author must've been smoking something
      very potent at the time.
      
      The problem with the current one was that it would return the wrong
      element index in certain situations:
      
        https://lkml.kernel.org/r/CAM_iQpVd02zkVJ846cj-Fg1yUNuz6tY5q1Vpj4LrXmE06dPYYg@mail.gmail.com
      
      and the noodling code after the loop was fishy at best.
      
      So switch to using Knuth's binary search. The final result is much
      cleaner and straightforward.
      
      Fixes: 011d8261 ("RAS: Add a Corrected Errors Collector")
      Reported-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a163bcf