1. 30 Apr, 2020 15 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-fixes-2020-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 81d6bc44
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2020-04-29
      
      This series introduces some fixes to mlx5 driver.
      
      Please pull and let me know if there is any problem.
      
      v2:
       - Dropped the ktls patch, Tariq has to check if it is fixable in the stack
      
      For -stable v4.12
       ('net/mlx5: Fix forced completion access non initialized command entry')
       ('net/mlx5: Fix command entry leak in Internal Error State')
      
      For -stable v5.4
       ('net/mlx5: DR, On creation set CQ's arm_db member to right value')
      
      For -stable v5.6
       ('net/mlx5e: Fix q counters on uplink representors')
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81d6bc44
    • Paolo Abeni's avatar
      mptcp: fix uninitialized value access · ac2b47fb
      Paolo Abeni authored
      tcp_v{4,6}_syn_recv_sock() set 'own_req' only when returning
      a not NULL 'child', let's check 'own_req' only if child is
      available to avoid an - unharmful - UBSAN splat.
      
      v1 -> v2:
       - reference the correct hash
      
      Fixes: 4c8941de ("mptcp: avoid flipping mp_capable field in syn_recv_sock()")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac2b47fb
    • David S. Miller's avatar
      Merge branch 'mptcp-fix-incoming-options-parsing' · 8c755953
      David S. Miller authored
      Paolo Abeni says:
      
      ====================
      mptcp: fix incoming options parsing
      
      This series addresses a serious issue in MPTCP option parsing.
      
      This is bigger than the usual -net change, but I was unable to find a
      working, sane, smaller fix.
      
      The core change is inside patch 2/5 which moved MPTCP options parsing from
      the TCP code inside existing MPTCP hooks and clean MPTCP options status on
      each processed packet.
      
      The patch 1/5 is a needed pre-requisite, and patches 3,4,5 are smaller,
      related fixes.
      
      v1 -> v2:
       - cleaned-up patch 1/5
       - rebased on top of current -net
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c755953
    • Paolo Abeni's avatar
      mptcp: initialize the data_fin field for mpc packets · a77895db
      Paolo Abeni authored
      When parsing MPC+data packets we set the dss field, so
      we must also initialize the data_fin, or we can find stray
      value there.
      
      Fixes: 9a19371b ("mptcp: fix data_fin handing in RX path")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a77895db
    • Paolo Abeni's avatar
      mptcp: fix 'use_ack' option access. · 5a91e32b
      Paolo Abeni authored
      The mentioned RX option field is initialized only for DSS
      packet, we must access it only if 'dss' is set too, or
      the subflow will end-up in a bad status, leading to
      RFC violations.
      
      Fixes: d22f4988 ("mptcp: process MP_CAPABLE data option")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5a91e32b
    • Paolo Abeni's avatar
      mptcp: avoid a WARN on bad input. · d6085fe1
      Paolo Abeni authored
      Syzcaller has found a way to trigger the WARN_ON_ONCE condition
      in check_fully_established().
      
      The root cause is a legit fallback to TCP scenario, so replace
      the WARN with a plain message on a more strict condition.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6085fe1
    • Paolo Abeni's avatar
      mptcp: move option parsing into mptcp_incoming_options() · cfde141e
      Paolo Abeni authored
      The mptcp_options_received structure carries several per
      packet flags (mp_capable, mp_join, etc.). Such fields must
      be cleared on each packet, even on dropped ones or packet
      not carrying any MPTCP options, but the current mptcp
      code clears them only on TCP option reset.
      
      On several races/corner cases we end-up with stray bits in
      incoming options, leading to WARN_ON splats. e.g.:
      
      [  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
      [  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
      [  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
      [  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
      [  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
      [  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
      [  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
      [  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
      [  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
      [  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
      [  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
      [  171.232586] Call Trace:
      [  171.233109]  <IRQ>
      [  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
      [  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
      [  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
      [  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
      [  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
      [  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
      [  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
      [  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
      [  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
      [  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
      [  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
      [  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
      [  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
      [  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
      [  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
      [  171.282358]  </IRQ>
      
      We could address the issue clearing explicitly the relevant fields
      in several places - tcp_parse_option, tcp_fast_parse_options,
      possibly others.
      
      Instead we move the MPTCP option parsing into the already existing
      mptcp ingress hook, so that we need to clear the fields in a single
      place.
      
      This allows us dropping an MPTCP hook from the TCP code and
      removing the quite large mptcp_options_received from the tcp_sock
      struct. On the flip side, the MPTCP sockets will traverse the
      option space twice (in tcp_parse_option() and in
      mptcp_incoming_options(). That looks acceptable: we already
      do that for syn and 3rd ack packets, plain TCP socket will
      benefit from it, and even MPTCP sockets will experience better
      code locality, reducing the jumps between TCP and MPTCP code.
      
      v1 -> v2:
       - rebased on current '-net' tree
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cfde141e
    • Paolo Abeni's avatar
      mptcp: consolidate synack processing. · 263e1201
      Paolo Abeni authored
      Currently the MPTCP code uses 2 hooks to process syn-ack
      packets, mptcp_rcv_synsent() and the sk_rx_dst_set()
      callback.
      
      We can drop the first, moving the relevant code into the
      latter, reducing the hooking into the TCP code. This is
      also needed by the next patch.
      
      v1 -> v2:
       - use local tcp sock ptr instead of casting the sk variable
         several times - DaveM
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      263e1201
    • Roi Dayan's avatar
      net/mlx5e: Fix q counters on uplink representors · 67b38de6
      Roi Dayan authored
      Need to allocate the q counters before init_rx which needs them
      when creating the rq.
      
      Fixes: 8520fa57 ("net/mlx5e: Create q counters on uplink representors")
      Signed-off-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      67b38de6
    • Moshe Shemesh's avatar
      net/mlx5: Fix command entry leak in Internal Error State · cece6f43
      Moshe Shemesh authored
      Processing commands by cmd_work_handler() while already in Internal
      Error State will result in entry leak, since the handler process force
      completion without doorbell. Forced completion doesn't release the entry
      and event completion will never arrive, so entry should be released.
      
      Fixes: 73dd3a48 ("net/mlx5: Avoid using pending command interface slots")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      cece6f43
    • Moshe Shemesh's avatar
      net/mlx5: Fix forced completion access non initialized command entry · f3cb3ceb
      Moshe Shemesh authored
      mlx5_cmd_flush() will trigger forced completions to all valid command
      entries. Triggered by an asynch event such as fast teardown it can
      happen at any stage of the command, including command initialization.
      It will trigger forced completion and that can lead to completion on an
      uninitialized command entry.
      
      Setting MLX5_CMD_ENT_STATE_PENDING_COMP only after command entry is
      initialized will ensure force completion is treated only if command
      entry is initialized.
      
      Fixes: 73dd3a48 ("net/mlx5: Avoid using pending command interface slots")
      Signed-off-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f3cb3ceb
    • Erez Shitrit's avatar
      net/mlx5: DR, On creation set CQ's arm_db member to right value · 8075411d
      Erez Shitrit authored
      In polling mode, set arm_db member to a value that will avoid CQ
      event recovery by the HW.
      Otherwise we might get event without completion function.
      In addition,empty completion function to was added to protect from
      unexpected events.
      
      Fixes: 297ccceb ("net/mlx5: DR, Expose an internal API to issue RDMA operations")
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarAlex Vesker <valex@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      8075411d
    • Parav Pandit's avatar
      net/mlx5: E-switch, Fix mutex init order · f8d1edda
      Parav Pandit authored
      In cited patch mutex is initialized after its used.
      Below call trace is observed.
      Fix the order to initialize the mutex early enough.
      Similarly follow mirror sequence during cleanup.
      
      kernel: DEBUG_LOCKS_WARN_ON(lock->magic != lock)
      kernel: WARNING: CPU: 5 PID: 45916 at kernel/locking/mutex.c:938
      __mutex_lock+0x7d6/0x8a0
      kernel: Call Trace:
      kernel: ? esw_vport_tbl_get+0x3b/0x250 [mlx5_core]
      kernel: ? mark_held_locks+0x55/0x70
      kernel: ? __slab_free+0x274/0x400
      kernel: ? lockdep_hardirqs_on+0x140/0x1d0
      kernel: esw_vport_tbl_get+0x3b/0x250 [mlx5_core]
      kernel: ? mlx5_esw_chains_create_fdb_prio+0xa57/0xc20 [mlx5_core]
      kernel: mlx5_esw_vport_tbl_get+0x88/0xf0 [mlx5_core]
      kernel: mlx5_esw_chains_create+0x2f3/0x3e0 [mlx5_core]
      kernel: esw_create_offloads_fdb_tables+0x11d/0x580 [mlx5_core]
      kernel: esw_offloads_enable+0x26d/0x540 [mlx5_core]
      kernel: mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core]
      kernel: mlx5_devlink_eswitch_mode_set+0x1af/0x320 [mlx5_core]
      kernel: devlink_nl_cmd_eswitch_set_doit+0x41/0xb0
      
      Fixes: 96e32687 ("net/mlx5e: Eswitch, Use per vport tables for mirroring")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarEli Cohen <eli@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      f8d1edda
    • Parav Pandit's avatar
      net/mlx5: E-switch, Fix printing wrong error value · e9864539
      Parav Pandit authored
      When mlx5_modify_header_alloc() fails, instead of printing the error
      value returned, current error log prints 0.
      
      Fix by printing correct error value returned by
      mlx5_modify_header_alloc().
      
      Fixes: 6724e66b ("net/mlx5: E-Switch, Get reg_c1 value on miss")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e9864539
    • Parav Pandit's avatar
      net/mlx5: E-switch, Fix error unwinding flow for steering init failure · 79949985
      Parav Pandit authored
      Error unwinding is done incorrectly in the cited commit.
      When steering init fails, there is no need to perform steering cleanup.
      When vport error exists, error cleanup should be mirror of the setup
      routine, i.e. to perform steering cleanup before metadata cleanup.
      
      This avoids the call trace in accessing uninitialized objects which are
      skipped during steering_init() due to failure in steering_init().
      
      Call trace:
      mlx5_cmd_modify_header_alloc:805:(pid 21128): too many modify header
      actions 1, max supported 0
      E-Switch: Failed to create restore mod header
      
      BUG: kernel NULL pointer dereference, address: 00000000000000d0
      [  677.263079]  mlx5_destroy_flow_group+0x13/0x80 [mlx5_core]
      [  677.268921]  esw_offloads_steering_cleanup+0x51/0xf0 [mlx5_core]
      [  677.275281]  esw_offloads_enable+0x1a5/0x800 [mlx5_core]
      [  677.280949]  mlx5_eswitch_enable_locked+0x155/0x860 [mlx5_core]
      [  677.287227]  mlx5_devlink_eswitch_mode_set+0x1af/0x320
      [  677.293741]  devlink_nl_cmd_eswitch_set_doit+0x41/0xb0
      [  677.299217]  genl_rcv_msg+0x1eb/0x430
      
      Fixes: 7983a675 ("net/mlx5: E-Switch, Enable chains only if regs loopback is enabled")
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      79949985
  2. 29 Apr, 2020 6 commits
  3. 28 Apr, 2020 2 commits
    • YueHaibing's avatar
      net/x25: Fix null-ptr-deref in x25_disconnect · 8999dc89
      YueHaibing authored
      We should check null before do x25_neigh_put in x25_disconnect,
      otherwise may cause null-ptr-deref like this:
      
       #include <sys/socket.h>
       #include <linux/x25.h>
      
       int main() {
          int sck_x25;
          sck_x25 = socket(AF_X25, SOCK_SEQPACKET, 0);
          close(sck_x25);
          return 0;
       }
      
      BUG: kernel NULL pointer dereference, address: 00000000000000d8
      CPU: 0 PID: 4817 Comm: t2 Not tainted 5.7.0-rc3+ #159
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-
      RIP: 0010:x25_disconnect+0x91/0xe0
      Call Trace:
       x25_release+0x18a/0x1b0
       __sock_release+0x3d/0xc0
       sock_close+0x13/0x20
       __fput+0x107/0x270
       ____fput+0x9/0x10
       task_work_run+0x6d/0xb0
       exit_to_usermode_loop+0x102/0x110
       do_syscall_64+0x23c/0x260
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      Reported-by: syzbot+6db548b615e5aeefdce2@syzkaller.appspotmail.com
      Fixes: 4becb7ee ("net/x25: Fix x25_neigh refcnt leak when x25 disconnect")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8999dc89
    • Gavin Shan's avatar
      net/ena: Fix build warning in ena_xdp_set() · caec6619
      Gavin Shan authored
      This fixes the following build warning in ena_xdp_set(), which is
      observed on aarch64 with 64KB page size.
      
         In file included from ./include/net/inet_sock.h:19,
            from ./include/net/ip.h:27,
            from drivers/net/ethernet/amazon/ena/ena_netdev.c:46:
         drivers/net/ethernet/amazon/ena/ena_netdev.c: In function         \
         ‘ena_xdp_set’:                                                    \
         drivers/net/ethernet/amazon/ena/ena_netdev.c:557:6: warning:      \
         format ‘%lu’                                                      \
         expects argument of type ‘long unsigned int’, but argument 4      \
         has type ‘int’                                                    \
         [-Wformat=] "Failed to set xdp program, the current MTU (%d) is   \
         larger than the maximum allowed MTU (%lu) while xdp is on",
      Signed-off-by: default avatarGavin Shan <gshan@redhat.com>
      Acked-by: default avatarShay Agroskin <shayagr@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caec6619
  4. 27 Apr, 2020 17 commits