1. 06 Oct, 2021 6 commits
  2. 05 Oct, 2021 9 commits
    • Eric Dumazet's avatar
      netlink: annotate data races around nlk->bound · 7707a4d0
      Eric Dumazet authored
      While existing code is correct, KCSAN is reporting
      a data-race in netlink_insert / netlink_sendmsg [1]
      
      It is correct to read nlk->bound without a lock, as netlink_autobind()
      will acquire all needed locks.
      
      [1]
      BUG: KCSAN: data-race in netlink_insert / netlink_sendmsg
      
      write to 0xffff8881031c8b30 of 1 bytes by task 18752 on cpu 0:
       netlink_insert+0x5cc/0x7f0 net/netlink/af_netlink.c:597
       netlink_autobind+0xa9/0x150 net/netlink/af_netlink.c:842
       netlink_sendmsg+0x479/0x7c0 net/netlink/af_netlink.c:1892
       sock_sendmsg_nosec net/socket.c:703 [inline]
       sock_sendmsg net/socket.c:723 [inline]
       ____sys_sendmsg+0x360/0x4d0 net/socket.c:2392
       ___sys_sendmsg net/socket.c:2446 [inline]
       __sys_sendmsg+0x1ed/0x270 net/socket.c:2475
       __do_sys_sendmsg net/socket.c:2484 [inline]
       __se_sys_sendmsg net/socket.c:2482 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2482
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff8881031c8b30 of 1 bytes by task 18751 on cpu 1:
       netlink_sendmsg+0x270/0x7c0 net/netlink/af_netlink.c:1891
       sock_sendmsg_nosec net/socket.c:703 [inline]
       sock_sendmsg net/socket.c:723 [inline]
       __sys_sendto+0x2a8/0x370 net/socket.c:2019
       __do_sys_sendto net/socket.c:2031 [inline]
       __se_sys_sendto net/socket.c:2027 [inline]
       __x64_sys_sendto+0x74/0x90 net/socket.c:2027
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 18751 Comm: syz-executor.0 Not tainted 5.14.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: da314c99 ("netlink: Replace rhash_portid with bound")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7707a4d0
    • Wong Vee Khee's avatar
      net: pcs: xpcs: fix incorrect CL37 AN sequence · e3cf002d
      Wong Vee Khee authored
      According to Synopsys DesignWare Cores Ethernet PCS databook, it is
      required to disable Clause 37 auto-negotiation by programming bit-12
      (AN_ENABLE) to 0 if it is already enabled, before programming various
      fields of VR_MII_AN_CTRL registers.
      
      After all these programming are done, it is then required to enable
      Clause 37 auto-negotiation by programming bit-12 (AN_ENABLE) to 1.
      
      Fixes: b97b5331 ("net: pcs: add C37 SGMII AN support for intel mGbE controller")
      Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarWong Vee Khee <vee.khee.wong@linux.intel.com>
      Reviewed-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e3cf002d
    • Sean Anderson's avatar
      net: sfp: Fix typo in state machine debug string · 25a9da66
      Sean Anderson authored
      The string should be "tx_disable" to match the state enum.
      
      Fixes: 4005a7cb ("net: phy: sftp: print debug message with text, not numbers")
      Signed-off-by: default avatarSean Anderson <sean.anderson@seco.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      25a9da66
    • Eric Dumazet's avatar
      net/sched: sch_taprio: properly cancel timer from taprio_destroy() · a56d447f
      Eric Dumazet authored
      There is a comment in qdisc_create() about us not calling ops->reset()
      in some cases.
      
      err_out4:
      	/*
      	 * Any broken qdiscs that would require a ops->reset() here?
      	 * The qdisc was never in action so it shouldn't be necessary.
      	 */
      
      As taprio sets a timer before actually receiving a packet, we need
      to cancel it from ops->destroy, just in case ops->reset has not
      been called.
      
      syzbot reported:
      
      ODEBUG: free active (active state 0) object type: hrtimer hint: advance_sched+0x0/0x9a0 arch/x86/include/asm/atomic64_64.h:22
      WARNING: CPU: 0 PID: 8441 at lib/debugobjects.c:505 debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Modules linked in:
      CPU: 0 PID: 8441 Comm: syz-executor813 Not tainted 5.14.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:505
      Code: ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 af 00 00 00 48 8b 14 dd e0 d3 e3 89 4c 89 ee 48 c7 c7 e0 c7 e3 89 e8 5b 86 11 05 <0f> 0b 83 05 85 03 92 09 01 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e c3
      RSP: 0018:ffffc9000130f330 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
      RDX: ffff88802baeb880 RSI: ffffffff815d87b5 RDI: fffff52000261e58
      RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff815d25ee R11: 0000000000000000 R12: ffffffff898dd020
      R13: ffffffff89e3ce20 R14: ffffffff81653630 R15: dffffc0000000000
      FS:  0000000000f0d300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffb64b3e000 CR3: 0000000036557000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __debug_check_no_obj_freed lib/debugobjects.c:987 [inline]
       debug_check_no_obj_freed+0x301/0x420 lib/debugobjects.c:1018
       slab_free_hook mm/slub.c:1603 [inline]
       slab_free_freelist_hook+0x171/0x240 mm/slub.c:1653
       slab_free mm/slub.c:3213 [inline]
       kfree+0xe4/0x540 mm/slub.c:4267
       qdisc_create+0xbcf/0x1320 net/sched/sch_api.c:1299
       tc_modify_qdisc+0x4c8/0x1a60 net/sched/sch_api.c:1663
       rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2403
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2457
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      
      Fixes: 44d4775c ("net/sched: sch_taprio: reset child qdiscs before freeing them")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Davide Caratti <dcaratti@redhat.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarVinicius Costa Gomes <vinicius.gomes@intel.com>
      Acked-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a56d447f
    • David S. Miller's avatar
      Merge branch 'bridge-fixes' · 64506cb9
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      net: bridge: br_get_linkxstats_size() fixes
      
      This patch series attempts to fix the following syzbot report.
      
      WARNING: CPU: 1 PID: 21425 at net/core/rtnetlink.c:5388 rtnl_stats_get+0x80f/0x8c0 net/core/rtnetlink.c:5388
      Modules linked in:
      CPU: 1 PID: 21425 Comm: syz-executor394 Not tainted 5.13.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:rtnl_stats_get+0x80f/0x8c0 net/core/rtnetlink.c:5388
      Code: e9 9c fc ff ff 4c 89 e7 89 0c 24 e8 ab 8b a8 fa 8b 0c 24 e9 bc fc ff ff 4c 89 e7 e8 9b 8b a8 fa e9 df fe ff ff e8 61 85 63 fa <0f> 0b e9 f7 fc ff ff 41 be ea ff ff ff e9 f9 fc ff ff 41 be 97 ff
      RSP: 0018:ffffc9000cf77688 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 000000000000012c RCX: 0000000000000000
      RDX: ffff8880211754c0 RSI: ffffffff8711571f RDI: 0000000000000003
      RBP: ffff8880175aa780 R08: 00000000ffffffa6 R09: ffff88823bd5c04f
      R10: ffffffff87115413 R11: 0000000000000001 R12: ffff8880175aab74
      R13: ffff8880175aab40 R14: 00000000ffffffa6 R15: 0000000000000006
      FS:  0000000001ff9300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000005cfd58 CR3: 000000002cd43000 CR4: 00000000001506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5562
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:654 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:674
       ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
       ___sys_sendmsg+0xf3/0x170 net/socket.c:2404
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
       do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4440d9
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64506cb9
    • Eric Dumazet's avatar
      net: bridge: fix under estimation in br_get_linkxstats_size() · 0854a051
      Eric Dumazet authored
      Commit de179966 ("net: bridge: add STP xstats")
      added an additional nla_reserve_64bit() in br_fill_linkxstats(),
      but forgot to update br_get_linkxstats_size() accordingly.
      
      This can trigger the following in rtnl_stats_get()
      
      	WARN_ON(err == -EMSGSIZE);
      
      Fixes: de179966 ("net: bridge: add STP xstats")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0854a051
    • Eric Dumazet's avatar
      net: bridge: use nla_total_size_64bit() in br_get_linkxstats_size() · dbe0b880
      Eric Dumazet authored
      bridge_fill_linkxstats() is using nla_reserve_64bit().
      
      We must use nla_total_size_64bit() instead of nla_total_size()
      for corresponding data structure.
      
      Fixes: 1080ab95 ("net: bridge: add support for IGMP/MLD stats and export them via netlink")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Nikolay Aleksandrov <nikolay@nvidia.com>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbe0b880
    • Hayes Wang's avatar
      r8152: avoid to resubmit rx immediately · baf33d7a
      Hayes Wang authored
      For the situation that the disconnect event comes very late when the
      device is unplugged, the driver would resubmit the RX bulk transfer
      after getting the callback with -EPROTO immediately and continually.
      Finally, soft lockup occurs.
      
      This patch avoids to resubmit RX immediately. It uses a workqueue to
      schedule the RX NAPI. And the NAPI would resubmit the RX. It let the
      disconnect event have opportunity to stop the submission before soft
      lockup.
      Reported-by: default avatarJason-ch Chen <jason-ch.chen@mediatek.com>
      Tested-by: default avatarJason-ch Chen <jason-ch.chen@mediatek.com>
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      baf33d7a
    • Jakub Kicinski's avatar
      etherdevice: use __dev_addr_set() · 3f6cffb8
      Jakub Kicinski authored
      Andrew points out that eth_hw_addr_set() replaces memcpy()
      calls so we can't use ether_addr_copy() which assumes
      both arguments are 2-bytes aligned.
      Reported-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f6cffb8
  3. 04 Oct, 2021 1 commit
  4. 02 Oct, 2021 4 commits
  5. 01 Oct, 2021 9 commits
  6. 30 Sep, 2021 11 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 4de593fb
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from mac80211, netfilter and bpf.
      
        Current release - regressions:
      
         - bpf, cgroup: assign cgroup in cgroup_sk_alloc when called from
           interrupt
      
         - mdio: revert mechanical patches which broke handling of optional
           resources
      
         - dev_addr_list: prevent address duplication
      
        Previous releases - regressions:
      
         - sctp: break out if skb_header_pointer returns NULL in sctp_rcv_ootb
           (NULL deref)
      
         - Revert "mac80211: do not use low data rates for data frames with no
           ack flag", fixing broadcast transmissions
      
         - mac80211: fix use-after-free in CCMP/GCMP RX
      
         - netfilter: include zone id in tuple hash again, minimize collisions
      
         - netfilter: nf_tables: unlink table before deleting it (race -> UAF)
      
         - netfilter: log: work around missing softdep backend module
      
         - mptcp: don't return sockets in foreign netns
      
         - sched: flower: protect fl_walk() with rcu (race -> UAF)
      
         - ixgbe: fix NULL pointer dereference in ixgbe_xdp_setup
      
         - smsc95xx: fix stalled rx after link change
      
         - enetc: fix the incorrect clearing of IF_MODE bits
      
         - ipv4: fix rtnexthop len when RTA_FLOW is present
      
         - dsa: mv88e6xxx: 6161: use correct MAX MTU config method for this
           SKU
      
         - e100: fix length calculation & buffer overrun in ethtool::get_regs
      
        Previous releases - always broken:
      
         - mac80211: fix using stale frag_tail skb pointer in A-MSDU tx
      
         - mac80211: drop frames from invalid MAC address in ad-hoc mode
      
         - af_unix: fix races in sk_peer_pid and sk_peer_cred accesses (race
           -> UAF)
      
         - bpf, x86: Fix bpf mapping of atomic fetch implementation
      
         - bpf: handle return value of BPF_PROG_TYPE_STRUCT_OPS prog
      
         - netfilter: ip6_tables: zero-initialize fragment offset
      
         - mhi: fix error path in mhi_net_newlink
      
         - af_unix: return errno instead of NULL in unix_create1() when over
           the fs.file-max limit
      
        Misc:
      
         - bpf: exempt CAP_BPF from checks against bpf_jit_limit
      
         - netfilter: conntrack: make max chain length random, prevent
           guessing buckets by attackers
      
         - netfilter: nf_nat_masquerade: make async masq_inet6_event handling
           generic, defer conntrack walk to work queue (prevent hogging RTNL
           lock)"
      
      * tag 'net-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
        af_unix: fix races in sk_peer_pid and sk_peer_cred accesses
        net: stmmac: fix EEE init issue when paired with EEE capable PHYs
        net: dev_addr_list: handle first address in __hw_addr_add_ex
        net: sched: flower: protect fl_walk() with rcu
        net: introduce and use lock_sock_fast_nested()
        net: phy: bcm7xxx: Fixed indirect MMD operations
        net: hns3: disable firmware compatible features when uninstall PF
        net: hns3: fix always enable rx vlan filter problem after selftest
        net: hns3: PF enable promisc for VF when mac table is overflow
        net: hns3: fix show wrong state when add existing uc mac address
        net: hns3: fix mixed flag HCLGE_FLAG_MQPRIO_ENABLE and HCLGE_FLAG_DCB_ENABLE
        net: hns3: don't rollback when destroy mqprio fail
        net: hns3: remove tc enable checking
        net: hns3: do not allow call hns3_nic_net_open repeatedly
        ixgbe: Fix NULL pointer dereference in ixgbe_xdp_setup
        net: bridge: mcast: Associate the seqcount with its protecting lock.
        net: mdio-ipq4019: Fix the error for an optional regs resource
        net: hns3: fix hclge_dbg_dump_tm_pg() stack usage
        net: mdio: mscc-miim: Fix the mdio controller
        af_unix: Return errno instead of NULL in unix_create1().
        ...
      4de593fb
    • Aya Levin's avatar
      net/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode · 3bf1742f
      Aya Levin authored
      TX-port-TS hijacks the PTP traffic to a specific HW TX-queue. This
      conflicts with MQPRIO in channel mode, which specifies explicitly which
      TC accepts the packet. This patch mutually excludes the above
      configuration.
      
      Fixes: ec60c458 ("net/mlx5e: Support MQPRIO channel mode")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3bf1742f
    • Lama Kayal's avatar
      net/mlx5e: Fix the presented RQ index in PTP stats · dd1979cf
      Lama Kayal authored
      PTP-RQ counters title format contains PTP-RQ identifier, which is
      mistakenly not passed to sprinft().
      This leads to unexpected garbage values instead.
      This patch fixes it.
      
      Before applying the patch:
      ethtool -S eth3 | grep ptp_rq
           ptp_rq15_packets: 0
           ptp_rq8_bytes: 0
           ptp_rq6_csum_complete: 0
           ptp_rq14_csum_complete_tail: 0
           ptp_rq3_csum_complete_tail_slow : 0
           ptp_rq9_csum_unnecessary: 0
           ptp_rq1_csum_unnecessary_inner: 0
           ptp_rq7_csum_none: 0
           ptp_rq10_xdp_drop: 0
           ptp_rq9_xdp_redirect: 0
           ptp_rq13_lro_packets: 0
           ptp_rq12_lro_bytes: 0
           ptp_rq10_ecn_mark: 0
           ptp_rq9_removed_vlan_packets: 0
           ptp_rq5_wqe_err: 0
           ptp_rq8_mpwqe_filler_cqes: 0
           ptp_rq2_mpwqe_filler_strides: 0
           ptp_rq5_oversize_pkts_sw_drop: 0
           ptp_rq6_buff_alloc_err: 0
           ptp_rq15_cqe_compress_blks: 0
           ptp_rq2_cqe_compress_pkts: 0
           ptp_rq2_cache_reuse: 0
           ptp_rq12_cache_full: 0
           ptp_rq11_cache_empty: 256
           ptp_rq12_cache_busy: 0
           ptp_rq11_cache_waive: 0
           ptp_rq12_congst_umr: 0
           ptp_rq11_arfs_err: 0
           ptp_rq9_recover: 0
      
      After applying the patch:
      ethtool -S eth3 | grep ptp_rq
           ptp_rq0_packets: 0
           ptp_rq0_bytes: 0
           ptp_rq0_csum_complete: 0
           ptp_rq0_csum_complete_tail: 0
           ptp_rq0_csum_complete_tail_slow : 0
           ptp_rq0_csum_unnecessary: 0
           ptp_rq0_csum_unnecessary_inner: 0
           ptp_rq0_csum_none: 0
           ptp_rq0_xdp_drop: 0
           ptp_rq0_xdp_redirect: 0
           ptp_rq0_lro_packets: 0
           ptp_rq0_lro_bytes: 0
           ptp_rq0_ecn_mark: 0
           ptp_rq0_removed_vlan_packets: 0
           ptp_rq0_wqe_err: 0
           ptp_rq0_mpwqe_filler_cqes: 0
           ptp_rq0_mpwqe_filler_strides: 0
           ptp_rq0_oversize_pkts_sw_drop: 0
           ptp_rq0_buff_alloc_err: 0
           ptp_rq0_cqe_compress_blks: 0
           ptp_rq0_cqe_compress_pkts: 0
           ptp_rq0_cache_reuse: 0
           ptp_rq0_cache_full: 0
           ptp_rq0_cache_empty: 256
           ptp_rq0_cache_busy: 0
           ptp_rq0_cache_waive: 0
           ptp_rq0_congst_umr: 0
           ptp_rq0_arfs_err: 0
           ptp_rq0_recover: 0
      
      Fixes: a28359e9 ("net/mlx5e: Add PTP-RX statistics")
      Signed-off-by: default avatarLama Kayal <lkayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      dd1979cf
    • Shay Drory's avatar
      net/mlx5: Fix setting number of EQs of SFs · f88c4876
      Shay Drory authored
      When setting number of completion EQs of the SF, consider number of
      online CPUs.
      Without this consideration, when number of online cpus are less than 8,
      unnecessary 8 completion EQs are allocated.
      
      Fixes: c36326d3 ("net/mlx5: Round-Robin EQs over IRQs")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f88c4876
    • Shay Drory's avatar
      net/mlx5: Fix length of irq_index in chars · ac8b7d50
      Shay Drory authored
      The maximum irq_index can be 2047, This means irq_name should have 4
      characters reserve for the irq_index. Hence, increase it to 4.
      
      Fixes: 3af26495 ("net/mlx5: Enlarge interrupt field in CREATE_EQ")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ac8b7d50
    • Aya Levin's avatar
      net/mlx5: Avoid generating event after PPS out in Real time mode · 99b9a678
      Aya Levin authored
      When in Real-time mode, HW clock is synced with the PTP daemon. Hence
      driver should not re-calibrate the next pulse (via MTPPSE repetitive
      events mechanism).
      
      This patch arms repetitive events only in free-running mode.
      
      Fixes: 432119de ("net/mlx5: Add cyc2time HW translation mode support")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      99b9a678
    • Aya Levin's avatar
      net/mlx5: Force round second at 1PPS out start time · 64728294
      Aya Levin authored
      Allow configuration of 1PPS start time only with time-stamp representing
      a round second. Prior to this patch driver allowed setting of a
      non-round-second which is not supported by the device. Avoid unexpected
      behavior by restricting start-time configuration to a round-second.
      
      Fixes: 4272f9b8 ("net/mlx5e: Change 1PPS out scheme")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      64728294
    • Moshe Shemesh's avatar
      net/mlx5: E-Switch, Fix double allocation of acl flow counter · a586775f
      Moshe Shemesh authored
      Flow counter is allocated in eswitch legacy acl setting functions
      without checking if already allocated by previous setting. Add a check
      to avoid such double allocation.
      
      Fixes: 07bab950 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
      Fixes: ea651a86 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a586775f
    • Tariq Toukan's avatar
      net/mlx5e: Improve MQPRIO resiliency · 7dbc849b
      Tariq Toukan authored
      * Add netdev->tc_to_txq rollback in case of failure in
        mlx5e_update_netdev_queues().
      * Fix broken transition between the two modes:
        MQPRIO DCB mode with tc==8, and MQPRIO channel mode.
      * Disable MQPRIO channel mode if re-attaching with a different number
        of channels.
      * Improve code sharing.
      
      Fixes: ec60c458 ("net/mlx5e: Support MQPRIO channel mode")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7dbc849b
    • Tariq Toukan's avatar
      net/mlx5e: Keep the value for maximum number of channels in-sync · 9d758d4a
      Tariq Toukan authored
      The value for maximum number of channels is first calculated based
      on the netdev's profile and current function resources (specifically,
      number of MSIX vectors, which depends among other things on the number
      of online cores in the system).
      This value is then used to calculate the netdev's number of rxqs/txqs.
      Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
      is constant and we must not exceed it.
      
      To achieve this, keep the maximum number of channels in sync upon any
      netdevice re-attach.
      
      Use mlx5e_get_max_num_channels() for calculating the number of netdev's
      rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
      coinsiders core device resources, profile, and netdev) to init or
      update priv->max_nch.
      
      Before this patch, the value of priv->max_nch might get out of sync,
      mistakenly allowing accesses to out-of-bounds objects, which would
      crash the system.
      
      Track the number of channels stats structures used in a separate
      field, as they are persistent to suspend/resume operations. All the
      collected stats of every channel index that ever existed should be
      preserved. They are reset only when struct mlx5e_priv is,
      in mlx5e_priv_cleanup(), which is part of the profile changing flow.
      
      There is no point anymore in blocking a profile change due to max_nch
      mismatch in mlx5e_netdev_change_profile(). Remove the limitation.
      
      Fixes: a1f240f1 ("net/mlx5e: Adjust to max number of channles when re-attaching")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      9d758d4a
    • Raed Salem's avatar
      net/mlx5e: IPSEC RX, enable checksum complete · f9a10440
      Raed Salem authored
      Currently in Rx data path IPsec crypto offloaded packets uses
      csum_none flag, so checksum is handled by the stack, this naturally
      have some performance/cpu utilization impact on such flows. As Nvidia
      NIC starting from ConnectX6DX provides checksum complete value out of
      the box also for such flows there is no sense in taking csum_none path,
      furthermore the stack (xfrm) have the method to handle checksum complete
      corrections for such flows i.e. IPsec trailer removal and consequently
      checksum value adjustment.
      
      Because of the above and in addition the ConnectX6DX is the first HW
      which supports IPsec crypto offload then it is safe to report csum
      complete for IPsec offloaded traffic.
      
      Fixes: b2ac7541 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
      Signed-off-by: default avatarRaed Salem <raeds@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f9a10440