1. 26 Jul, 2023 28 commits
    • Shay Drory's avatar
      net/mlx5: Unregister devlink params in case interface is down · 53d737df
      Shay Drory authored
      Currently, in case an interface is down, mlx5 driver doesn't
      unregister its devlink params, which leads to this WARN[1].
      Fix it by unregistering devlink params in that case as well.
      
      [1]
      [  295.244769 ] WARNING: CPU: 15 PID: 1 at net/core/devlink.c:9042 devlink_free+0x174/0x1fc
      [  295.488379 ] CPU: 15 PID: 1 Comm: shutdown Tainted: G S         OE 5.15.0-1017.19.3.g0677e61-bluefield #g0677e61
      [  295.509330 ] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS 4.2.0.12761 Jun  6 2023
      [  295.543096 ] pc : devlink_free+0x174/0x1fc
      [  295.551104 ] lr : mlx5_devlink_free+0x18/0x2c [mlx5_core]
      [  295.561816 ] sp : ffff80000809b850
      [  295.711155 ] Call trace:
      [  295.716030 ]  devlink_free+0x174/0x1fc
      [  295.723346 ]  mlx5_devlink_free+0x18/0x2c [mlx5_core]
      [  295.733351 ]  mlx5_sf_dev_remove+0x98/0xb0 [mlx5_core]
      [  295.743534 ]  auxiliary_bus_remove+0x2c/0x50
      [  295.751893 ]  __device_release_driver+0x19c/0x280
      [  295.761120 ]  device_release_driver+0x34/0x50
      [  295.769649 ]  bus_remove_device+0xdc/0x170
      [  295.777656 ]  device_del+0x17c/0x3a4
      [  295.784620 ]  mlx5_sf_dev_remove+0x28/0xf0 [mlx5_core]
      [  295.794800 ]  mlx5_sf_dev_table_destroy+0x98/0x110 [mlx5_core]
      [  295.806375 ]  mlx5_unload+0x34/0xd0 [mlx5_core]
      [  295.815339 ]  mlx5_unload_one+0x70/0xe4 [mlx5_core]
      [  295.824998 ]  shutdown+0xb0/0xd8 [mlx5_core]
      [  295.833439 ]  pci_device_shutdown+0x3c/0xa0
      [  295.841651 ]  device_shutdown+0x170/0x340
      [  295.849486 ]  __do_sys_reboot+0x1f4/0x2a0
      [  295.857322 ]  __arm64_sys_reboot+0x2c/0x40
      [  295.865329 ]  invoke_syscall+0x78/0x100
      [  295.872817 ]  el0_svc_common.constprop.0+0x54/0x184
      [  295.882392 ]  do_el0_svc+0x30/0xac
      [  295.889008 ]  el0_svc+0x48/0x160
      [  295.895278 ]  el0t_64_sync_handler+0xa4/0x130
      [  295.903807 ]  el0t_64_sync+0x1a4/0x1a8
      [  295.911120 ] ---[ end trace 4f1d2381d00d9dce  ]---
      
      Fixes: fe578cbb ("net/mlx5: Move devlink registration before mlx5_load")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      53d737df
    • Shay Drory's avatar
      net/mlx5: DR, Fix peer domain namespace setting · 62752c0b
      Shay Drory authored
      The offending patch is based on the assumption that for PFs,
      mlx5_get_dev_index() is the same as vhca_id. However, this assumption
      is wrong in case of DPU (ECPF).
      Fix it by using vhca_id directly, and switch the array of peers to
      xarray.
      
      Fixes: 6d5b7321 ("net/mlx5: DR, handle more than one peer domain")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      62752c0b
    • Chris Mi's avatar
      net/mlx5: fs_chains: Fix ft prio if ignore_flow_level is not supported · 61eab651
      Chris Mi authored
      The cited commit sets ft prio to fs_base_prio. But if
      ignore_flow_level it not supported, ft prio must be set based on
      tc filter prio. Otherwise, all the ft prio are the same on the same
      chain. It is invalid if ignore_flow_level is not supported.
      
      Fix it by setting ft prio based on tc filter prio and setting
      fs_base_prio to 0 for fdb.
      
      Fixes: 8e80e564 ("net/mlx5: fs_chains: Refactor to detach chains from tc usage")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarPaul Blakey <paulb@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      61eab651
    • Jianbo Liu's avatar
      net/mlx5e: kTLS, Fix protection domain in use syndrome when devlink reload · 3e4cf1dd
      Jianbo Liu authored
      There are DEK objects cached in DEK pool after kTLS is used, and they
      are freed only in mlx5e_ktls_cleanup().
      
      mlx5e_destroy_mdev_resources() is called in mlx5e_suspend() to
      free mdev resources, including protection domain (PD). However, PD is
      still referenced by the cached DEK objects in this case, because
      profile->cleanup() (and therefore mlx5e_ktls_cleanup()) is called
      after mlx5e_suspend() during devlink reload. So the following FW
      syndrome is generated:
      
       mlx5_cmd_out_err:803:(pid 12948): DEALLOC_PD(0x801) op_mod(0x0) failed,
          status bad resource state(0x9), syndrome (0xef0c8a), err(-22)
      
      To avoid this syndrome, move DEK pool destruction to
      mlx5e_ktls_cleanup_tx(), which is called by profile->cleanup_tx(). And
      move pool creation to mlx5e_ktls_init_tx() for symmetry.
      
      Fixes: f741db1a ("net/mlx5e: kTLS, Improve connection rate by using fast update encryption key")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3e4cf1dd
    • Vlad Buslov's avatar
      net/mlx5: Bridge, set debugfs access right to root-only · eb02b93a
      Vlad Buslov authored
      As suggested during code review set the access rights for bridge 'fdb'
      debugfs file to root-only.
      
      Fixes: 791eb782 ("net/mlx5: Bridge, expose FDB state via debugfs")
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/netdev/20230619120515.5045132a@kernel.org/Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      eb02b93a
    • Dragos Tatulea's avatar
      net/mlx5e: xsk: Fix crash on regular rq reactivation · 39646d9b
      Dragos Tatulea authored
      When the regular rq is reactivated after the XSK socket is closed
      it could be reading stale cqes which eventually corrupts the rq.
      This leads to no more traffic being received on the regular rq and a
      crash on the next close or deactivation of the rq.
      
      Kal Cuttler Conely reported this issue as a crash on the release
      path when the xdpsock sample program is stopped (killed) and restarted
      in sequence while traffic is running.
      
      This patch flushes all cqes when during the rq flush. The cqe flushing
      is done in the reset state of the rq. mlx5e_rq_to_ready code is moved
      into the flush function to allow for this.
      
      Fixes: 082a9edf ("net/mlx5e: xsk: Flush RQ on XSK activation to save memory")
      Reported-by: default avatarKal Cutter Conley <kal.conley@dectris.com>
      Closes: https://lore.kernel.org/xdp-newbies/CAHApi-nUAs4TeFWUDV915CZJo07XVg2Vp63-no7UDfj6wur9nQ@mail.gmail.comSigned-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      39646d9b
    • Dragos Tatulea's avatar
      net/mlx5e: xsk: Fix invalid buffer access for legacy rq · e0f52298
      Dragos Tatulea authored
      The below crash can be encountered when using xdpsock in rx mode for
      legacy rq: the buffer gets released in the XDP_REDIRECT path, and then
      once again in the driver. This fix sets the flag to avoid releasing on
      the driver side.
      
      XSK handling of buffers for legacy rq was relying on the caller to set
      the skip release flag. But the referenced fix started using fragment
      counts for pages instead of the skip flag.
      
      Crash log:
       general protection fault, probably for non-canonical address 0xffff8881217e3a: 0000 [#1] SMP
       CPU: 0 PID: 14 Comm: ksoftirqd/0 Not tainted 6.5.0-rc1+ #31
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       RIP: 0010:bpf_prog_03b13f331978c78c+0xf/0x28
       Code:  ...
       RSP: 0018:ffff88810082fc98 EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff888138404901 RCX: c0ffffc900027cbc
       RDX: ffffffffa000b514 RSI: 00ffff8881217e32 RDI: ffff888138404901
       RBP: ffff88810082fc98 R08: 0000000000091100 R09: 0000000000000006
       R10: 0000000000000800 R11: 0000000000000800 R12: ffffc9000027a000
       R13: ffff8881217e2dc0 R14: ffff8881217e2910 R15: ffff8881217e2f00
       FS:  0000000000000000(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000564cb2e2cde0 CR3: 000000010e603004 CR4: 0000000000370eb0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <TASK>
        ? die_addr+0x32/0x80
        ? exc_general_protection+0x192/0x390
        ? asm_exc_general_protection+0x22/0x30
        ? 0xffffffffa000b514
        ? bpf_prog_03b13f331978c78c+0xf/0x28
        mlx5e_xdp_handle+0x48/0x670 [mlx5_core]
        ? dev_gro_receive+0x3b5/0x6e0
        mlx5e_xsk_skb_from_cqe_linear+0x6e/0x90 [mlx5_core]
        mlx5e_handle_rx_cqe+0x55/0x100 [mlx5_core]
        mlx5e_poll_rx_cq+0x87/0x6e0 [mlx5_core]
        mlx5e_napi_poll+0x45e/0x6b0 [mlx5_core]
        __napi_poll+0x25/0x1a0
        net_rx_action+0x28a/0x300
        __do_softirq+0xcd/0x279
        ? sort_range+0x20/0x20
        run_ksoftirqd+0x1a/0x20
        smpboot_thread_fn+0xa2/0x130
        kthread+0xc9/0xf0
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
       Modules linked in: mlx5_ib mlx5_core rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter overlay zram zsmalloc fuse [last unloaded: mlx5_core]
       ---[ end trace 0000000000000000 ]---
      
      Fixes: 7abd955a ("net/mlx5e: RX, Fix page_pool page fragment tracking for XDP")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e0f52298
    • Jianbo Liu's avatar
      net/mlx5e: Move representor neigh cleanup to profile cleanup_tx · d03b6e6f
      Jianbo Liu authored
      For IP tunnel encapsulation in ECMP (Equal-Cost Multipath) mode, as
      the flow is duplicated to the peer eswitch, the related neighbour
      information on the peer uplink representor is created as well.
      
      In the cited commit, eswitch devcom unpair is moved to uplink unload
      API, specifically the profile->cleanup_tx. If there is a encap rule
      offloaded in ECMP mode, when one eswitch does unpair (because of
      unloading the driver, for instance), and the peer rule from the peer
      eswitch is going to be deleted, the use-after-free error is triggered
      while accessing neigh info, as it is already cleaned up in uplink's
      profile->disable, which is before its profile->cleanup_tx.
      
      To fix this issue, move the neigh cleanup to profile's cleanup_tx
      callback, and after mlx5e_cleanup_uplink_rep_tx is called. The neigh
      init is moved to init_tx for symmeter.
      
      [ 2453.376299] BUG: KASAN: slab-use-after-free in mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.379125] Read of size 4 at addr ffff888127af9008 by task modprobe/2496
      
      [ 2453.381542] CPU: 7 PID: 2496 Comm: modprobe Tainted: G    B              6.4.0-rc7+ #15
      [ 2453.383386] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [ 2453.384335] Call Trace:
      [ 2453.384625]  <TASK>
      [ 2453.384891]  dump_stack_lvl+0x33/0x50
      [ 2453.385285]  print_report+0xc2/0x610
      [ 2453.385667]  ? __virt_addr_valid+0xb1/0x130
      [ 2453.386091]  ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.386757]  kasan_report+0xae/0xe0
      [ 2453.387123]  ? mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.387798]  mlx5e_rep_neigh_entry_release+0x109/0x3a0 [mlx5_core]
      [ 2453.388465]  mlx5e_rep_encap_entry_detach+0xa6/0xe0 [mlx5_core]
      [ 2453.389111]  mlx5e_encap_dealloc+0xa7/0x100 [mlx5_core]
      [ 2453.389706]  mlx5e_tc_tun_encap_dests_unset+0x61/0xb0 [mlx5_core]
      [ 2453.390361]  mlx5_free_flow_attr_actions+0x11e/0x340 [mlx5_core]
      [ 2453.391015]  ? complete_all+0x43/0xd0
      [ 2453.391398]  ? free_flow_post_acts+0x38/0x120 [mlx5_core]
      [ 2453.392004]  mlx5e_tc_del_fdb_flow+0x4ae/0x690 [mlx5_core]
      [ 2453.392618]  mlx5e_tc_del_fdb_peers_flow+0x308/0x370 [mlx5_core]
      [ 2453.393276]  mlx5e_tc_clean_fdb_peer_flows+0xf5/0x140 [mlx5_core]
      [ 2453.393925]  mlx5_esw_offloads_unpair+0x86/0x540 [mlx5_core]
      [ 2453.394546]  ? mlx5_esw_offloads_set_ns_peer.isra.0+0x180/0x180 [mlx5_core]
      [ 2453.395268]  ? down_write+0xaa/0x100
      [ 2453.395652]  mlx5_esw_offloads_devcom_event+0x203/0x530 [mlx5_core]
      [ 2453.396317]  mlx5_devcom_send_event+0xbb/0x190 [mlx5_core]
      [ 2453.396917]  mlx5_esw_offloads_devcom_cleanup+0xb0/0xd0 [mlx5_core]
      [ 2453.397582]  mlx5e_tc_esw_cleanup+0x42/0x120 [mlx5_core]
      [ 2453.398182]  mlx5e_rep_tc_cleanup+0x15/0x30 [mlx5_core]
      [ 2453.398768]  mlx5e_cleanup_rep_tx+0x6c/0x80 [mlx5_core]
      [ 2453.399367]  mlx5e_detach_netdev+0xee/0x120 [mlx5_core]
      [ 2453.399957]  mlx5e_netdev_change_profile+0x84/0x170 [mlx5_core]
      [ 2453.400598]  mlx5e_vport_rep_unload+0xe0/0xf0 [mlx5_core]
      [ 2453.403781]  mlx5_eswitch_unregister_vport_reps+0x15e/0x190 [mlx5_core]
      [ 2453.404479]  ? mlx5_eswitch_register_vport_reps+0x200/0x200 [mlx5_core]
      [ 2453.405170]  ? up_write+0x39/0x60
      [ 2453.405529]  ? kernfs_remove_by_name_ns+0xb7/0xe0
      [ 2453.405985]  auxiliary_bus_remove+0x2e/0x40
      [ 2453.406405]  device_release_driver_internal+0x243/0x2d0
      [ 2453.406900]  ? kobject_put+0x42/0x2d0
      [ 2453.407284]  bus_remove_device+0x128/0x1d0
      [ 2453.407687]  device_del+0x240/0x550
      [ 2453.408053]  ? waiting_for_supplier_show+0xe0/0xe0
      [ 2453.408511]  ? kobject_put+0xfa/0x2d0
      [ 2453.408889]  ? __kmem_cache_free+0x14d/0x280
      [ 2453.409310]  mlx5_rescan_drivers_locked.part.0+0xcd/0x2b0 [mlx5_core]
      [ 2453.409973]  mlx5_unregister_device+0x40/0x50 [mlx5_core]
      [ 2453.410561]  mlx5_uninit_one+0x3d/0x110 [mlx5_core]
      [ 2453.411111]  remove_one+0x89/0x130 [mlx5_core]
      [ 2453.411628]  pci_device_remove+0x59/0xf0
      [ 2453.412026]  device_release_driver_internal+0x243/0x2d0
      [ 2453.412511]  ? parse_option_str+0x14/0x90
      [ 2453.412915]  driver_detach+0x7b/0xf0
      [ 2453.413289]  bus_remove_driver+0xb5/0x160
      [ 2453.413685]  pci_unregister_driver+0x3f/0xf0
      [ 2453.414104]  mlx5_cleanup+0xc/0x20 [mlx5_core]
      
      Fixes: 2be5bd42 ("net/mlx5: Handle pairing of E-switch via uplink un/load APIs")
      Signed-off-by: default avatarJianbo Liu <jianbol@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d03b6e6f
    • Amir Tzin's avatar
      net/mlx5e: Fix crash moving to switchdev mode when ntuple offload is set · 3ec43c1b
      Amir Tzin authored
      Moving to switchdev mode with ntuple offload on causes the kernel to
      crash since fs->arfs is freed during nic profile cleanup flow.
      
      Ntuple offload is not supported in switchdev mode and it is already
      unset by mlx5 fix feature ndo in switchdev mode. Verify fs->arfs is
      valid before disabling it.
      
      trace:
      [] RIP: 0010:_raw_spin_lock_bh+0x17/0x30
      [] arfs_del_rules+0x44/0x1a0 [mlx5_core]
      [] mlx5e_arfs_disable+0xe/0x20 [mlx5_core]
      [] mlx5e_handle_feature+0x3d/0xb0 [mlx5_core]
      [] ? __rtnl_unlock+0x25/0x50
      [] mlx5e_set_features+0xfe/0x160 [mlx5_core]
      [] __netdev_update_features+0x278/0xa50
      [] ? netdev_run_todo+0x5e/0x2a0
      [] netdev_update_features+0x22/0x70
      [] ? _cond_resched+0x15/0x30
      [] mlx5e_attach_netdev+0x12a/0x1e0 [mlx5_core]
      [] mlx5e_netdev_attach_profile+0xa1/0xc0 [mlx5_core]
      [] mlx5e_netdev_change_profile+0x77/0xe0 [mlx5_core]
      [] mlx5e_vport_rep_load+0x1ed/0x290 [mlx5_core]
      [] mlx5_esw_offloads_rep_load+0x88/0xd0 [mlx5_core]
      [] esw_offloads_load_rep.part.38+0x31/0x50 [mlx5_core]
      [] esw_offloads_enable+0x6c5/0x710 [mlx5_core]
      [] mlx5_eswitch_enable_locked+0x1bb/0x290 [mlx5_core]
      [] mlx5_devlink_eswitch_mode_set+0x14f/0x320 [mlx5_core]
      [] devlink_nl_cmd_eswitch_set_doit+0x94/0x120
      [] genl_family_rcv_msg_doit.isra.17+0x113/0x150
      [] genl_family_rcv_msg+0xb7/0x170
      [] ? devlink_nl_cmd_port_split_doit+0x100/0x100
      [] genl_rcv_msg+0x47/0xa0
      [] ? genl_family_rcv_msg+0x170/0x170
      [] netlink_rcv_skb+0x4c/0x130
      [] genl_rcv+0x24/0x40
      [] netlink_unicast+0x19a/0x230
      [] netlink_sendmsg+0x204/0x3d0
      [] sock_sendmsg+0x50/0x60
      
      Fixes: 90b22b9b ("net/mlx5e: Disable Rx ntuple offload for uplink representor")
      Signed-off-by: default avatarAmir Tzin <amirtz@nvidia.com>
      Reviewed-by: default avatarAya Levin <ayal@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3ec43c1b
    • Chris Mi's avatar
      net/mlx5e: Don't hold encap tbl lock if there is no encap action · 93a33193
      Chris Mi authored
      The cited commit holds encap tbl lock unconditionally when setting
      up dests. But it may cause the following deadlock:
      
       PID: 1063722  TASK: ffffa062ca5d0000  CPU: 13   COMMAND: "handler8"
        #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
        #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
        #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
        #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
        #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
        #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
        #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
        #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
        #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
        #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
       #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
       #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
       #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
       #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
       #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
       #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
       #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0
      
      [1031218.028143]  wait_for_completion+0x24/0x30
      [1031218.028589]  mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
      [1031218.029256]  mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
      [1031218.029885]  process_one_work+0x24e/0x510
      
      Actually no need to hold encap tbl lock if there is no encap action.
      Fix it by checking if encap action exists or not before holding
      encap tbl lock.
      
      Fixes: 37c3b9fa ("net/mlx5e: Prevent encap offload when neigh update is running")
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      93a33193
    • Shay Drory's avatar
      net/mlx5: Honor user input for migratable port fn attr · 0507f2c8
      Shay Drory authored
      Currently, whenever a user is setting migratable port fn attr, the
      driver is always turn migratable capability on.
      Fix it by honor the user input
      
      Fixes: e5b9642a ("net/mlx5: E-Switch, Implement devlink port function cmds to control migratable")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0507f2c8
    • Yuanjun Gong's avatar
      net/mlx5e: fix return value check in mlx5e_ipsec_remove_trailer() · e5bcb756
      Yuanjun Gong authored
      mlx5e_ipsec_remove_trailer() should return an error code if function
      pskb_trim() returns an unexpected value.
      
      Fixes: 2ac9cfe7 ("net/mlx5e: IPSec, Add Innova IPSec offload TX data path")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      e5bcb756
    • Zhengchao Shao's avatar
      net/mlx5: fix potential memory leak in mlx5e_init_rep_rx · c6cf0b60
      Zhengchao Shao authored
      The memory pointed to by the priv->rx_res pointer is not freed in the error
      path of mlx5e_init_rep_rx, which can lead to a memory leak. Fix by freeing
      the memory in the error path, thereby making the error path identical to
      mlx5e_cleanup_rep_rx().
      
      Fixes: af8bbf73 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c6cf0b60
    • Zhengchao Shao's avatar
      net/mlx5: DR, fix memory leak in mlx5dr_cmd_create_reformat_ctx · 5dd77585
      Zhengchao Shao authored
      when mlx5_cmd_exec failed in mlx5dr_cmd_create_reformat_ctx, the memory
      pointed by 'in' is not released, which will cause memory leak. Move memory
      release after mlx5_cmd_exec.
      
      Fixes: 1d918647 ("net/mlx5: DR, Add direct rule command utilities")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      5dd77585
    • Zhengchao Shao's avatar
      net/mlx5e: fix double free in macsec_fs_tx_create_crypto_table_groups · aeb66017
      Zhengchao Shao authored
      In function macsec_fs_tx_create_crypto_table_groups(), when the ft->g
      memory is successfully allocated but the 'in' memory fails to be
      allocated, the memory pointed to by ft->g is released once. And in function
      macsec_fs_tx_create(), macsec_fs_tx_destroy() is called to release the
      memory pointed to by ft->g again. This will cause double free problem.
      
      Fixes: e467b283 ("net/mlx5e: Add MACsec TX steering rules")
      Signed-off-by: default avatarZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      aeb66017
    • Jakub Kicinski's avatar
      Merge branch 'tools-ynl-gen-fix-parse-multi-attr-enum-attribute' · fa29d467
      Jakub Kicinski authored
      Arkadiusz Kubalewski says:
      
      ====================
      tools: ynl-gen: fix parse multi-attr enum attribute
      
      Fix the issues with parsing enums in ynl.py script.
      ====================
      
      Link: https://lore.kernel.org/r/20230725101642.267248-1-arkadiusz.kubalewski@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa29d467
    • Arkadiusz Kubalewski's avatar
      tools: ynl-gen: fix parse multi-attr enum attribute · df15c15e
      Arkadiusz Kubalewski authored
      When attribute is enum type and marked as multi-attr, the netlink
      respond is not parsed, fails with stack trace:
      Traceback (most recent call last):
        File "/net-next/tools/net/ynl/./test.py", line 520, in <module>
          main()
        File "/net-next/tools/net/ynl/./test.py", line 488, in main
          dplls=dplls_get(282574471561216)
        File "/net-next/tools/net/ynl/./test.py", line 48, in dplls_get
          reply=act(args)
        File "/net-next/tools/net/ynl/./test.py", line 41, in act
          reply = ynl.dump(args.dump, attrs)
        File "/net-next/tools/net/ynl/lib/ynl.py", line 598, in dump
          return self._op(method, vals, dump=True)
        File "/net-next/tools/net/ynl/lib/ynl.py", line 584, in _op
          rsp_msg = self._decode(gm.raw_attrs, op.attr_set.name)
        File "/net-next/tools/net/ynl/lib/ynl.py", line 451, in _decode
          self._decode_enum(rsp, attr_spec)
        File "/net-next/tools/net/ynl/lib/ynl.py", line 408, in _decode_enum
          value = enum.entries_by_val[raw].name
      TypeError: unhashable type: 'list'
      error: 1
      
      Redesign _decode_enum(..) to take a enum int value and translate
      it to either a bitmask or enum name as expected.
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Reviewed-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/r/20230725101642.267248-3-arkadiusz.kubalewski@intel.comReviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      df15c15e
    • Arkadiusz Kubalewski's avatar
      tools: ynl-gen: fix enum index in _decode_enum(..) · d7ddf5f4
      Arkadiusz Kubalewski authored
      Remove wrong index adjustment, which is leftover from adding
      support for sparse enums.
      enum.entries_by_val() function shall not subtract the start-value, as
      it is indexed with real enum value.
      
      Fixes: c311aaa7 ("tools: ynl: fix enum-as-flags in the generic CLI")
      Signed-off-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Reviewed-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Link: https://lore.kernel.org/r/20230725101642.267248-2-arkadiusz.kubalewski@intel.comReviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d7ddf5f4
    • Muhammad Husaini Zulkifli's avatar
      igc: Fix Kernel Panic during ndo_tx_timeout callback · d4a7ce64
      Muhammad Husaini Zulkifli authored
      The Xeon validation group has been carrying out some loaded tests
      with various HW configurations, and they have seen some transmit
      queue time out happening during the test. This will cause the
      reset adapter function to be called by igc_tx_timeout().
      Similar race conditions may arise when the interface is being brought
      down and up in igc_reinit_locked(), an interrupt being generated, and
      igc_clean_tx_irq() being called to complete the TX.
      
      When the igc_tx_timeout() function is invoked, this patch will turn
      off all TX ring HW queues during igc_down() process. TX ring HW queues
      will be activated again during the igc_configure_tx_ring() process
      when performing the igc_up() procedure later.
      
      This patch also moved existing igc_disable_tx_ring_hw() to avoid using
      forward declaration.
      
      Kernel trace:
      [ 7678.747813] ------------[ cut here ]------------
      [ 7678.757914] NETDEV WATCHDOG: enp1s0 (igc): transmit queue 2 timed out
      [ 7678.770117] WARNING: CPU: 0 PID: 13 at net/sched/sch_generic.c:525 dev_watchdog+0x1ae/0x1f0
      [ 7678.784459] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat
      nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO)
      cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO) svfs_pci_hotplug(PO)
      vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO) svheartbeat(PO) ioapic(PO)
      sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO) smbus(PO) spiflash_cdf(PO) arden(PO)
      dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO) pch(PO) sviotargets(PO) svbdf(PO) svmem(PO)
      svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO) svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO)
      fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O) ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO)
      regsupport(O) libnvdimm nls_cp437 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel
      snd_intel_dspcfg snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
      [ 7678.784496]  input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm fuse backlight
      configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic pegasus mmc_block usbhid
      mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa scsi_transport_sas e1000e e1000 e100 ax88179_178a
      usbnet xhci_pci sd_mod xhci_hcd t10_pi crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore
      crct10dif_generic ptp crct10dif_common usb_common pps_core
      [ 7679.200403] RIP: 0010:dev_watchdog+0x1ae/0x1f0
      [ 7679.210201] Code: 28 e9 53 ff ff ff 4c 89 e7 c6 05 06 42 b9 00 01 e8 17 d1 fb ff 44 89 e9 4c
      89 e6 48 c7 c7 40 ad fb 81 48 89 c2 e8 52 62 82 ff <0f> 0b e9 72 ff ff ff 65 8b 05 80 7d 7c 7e
      89 c0 48 0f a3 05 0a c1
      [ 7679.245438] RSP: 0018:ffa00000001f7d90 EFLAGS: 00010282
      [ 7679.256021] RAX: 0000000000000000 RBX: ff11000109938440 RCX: 0000000000000000
      [ 7679.268710] RDX: ff11000361e26cd8 RSI: ff11000361e1b880 RDI: ff11000361e1b880
      [ 7679.281314] RBP: ffa00000001f7da8 R08: ff1100035f8fffe8 R09: 0000000000027ffb
      [ 7679.293840] R10: 0000000000001f0a R11: ff1100035f840000 R12: ff11000109938000
      [ 7679.306276] R13: 0000000000000002 R14: dead000000000122 R15: ffa00000001f7e18
      [ 7679.318648] FS:  0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
      [ 7679.332064] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 7679.342757] CR2: 00007ffff7fca168 CR3: 000000013b08a006 CR4: 0000000000471ef8
      [ 7679.354984] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 7679.367207] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
      [ 7679.379370] PKRU: 55555554
      [ 7679.386446] Call Trace:
      [ 7679.393152]  <TASK>
      [ 7679.399363]  ? __pfx_dev_watchdog+0x10/0x10
      [ 7679.407870]  call_timer_fn+0x31/0x110
      [ 7679.415698]  expire_timers+0xb2/0x120
      [ 7679.423403]  run_timer_softirq+0x179/0x1e0
      [ 7679.431532]  ? __schedule+0x2b1/0x820
      [ 7679.439078]  __do_softirq+0xd1/0x295
      [ 7679.446426]  ? __pfx_smpboot_thread_fn+0x10/0x10
      [ 7679.454867]  run_ksoftirqd+0x22/0x30
      [ 7679.462058]  smpboot_thread_fn+0xb7/0x160
      [ 7679.469670]  kthread+0xcd/0xf0
      [ 7679.476097]  ? __pfx_kthread+0x10/0x10
      [ 7679.483211]  ret_from_fork+0x29/0x50
      [ 7679.490047]  </TASK>
      [ 7679.495204] ---[ end trace 0000000000000000 ]---
      [ 7679.503179] igc 0000:01:00.0 enp1s0: Register Dump
      [ 7679.511230] igc 0000:01:00.0 enp1s0: Register Name   Value
      [ 7679.519892] igc 0000:01:00.0 enp1s0: CTRL            181c0641
      [ 7679.528782] igc 0000:01:00.0 enp1s0: STATUS          40280683
      [ 7679.537551] igc 0000:01:00.0 enp1s0: CTRL_EXT        10000040
      [ 7679.546284] igc 0000:01:00.0 enp1s0: MDIC            180a3800
      [ 7679.554942] igc 0000:01:00.0 enp1s0: ICR             00000081
      [ 7679.563503] igc 0000:01:00.0 enp1s0: RCTL            04408022
      [ 7679.571963] igc 0000:01:00.0 enp1s0: RDLEN[0-3]      00001000 00001000 00001000 00001000
      [ 7679.583075] igc 0000:01:00.0 enp1s0: RDH[0-3]        00000068 000000b6 0000000f 00000031
      [ 7679.594162] igc 0000:01:00.0 enp1s0: RDT[0-3]        00000066 000000b2 0000000e 00000030
      [ 7679.605174] igc 0000:01:00.0 enp1s0: RXDCTL[0-3]     02040808 02040808 02040808 02040808
      [ 7679.616196] igc 0000:01:00.0 enp1s0: RDBAL[0-3]      1bb7c000 1bb7f000 1bb82000 0ef33000
      [ 7679.627242] igc 0000:01:00.0 enp1s0: RDBAH[0-3]      00000001 00000001 00000001 00000001
      [ 7679.638256] igc 0000:01:00.0 enp1s0: TCTL            a503f0fa
      [ 7679.646607] igc 0000:01:00.0 enp1s0: TDBAL[0-3]      2ba4a000 1bb6f000 1bb74000 1bb79000
      [ 7679.657609] igc 0000:01:00.0 enp1s0: TDBAH[0-3]      00000001 00000001 00000001 00000001
      [ 7679.668551] igc 0000:01:00.0 enp1s0: TDLEN[0-3]      00001000 00001000 00001000 00001000
      [ 7679.679470] igc 0000:01:00.0 enp1s0: TDH[0-3]        000000a7 0000002d 000000bf 000000d9
      [ 7679.690406] igc 0000:01:00.0 enp1s0: TDT[0-3]        000000a7 0000002d 000000bf 000000d9
      [ 7679.701264] igc 0000:01:00.0 enp1s0: TXDCTL[0-3]     02100108 02100108 02100108 02100108
      [ 7679.712123] igc 0000:01:00.0 enp1s0: Reset adapter
      [ 7683.085967] igc 0000:01:00.0 enp1s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
      [ 8086.945561] ------------[ cut here ]------------
      Entering kdb (current=0xffffffff8220b200, pid 0) on processor 0
      Oops: (null) due to oops @ 0xffffffff81573888
      RIP: 0010:dql_completed+0x148/0x160
      Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db 0f 95
      c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89
      c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc
      RSP: 0018:ffa0000000003e00 EFLAGS: 00010287
      RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000
      RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480
      RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28
      R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9
      R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048
      FS:  0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <IRQ>
       ? igc_poll+0x1a9/0x14d0 [igc]
       __napi_poll+0x2e/0x1b0
       net_rx_action+0x126/0x250
       __do_softirq+0xd1/0x295
       irq_exit_rcu+0xc5/0xf0
       common_interrupt+0x86/0xa0
       </IRQ>
       <TASK>
       asm_common_interrupt+0x27/0x40
      RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
      Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8 1b
      de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf
      4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d
      RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
      RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
      RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
      RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
      R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
      R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
       cpuidle_enter+0x2e/0x50
       call_cpuidle+0x23/0x40
       do_idle+0x1be/0x220
       cpu_startup_entry+0x20/0x30
       rest_init+0xb5/0xc0
       arch_call_rest_init+0xe/0x30
       start_kernel+0x448/0x760
       x86_64_start_kernel+0x109/0x150
       secondary_startup_64_no_verify+0xe0/0xeb
       </TASK>
      more>
      [0]kdb>
      
      [0]kdb>
      [0]kdb> go
      Catastrophic error detected
      kdb_continue_catastrophic=0, type go a second time if you really want to
      continue
      [0]kdb> go
      Catastrophic error detected
      kdb_continue_catastrophic=0, attempting to continue
      [ 8086.955689] refcount_t: underflow; use-after-free.
      [ 8086.955697] WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0xc2/0x110
      [ 8086.955706] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype nft_compat
      nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO) rktpm(PO)
      cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO)
      svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO)
      svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO)
      smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO)
      pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO)
      svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O)
      ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437
      snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
      snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
      [ 8086.955751]  input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm
      fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic
      pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa
      scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi
      crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common
      usb_common pps_core
      [ 8086.955784] RIP: 0010:refcount_warn_saturate+0xc2/0x110
      [ 8086.955788] Code: 01 e8 82 e7 b4 ff 0f 0b 5d c3 cc cc cc cc 80 3d 68 c6 eb 00 00 75 81
      48 c7 c7 a0 87 f6 81 c6 05 58 c6 eb 00 01 e8 5e e7 b4 ff <0f> 0b 5d c3 cc cc cc cc 80 3d
      42 c6 eb 00 00 0f 85 59 ff ff ff 48
      [ 8086.955790] RSP: 0018:ffa0000000003da0 EFLAGS: 00010286
      [ 8086.955793] RAX: 0000000000000000 RBX: ff1100011da40ee0 RCX: ff11000361e1b888
      [ 8086.955794] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ff11000361e1b880
      [ 8086.955795] RBP: ffa0000000003da0 R08: 80000000ffff9f45 R09: ffa0000000003d28
      [ 8086.955796] R10: ff1100035f840000 R11: 0000000000000028 R12: ff11000319ff8000
      [ 8086.955797] R13: ff1100011bb79d60 R14: 00000000ffffffd6 R15: ff1100037039cb00
      [ 8086.955798] FS:  0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
      [ 8086.955800] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 8086.955801] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
      [ 8086.955803] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 8086.955803] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
      [ 8086.955804] PKRU: 55555554
      [ 8086.955805] Call Trace:
      [ 8086.955806]  <IRQ>
      [ 8086.955808]  tcp_wfree+0x112/0x130
      [ 8086.955814]  skb_release_head_state+0x24/0xa0
      [ 8086.955818]  napi_consume_skb+0x9c/0x160
      [ 8086.955821]  igc_poll+0x5d8/0x14d0 [igc]
      [ 8086.955835]  __napi_poll+0x2e/0x1b0
      [ 8086.955839]  net_rx_action+0x126/0x250
      [ 8086.955843]  __do_softirq+0xd1/0x295
      [ 8086.955846]  irq_exit_rcu+0xc5/0xf0
      [ 8086.955851]  common_interrupt+0x86/0xa0
      [ 8086.955857]  </IRQ>
      [ 8086.955857]  <TASK>
      [ 8086.955858]  asm_common_interrupt+0x27/0x40
      [ 8086.955862] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
      [ 8086.955866] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00 31 ff e8
      1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00 49 63 cf 4c 2b 75
      c8 48 8d 04 49 48 89 ca 48 8d
      [ 8086.955867] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
      [ 8086.955869] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
      [ 8086.955870] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
      [ 8086.955871] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
      [ 8086.955872] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
      [ 8086.955873] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
      [ 8086.955875]  cpuidle_enter+0x2e/0x50
      [ 8086.955880]  call_cpuidle+0x23/0x40
      [ 8086.955884]  do_idle+0x1be/0x220
      [ 8086.955887]  cpu_startup_entry+0x20/0x30
      [ 8086.955889]  rest_init+0xb5/0xc0
      [ 8086.955892]  arch_call_rest_init+0xe/0x30
      [ 8086.955895]  start_kernel+0x448/0x760
      [ 8086.955898]  x86_64_start_kernel+0x109/0x150
      [ 8086.955900]  secondary_startup_64_no_verify+0xe0/0xeb
      [ 8086.955904]  </TASK>
      [ 8086.955904] ---[ end trace 0000000000000000 ]---
      [ 8086.955912] ------------[ cut here ]------------
      [ 8086.955913] kernel BUG at lib/dynamic_queue_limits.c:27!
      [ 8086.955918] invalid opcode: 0000 [#1] SMP
      [ 8086.955922] RIP: 0010:dql_completed+0x148/0x160
      [ 8086.955925] Code: c9 00 48 89 57 58 e9 46 ff ff ff 45 85 e4 41 0f 95 c4 41 39 db
      0f 95 c1 41 84 cc 74 05 45 85 ed 78 0a 44 89 c1 e9 27 ff ff ff <0f> 0b 01 f6 44 89
      c1 29 f1 0f 48 ca eb 8c cc cc cc cc cc cc cc cc
      [ 8086.955927] RSP: 0018:ffa0000000003e00 EFLAGS: 00010287
      [ 8086.955928] RAX: 000000000000006c RBX: ffa0000003eb0f78 RCX: ff11000109938000
      [ 8086.955929] RDX: 0000000000000003 RSI: 0000000000000160 RDI: ff110001002e9480
      [ 8086.955930] RBP: ffa0000000003ed8 R08: ff110001002e93c0 R09: ffa0000000003d28
      [ 8086.955931] R10: 0000000000007cc0 R11: 0000000000007c54 R12: 00000000ffffffd9
      [ 8086.955932] R13: ff1100037039cb00 R14: 00000000ffffffd9 R15: ff1100037039c048
      [ 8086.955933] FS:  0000000000000000(0000) GS:ff11000361e00000(0000) knlGS:0000000000000000
      [ 8086.955934] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 8086.955935] CR2: 00007ffff7fca168 CR3: 000000013b08a003 CR4: 0000000000471ef8
      [ 8086.955936] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 8086.955937] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
      [ 8086.955938] PKRU: 55555554
      [ 8086.955939] Call Trace:
      [ 8086.955939]  <IRQ>
      [ 8086.955940]  ? igc_poll+0x1a9/0x14d0 [igc]
      [ 8086.955949]  __napi_poll+0x2e/0x1b0
      [ 8086.955952]  net_rx_action+0x126/0x250
      [ 8086.955956]  __do_softirq+0xd1/0x295
      [ 8086.955958]  irq_exit_rcu+0xc5/0xf0
      [ 8086.955961]  common_interrupt+0x86/0xa0
      [ 8086.955964]  </IRQ>
      [ 8086.955965]  <TASK>
      [ 8086.955965]  asm_common_interrupt+0x27/0x40
      [ 8086.955968] RIP: 0010:cpuidle_enter_state+0xd3/0x3e0
      [ 8086.955971] Code: 73 f1 ff ff 49 89 c6 8b 05 e2 ca a7 00 85 c0 0f 8f b3 02 00 00
      31 ff e8 1b de 75 ff 80 7d d7 00 0f 85 cd 01 00 00 fb 45 85 ff <0f> 88 fd 00 00 00
      49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89 ca 48 8d
      [ 8086.955972] RSP: 0018:ffffffff82203df0 EFLAGS: 00000202
      [ 8086.955973] RAX: ff11000361e2a200 RBX: 0000000000000002 RCX: 000000000000001f
      [ 8086.955974] RDX: 0000000000000000 RSI: 000000003cf3cf3d RDI: 0000000000000000
      [ 8086.955974] RBP: ffffffff82203e28 R08: 0000075ae38471c8 R09: 0000000000000018
      [ 8086.955975] R10: 000000000000031a R11: ffffffff8238dca0 R12: ffd1ffffff200000
      [ 8086.955976] R13: ffffffff8238dca0 R14: 0000075ae38471c8 R15: 0000000000000002
      [ 8086.955978]  cpuidle_enter+0x2e/0x50
      [ 8086.955981]  call_cpuidle+0x23/0x40
      [ 8086.955984]  do_idle+0x1be/0x220
      [ 8086.955985]  cpu_startup_entry+0x20/0x30
      [ 8086.955987]  rest_init+0xb5/0xc0
      [ 8086.955990]  arch_call_rest_init+0xe/0x30
      [ 8086.955992]  start_kernel+0x448/0x760
      [ 8086.955994]  x86_64_start_kernel+0x109/0x150
      [ 8086.955996]  secondary_startup_64_no_verify+0xe0/0xeb
      [ 8086.955998]  </TASK>
      [ 8086.955999] Modules linked in: xt_conntrack nft_chain_nat xt_MASQUERADE xt_addrtype
      nft_compat nf_tables nfnetlink br_netfilter bridge stp llc overlay dm_mod emrcha(PO) emriio(PO)
      rktpm(PO) cegbuf_mod(PO) patch_update(PO) se(PO) sgx_tgts(PO) mktme(PO) keylocker(PO) svtdx(PO)
      svfs_pci_hotplug(PO) vtd_mod(PO) davemem(PO) svmabort(PO) svindexio(PO) usbx2(PO) ehci_sched(PO)
      svheartbeat(PO) ioapic(PO) sv8259(PO) svintr(PO) lt(PO) pcierootport(PO) enginefw_mod(PO) ata(PO)
      smbus(PO) spiflash_cdf(PO) arden(PO) dsa_iax(PO) oobmsm_punit(PO) cpm(PO) svkdb(PO) ebg_pch(PO)
      pch(PO) sviotargets(PO) svbdf(PO) svmem(PO) svbios(PO) dram(PO) svtsc(PO) targets(PO) superio(PO)
      svkernel(PO) cswitch(PO) mcf(PO) pentiumIII_mod(PO) fs_svfs(PO) mdevdefdb(PO) svfs_os_services(O)
      ixgbe mdio mdio_devres libphy emeraldrapids_svdefs(PO) regsupport(O) libnvdimm nls_cp437
      snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg
      snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core snd_pcm snd_timer isst_if_mbox_pci
      [ 8086.956029]  input_leds isst_if_mmio sg snd isst_if_common soundcore wmi button sad9(O) drm
      fuse backlight configfs efivarfs ip_tables x_tables vmd sdhci led_class rtl8150 r8152 hid_generic
      pegasus mmc_block usbhid mmc_core hid megaraid_sas ixgb igb i2c_algo_bit ice i40e hpsa
      scsi_transport_sas e1000e e1000 e100 ax88179_178a usbnet xhci_pci sd_mod xhci_hcd t10_pi
      crc32c_intel crc64_rocksoft igc crc64 crc_t10dif usbcore crct10dif_generic ptp crct10dif_common
      usb_common pps_core
      [16762.543675] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.593 msecs
      [16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.595 msecs
      [16762.543673] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.495 msecs
      [16762.543679] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs
      [16762.543678] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.598 msecs
      [16762.543690] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.605 msecs
      [16762.543684] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.599 msecs
      [16762.543693] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 8675587.613 msecs
      [16762.543784] ---[ end trace 0000000000000000 ]---
      [16762.849099] RIP: 0010:dql_completed+0x148/0x160
      PANIC: Fatal exception in interrupt
      
      Fixes: 9b275176 ("igc: Add ndo_tx_timeout support")
      Tested-by: default avatarAlejandra Victoria Alcaraz <alejandra.victoria.alcaraz@intel.com>
      Signed-off-by: default avatarMuhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
      Acked-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNaama Meir <naamax.meir@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d4a7ce64
    • Christian Marangi's avatar
      net: dsa: qca8k: fix mdb add/del case with 0 VID · dfd739f1
      Christian Marangi authored
      The qca8k switch doesn't support using 0 as VID and require a default
      VID to be always set. MDB add/del function doesn't currently handle
      this and are currently setting the default VID.
      
      Fix this by correctly handling this corner case and internally use the
      default VID for VID 0 case.
      
      Fixes: ba8f870d ("net: dsa: qca8k: add support for mdb_add/del")
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dfd739f1
    • Christian Marangi's avatar
      net: dsa: qca8k: fix broken search_and_del · ae70dcb9
      Christian Marangi authored
      On deleting an MDB entry for a port, fdb_search_and_del is used.
      An FDB entry can't be modified so it needs to be deleted and readded
      again with the new portmap (and the port deleted as requested)
      
      We use the SEARCH operator to search the entry to edit by vid and mac
      address and then we check the aging if we actually found an entry.
      
      Currently the code suffer from a bug where the searched fdb entry is
      never read again with the found values (if found) resulting in the code
      always returning -EINVAL as aging was always 0.
      
      Fix this by correctly read the fdb entry after it was searched.
      
      Fixes: ba8f870d ("net: dsa: qca8k: add support for mdb_add/del")
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae70dcb9
    • Christian Marangi's avatar
      net: dsa: qca8k: fix search_and_insert wrong handling of new rule · 80248d41
      Christian Marangi authored
      On inserting a mdb entry, fdb_search_and_insert is used to add a port to
      the qca8k target entry in the FDB db.
      
      A FDB entry can't be modified so it needs to be removed and insert again
      with the new values.
      
      To detect if an entry already exist, the SEARCH operation is used and we
      check the aging of the entry. If the entry is not 0, the entry exist and
      we proceed to delete it.
      
      Current code have 2 main problem:
      - The condition to check if the FDB entry exist is wrong and should be
        the opposite.
      - When a FDB entry doesn't exist, aging was never actually set to the
        STATIC value resulting in allocating an invalid entry.
      
      Fix both problem by adding aging support to the function, calling the
      function with STATIC as aging by default and finally by correct the
      condition to check if the entry actually exist.
      
      Fixes: ba8f870d ("net: dsa: qca8k: add support for mdb_add/del")
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80248d41
    • Christian Marangi's avatar
      net: dsa: qca8k: enable use_single_write for qca8xxx · 2c39dd02
      Christian Marangi authored
      The qca8xxx switch supports 2 way to write reg values, a slow way using
      mdio and a fast way by sending specially crafted mgmt packet to
      read/write reg.
      
      The fast way can support up to 32 bytes of data as eth packet are used
      to send/receive.
      
      This correctly works for almost the entire regmap of the switch but with
      the use of some kernel selftests for dsa drivers it was found a funny
      and interesting hw defect/limitation.
      
      For some specific reg, bulk write won't work and will result in writing
      only part of the requested regs resulting in half data written. This was
      especially hard to track and discover due to the total strangeness of
      the problem and also by the specific regs where this occurs.
      
      This occurs in the specific regs of the ATU table, where multiple entry
      needs to be written to compose the entire entry.
      It was discovered that with a bulk write of 12 bytes on
      QCA8K_REG_ATU_DATA0 only QCA8K_REG_ATU_DATA0 and QCA8K_REG_ATU_DATA2
      were written, but QCA8K_REG_ATU_DATA1 was always zero.
      Tcpdump was used to make sure the specially crafted packet was correct
      and this was confirmed.
      
      The problem was hard to track as the lack of QCA8K_REG_ATU_DATA1
      resulted in an entry somehow possible as the first bytes of the mac
      address are set in QCA8K_REG_ATU_DATA0 and the entry type is set in
      QCA8K_REG_ATU_DATA2.
      
      Funlly enough writing QCA8K_REG_ATU_DATA1 results in the same problem
      with QCA8K_REG_ATU_DATA2 empty and QCA8K_REG_ATU_DATA1 and
      QCA8K_REG_ATU_FUNC correctly written.
      A speculation on the problem might be that there are some kind of
      indirection internally when accessing these regs and they can't be
      accessed all together, due to the fact that it's really a table mapped
      somewhere in the switch SRAM.
      
      Even more funny is the fact that every other reg was tested with all
      kind of combination and they are not affected by this problem. Read
      operation was also tested and always worked so it's not affected by this
      problem.
      
      The problem is not present if we limit writing a single reg at times.
      
      To handle this hardware defect, enable use_single_write so that bulk
      api can correctly split the write in multiple different operation
      effectively reverting to a non-bulk write.
      
      Cc: Mark Brown <broonie@kernel.org>
      Fixes: c766e077 ("net: dsa: qca8k: convert to regmap read/write API")
      Signed-off-by: default avatarChristian Marangi <ansuelsmth@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2c39dd02
    • Alex Elder's avatar
      net: ipa: only reset hashed tables when supported · e11ec2b8
      Alex Elder authored
      Last year, the code that manages GSI channel transactions switched
      from using spinlock-protected linked lists to using indexes into the
      ring buffer used for a channel.  Recently, Google reported seeing
      transaction reference count underflows occasionally during shutdown.
      
      Doug Anderson found a way to reproduce the issue reliably, and
      bisected the issue to the commit that eliminated the linked lists
      and the lock.  The root cause was ultimately determined to be
      related to unused transactions being committed as part of the modem
      shutdown cleanup activity.  Unused transactions are not normally
      expected (except in error cases).
      
      The modem uses some ranges of IPA-resident memory, and whenever it
      shuts down we zero those ranges.  In ipa_filter_reset_table() a
      transaction is allocated to zero modem filter table entries.  If
      hashing is not supported, hashed table memory should not be zeroed.
      But currently nothing prevents that, and the result is an unused
      transaction.  Something similar occurs when we zero routing table
      entries for the modem.
      
      By preventing any attempt to clear hashed tables when hashing is not
      supported, the reference count underflow is avoided in this case.
      
      Note that there likely remains an issue with properly freeing unused
      transactions (if they occur due to errors).  This patch addresses
      only the underflows that Google originally reported.
      
      Cc: <stable@vger.kernel.org> # 6.1.x
      Fixes: d338ae28 ("net: ipa: kill all other transaction lists")
      Tested-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Link: https://lore.kernel.org/r/20230724224055.1688854-1-elder@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e11ec2b8
    • Jakub Kicinski's avatar
      Merge branch 'net-fix-error-warning-by-fstrict-flex-arrays-3' · a49441c9
      Jakub Kicinski authored
      Kuniyuki Iwashima says:
      
      ====================
      net: Fix error/warning by -fstrict-flex-arrays=3.
      
      df8fc4e9 ("kbuild: Enable -fstrict-flex-arrays=3") started applying
      strict rules for standard string functions (strlen(), memcpy(), etc.) if
      CONFIG_FORTIFY_SOURCE=y.
      
      This series fixes two false positives caught by syzkaller.
      
      v2: https://lore.kernel.org/netdev/20230720004410.87588-1-kuniyu@amazon.com/
      v1: https://lore.kernel.org/netdev/20230719185322.44255-1-kuniyu@amazon.com/
      ====================
      
      Link: https://lore.kernel.org/r/20230724213425.22920-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a49441c9
    • Kuniyuki Iwashima's avatar
      af_packet: Fix warning of fortified memcpy() in packet_getname(). · a0ade840
      Kuniyuki Iwashima authored
      syzkaller found a warning in packet_getname() [0], where we try to
      copy 16 bytes to sockaddr_ll.sll_addr[8].
      
      Some devices (ip6gre, vti6, ip6tnl) have 16 bytes address expressed
      by struct in6_addr.  Also, Infiniband has 32 bytes as MAX_ADDR_LEN.
      
      The write seems to overflow, but actually not since we use struct
      sockaddr_storage defined in __sys_getsockname() and its size is 128
      (_K_SS_MAXSIZE) bytes.  Thus, we have sufficient room after sll_addr[]
      as __data[].
      
      To avoid the warning, let's add a flex array member union-ed with
      sll_addr.
      
      Another option would be to use strncpy() and limit the copied length
      to sizeof(sll_addr), but it will return the partial address and break
      an application that passes sockaddr_storage to getsockname().
      
      [0]:
      memcpy: detected field-spanning write (size 16) of single field "sll->sll_addr" at net/packet/af_packet.c:3604 (size 8)
      WARNING: CPU: 0 PID: 255 at net/packet/af_packet.c:3604 packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604
      Modules linked in:
      CPU: 0 PID: 255 Comm: syz-executor750 Not tainted 6.5.0-rc1-00330-g60cc1f7d #4
      Hardware name: linux,dummy-virt (DT)
      pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604
      lr : packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604
      sp : ffff800089887bc0
      x29: ffff800089887bc0 x28: ffff000010f80f80 x27: 0000000000000003
      x26: dfff800000000000 x25: ffff700011310f80 x24: ffff800087d55000
      x23: dfff800000000000 x22: ffff800089887c2c x21: 0000000000000010
      x20: ffff00000de08310 x19: ffff800089887c20 x18: ffff800086ab1630
      x17: 20646c6569662065 x16: 6c676e697320666f x15: 0000000000000001
      x14: 1fffe0000d56d7ca x13: 0000000000000000 x12: 0000000000000000
      x11: 0000000000000000 x10: 0000000000000000 x9 : 3e60944c3da92b00
      x8 : 3e60944c3da92b00 x7 : 0000000000000001 x6 : 0000000000000001
      x5 : ffff8000898874f8 x4 : ffff800086ac99e0 x3 : ffff8000803f8808
      x2 : 0000000000000001 x1 : 0000000100000000 x0 : 0000000000000000
      Call trace:
       packet_getname+0x25c/0x3a0 net/packet/af_packet.c:3604
       __sys_getsockname+0x168/0x24c net/socket.c:2042
       __do_sys_getsockname net/socket.c:2057 [inline]
       __se_sys_getsockname net/socket.c:2054 [inline]
       __arm64_sys_getsockname+0x7c/0x94 net/socket.c:2054
       __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
       invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
       el0_svc_common+0x134/0x240 arch/arm64/kernel/syscall.c:139
       do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:188
       el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:647
       el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
       el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
      
      Fixes: df8fc4e9 ("kbuild: Enable -fstrict-flex-arrays=3")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230724213425.22920-3-kuniyu@amazon.comReviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a0ade840
    • Kuniyuki Iwashima's avatar
      af_unix: Fix fortify_panic() in unix_bind_bsd(). · 06d4c8a8
      Kuniyuki Iwashima authored
      syzkaller found a bug in unix_bind_bsd() [0].  We can reproduce it
      by bind()ing a socket on a path with length 108.
      
      108 is the size of sun_addr of struct sockaddr_un and is the maximum
      valid length for the pathname socket.  When calling bind(), we use
      struct sockaddr_storage as the actual buffer size, so terminating
      sun_addr[108] with null is legitimate as done in unix_mkname_bsd().
      
      However, strlen(sunaddr) for such a case causes fortify_panic() if
      CONFIG_FORTIFY_SOURCE=y.  __fortify_strlen() has no idea about the
      actual buffer size and see the string as unterminated.
      
      Let's use strnlen() to allow sun_addr to be unterminated at 107.
      
      [0]:
      detected buffer overflow in __fortify_strlen
      kernel BUG at lib/string_helpers.c:1031!
      Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 0 PID: 255 Comm: syz-executor296 Not tainted 6.5.0-rc1-00330-g60cc1f7d #4
      Hardware name: linux,dummy-virt (DT)
      pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : fortify_panic+0x1c/0x20 lib/string_helpers.c:1030
      lr : fortify_panic+0x1c/0x20 lib/string_helpers.c:1030
      sp : ffff800089817af0
      x29: ffff800089817af0 x28: ffff800089817b40 x27: 1ffff00011302f68
      x26: 000000000000006e x25: 0000000000000012 x24: ffff800087e60140
      x23: dfff800000000000 x22: ffff800089817c20 x21: ffff800089817c8e
      x20: 000000000000006c x19: ffff00000c323900 x18: ffff800086ab1630
      x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000001
      x14: 1ffff00011302eb8 x13: 0000000000000000 x12: 0000000000000000
      x11: 0000000000000000 x10: 0000000000000000 x9 : 64a26b65474d2a00
      x8 : 64a26b65474d2a00 x7 : 0000000000000001 x6 : 0000000000000001
      x5 : ffff800089817438 x4 : ffff800086ac99e0 x3 : ffff800080f19e8c
      x2 : 0000000000000001 x1 : 0000000100000000 x0 : 000000000000002c
      Call trace:
       fortify_panic+0x1c/0x20 lib/string_helpers.c:1030
       _Z16__fortify_strlenPKcU25pass_dynamic_object_size1 include/linux/fortify-string.h:217 [inline]
       unix_bind_bsd net/unix/af_unix.c:1212 [inline]
       unix_bind+0xba8/0xc58 net/unix/af_unix.c:1326
       __sys_bind+0x1ac/0x248 net/socket.c:1792
       __do_sys_bind net/socket.c:1803 [inline]
       __se_sys_bind net/socket.c:1801 [inline]
       __arm64_sys_bind+0x7c/0x94 net/socket.c:1801
       __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
       invoke_syscall+0x98/0x2c0 arch/arm64/kernel/syscall.c:52
       el0_svc_common+0x134/0x240 arch/arm64/kernel/syscall.c:139
       do_el0_svc+0x64/0x198 arch/arm64/kernel/syscall.c:188
       el0_svc+0x2c/0x7c arch/arm64/kernel/entry-common.c:647
       el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:665
       el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
      Code: aa0003e1 d0000e80 91030000 97ffc91a (d4210000)
      
      Fixes: df8fc4e9 ("kbuild: Enable -fstrict-flex-arrays=3")
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Suggested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230724213425.22920-2-kuniyu@amazon.comReviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      06d4c8a8
    • Lin Ma's avatar
      macvlan: add forgotten nla_policy for IFLA_MACVLAN_BC_CUTOFF · 55cef78c
      Lin Ma authored
      The previous commit 954d1fa1 ("macvlan: Add netlink attribute for
      broadcast cutoff") added one additional attribute named
      IFLA_MACVLAN_BC_CUTOFF to allow broadcast cutfoff.
      
      However, it forgot to describe the nla_policy at macvlan_policy
      (drivers/net/macvlan.c). Hence, this suppose NLA_S32 (4 bytes) integer
      can be faked as empty (0 bytes) by a malicious user, which could leads
      to OOB in heap just like CVE-2023-3773.
      
      To fix it, this commit just completes the nla_policy description for
      IFLA_MACVLAN_BC_CUTOFF. This enforces the length check and avoids the
      potential OOB read.
      
      Fixes: 954d1fa1 ("macvlan: Add netlink attribute for broadcast cutoff")
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230723080205.3715164-1-linma@zju.edu.cnSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      55cef78c
  2. 25 Jul, 2023 8 commits
  3. 24 Jul, 2023 4 commits
    • Stewart Smith's avatar
      tcp: Reduce chance of collisions in inet6_hashfn(). · d11b0df7
      Stewart Smith authored
      For both IPv4 and IPv6 incoming TCP connections are tracked in a hash
      table with a hash over the source & destination addresses and ports.
      However, the IPv6 hash is insufficient and can lead to a high rate of
      collisions.
      
      The IPv6 hash used an XOR to fit everything into the 96 bits for the
      fast jenkins hash, meaning it is possible for an external entity to
      ensure the hash collides, thus falling back to a linear search in the
      bucket, which is slow.
      
      We take the approach of hash the full length of IPv6 address in
      __ipv6_addr_jhash() so that all users can benefit from a more secure
      version.
      
      While this may look like it adds overhead, the reality of modern CPUs
      means that this is unmeasurable in real world scenarios.
      
      In simulating with llvm-mca, the increase in cycles for the hashing
      code was ~16 cycles on Skylake (from a base of ~155), and an extra ~9
      on Nehalem (base of ~173).
      
      In commit dd6d2910 ("netfilter: conntrack: switch to siphash")
      netfilter switched from a jenkins hash to a siphash, but even the faster
      hsiphash is a more significant overhead (~20-30%) in some preliminary
      testing.  So, in this patch, we keep to the more conservative approach to
      ensure we don't add much overhead per SYN.
      
      In testing, this results in a consistently even spread across the
      connection buckets.  In both testing and real-world scenarios, we have
      not found any measurable performance impact.
      
      Fixes: 08dcdbf6 ("ipv6: use a stronger hash for tcp")
      Signed-off-by: default avatarStewart Smith <trawets@amazon.com>
      Signed-off-by: default avatarSamuel Mendoza-Jonas <samjonas@amazon.com>
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230721222410.17914-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d11b0df7
    • Wei Fang's avatar
      net: fec: avoid tx queue timeout when XDP is enabled · bb7a0156
      Wei Fang authored
      According to the implementation of XDP of FEC driver, the XDP path
      shares the transmit queues with the kernel network stack, so it is
      possible to lead to a tx timeout event when XDP uses the tx queue
      pretty much exclusively. And this event will cause the reset of the
      FEC hardware.
      To avoid timeout in this case, we use the txq_trans_cond_update()
      interface to update txq->trans_start to jiffies so that watchdog
      won't generate a transmit timeout warning.
      
      Fixes: 6d6b39f1 ("net: fec: add initial XDP support")
      Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Link: https://lore.kernel.org/r/20230721083559.2857312-1-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      bb7a0156
    • Maciej Żenczykowski's avatar
      ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address · 69172f0b
      Maciej Żenczykowski authored
      currently on 6.4 net/main:
      
        # ip link add dummy1 type dummy
        # echo 1 > /proc/sys/net/ipv6/conf/dummy1/use_tempaddr
        # ip link set dummy1 up
        # ip -6 addr add 2000::1/64 mngtmpaddr dev dummy1
        # ip -6 addr show dev dummy1
      
        11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            inet6 2000::44f3:581c:8ca:3983/64 scope global temporary dynamic
               valid_lft 604800sec preferred_lft 86172sec
            inet6 2000::1/64 scope global mngtmpaddr
               valid_lft forever preferred_lft forever
            inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link
               valid_lft forever preferred_lft forever
      
        # ip -6 addr del 2000::44f3:581c:8ca:3983/64 dev dummy1
      
        (can wait a few seconds if you want to, the above delete isn't [directly] the problem)
      
        # ip -6 addr show dev dummy1
      
        11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            inet6 2000::1/64 scope global mngtmpaddr
               valid_lft forever preferred_lft forever
            inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link
               valid_lft forever preferred_lft forever
      
        # ip -6 addr del 2000::1/64 mngtmpaddr dev dummy1
        # ip -6 addr show dev dummy1
      
        11: dummy1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            inet6 2000::81c9:56b7:f51a:b98f/64 scope global temporary dynamic
               valid_lft 604797sec preferred_lft 86169sec
            inet6 fe80::e8a8:a6ff:fed5:56d4/64 scope link
               valid_lft forever preferred_lft forever
      
      This patch prevents this new 'global temporary dynamic' address from being
      created by the deletion of the related (same subnet prefix) 'mngtmpaddr'
      (which is triggered by there already being no temporary addresses).
      
      Cc: Jiri Pirko <jiri@resnulli.us>
      Fixes: 53bd6749 ("ipv6 addrconf: introduce IFA_F_MANAGETEMPADDR to tell kernel to manage temporary addresses")
      Reported-by: default avatarXiao Ma <xiaom@google.com>
      Signed-off-by: default avatarMaciej Żenczykowski <maze@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20230720160022.1887942-1-maze@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      69172f0b
    • Yuanjun Gong's avatar
      ethernet: atheros: fix return value check in atl1e_tso_csum() · 69a184f7
      Yuanjun Gong authored
      in atl1e_tso_csum, it should check the return value of pskb_trim(),
      and return an error code if an unexpected value is returned
      by pskb_trim().
      
      Fixes: a6a53252 ("atl1e: Atheros L1E Gigabit Ethernet driver")
      Signed-off-by: default avatarYuanjun Gong <ruc_gongyuanjun@163.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230720144219.39285-1-ruc_gongyuanjun@163.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      69a184f7