1. 24 Aug, 2022 9 commits
  2. 23 Aug, 2022 16 commits
  3. 22 Aug, 2022 15 commits
    • Dan Carpenter's avatar
      net/mlx5: Unlock on error in mlx5_sriov_enable() · 35419025
      Dan Carpenter authored
      Unlock before returning if mlx5_device_enable_sriov() fails.
      
      Fixes: 84a433a4 ("net/mlx5: Lock mlx5 devlink reload callbacks")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      35419025
    • Dan Carpenter's avatar
      net/mlx5e: Fix use after free in mlx5e_fs_init() · 21234e3a
      Dan Carpenter authored
      Call mlx5e_fs_vlan_free(fs) before kvfree(fs).
      
      Fixes: af8bbf73 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      21234e3a
    • Dan Carpenter's avatar
      net/mlx5e: kTLS, Use _safe() iterator in mlx5e_tls_priv_tx_list_cleanup() · 6514210b
      Dan Carpenter authored
      Use the list_for_each_entry_safe() macro to prevent dereferencing "obj"
      after it has been freed.
      
      Fixes: c4dfe704 ("net/mlx5e: kTLS, Recycle objects of device-offloaded TLS TX connections")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6514210b
    • Dan Carpenter's avatar
      net/mlx5: unlock on error path in esw_vfs_changed_event_handler() · b868c8fe
      Dan Carpenter authored
      Unlock before returning on this error path.
      
      Fixes: f1bc646c ("net/mlx5: Use devl_ API in mlx5_esw_offloads_devlink_port_register")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      b868c8fe
    • Maor Dickman's avatar
      net/mlx5e: Fix wrong tc flag used when set hw-tc-offload off · 550f9643
      Maor Dickman authored
      The cited commit reintroduced the ability to set hw-tc-offload
      in switchdev mode by reusing NIC mode calls without modifying it
      to support both modes, this can cause an illegal memory access
      when trying to turn hw-tc-offload off.
      
      Fix this by using the right TC_FLAG when checking if tc rules
      are installed while disabling hw-tc-offload.
      
      Fixes: d3cbd425 ("net/mlx5e: Add ndo_set_feature for uplink representor")
      Signed-off-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      550f9643
    • Roi Dayan's avatar
      net/mlx5e: TC, Add missing policer validation · f7a4e867
      Roi Dayan authored
      There is a missing policer validation when offloading police action
      with tc action api. Add it.
      
      Fixes: 7d1a5ce4 ("net/mlx5e: TC, Support tc action api for police")
      Signed-off-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f7a4e867
    • Aya Levin's avatar
      net/mlx5e: Fix wrong application of the LRO state · 7b3707fc
      Aya Levin authored
      Driver caches packet merge type in mlx5e_params instance which must be
      in perfect sync with the netdev_feature's bit.
      Prior to this patch, in certain conditions (*) LRO state was set in
      mlx5e_params, while netdev_feature's bit was off. Causing the LRO to
      be applied on the RQs (HW level).
      
      (*) This can happen only on profile init (mlx5e_build_nic_params()),
      when RQ expect non-linear SKB and PCI is fast enough in comparison to
      link width.
      
      Solution: remove setting of packet merge type from
      mlx5e_build_nic_params() as netdev features are not updated.
      
      Fixes: 619a8f2a ("net/mlx5e: Use linear SKB in Striding RQ")
      Signed-off-by: default avatarAya Levin <ayal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      7b3707fc
    • Moshe Shemesh's avatar
      net/mlx5: Avoid false positive lockdep warning by adding lock_class_key · d59b73a6
      Moshe Shemesh authored
      Add a lock_class_key per mlx5 device to avoid a false positive
      "possible circular locking dependency" warning by lockdep, on flows
      which lock more than one mlx5 device, such as adding SF.
      
      kernel log:
       ======================================================
       WARNING: possible circular locking dependency detected
       5.19.0-rc8+ #2 Not tainted
       ------------------------------------------------------
       kworker/u20:0/8 is trying to acquire lock:
       ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]
      
       but task is already holding lock:
       ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}:
              down_write+0x90/0x150
              blocking_notifier_chain_register+0x53/0xa0
              mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
              mlx5_init_one+0x261/0x490 [mlx5_core]
              probe_one+0x430/0x680 [mlx5_core]
              local_pci_probe+0xd6/0x170
              work_for_cpu_fn+0x4e/0xa0
              process_one_work+0x7c2/0x1340
              worker_thread+0x6f6/0xec0
              kthread+0x28f/0x330
              ret_from_fork+0x1f/0x30
      
       -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
              __lock_acquire+0x2fc7/0x6720
              lock_acquire+0x1c1/0x550
              __mutex_lock+0x12c/0x14b0
              mlx5_init_one+0x2e/0x490 [mlx5_core]
              mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
              auxiliary_bus_probe+0x9d/0xe0
              really_probe+0x1e0/0xaa0
              __driver_probe_device+0x219/0x480
              driver_probe_device+0x49/0x130
              __device_attach_driver+0x1b8/0x280
              bus_for_each_drv+0x123/0x1a0
              __device_attach+0x1a3/0x460
              bus_probe_device+0x1a2/0x260
              device_add+0x9b1/0x1b40
              __auxiliary_device_add+0x88/0xc0
              mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
              blocking_notifier_call_chain+0xd5/0x130
              mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
              process_one_work+0x7c2/0x1340
              worker_thread+0x59d/0xec0
              kthread+0x28f/0x330
              ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&(&notifier->n_head)->rwsem);
                                      lock(&dev->intf_state_mutex);
                                      lock(&(&notifier->n_head)->rwsem);
         lock(&dev->intf_state_mutex);
      
        *** DEADLOCK ***
      
       4 locks held by kworker/u20:0/8:
        #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
        #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
        #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
        #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460
      
       stack backtrace:
       CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
       Call Trace:
        <TASK>
        dump_stack_lvl+0x57/0x7d
        check_noncircular+0x278/0x300
        ? print_circular_bug+0x460/0x460
        ? lock_chain_count+0x20/0x20
        ? register_lock_class+0x1880/0x1880
        __lock_acquire+0x2fc7/0x6720
        ? register_lock_class+0x1880/0x1880
        ? register_lock_class+0x1880/0x1880
        lock_acquire+0x1c1/0x550
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? lockdep_hardirqs_on_prepare+0x400/0x400
        __mutex_lock+0x12c/0x14b0
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? _raw_read_unlock+0x1f/0x30
        ? mutex_lock_io_nested+0x1320/0x1320
        ? __ioremap_caller.constprop.0+0x306/0x490
        ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
        ? iounmap+0x160/0x160
        mlx5_init_one+0x2e/0x490 [mlx5_core]
        mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
        ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
        auxiliary_bus_probe+0x9d/0xe0
        really_probe+0x1e0/0xaa0
        __driver_probe_device+0x219/0x480
        ? auxiliary_match_id+0xe9/0x140
        driver_probe_device+0x49/0x130
        __device_attach_driver+0x1b8/0x280
        ? driver_allows_async_probing+0x140/0x140
        bus_for_each_drv+0x123/0x1a0
        ? bus_for_each_dev+0x1a0/0x1a0
        ? lockdep_hardirqs_on_prepare+0x286/0x400
        ? trace_hardirqs_on+0x2d/0x100
        __device_attach+0x1a3/0x460
        ? device_driver_attach+0x1e0/0x1e0
        ? kobject_uevent_env+0x22d/0xf10
        bus_probe_device+0x1a2/0x260
        device_add+0x9b1/0x1b40
        ? dev_set_name+0xab/0xe0
        ? __fw_devlink_link_to_suppliers+0x260/0x260
        ? memset+0x20/0x40
        ? lockdep_init_map_type+0x21a/0x7d0
        __auxiliary_device_add+0x88/0xc0
        ? auxiliary_device_init+0x86/0xa0
        mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
        blocking_notifier_call_chain+0xd5/0x130
        mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
        ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
        ? lock_downgrade+0x6e0/0x6e0
        ? lockdep_hardirqs_on_prepare+0x286/0x400
        process_one_work+0x7c2/0x1340
        ? lockdep_hardirqs_on_prepare+0x400/0x400
        ? pwq_dec_nr_in_flight+0x230/0x230
        ? rwlock_bug.part.0+0x90/0x90
        worker_thread+0x59d/0xec0
        ? process_one_work+0x1340/0x1340
        kthread+0x28f/0x330
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
      
      Fixes: 6a327321 ("net/mlx5: SF, Port function state change support")
      Signed-off-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d59b73a6
    • Roy Novich's avatar
      net/mlx5: Fix cmd error logging for manage pages cmd · 090f3e4f
      Roy Novich authored
      When the driver unloads, give/reclaim_pages may fail as PF driver in
      teardown flow, current code will lead to the following kernel log print
      'failed reclaiming pages: err 0'.
      
      Fix it to get same behavior as before the cited commits,
      by calling mlx5_cmd_check before handling error state.
      mlx5_cmd_check will verify if the returned error is an actual error
      needed to be handled by the driver or not and will return an
      appropriate value.
      
      Fixes: 8d564292 ("net/mlx5: Remove redundant error on reclaim pages")
      Fixes: 4dac2f10 ("net/mlx5: Remove redundant notify fail on give pages")
      Signed-off-by: default avatarRoy Novich <royno@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      090f3e4f
    • Vlad Buslov's avatar
      net/mlx5: Disable irq when locking lag_lock · 8e93f294
      Vlad Buslov authored
      The lag_lock is taken from both process and softirq contexts which results
      lockdep warning[0] about potential deadlock. However, just disabling
      softirqs by using *_bh spinlock API is not enough since it will cause
      warning in some contexts where the lock is obtained with hard irqs
      disabled. To fix the issue save current irq state, disable them before
      obtaining the lock an re-enable irqs from saved state after releasing it.
      
      [0]:
      
      [Sun Aug  7 13:12:29 2022] ================================
      [Sun Aug  7 13:12:29 2022] WARNING: inconsistent lock state
      [Sun Aug  7 13:12:29 2022] 5.19.0_for_upstream_debug_2022_08_04_16_06 #1 Not tainted
      [Sun Aug  7 13:12:29 2022] --------------------------------
      [Sun Aug  7 13:12:29 2022] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [Sun Aug  7 13:12:29 2022] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [Sun Aug  7 13:12:29 2022] ffffffffa06dc0d8 (lag_lock){+.?.}-{2:2}, at: mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
      [Sun Aug  7 13:12:29 2022] {SOFTIRQ-ON-W} state was registered at:
      [Sun Aug  7 13:12:29 2022]   lock_acquire+0x1c1/0x550
      [Sun Aug  7 13:12:29 2022]   _raw_spin_lock+0x2c/0x40
      [Sun Aug  7 13:12:29 2022]   mlx5_lag_add_netdev+0x13b/0x480 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   mlx5e_nic_enable+0x114/0x470 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   mlx5e_attach_netdev+0x30e/0x6a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   mlx5e_resume+0x105/0x160 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   mlx5e_probe+0xac3/0x14f0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   auxiliary_bus_probe+0x9d/0xe0
      [Sun Aug  7 13:12:29 2022]   really_probe+0x1e0/0xaa0
      [Sun Aug  7 13:12:29 2022]   __driver_probe_device+0x219/0x480
      [Sun Aug  7 13:12:29 2022]   driver_probe_device+0x49/0x130
      [Sun Aug  7 13:12:29 2022]   __driver_attach+0x1e4/0x4d0
      [Sun Aug  7 13:12:29 2022]   bus_for_each_dev+0x11e/0x1a0
      [Sun Aug  7 13:12:29 2022]   bus_add_driver+0x3f4/0x5a0
      [Sun Aug  7 13:12:29 2022]   driver_register+0x20f/0x390
      [Sun Aug  7 13:12:29 2022]   __auxiliary_driver_register+0x14e/0x260
      [Sun Aug  7 13:12:29 2022]   mlx5e_init+0x38/0x90 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]   vhost_iotlb_itree_augment_rotate+0xcb/0x180 [vhost_iotlb]
      [Sun Aug  7 13:12:29 2022]   do_one_initcall+0xc4/0x400
      [Sun Aug  7 13:12:29 2022]   do_init_module+0x18a/0x620
      [Sun Aug  7 13:12:29 2022]   load_module+0x563a/0x7040
      [Sun Aug  7 13:12:29 2022]   __do_sys_finit_module+0x122/0x1d0
      [Sun Aug  7 13:12:29 2022]   do_syscall_64+0x3d/0x90
      [Sun Aug  7 13:12:29 2022]   entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [Sun Aug  7 13:12:29 2022] irq event stamp: 3596508
      [Sun Aug  7 13:12:29 2022] hardirqs last  enabled at (3596508): [<ffffffff813687c2>] __local_bh_enable_ip+0xa2/0x100
      [Sun Aug  7 13:12:29 2022] hardirqs last disabled at (3596507): [<ffffffff813687da>] __local_bh_enable_ip+0xba/0x100
      [Sun Aug  7 13:12:29 2022] softirqs last  enabled at (3596488): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
      [Sun Aug  7 13:12:29 2022] softirqs last disabled at (3596495): [<ffffffff81368a2a>] irq_exit_rcu+0x11a/0x170
      [Sun Aug  7 13:12:29 2022]
                                 other info that might help us debug this:
      [Sun Aug  7 13:12:29 2022]  Possible unsafe locking scenario:
      
      [Sun Aug  7 13:12:29 2022]        CPU0
      [Sun Aug  7 13:12:29 2022]        ----
      [Sun Aug  7 13:12:29 2022]   lock(lag_lock);
      [Sun Aug  7 13:12:29 2022]   <Interrupt>
      [Sun Aug  7 13:12:29 2022]     lock(lag_lock);
      [Sun Aug  7 13:12:29 2022]
                                  *** DEADLOCK ***
      
      [Sun Aug  7 13:12:29 2022] 4 locks held by swapper/0/0:
      [Sun Aug  7 13:12:29 2022]  #0: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: mlx5e_napi_poll+0x43/0x20a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  #1: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb_list_internal+0x2d7/0xd60
      [Sun Aug  7 13:12:29 2022]  #2: ffff888144a18b58 (&br->hash_lock){+.-.}-{2:2}, at: br_fdb_update+0x301/0x570
      [Sun Aug  7 13:12:29 2022]  #3: ffffffff84643260 (rcu_read_lock){....}-{1:2}, at: atomic_notifier_call_chain+0x5/0x1d0
      [Sun Aug  7 13:12:29 2022]
                                 stack backtrace:
      [Sun Aug  7 13:12:29 2022] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0_for_upstream_debug_2022_08_04_16_06 #1
      [Sun Aug  7 13:12:29 2022] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [Sun Aug  7 13:12:29 2022] Call Trace:
      [Sun Aug  7 13:12:29 2022]  <IRQ>
      [Sun Aug  7 13:12:29 2022]  dump_stack_lvl+0x57/0x7d
      [Sun Aug  7 13:12:29 2022]  mark_lock.part.0.cold+0x5f/0x92
      [Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? unwind_next_frame+0x1c4/0x1b50
      [Sun Aug  7 13:12:29 2022]  ? secondary_startup_64_no_verify+0xcd/0xdb
      [Sun Aug  7 13:12:29 2022]  ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? stack_access_ok+0x1d0/0x1d0
      [Sun Aug  7 13:12:29 2022]  ? start_kernel+0x3a7/0x3c5
      [Sun Aug  7 13:12:29 2022]  __lock_acquire+0x1260/0x6720
      [Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
      [Sun Aug  7 13:12:29 2022]  ? mark_lock.part.0+0xed/0x3060
      [Sun Aug  7 13:12:29 2022]  ? stack_trace_save+0x91/0xc0
      [Sun Aug  7 13:12:29 2022]  lock_acquire+0x1c1/0x550
      [Sun Aug  7 13:12:29 2022]  ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? lockdep_hardirqs_on_prepare+0x400/0x400
      [Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
      [Sun Aug  7 13:12:29 2022]  _raw_spin_lock+0x2c/0x40
      [Sun Aug  7 13:12:29 2022]  ? mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  mlx5_lag_is_shared_fdb+0x1f/0x120 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  mlx5_esw_bridge_rep_vport_num_vhca_id_get+0x1a0/0x600 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? mlx5_esw_bridge_update_work+0x90/0x90 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? lock_acquire+0x1c1/0x550
      [Sun Aug  7 13:12:29 2022]  mlx5_esw_bridge_switchdev_event+0x185/0x8f0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? mlx5_esw_bridge_port_obj_attr_set+0x3e0/0x3e0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  atomic_notifier_call_chain+0xd7/0x1d0
      [Sun Aug  7 13:12:29 2022]  br_switchdev_fdb_notify+0xea/0x100
      [Sun Aug  7 13:12:29 2022]  ? br_switchdev_set_port_flag+0x310/0x310
      [Sun Aug  7 13:12:29 2022]  fdb_notify+0x11b/0x150
      [Sun Aug  7 13:12:29 2022]  br_fdb_update+0x34c/0x570
      [Sun Aug  7 13:12:29 2022]  ? lock_chain_count+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? br_fdb_add_local+0x50/0x50
      [Sun Aug  7 13:12:29 2022]  ? br_allowed_ingress+0x5f/0x1070
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  br_handle_frame_finish+0x786/0x18e0
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
      [Sun Aug  7 13:12:29 2022]  ? sctp_inet_bind_verify+0x4d/0x190
      [Sun Aug  7 13:12:29 2022]  ? xlog_unpack_data+0x2e0/0x310
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  br_nf_hook_thresh+0x227/0x380 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? setup_pre_routing+0x460/0x460 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? br_nf_pre_routing_ipv6+0x48b/0x69c [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  br_nf_pre_routing_finish_ipv6+0x5c2/0xbf0 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  br_nf_pre_routing_ipv6+0x4c6/0x69c [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_validate_ipv6+0x9e0/0x9e0 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_nf_forward_arp+0xb70/0xb70 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  ? br_nf_pre_routing+0xacf/0x1160 [br_netfilter]
      [Sun Aug  7 13:12:29 2022]  br_handle_frame+0x8a9/0x1270
      [Sun Aug  7 13:12:29 2022]  ? br_handle_frame_finish+0x18e0/0x18e0
      [Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
      [Sun Aug  7 13:12:29 2022]  ? br_handle_local_finish+0x20/0x20
      [Sun Aug  7 13:12:29 2022]  ? bond_handle_frame+0xf9/0xac0 [bonding]
      [Sun Aug  7 13:12:29 2022]  ? br_handle_frame_finish+0x18e0/0x18e0
      [Sun Aug  7 13:12:29 2022]  __netif_receive_skb_core+0x7c0/0x2c70
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  ? generic_xdp_tx+0x5b0/0x5b0
      [Sun Aug  7 13:12:29 2022]  ? __lock_acquire+0xd6f/0x6720
      [Sun Aug  7 13:12:29 2022]  ? register_lock_class+0x1880/0x1880
      [Sun Aug  7 13:12:29 2022]  ? check_chain_key+0x24a/0x580
      [Sun Aug  7 13:12:29 2022]  __netif_receive_skb_list_core+0x2d7/0x8a0
      [Sun Aug  7 13:12:29 2022]  ? lock_acquire+0x1c1/0x550
      [Sun Aug  7 13:12:29 2022]  ? process_backlog+0x960/0x960
      [Sun Aug  7 13:12:29 2022]  ? lockdep_hardirqs_on_prepare+0x129/0x400
      [Sun Aug  7 13:12:29 2022]  ? kvm_clock_get_cycles+0x14/0x20
      [Sun Aug  7 13:12:29 2022]  netif_receive_skb_list_internal+0x5f4/0xd60
      [Sun Aug  7 13:12:29 2022]  ? do_xdp_generic+0x150/0x150
      [Sun Aug  7 13:12:29 2022]  ? mlx5e_poll_rx_cq+0xf6b/0x2960 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? mlx5e_poll_ico_cq+0x3d/0x1590 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  napi_complete_done+0x188/0x710
      [Sun Aug  7 13:12:29 2022]  mlx5e_napi_poll+0x4e9/0x20a0 [mlx5_core]
      [Sun Aug  7 13:12:29 2022]  ? __queue_work+0x53c/0xeb0
      [Sun Aug  7 13:12:29 2022]  __napi_poll+0x9f/0x540
      [Sun Aug  7 13:12:29 2022]  net_rx_action+0x420/0xb70
      [Sun Aug  7 13:12:29 2022]  ? napi_threaded_poll+0x470/0x470
      [Sun Aug  7 13:12:29 2022]  ? __common_interrupt+0x79/0x1a0
      [Sun Aug  7 13:12:29 2022]  __do_softirq+0x271/0x92c
      [Sun Aug  7 13:12:29 2022]  irq_exit_rcu+0x11a/0x170
      [Sun Aug  7 13:12:29 2022]  common_interrupt+0x7d/0xa0
      [Sun Aug  7 13:12:29 2022]  </IRQ>
      [Sun Aug  7 13:12:29 2022]  <TASK>
      [Sun Aug  7 13:12:29 2022]  asm_common_interrupt+0x22/0x40
      [Sun Aug  7 13:12:29 2022] RIP: 0010:default_idle+0x42/0x60
      [Sun Aug  7 13:12:29 2022] Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 6b f1 22 02 85 c0 7e 07 0f 00 2d 80 3b 4a 00 fb f4 <c3> 48 c7 c7 e0 07 7e 85 e8 21 bd 40 fe eb de 66 66 2e 0f 1f 84 00
      [Sun Aug  7 13:12:29 2022] RSP: 0018:ffffffff84407e18 EFLAGS: 00000242
      [Sun Aug  7 13:12:29 2022] RAX: 0000000000000001 RBX: ffffffff84ec4a68 RCX: 1ffffffff0afc0fc
      [Sun Aug  7 13:12:29 2022] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835b1fac
      [Sun Aug  7 13:12:29 2022] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff8884d2c44ac3
      [Sun Aug  7 13:12:29 2022] R10: ffffed109a588958 R11: 00000000ffffffff R12: 0000000000000000
      [Sun Aug  7 13:12:29 2022] R13: ffffffff84efac20 R14: 0000000000000000 R15: dffffc0000000000
      [Sun Aug  7 13:12:29 2022]  ? default_idle_call+0xcc/0x460
      [Sun Aug  7 13:12:29 2022]  default_idle_call+0xec/0x460
      [Sun Aug  7 13:12:29 2022]  do_idle+0x394/0x450
      [Sun Aug  7 13:12:29 2022]  ? arch_cpu_idle_exit+0x40/0x40
      [Sun Aug  7 13:12:29 2022]  cpu_startup_entry+0x19/0x20
      [Sun Aug  7 13:12:29 2022]  rest_init+0x156/0x250
      [Sun Aug  7 13:12:29 2022]  arch_call_rest_init+0xf/0x15
      [Sun Aug  7 13:12:29 2022]  start_kernel+0x3a7/0x3c5
      [Sun Aug  7 13:12:29 2022]  secondary_startup_64_no_verify+0xcd/0xdb
      [Sun Aug  7 13:12:29 2022]  </TASK>
      
      Fixes: ff9b7521 ("net/mlx5: Bridge, support LAG")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8e93f294
    • Eli Cohen's avatar
      net/mlx5: Eswitch, Fix forwarding decision to uplink · 942fca7e
      Eli Cohen authored
      Make sure to modify the rule for uplink forwarding only for the case
      where destination vport number is MLX5_VPORT_UPLINK.
      
      Fixes: 94db3317 ("net/mlx5: Support multiport eswitch mode")
      Signed-off-by: default avatarEli Cohen <elic@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      942fca7e
    • Eli Cohen's avatar
      net/mlx5: LAG, fix logic over MLX5_LAG_FLAG_NDEVS_READY · a6e675a6
      Eli Cohen authored
      Only set MLX5_LAG_FLAG_NDEVS_READY if both netdevices are registered.
      Doing so guarantees that both ldev->pf[MLX5_LAG_P0].dev and
      ldev->pf[MLX5_LAG_P1].dev have valid pointers when
      MLX5_LAG_FLAG_NDEVS_READY is set.
      
      The core issue is asymmetry in setting MLX5_LAG_FLAG_NDEVS_READY and
      clearing it. Setting it is done wrongly when both
      ldev->pf[MLX5_LAG_P0].dev and ldev->pf[MLX5_LAG_P1].dev are set;
      clearing it is done right when either of ldev->pf[i].netdev is cleared.
      
      Consider the following scenario:
      1. PF0 loads and sets ldev->pf[MLX5_LAG_P0].dev to a valid pointer
      2. PF1 loads and sets both ldev->pf[MLX5_LAG_P1].dev and
         ldev->pf[MLX5_LAG_P1].netdev with valid pointers. This results in
         MLX5_LAG_FLAG_NDEVS_READY is set.
      3. PF0 is unloaded before setting dev->pf[MLX5_LAG_P0].netdev.
         MLX5_LAG_FLAG_NDEVS_READY remains set.
      
      Further execution of mlx5_do_bond() will result in null pointer
      dereference when calling mlx5_lag_is_multipath()
      
      This patch fixes the following call trace actually encountered:
      
      [ 1293.475195] BUG: kernel NULL pointer dereference, address: 00000000000009a8
      [ 1293.478756] #PF: supervisor read access in kernel mode
      [ 1293.481320] #PF: error_code(0x0000) - not-present page
      [ 1293.483686] PGD 0 P4D 0
      [ 1293.484434] Oops: 0000 [#1] SMP PTI
      [ 1293.485377] CPU: 1 PID: 23690 Comm: kworker/u16:2 Not tainted 5.18.0-rc5_for_upstream_min_debug_2022_05_05_10_13 #1
      [ 1293.488039] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [ 1293.490836] Workqueue: mlx5_lag mlx5_do_bond_work [mlx5_core]
      [ 1293.492448] RIP: 0010:mlx5_lag_is_multipath+0x5/0x50 [mlx5_core]
      [ 1293.494044] Code: e8 70 40 ff e0 48 8b 14 24 48 83 05 5c 1a 1b 00 01 e9 19 ff ff ff 48 83 05 47 1a 1b 00 01 eb d7 0f 1f 44 00 00 0f 1f 44 00 00 <48> 8b 87 a8 09 00 00 48 85 c0 74 26 48 83 05 a7 1b 1b 00 01 41 b8
      [ 1293.498673] RSP: 0018:ffff88811b2fbe40 EFLAGS: 00010202
      [ 1293.500152] RAX: ffff88818a94e1c0 RBX: ffff888165eca6c0 RCX: 0000000000000000
      [ 1293.501841] RDX: 0000000000000001 RSI: ffff88818a94e1c0 RDI: 0000000000000000
      [ 1293.503585] RBP: 0000000000000000 R08: ffff888119886740 R09: ffff888165eca73c
      [ 1293.505286] R10: 0000000000000018 R11: 0000000000000018 R12: ffff88818a94e1c0
      [ 1293.506979] R13: ffff888112729800 R14: 0000000000000000 R15: ffff888112729858
      [ 1293.508753] FS:  0000000000000000(0000) GS:ffff88852cc40000(0000) knlGS:0000000000000000
      [ 1293.510782] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1293.512265] CR2: 00000000000009a8 CR3: 00000001032d4002 CR4: 0000000000370ea0
      [ 1293.514001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1293.515806] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Fixes: 8a66e458 ("net/mlx5: Change ownership model for lag")
      Signed-off-by: default avatarEli Cohen <elic@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      a6e675a6
    • Vlad Buslov's avatar
      net/mlx5e: Properly disable vlan strip on non-UL reps · f37044fd
      Vlad Buslov authored
      When querying mlx5 non-uplink representors capabilities with ethtool
      rx-vlan-offload is marked as "off [fixed]". However, it is actually always
      enabled because mlx5e_params->vlan_strip_disable is 0 by default when
      initializing struct mlx5e_params instance. Fix the issue by explicitly
      setting the vlan_strip_disable to 'true' for non-uplink representors.
      
      Fixes: cb67b832 ("net/mlx5e: Introduce SRIOV VF representors")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f37044fd
    • Maciej Fijalkowski's avatar
      ice: xsk: use Rx ring's XDP ring when picking NAPI context · 9ead7e74
      Maciej Fijalkowski authored
      Ice driver allocates per cpu XDP queues so that redirect path can safely
      use smp_processor_id() as an index to the array. At the same time
      though, XDP rings are used to pick NAPI context to call napi_schedule()
      or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
      num_possible_cpus() of underlying platform is 44, then this means queue
      vectors with correlated NAPI contexts will carry several XDP queues.
      
      This in turn can result in a broken behavior where NAPI context of
      interest will never be scheduled and AF_XDP socket will not process any
      traffic.
      
      To fix this, let us change the way how XDP rings are assigned to Rx
      rings and use this information later on when setting
      ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
      queue vector and walk through Tx ring's linked list. Once we stumble
      upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.
      
      Previous [0] approach of fixing this issue was for txonly scenario
      because of the described grouping of XDP rings across queue vectors. So,
      relying on Rx ring meant that NAPI context could be scheduled with a
      queue vector without XDP ring with associated XSK pool.
      
      [0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      9ead7e74
    • Maciej Fijalkowski's avatar
      ice: xsk: prohibit usage of non-balanced queue id · 5a42f112
      Maciej Fijalkowski authored
      Fix the following scenario:
      1. ethtool -L $IFACE rx 8 tx 96
      2. xdpsock -q 10 -t -z
      
      Above refers to a case where user would like to attach XSK socket in
      txonly mode at a queue id that does not have a corresponding Rx queue.
      At this moment ice's XSK logic is tightly bound to act on a "queue pair",
      e.g. both Tx and Rx queues at a given queue id are disabled/enabled and
      both of them will get XSK pool assigned, which is broken for the presented
      queue configuration. This results in the splat included at the bottom,
      which is basically an OOB access to Rx ring array.
      
      To fix this, allow using the ids only in scope of "combined" queues
      reported by ethtool. However, logic should be rewritten to allow such
      configurations later on, which would end up as a complete rewrite of the
      control path, so let us go with this temporary fix.
      
      [420160.558008] BUG: kernel NULL pointer dereference, address: 0000000000000082
      [420160.566359] #PF: supervisor read access in kernel mode
      [420160.572657] #PF: error_code(0x0000) - not-present page
      [420160.579002] PGD 0 P4D 0
      [420160.582756] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [420160.588396] CPU: 10 PID: 21232 Comm: xdpsock Tainted: G           OE     5.19.0-rc7+ #10
      [420160.597893] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [420160.609894] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
      [420160.616968] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
      [420160.639421] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
      [420160.646650] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
      [420160.655893] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
      [420160.665166] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
      [420160.674493] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
      [420160.683833] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
      [420160.693211] FS:  00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
      [420160.703645] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [420160.711783] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
      [420160.721399] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [420160.731045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [420160.740707] PKRU: 55555554
      [420160.745960] Call Trace:
      [420160.750962]  <TASK>
      [420160.755597]  ? kmalloc_large_node+0x79/0x90
      [420160.762703]  ? __kmalloc_node+0x3f5/0x4b0
      [420160.769341]  xp_assign_dev+0xfd/0x210
      [420160.775661]  ? shmem_file_read_iter+0x29a/0x420
      [420160.782896]  xsk_bind+0x152/0x490
      [420160.788943]  __sys_bind+0xd0/0x100
      [420160.795097]  ? exit_to_user_mode_prepare+0x20/0x120
      [420160.802801]  __x64_sys_bind+0x16/0x20
      [420160.809298]  do_syscall_64+0x38/0x90
      [420160.815741]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [420160.823731] RIP: 0033:0x7fa86a0dd2fb
      [420160.830264] Code: c3 66 0f 1f 44 00 00 48 8b 15 69 8b 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 f3 0f 1e fa b8 31 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 8b 0c 00 f7 d8 64 89 01 48
      [420160.855410] RSP: 002b:00007ffc1146f618 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
      [420160.866366] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa86a0dd2fb
      [420160.876957] RDX: 0000000000000010 RSI: 00007ffc1146f680 RDI: 0000000000000003
      [420160.887604] RBP: 000055d7113a0520 R08: 00007fa868fb8000 R09: 0000000080000000
      [420160.898293] R10: 0000000000008001 R11: 0000000000000246 R12: 000055d7113a04e0
      [420160.909038] R13: 000055d7113a0320 R14: 000000000000000a R15: 0000000000000000
      [420160.919817]  </TASK>
      [420160.925659] Modules linked in: ice(OE) af_packet binfmt_misc nls_iso8859_1 ipmi_ssif intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mei_me coretemp ioatdma mei ipmi_si wmi ipmi_msghandler acpi_pad acpi_power_meter ip_tables x_tables autofs4 ixgbe i40e crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ahci mdio dca libahci lpc_ich [last unloaded: ice]
      [420160.977576] CR2: 0000000000000082
      [420160.985037] ---[ end trace 0000000000000000 ]---
      [420161.097724] RIP: 0010:ice_xsk_pool_setup+0x44/0x7d0 [ice]
      [420161.107341] Code: f3 48 83 ec 40 48 8b 4f 20 48 8b 3f 65 48 8b 04 25 28 00 00 00 48 89 44 24 38 31 c0 48 8d 04 ed 00 00 00 00 48 01 c1 48 8b 11 <0f> b7 92 82 00 00 00 48 85 d2 0f 84 2d 75 00 00 48 8d 72 ff 48 85
      [420161.134741] RSP: 0018:ffffc9002d2afd48 EFLAGS: 00010282
      [420161.144274] RAX: 0000000000000050 RBX: ffff88811d8bdd00 RCX: ffff888112c14ff8
      [420161.155690] RDX: 0000000000000000 RSI: ffff88811d8bdd00 RDI: ffff888109861000
      [420161.168088] RBP: 000000000000000a R08: 000000000000000a R09: 0000000000000000
      [420161.179295] R10: 000000000000889f R11: 0000000000000000 R12: 000000000000000a
      [420161.190420] R13: 000000000000000a R14: 0000000000000000 R15: ffff888117611828
      [420161.201505] FS:  00007fa869fc1f80(0000) GS:ffff8897e0880000(0000) knlGS:0000000000000000
      [420161.213628] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [420161.223413] CR2: 0000000000000082 CR3: 00000001d076c001 CR4: 00000000007706e0
      [420161.234653] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [420161.245893] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [420161.257052] PKRU: 55555554
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Signed-off-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      5a42f112