1. 16 Mar, 2023 2 commits
    • Akihiko Odaki's avatar
      igb: Enable SR-IOV after reinit · 50f30349
      Akihiko Odaki authored
      Enabling SR-IOV causes the virtual functions to make requests to the
      PF via the mailbox. Notably, E1000_VF_RESET request will happen during
      the initialization of the VF. However, unless the reinit is done, the
      VMMB interrupt, which delivers mailbox interrupt from VF to PF will be
      kept masked and such requests will be silently ignored.
      
      Enable SR-IOV at the very end of the procedure to configure the device
      for SR-IOV so that the PF is configured properly for SR-IOV when a VF is
      activated.
      
      Fixes: fa44f2f1 ("igb: Enable SR-IOV configuration via PCI sysfs interface")
      Signed-off-by: default avatarAkihiko Odaki <akihiko.odaki@daynix.com>
      Tested-by: default avatarMarek Szlosek <marek.szlosek@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      50f30349
    • Lin Ma's avatar
      igb: revert rtnl_lock() that causes deadlock · 65f69851
      Lin Ma authored
      The commit 6faee3d4 ("igb: Add lock to avoid data race") adds
      rtnl_lock to eliminate a false data race shown below
      
       (FREE from device detaching)      |   (USE from netdev core)
      igb_remove                         |  igb_ndo_get_vf_config
       igb_disable_sriov                 |  vf >= adapter->vfs_allocated_count?
        kfree(adapter->vf_data)          |
        adapter->vfs_allocated_count = 0 |
                                         |    memcpy(... adapter->vf_data[vf]
      
      The above race will never happen and the extra rtnl_lock causes deadlock
      below
      
      [  141.420169]  <TASK>
      [  141.420672]  __schedule+0x2dd/0x840
      [  141.421427]  schedule+0x50/0xc0
      [  141.422041]  schedule_preempt_disabled+0x11/0x20
      [  141.422678]  __mutex_lock.isra.13+0x431/0x6b0
      [  141.423324]  unregister_netdev+0xe/0x20
      [  141.423578]  igbvf_remove+0x45/0xe0 [igbvf]
      [  141.423791]  pci_device_remove+0x36/0xb0
      [  141.423990]  device_release_driver_internal+0xc1/0x160
      [  141.424270]  pci_stop_bus_device+0x6d/0x90
      [  141.424507]  pci_stop_and_remove_bus_device+0xe/0x20
      [  141.424789]  pci_iov_remove_virtfn+0xba/0x120
      [  141.425452]  sriov_disable+0x2f/0xf0
      [  141.425679]  igb_disable_sriov+0x4e/0x100 [igb]
      [  141.426353]  igb_remove+0xa0/0x130 [igb]
      [  141.426599]  pci_device_remove+0x36/0xb0
      [  141.426796]  device_release_driver_internal+0xc1/0x160
      [  141.427060]  driver_detach+0x44/0x90
      [  141.427253]  bus_remove_driver+0x55/0xe0
      [  141.427477]  pci_unregister_driver+0x2a/0xa0
      [  141.428296]  __x64_sys_delete_module+0x141/0x2b0
      [  141.429126]  ? mntput_no_expire+0x4a/0x240
      [  141.429363]  ? syscall_trace_enter.isra.19+0x126/0x1a0
      [  141.429653]  do_syscall_64+0x5b/0x80
      [  141.429847]  ? exit_to_user_mode_prepare+0x14d/0x1c0
      [  141.430109]  ? syscall_exit_to_user_mode+0x12/0x30
      [  141.430849]  ? do_syscall_64+0x67/0x80
      [  141.431083]  ? syscall_exit_to_user_mode_prepare+0x183/0x1b0
      [  141.431770]  ? syscall_exit_to_user_mode+0x12/0x30
      [  141.432482]  ? do_syscall_64+0x67/0x80
      [  141.432714]  ? exc_page_fault+0x64/0x140
      [  141.432911]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      Since the igb_disable_sriov() will call pci_disable_sriov() before
      releasing any resources, the netdev core will synchronize the cleanup to
      avoid any races. This patch removes the useless rtnl_(un)lock to guarantee
      correctness.
      
      CC: stable@vger.kernel.org
      Fixes: 6faee3d4 ("igb: Add lock to avoid data race")
      Reported-by: default avatarCorinna Vinschen <vinschen@redhat.com>
      Link: https://lore.kernel.org/intel-wired-lan/ZAcJvkEPqWeJHO2r@calimero.vinschen.de/Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Tested-by: default avatarCorinna Vinschen <vinschen@redhat.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Tested-by: default avatarRafal Romanowski <rafal.romanowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      65f69851
  2. 08 Mar, 2023 3 commits
  3. 07 Mar, 2023 9 commits
  4. 06 Mar, 2023 15 commits
    • Jakub Kicinski's avatar
      net: tls: fix device-offloaded sendpage straddling records · e539a105
      Jakub Kicinski authored
      Adrien reports that incorrect data is transmitted when a single
      page straddles multiple records. We would transmit the same
      data in all iterations of the loop.
      Reported-by: default avatarAdrien Moulin <amoulin@corp.free.fr>
      Link: https://lore.kernel.org/all/61481278.42813558.1677845235112.JavaMail.zimbra@corp.free.fr
      Fixes: c1318b39 ("tls: Add opt-in zerocopy mode of sendfile()")
      Tested-by: default avatarAdrien Moulin <amoulin@corp.free.fr>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarMaxim Mikityanskiy <maxtram95@gmail.com>
      Link: https://lore.kernel.org/r/20230304192610.3818098-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e539a105
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix RX data corruption issue · 193250ac
      Daniel Golle authored
      Fix data corruption issue with SerDes connected PHYs operating at 1.25
      Gbps speed where we could previously observe about 30% packet loss while
      the bad packet counter was increasing.
      
      As almost all boards with MediaTek MT7622 or MT7986 use either the MT7531
      switch IC operating at 3.125Gbps SerDes rate or single-port PHYs using
      rate-adaptation to 2500Base-X mode, this issue only got exposed now when
      we started trying to use SFP modules operating with 1.25 Gbps with the
      BananaPi R3 board.
      
      The fix is to set bit 12 which disables the RX FIFO clear function when
      setting up MAC MCR, MediaTek SDK did the same change stating:
      "If without this patch, kernel might receive invalid packets that are
      corrupted by GMAC."[1]
      
      [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
      
      Fixes: 42c03844 ("net-next: mediatek: add support for MediaTek MT7622 SoC")
      Tested-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/138da2735f92c8b6f8578ec2e5a794ee515b665f.1677937317.git.daniel@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      193250ac
    • Heiner Kallweit's avatar
      net: phy: smsc: fix link up detection in forced irq mode · 58aac3a2
      Heiner Kallweit authored
      Currently link up can't be detected in forced mode if polling
      isn't used. Only link up interrupt source we have is aneg
      complete which isn't applicable in forced mode. Therefore we
      have to use energy-on as link up indicator.
      
      Fixes: 73654945 ("net: phy: smsc: skip ENERGYON interrupt if disabled")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58aac3a2
    • Martin KaFai Lau's avatar
      Merge branch 'fix resolving VAR after DATASEC' · 32dfc59e
      Martin KaFai Lau authored
      Lorenz Bauer says:
      
      ====================
      
      See the first patch for a detailed explanation.
      
      v2:
      - Move RESOLVE_TBD assignment out of the loop (Martin)
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      32dfc59e
    • Lorenz Bauer's avatar
      selftests/bpf: check that modifier resolves after pointer · dfdd608c
      Lorenz Bauer authored
      Add a regression test that ensures that a VAR pointing at a
      modifier which follows a PTR (or STRUCT or ARRAY) is resolved
      correctly by the datasec validator.
      Signed-off-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/r/20230306112138.155352-3-lmb@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      dfdd608c
    • Lorenz Bauer's avatar
      btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR · 9b459804
      Lorenz Bauer authored
      btf_datasec_resolve contains a bug that causes the following BTF
      to fail loading:
      
          [1] DATASEC a size=2 vlen=2
              type_id=4 offset=0 size=1
              type_id=7 offset=1 size=1
          [2] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
          [3] PTR (anon) type_id=2
          [4] VAR a type_id=3 linkage=0
          [5] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
          [6] TYPEDEF td type_id=5
          [7] VAR b type_id=6 linkage=0
      
      This error message is printed during btf_check_all_types:
      
          [1] DATASEC a size=2 vlen=2
              type_id=7 offset=1 size=1 Invalid type
      
      By tracing btf_*_resolve we can pinpoint the problem:
      
          btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_TBD) = 0
              btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_TBD) = 0
                  btf_ptr_resolve(depth: 3, type_id: 3, mode: RESOLVE_PTR) = 0
              btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_PTR) = 0
          btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_PTR) = -22
      
      The last invocation of btf_datasec_resolve should invoke btf_var_resolve
      by means of env_stack_push, instead it returns EINVAL. The reason is that
      env_stack_push is never executed for the second VAR.
      
          if (!env_type_is_resolve_sink(env, var_type) &&
              !env_type_is_resolved(env, var_type_id)) {
              env_stack_set_next_member(env, i + 1);
              return env_stack_push(env, var_type, var_type_id);
          }
      
      env_type_is_resolve_sink() changes its behaviour based on resolve_mode.
      For RESOLVE_PTR, we can simplify the if condition to the following:
      
          (btf_type_is_modifier() || btf_type_is_ptr) && !env_type_is_resolved()
      
      Since we're dealing with a VAR the clause evaluates to false. This is
      not sufficient to trigger the bug however. The log output and EINVAL
      are only generated if btf_type_id_size() fails.
      
          if (!btf_type_id_size(btf, &type_id, &type_size)) {
              btf_verifier_log_vsi(env, v->t, vsi, "Invalid type");
              return -EINVAL;
          }
      
      Most types are sized, so for example a VAR referring to an INT is not a
      problem. The bug is only triggered if a VAR points at a modifier. Since
      we skipped btf_var_resolve that modifier was also never resolved, which
      means that btf_resolved_type_id returns 0 aka VOID for the modifier.
      This in turn causes btf_type_id_size to return NULL, triggering EINVAL.
      
      To summarise, the following conditions are necessary:
      
      - VAR pointing at PTR, STRUCT, UNION or ARRAY
      - Followed by a VAR pointing at TYPEDEF, VOLATILE, CONST, RESTRICT or
        TYPE_TAG
      
      The fix is to reset resolve_mode to RESOLVE_TBD before attempting to
      resolve a VAR from a DATASEC.
      
      Fixes: 1dc92851 ("bpf: kernel side support for BTF Var and DataSec")
      Signed-off-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/r/20230306112138.155352-2-lmb@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      9b459804
    • Alexander Lobakin's avatar
      bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES · 294635a8
      Alexander Lobakin authored
      &xdp_buff and &xdp_frame are bound in a way that
      
      xdp_buff->data_hard_start == xdp_frame
      
      It's always the case and e.g. xdp_convert_buff_to_frame() relies on
      this.
      IOW, the following:
      
      	for (u32 i = 0; i < 0xdead; i++) {
      		xdpf = xdp_convert_buff_to_frame(&xdp);
      		xdp_convert_frame_to_buff(xdpf, &xdp);
      	}
      
      shouldn't ever modify @xdpf's contents or the pointer itself.
      However, "live packet" code wrongly treats &xdp_frame as part of its
      context placed *before* the data_hard_start. With such flow,
      data_hard_start is sizeof(*xdpf) off to the right and no longer points
      to the XDP frame.
      
      Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
      places and praying that there are no more miscalcs left somewhere in the
      code, unionize ::frm with ::data in a flex array, so that both starts
      pointing to the actual data_hard_start and the XDP frame actually starts
      being a part of it, i.e. a part of the headroom, not the context.
      A nice side effect is that the maximum frame size for this mode gets
      increased by 40 bytes, as xdp_buff::frame_sz includes everything from
      data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
      info.
      Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
      hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
      
      Minor: align `&head->data` with how `head->frm` is assigned for
      consistency.
      Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
      clarity.
      
      (was found while testing XDP traffic generator on ice, which calls
       xdp_convert_frame_to_buff() for each XDP frame)
      
      Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      294635a8
    • Bagas Sanjaya's avatar
      bpf, doc: Link to submitting-patches.rst for general patch submission info · b7abcd9c
      Bagas Sanjaya authored
      The link for patch submission information in general refers to index
      page for "Working with the kernel development community" section of
      kernel docs, whereas the link should have been
      Documentation/process/submitting-patches.rst instead.
      
      Fix it by replacing the index target with the appropriate doc.
      
      Fixes: 54222838 ("bpf, doc: convert bpf_devel_QA.rst to use RST formatting")
      Signed-off-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230228074523.11493-3-bagasdotme@gmail.com
      b7abcd9c
    • Bagas Sanjaya's avatar
      bpf, doc: Do not link to docs.kernel.org for kselftest link · 32db18d6
      Bagas Sanjaya authored
      The question on how to run BPF selftests have a reference link to kernel
      selftest documentation (Documentation/dev-tools/kselftest.rst). However,
      it uses external link to the documentation at kernel.org/docs (aka
      docs.kernel.org) instead, which requires Internet access.
      
      Fix this and replace the link with internal linking, by using :doc: directive
      while keeping the anchor text.
      
      Fixes: b7a27c3a ("bpf, doc: howto use/run the BPF selftests")
      Signed-off-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230228074523.11493-2-bagasdotme@gmail.com
      32db18d6
    • Florian Westphal's avatar
      netfilter: tproxy: fix deadlock due to missing BH disable · 4a024267
      Florian Westphal authored
      The xtables packet traverser performs an unconditional local_bh_disable(),
      but the nf_tables evaluation loop does not.
      
      Functions that are called from either xtables or nftables must assume
      that they can be called in process context.
      
      inet_twsk_deschedule_put() assumes that no softirq interrupt can occur.
      If tproxy is used from nf_tables its possible that we'll deadlock
      trying to aquire a lock already held in process context.
      
      Add a small helper that takes care of this and use it.
      
      Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/
      Fixes: 4ed8eb65 ("netfilter: nf_tables: Add native tproxy support")
      Reported-and-tested-by: default avatarMajor Dávid <major.david@balasys.hu>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4a024267
    • Ivan Delalande's avatar
      netfilter: ctnetlink: revert to dumping mark regardless of event type · 9f7dd42f
      Ivan Delalande authored
      It seems that change was unintentional, we have userspace code that
      needs the mark while listening for events like REPLY, DESTROY, etc.
      Also include 0-marks in requested dumps, as they were before that fix.
      
      Fixes: 1feeae07 ("netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark")
      Signed-off-by: default avatarIvan Delalande <colona@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9f7dd42f
    • Selvin Xavier's avatar
      bnxt_en: Fix the double free during device removal · 89b59a84
      Selvin Xavier authored
      Following warning reported by KASAN during driver unload
      
      ==================================================================
      BUG: KASAN: double-free in bnxt_remove_one+0x103/0x200 [bnxt_en]
      Free of addr ffff88814e8dd4c0 by task rmmod/17469
      CPU: 47 PID: 17469 Comm: rmmod Kdump: loaded Tainted: G S                 6.2.0-rc7+ #2
      Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.3.10 08/15/2019
      Call Trace:
       <TASK>
       dump_stack_lvl+0x33/0x46
       print_report+0x17b/0x4b3
       ? __call_rcu_common.constprop.79+0x27e/0x8c0
       ? __pfx_free_object_rcu+0x10/0x10
       ? __virt_addr_valid+0xe3/0x160
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       kasan_report_invalid_free+0x64/0xd0
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       __kasan_slab_free+0x179/0x1c0
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       __kmem_cache_free+0x194/0x350
       bnxt_remove_one+0x103/0x200 [bnxt_en]
       pci_device_remove+0x62/0x110
       device_release_driver_internal+0xf6/0x1c0
       driver_detach+0x76/0xe0
       bus_remove_driver+0x89/0x160
       pci_unregister_driver+0x26/0x110
       ? strncpy_from_user+0x188/0x1c0
       bnxt_exit+0xc/0x24 [bnxt_en]
       __x64_sys_delete_module+0x21f/0x390
       ? __pfx___x64_sys_delete_module+0x10/0x10
       ? __pfx_mem_cgroup_handle_over_high+0x10/0x10
       ? _raw_spin_lock+0x87/0xe0
       ? __pfx__raw_spin_lock+0x10/0x10
       ? __audit_syscall_entry+0x185/0x210
       ? ktime_get_coarse_real_ts64+0x51/0x80
       ? syscall_trace_enter.isra.18+0x126/0x1a0
       do_syscall_64+0x37/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7effcb6fd71b
      Code: 73 01 c3 48 8b 0d 6d 17 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 17 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffeada270b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 00005623660e0750 RCX: 00007effcb6fd71b
      RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005623660e07b8
      RBP: 0000000000000000 R08: 00007ffeada26031 R09: 0000000000000000
      R10: 00007effcb771280 R11: 0000000000000206 R12: 00007ffeada272e0
      R13: 00007ffeada28bc4 R14: 00005623660e02a0 R15: 00005623660e0750
       </TASK>
      
      Auxiliary device structures are freed in bnxt_aux_dev_release. So avoid
      calling kfree from bnxt_remove_one.
      
      Also, set bp->edev to NULL before freeing the auxilary private structure.
      
      Fixes: d80d88b0 ("bnxt_en: Add auxiliary driver support")
      Reviewed-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89b59a84
    • Michael Chan's avatar
      bnxt_en: Avoid order-5 memory allocation for TPA data · accd7e23
      Michael Chan authored
      The driver needs to keep track of all the possible concurrent TPA (GRO/LRO)
      completions on the aggregation ring.  On P5 chips, the maximum number
      of concurrent TPA is 256 and the amount of memory we allocate is order-5
      on systems using 4K pages.  Memory allocation failure has been reported:
      
      NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
      CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1
      Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022
      Call Trace:
       dump_stack+0x57/0x6e
       warn_alloc.cold.120+0x7b/0xdd
       ? _cond_resched+0x15/0x30
       ? __alloc_pages_direct_compact+0x15f/0x170
       __alloc_pages_slowpath.constprop.108+0xc58/0xc70
       __alloc_pages_nodemask+0x2d0/0x300
       kmalloc_order+0x24/0xe0
       kmalloc_order_trace+0x19/0x80
       bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en]
       ? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en]
       __bnxt_open_nic+0x12e/0x780 [bnxt_en]
       bnxt_open+0x10b/0x240 [bnxt_en]
       __dev_open+0xe9/0x180
       __dev_change_flags+0x1af/0x220
       dev_change_flags+0x21/0x60
       do_setlink+0x35c/0x1100
      
      Instead of allocating this big chunk of memory and dividing it up for the
      concurrent TPA instances, allocate each small chunk separately for each
      TPA instance.  This will reduce it to order-0 allocations.
      
      Fixes: 79632e9b ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.")
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarDamodharam Ammepalli <damodharam.ammepalli@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      accd7e23
    • Russell King (Oracle)'s avatar
      net: phylib: get rid of unnecessary locking · f4b47a2e
      Russell King (Oracle) authored
      The locking in phy_probe() and phy_remove() does very little to prevent
      any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA
      warnings. Remove it.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.2.0-dirty #1108 Tainted: G        W   E
      ------------------------------------------------------
      ip/415 is trying to acquire lock:
      ffff5c268f81ef50 (&dev->lock){+.+.}-{3:3}, at: phy_attach_direct+0x17c/0x3a0 [libphy]
      
      but task is already holding lock:
      ffffaef6496cb518 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x154/0x560
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (rtnl_mutex){+.+.}-{3:3}:
             __lock_acquire+0x35c/0x6c0
             lock_acquire.part.0+0xcc/0x220
             lock_acquire+0x68/0x84
             __mutex_lock+0x8c/0x414
             mutex_lock_nested+0x34/0x40
             rtnl_lock+0x24/0x30
             sfp_bus_add_upstream+0x34/0x150
             phy_sfp_probe+0x4c/0x94 [libphy]
             mv3310_probe+0x148/0x184 [marvell10g]
             phy_probe+0x8c/0x200 [libphy]
             call_driver_probe+0xbc/0x15c
             really_probe+0xc0/0x320
             __driver_probe_device+0x84/0x120
             driver_probe_device+0x44/0x120
             __device_attach_driver+0xc4/0x160
             bus_for_each_drv+0x80/0xe0
             __device_attach+0xb0/0x1f0
             device_initial_probe+0x1c/0x2c
             bus_probe_device+0xa4/0xb0
             device_add+0x360/0x53c
             phy_device_register+0x60/0xa4 [libphy]
             fwnode_mdiobus_phy_device_register+0xc0/0x190 [fwnode_mdio]
             fwnode_mdiobus_register_phy+0x160/0xd80 [fwnode_mdio]
             of_mdiobus_register+0x140/0x340 [of_mdio]
             orion_mdio_probe+0x298/0x3c0 [mvmdio]
             platform_probe+0x70/0xe0
             call_driver_probe+0x34/0x15c
             really_probe+0xc0/0x320
             __driver_probe_device+0x84/0x120
             driver_probe_device+0x44/0x120
             __driver_attach+0x104/0x210
             bus_for_each_dev+0x78/0xdc
             driver_attach+0x2c/0x3c
             bus_add_driver+0x184/0x240
             driver_register+0x80/0x13c
             __platform_driver_register+0x30/0x3c
             xt_compat_calc_jump+0x28/0xa4 [x_tables]
             do_one_initcall+0x50/0x1b0
             do_init_module+0x50/0x1fc
             load_module+0x684/0x744
             __do_sys_finit_module+0xc4/0x140
             __arm64_sys_finit_module+0x28/0x34
             invoke_syscall+0x50/0x120
             el0_svc_common.constprop.0+0x6c/0x1b0
             do_el0_svc+0x34/0x44
             el0_svc+0x48/0xf0
             el0t_64_sync_handler+0xb8/0xc0
             el0t_64_sync+0x1a0/0x1a4
      
      -> #0 (&dev->lock){+.+.}-{3:3}:
             check_prev_add+0xb4/0xc80
             validate_chain+0x414/0x47c
             __lock_acquire+0x35c/0x6c0
             lock_acquire.part.0+0xcc/0x220
             lock_acquire+0x68/0x84
             __mutex_lock+0x8c/0x414
             mutex_lock_nested+0x34/0x40
             phy_attach_direct+0x17c/0x3a0 [libphy]
             phylink_fwnode_phy_connect.part.0+0x70/0xe4 [phylink]
             phylink_fwnode_phy_connect+0x48/0x60 [phylink]
             mvpp2_open+0xec/0x2e0 [mvpp2]
             __dev_open+0x104/0x214
             __dev_change_flags+0x1d4/0x254
             dev_change_flags+0x2c/0x7c
             do_setlink+0x254/0xa50
             __rtnl_newlink+0x430/0x514
             rtnl_newlink+0x58/0x8c
             rtnetlink_rcv_msg+0x17c/0x560
             netlink_rcv_skb+0x64/0x150
             rtnetlink_rcv+0x20/0x30
             netlink_unicast+0x1d4/0x2b4
             netlink_sendmsg+0x1a4/0x400
             ____sys_sendmsg+0x228/0x290
             ___sys_sendmsg+0x88/0xec
             __sys_sendmsg+0x70/0xd0
             __arm64_sys_sendmsg+0x2c/0x40
             invoke_syscall+0x50/0x120
             el0_svc_common.constprop.0+0x6c/0x1b0
             do_el0_svc+0x34/0x44
             el0_svc+0x48/0xf0
             el0t_64_sync_handler+0xb8/0xc0
             el0t_64_sync+0x1a0/0x1a4
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock(&dev->lock);
                                     lock(rtnl_mutex);
        lock(&dev->lock);
      
       *** DEADLOCK ***
      
      Fixes: 298e54fa ("net: phy: add core phylib sfp support")
      Reported-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4b47a2e
    • Rongguang Wei's avatar
      net: stmmac: add to set device wake up flag when stmmac init phy · a9334b70
      Rongguang Wei authored
      When MAC is not support PMT, driver will check PHY's WoL capability
      and set device wakeup capability in stmmac_init_phy(). We can enable
      the WoL through ethtool, the driver would enable the device wake up
      flag. Now the device_may_wakeup() return true.
      
      But if there is a way which enable the PHY's WoL capability derectly,
      like in BIOS. The driver would not know the enable thing and would not
      set the device wake up flag. The phy_suspend may failed like this:
      
      [   32.409063] PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x50 returns -16
      [   32.409065] PM: Device stmmac-1:00 failed to suspend: error -16
      [   32.409067] PM: Some devices failed to suspend, or early wake event detected
      
      Add to set the device wakeup enable flag according to the get_wol
      function result in PHY can fix the error in this scene.
      
      v2: add a Fixes tag.
      
      Fixes: 1d8e5b0f ("net: stmmac: Support WOL with phy")
      Signed-off-by: default avatarRongguang Wei <weirongguang@kylinos.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9334b70
  5. 03 Mar, 2023 11 commits
    • Liu Jian's avatar
      bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() · d900f3d2
      Liu Jian authored
      When the buffer length of the recvmsg system call is 0, we got the
      flollowing soft lockup problem:
      
      watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [a.out:6149]
      CPU: 3 PID: 6149 Comm: a.out Kdump: loaded Not tainted 6.2.0+ #30
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:remove_wait_queue+0xb/0xc0
      Code: 5e 41 5f c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 <41> 56 41 55 41 54 55 48 89 fd 53 48 89 f3 4c 8d 6b 18 4c 8d 73 20
      RSP: 0018:ffff88811b5978b8 EFLAGS: 00000246
      RAX: 0000000000000000 RBX: ffff88811a7d3780 RCX: ffffffffb7a4d768
      RDX: dffffc0000000000 RSI: ffff88811b597908 RDI: ffff888115408040
      RBP: 1ffff110236b2f1b R08: 0000000000000000 R09: ffff88811a7d37e7
      R10: ffffed10234fa6fc R11: 0000000000000001 R12: ffff88811179b800
      R13: 0000000000000001 R14: ffff88811a7d38a8 R15: ffff88811a7d37e0
      FS:  00007f6fb5398740(0000) GS:ffff888237180000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 000000010b6ba002 CR4: 0000000000370ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       tcp_msg_wait_data+0x279/0x2f0
       tcp_bpf_recvmsg_parser+0x3c6/0x490
       inet_recvmsg+0x280/0x290
       sock_recvmsg+0xfc/0x120
       ____sys_recvmsg+0x160/0x3d0
       ___sys_recvmsg+0xf0/0x180
       __sys_recvmsg+0xea/0x1a0
       do_syscall_64+0x3f/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      The logic in tcp_bpf_recvmsg_parser is as follows:
      
      msg_bytes_ready:
      	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
      	if (!copied) {
      		wait data;
      		goto msg_bytes_ready;
      	}
      
      In this case, "copied" always is 0, the infinite loop occurs.
      
      According to the Linux system call man page, 0 should be returned in this
      case. Therefore, in tcp_bpf_recvmsg_parser(), if the length is 0, directly
      return. Also modify several other functions with the same problem.
      
      Fixes: 1f5be6b3 ("udp: Implement udp_bpf_recvmsg() for sockmap")
      Fixes: 9825d866 ("af_unix: Implement unix_dgram_bpf_recvmsg()")
      Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20230303080946.1146638-1-liujian56@huawei.com
      d900f3d2
    • David S. Miller's avatar
      Merge branch 'nfp-ipsec-csum' · 52812526
      David S. Miller authored
      Simon Horman says:
      
      ====================
      nfp: fix incorrect IPsec checksum handling
      
      this short series resolves two problems with IPsec checksum handling
      in the nfp driver.
      
      * PATCH 1/3, 2/3: Correct setting of checksum flags.
        One patch for each of the nfd3 and nfdk datapaths.
      
      * Patch 3/3: Correct configuration of NETIF_F_CSUM_MASK
        so that the stack does not unecessarily calculate csums for
        IPsec offload packets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52812526
    • Huanhuan Wang's avatar
      nfp: fix esp-tx-csum-offload doesn't take effect · 1cf78d4c
      Huanhuan Wang authored
      When esp-tx-csum-offload is set to on, the protocol stack shouldn't
      calculate the IPsec offload packet's csum, but it does. Because the
      callback `.ndo_features_check` incorrectly masked NETIF_F_CSUM_MASK bit.
      
      Fixes: 57f273ad ("nfp: add framework to support ipsec offloading")
      Signed-off-by: default avatarHuanhuan Wang <huanhuan.wang@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cf78d4c
    • Huanhuan Wang's avatar
      nfp: fix incorrectly set csum flag for nfdk path · 8b46168c
      Huanhuan Wang authored
      The csum flag of IPsec packet are set repeatedly. Therefore, the csum
      flag set of IPsec and non-IPsec packet need to be distinguished.
      
      As the ipv6 header does not have a csum field, so l3-csum flag is not
      required to be set for ipv6 case.
      
      Fixes: 436396f2 ("nfp: support IPsec offloading for NFP3800")
      Signed-off-by: default avatarHuanhuan Wang <huanhuan.wang@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b46168c
    • Huanhuan Wang's avatar
      nfp: fix incorrectly set csum flag for nfd3 path · 3e04419c
      Huanhuan Wang authored
      The csum flag of IPsec packet are set repeatedly. Therefore, the csum
      flag set of IPsec and non-IPsec packet need to be distinguished.
      
      As the ipv6 header does not have a csum field, so l3-csum flag is not
      required to be set for ipv6 case.
      
      L4-csum flag include the tcp csum flag and udp csum flag, we shouldn't
      set the udp and tcp csum flag at the same time for one packet, should
      set l4-csum flag according to the transport layer is tcp or udp.
      
      Fixes: 57f273ad ("nfp: add framework to support ipsec offloading")
      Signed-off-by: default avatarHuanhuan Wang <huanhuan.wang@corigine.com>
      Reviewed-by: default avatarLouis Peens <louis.peens@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e04419c
    • Petr Oros's avatar
      ice: copy last block omitted in ice_get_module_eeprom() · 84cba184
      Petr Oros authored
      ice_get_module_eeprom() is broken since commit e9c9692c ("ice:
      Reimplement module reads used by ethtool") In this refactor,
      ice_get_module_eeprom() reads the eeprom in blocks of size 8.
      But the condition that should protect the buffer overflow
      ignores the last block. The last block always contains zeros.
      
      Bug uncovered by ethtool upstream commit 9538f384b535
      ("netlink: eeprom: Defer page requests to individual parsers")
      After this commit, ethtool reads a block with length = 1;
      to read the SFF-8024 identifier value.
      
      unpatched driver:
      $ ethtool -m enp65s0f0np0 offset 0x90 length 8
      Offset          Values
      ------          ------
      0x0090:         00 00 00 00 00 00 00 00
      $ ethtool -m enp65s0f0np0 offset 0x90 length 12
      Offset          Values
      ------          ------
      0x0090:         00 00 01 a0 4d 65 6c 6c 00 00 00 00
      $
      
      $ ethtool -m enp65s0f0np0
      Offset          Values
      ------          ------
      0x0000:         11 06 06 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0010:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0020:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0030:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0040:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      0x0060:         00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 00
      0x0070:         00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      patched driver:
      $ ethtool -m enp65s0f0np0 offset 0x90 length 8
      Offset          Values
      ------          ------
      0x0090:         00 00 01 a0 4d 65 6c 6c
      $ ethtool -m enp65s0f0np0 offset 0x90 length 12
      Offset          Values
      ------          ------
      0x0090:         00 00 01 a0 4d 65 6c 6c 61 6e 6f 78
      $ ethtool -m enp65s0f0np0
          Identifier                                : 0x11 (QSFP28)
          Extended identifier                       : 0x00
          Extended identifier description           : 1.5W max. Power consumption
          Extended identifier description           : No CDR in TX, No CDR in RX
          Extended identifier description           : High Power Class (> 3.5 W) not enabled
          Connector                                 : 0x23 (No separable connector)
          Transceiver codes                         : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00
          Transceiver type                          : 40G Ethernet: 40G Base-CR4
          Transceiver type                          : 25G Ethernet: 25G Base-CR CA-N
          Encoding                                  : 0x05 (64B/66B)
          BR, Nominal                               : 25500Mbps
          Rate identifier                           : 0x00
          Length (SMF,km)                           : 0km
          Length (OM3 50um)                         : 0m
          Length (OM2 50um)                         : 0m
          Length (OM1 62.5um)                       : 0m
          Length (Copper or Active cable)           : 1m
          Transmitter technology                    : 0xa0 (Copper cable unequalized)
          Attenuation at 2.5GHz                     : 4db
          Attenuation at 5.0GHz                     : 5db
          Attenuation at 7.0GHz                     : 7db
          Attenuation at 12.9GHz                    : 10db
          ........
          ....
      
      Fixes: e9c9692c ("ice: Reimplement module reads used by ethtool")
      Signed-off-by: default avatarPetr Oros <poros@redhat.com>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84cba184
    • David S. Miller's avatar
      Merge branch 'net-tools-ynl-fixes' · 8f632a0a
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      tools: ynl: fix subset use and change default value for attrs/ops
      
      Fix a problem in subsetting, which will become apparent when
      the devlink family comes after the merge window. Even tho none
      of the existing families need this, we don't want someone to
      get "inspired" by the current, incorrect code when using specs
      in other languages.
      
      Change the default value for the first attr/op. This is a slight
      behavior change so needs to go in now. The diffstat of the last
      patch should serve as the clearest justification there..
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f632a0a
    • Jakub Kicinski's avatar
      netlink: specs: update for codegen enumerating from 1 · bcec7171
      Jakub Kicinski authored
      Now that the codegen rules had been changed we can update
      the specs to reflect the new default.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bcec7171
    • Jakub Kicinski's avatar
      tools: ynl: use 1 as the default for first entry in attrs/ops · ad4fafcd
      Jakub Kicinski authored
      Pretty much all families use value: 1 or reserve as unspec
      the first entry in attribute set and the first operation.
      Make this the default. Update documentation (the doc for
      values of operations just refers back to doc for attrs
      so updating only attrs).
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad4fafcd
    • Jakub Kicinski's avatar
      tools: ynl: fully inherit attrs in subsets · 7cf93538
      Jakub Kicinski authored
      To avoid having to repeat the entire definition of an attribute
      (including the value) use the Attr object from the original set.
      In fact this is already the documented expectation.
      
      Fixes: be5bea1c ("net: add basic C code generators for Netlink")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7cf93538
    • Jakub Kicinski's avatar
      Merge tag 'ieee802154-for-net-2023-03-02' of... · ad93bab6
      Jakub Kicinski authored
      Merge tag 'ieee802154-for-net-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
      
      Stefan Schmidt says:
      
      ====================
      ieee802154 for net 2023-03-02
      
      Two small fixes this time.
      
      Alexander Aring fixed a potential negative array access in the ca8210
      driver.
      
      Miquel Raynal fixed a crash that could have been triggered through
      the extended netlink API for 802154. This only came in this merge window.
      Found by syzkaller.
      
      * tag 'ieee802154-for-net-2023-03-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
        ieee802154: Prevent user from crashing the host
        ca8210: fix mac_len negative array access
      ====================
      
      Link: https://lore.kernel.org/r/20230302153032.1312755-1-stefan@datenfreihafen.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ad93bab6