1. 09 Mar, 2023 8 commits
    • Paolo Abeni's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 67eeadf2
      Paolo Abeni authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2023-03-07 (ice)
      
      This series contains updates to ice driver only.
      
      Dave removes masking from pfcena field as it was incorrectly preventing
      valid traffic classes from being enabled.
      
      Michal resolves various smatch issues such as not propagating error
      codes and returning 0 explicitly.
      
      Arnd Bergmann resolves gcc-9 warning for integer overflow.
      
      * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
        ethernet: ice: avoid gcc-9 integer overflow warning
        ice: don't ignore return codes in VSI related code
        ice: Fix DSCP PFC TLV creation
      ====================
      
      Link: https://lore.kernel.org/r/20230307220714.3997294-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      67eeadf2
    • Jakub Kicinski's avatar
      Merge branch 'tools-ynl-fix-enum-as-flags-in-the-generic-cli' · 2d8cb0bf
      Jakub Kicinski authored
      Jakub Kicinski says:
      
      ====================
      tools: ynl: fix enum-as-flags in the generic CLI
      
      The CLI needs to use proper classes when looking at Enum definitions
      rather than interpreting the YAML spec ad-hoc, because we have more
      than on format of the definition supported.
      ====================
      
      Link: https://lore.kernel.org/r/20230308003923.445268-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2d8cb0bf
    • Jakub Kicinski's avatar
      tools: ynl: fix enum-as-flags in the generic CLI · c311aaa7
      Jakub Kicinski authored
      Lorenzo points out that the generic CLI is broken for the netdev
      family. When I added the support for documentation of enums
      (and sparse enums) the client script was not updated.
      It expects the values in enum to be a list of names,
      now it can also be a dict (YAML object).
      Reported-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Fixes: e4b48ed4 ("tools: ynl: add a completely generic client")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c311aaa7
    • Jakub Kicinski's avatar
      tools: ynl: move the enum classes to shared code · 6517a60b
      Jakub Kicinski authored
      Move bulk of the EnumSet and EnumEntry code to shared
      code for reuse by cli.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6517a60b
    • Thadeu Lima de Souza Cascardo's avatar
      net: avoid double iput when sock_alloc_file fails · 649c15c7
      Thadeu Lima de Souza Cascardo authored
      When sock_alloc_file fails to allocate a file, it will call sock_release.
      __sys_socket_file should then not call sock_release again, otherwise there
      will be a double free.
      
      [   89.319884] ------------[ cut here ]------------
      [   89.320286] kernel BUG at fs/inode.c:1764!
      [   89.320656] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [   89.321051] CPU: 7 PID: 125 Comm: iou-sqp-124 Not tainted 6.2.0+ #361
      [   89.321535] RIP: 0010:iput+0x1ff/0x240
      [   89.321808] Code: d1 83 e1 03 48 83 f9 02 75 09 48 81 fa 00 10 00 00 77 05 83 e2 01 75 1f 4c 89 ef e8 fb d2 ba 00 e9 80 fe ff ff c3 cc cc cc cc <0f> 0b 0f 0b e9 d0 fe ff ff 0f 0b eb 8d 49 8d b4 24 08 01 00 00 48
      [   89.322760] RSP: 0018:ffffbdd60068bd50 EFLAGS: 00010202
      [   89.323036] RAX: 0000000000000000 RBX: ffff9d7ad3cacac0 RCX: 0000000000001107
      [   89.323412] RDX: 000000000003af00 RSI: 0000000000000000 RDI: ffff9d7ad3cacb40
      [   89.323785] RBP: ffffbdd60068bd68 R08: ffffffffffffffff R09: ffffffffab606438
      [   89.324157] R10: ffffffffacb3dfa0 R11: 6465686361657256 R12: ffff9d7ad3cacb40
      [   89.324529] R13: 0000000080000001 R14: 0000000080000001 R15: 0000000000000002
      [   89.324904] FS:  00007f7b28516740(0000) GS:ffff9d7aeb1c0000(0000) knlGS:0000000000000000
      [   89.325328] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   89.325629] CR2: 00007f0af52e96c0 CR3: 0000000002a02006 CR4: 0000000000770ee0
      [   89.326004] PKRU: 55555554
      [   89.326161] Call Trace:
      [   89.326298]  <TASK>
      [   89.326419]  __sock_release+0xb5/0xc0
      [   89.326632]  __sys_socket_file+0xb2/0xd0
      [   89.326844]  io_socket+0x88/0x100
      [   89.327039]  ? io_issue_sqe+0x6a/0x430
      [   89.327258]  io_issue_sqe+0x67/0x430
      [   89.327450]  io_submit_sqes+0x1fe/0x670
      [   89.327661]  io_sq_thread+0x2e6/0x530
      [   89.327859]  ? __pfx_autoremove_wake_function+0x10/0x10
      [   89.328145]  ? __pfx_io_sq_thread+0x10/0x10
      [   89.328367]  ret_from_fork+0x29/0x50
      [   89.328576] RIP: 0033:0x0
      [   89.328732] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
      [   89.329073] RSP: 002b:0000000000000000 EFLAGS: 00000202 ORIG_RAX: 00000000000001a9
      [   89.329477] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007f7b28637a3d
      [   89.329845] RDX: 00007fff4e4318a8 RSI: 00007fff4e4318b0 RDI: 0000000000000400
      [   89.330216] RBP: 00007fff4e431830 R08: 00007fff4e431711 R09: 00007fff4e4318b0
      [   89.330584] R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff4e441b38
      [   89.330950] R13: 0000563835e3e725 R14: 0000563835e40d10 R15: 00007f7b28784040
      [   89.331318]  </TASK>
      [   89.331441] Modules linked in:
      [   89.331617] ---[ end trace 0000000000000000 ]---
      
      Fixes: da214a47 ("net: add __sys_socket_file()")
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230307173707.468744-1-cascardo@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      649c15c7
    • Eric Dumazet's avatar
      af_unix: fix struct pid leaks in OOB support · 2aab4b96
      Eric Dumazet authored
      syzbot reported struct pid leak [1].
      
      Issue is that queue_oob() calls maybe_add_creds() which potentially
      holds a reference on a pid.
      
      But skb->destructor is not set (either directly or by calling
      unix_scm_to_skb())
      
      This means that subsequent kfree_skb() or consume_skb() would leak
      this reference.
      
      In this fix, I chose to fully support scm even for the OOB message.
      
      [1]
      BUG: memory leak
      unreferenced object 0xffff8881053e7f80 (size 128):
      comm "syz-executor242", pid 5066, jiffies 4294946079 (age 13.220s)
      hex dump (first 32 bytes):
      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
      backtrace:
      [<ffffffff812ae26a>] alloc_pid+0x6a/0x560 kernel/pid.c:180
      [<ffffffff812718df>] copy_process+0x169f/0x26c0 kernel/fork.c:2285
      [<ffffffff81272b37>] kernel_clone+0xf7/0x610 kernel/fork.c:2684
      [<ffffffff812730cc>] __do_sys_clone+0x7c/0xb0 kernel/fork.c:2825
      [<ffffffff849ad699>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      [<ffffffff849ad699>] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
      [<ffffffff84a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: 314001f0 ("af_unix: Add OOB support")
      Reported-by: syzbot+7699d9e5635c10253a27@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Rao Shoaib <rao.shoaib@oracle.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230307164530.771896-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2aab4b96
    • Jakub Kicinski's avatar
      eth: fealnx: bring back this old driver · 8f148208
      Jakub Kicinski authored
      This reverts commit d5e2d038.
      
      We have a report of this chip being used on a
      
        SURECOM EP-320X-S 100/10M Ethernet PCI Adapter
      
      which could still have been purchased in some parts
      of the world 3 years ago.
      
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217151
      Fixes: d5e2d038 ("eth: fealnx: delete the driver for Myson MTD-800")
      Link: https://lore.kernel.org/r/20230307171930.4008454-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f148208
    • Vladimir Oltean's avatar
      net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC · c8b8a3c6
      Vladimir Oltean authored
      The MT7530 switch from the MT7621 SoC has 2 ports which can be set up as
      internal: port 5 and 6. Arınç reports that the GMAC1 attached to port 5
      receives corrupted frames, unless port 6 (attached to GMAC0) has been
      brought up by the driver. This is true regardless of whether port 5 is
      used as a user port or as a CPU port (carrying DSA tags).
      
      Offline debugging (blind for me) which began in the linked thread showed
      experimentally that the configuration done by the driver for port 6
      contains a step which is needed by port 5 as well - the write to
      CORE_GSWPLL_GRP2 (note that I've no idea as to what it does, apart from
      the comment "Set core clock into 500Mhz"). Prints put by Arınç show that
      the reset value of CORE_GSWPLL_GRP2 is RG_GSWPLL_POSDIV_500M(1) |
      RG_GSWPLL_FBKDIV_500M(40) (0x128), both on the MCM MT7530 from the
      MT7621 SoC, as well as on the standalone MT7530 from MT7623NI Bananapi
      BPI-R2. Apparently, port 5 on the standalone MT7530 can work under both
      values of the register, while on the MT7621 SoC it cannot.
      
      The call path that triggers the register write is:
      
      mt753x_phylink_mac_config() for port 6
      -> mt753x_pad_setup()
         -> mt7530_pad_clk_setup()
      
      so this fully explains the behavior noticed by Arınç, that bringing port
      6 up is necessary.
      
      The simplest fix for the problem is to extract the register writes which
      are needed for both port 5 and 6 into a common mt7530_pll_setup()
      function, which is called at mt7530_setup() time, immediately after
      switch reset. We can argue that this mirrors the code layout introduced
      in mt7531_setup() by commit 42bc4faf ("net: mt7531: only do PLL once
      after the reset"), in that the PLL setup has the exact same positioning,
      and further work to consolidate the separate setup() functions is not
      hindered.
      
      Testing confirms that:
      
      - the slight reordering of writes to MT7530_P6ECR and to
        CORE_GSWPLL_GRP1 / CORE_GSWPLL_GRP2 introduced by this change does not
        appear to cause problems for the operation of port 6 on MT7621 and on
        MT7623 (where port 5 also always worked)
      
      - packets sent through port 5 are not corrupted anymore, regardless of
        whether port 6 is enabled by phylink or not (or even present in the
        device tree)
      
      My algorithm for determining the Fixes: tag is as follows. Testing shows
      that some logic from mt7530_pad_clk_setup() is needed even for port 5.
      Prior to commit ca366d6c ("net: dsa: mt7530: Convert to PHYLINK
      API"), a call did exist for all phy_is_pseudo_fixed_link() ports - so
      port 5 included. That commit replaced it with a temporary "Port 5 is not
      supported!" comment, and the following commit 38f790a8 ("net: dsa:
      mt7530: Add support for port 5") replaced that comment with a
      configuration procedure in mt7530_setup_port5() which was insufficient
      for port 5 to work. I'm laying the blame on the patch that claimed
      support for port 5, although one would have also needed the change from
      commit c3b8e079 ("net: dsa: mt7530: setup core clock even in TRGMII
      mode") for the write to be performed completely independently from port
      6's configuration.
      
      Thanks go to Arınç for describing the problem, for debugging and for
      testing.
      Reported-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Link: https://lore.kernel.org/netdev/f297c2c4-6e7c-57ac-2394-f6025d309b9d@arinc9.com/
      Fixes: 38f790a8 ("net: dsa: mt7530: Add support for port 5")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230307155411.868573-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c8b8a3c6
  2. 08 Mar, 2023 3 commits
  3. 07 Mar, 2023 12 commits
  4. 06 Mar, 2023 15 commits
    • Jakub Kicinski's avatar
      net: tls: fix device-offloaded sendpage straddling records · e539a105
      Jakub Kicinski authored
      Adrien reports that incorrect data is transmitted when a single
      page straddles multiple records. We would transmit the same
      data in all iterations of the loop.
      Reported-by: default avatarAdrien Moulin <amoulin@corp.free.fr>
      Link: https://lore.kernel.org/all/61481278.42813558.1677845235112.JavaMail.zimbra@corp.free.fr
      Fixes: c1318b39 ("tls: Add opt-in zerocopy mode of sendfile()")
      Tested-by: default avatarAdrien Moulin <amoulin@corp.free.fr>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarMaxim Mikityanskiy <maxtram95@gmail.com>
      Link: https://lore.kernel.org/r/20230304192610.3818098-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e539a105
    • Daniel Golle's avatar
      net: ethernet: mtk_eth_soc: fix RX data corruption issue · 193250ac
      Daniel Golle authored
      Fix data corruption issue with SerDes connected PHYs operating at 1.25
      Gbps speed where we could previously observe about 30% packet loss while
      the bad packet counter was increasing.
      
      As almost all boards with MediaTek MT7622 or MT7986 use either the MT7531
      switch IC operating at 3.125Gbps SerDes rate or single-port PHYs using
      rate-adaptation to 2500Base-X mode, this issue only got exposed now when
      we started trying to use SFP modules operating with 1.25 Gbps with the
      BananaPi R3 board.
      
      The fix is to set bit 12 which disables the RX FIFO clear function when
      setting up MAC MCR, MediaTek SDK did the same change stating:
      "If without this patch, kernel might receive invalid packets that are
      corrupted by GMAC."[1]
      
      [1]: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/d8a2975939a12686c4a95c40db21efdc3f821f63
      
      Fixes: 42c03844 ("net-next: mediatek: add support for MediaTek MT7622 SoC")
      Tested-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/138da2735f92c8b6f8578ec2e5a794ee515b665f.1677937317.git.daniel@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      193250ac
    • Heiner Kallweit's avatar
      net: phy: smsc: fix link up detection in forced irq mode · 58aac3a2
      Heiner Kallweit authored
      Currently link up can't be detected in forced mode if polling
      isn't used. Only link up interrupt source we have is aneg
      complete which isn't applicable in forced mode. Therefore we
      have to use energy-on as link up indicator.
      
      Fixes: 73654945 ("net: phy: smsc: skip ENERGYON interrupt if disabled")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58aac3a2
    • Martin KaFai Lau's avatar
      Merge branch 'fix resolving VAR after DATASEC' · 32dfc59e
      Martin KaFai Lau authored
      Lorenz Bauer says:
      
      ====================
      
      See the first patch for a detailed explanation.
      
      v2:
      - Move RESOLVE_TBD assignment out of the loop (Martin)
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      32dfc59e
    • Lorenz Bauer's avatar
      selftests/bpf: check that modifier resolves after pointer · dfdd608c
      Lorenz Bauer authored
      Add a regression test that ensures that a VAR pointing at a
      modifier which follows a PTR (or STRUCT or ARRAY) is resolved
      correctly by the datasec validator.
      Signed-off-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/r/20230306112138.155352-3-lmb@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      dfdd608c
    • Lorenz Bauer's avatar
      btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR · 9b459804
      Lorenz Bauer authored
      btf_datasec_resolve contains a bug that causes the following BTF
      to fail loading:
      
          [1] DATASEC a size=2 vlen=2
              type_id=4 offset=0 size=1
              type_id=7 offset=1 size=1
          [2] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
          [3] PTR (anon) type_id=2
          [4] VAR a type_id=3 linkage=0
          [5] INT (anon) size=1 bits_offset=0 nr_bits=8 encoding=(none)
          [6] TYPEDEF td type_id=5
          [7] VAR b type_id=6 linkage=0
      
      This error message is printed during btf_check_all_types:
      
          [1] DATASEC a size=2 vlen=2
              type_id=7 offset=1 size=1 Invalid type
      
      By tracing btf_*_resolve we can pinpoint the problem:
      
          btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_TBD) = 0
              btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_TBD) = 0
                  btf_ptr_resolve(depth: 3, type_id: 3, mode: RESOLVE_PTR) = 0
              btf_var_resolve(depth: 2, type_id: 4, mode: RESOLVE_PTR) = 0
          btf_datasec_resolve(depth: 1, type_id: 1, mode: RESOLVE_PTR) = -22
      
      The last invocation of btf_datasec_resolve should invoke btf_var_resolve
      by means of env_stack_push, instead it returns EINVAL. The reason is that
      env_stack_push is never executed for the second VAR.
      
          if (!env_type_is_resolve_sink(env, var_type) &&
              !env_type_is_resolved(env, var_type_id)) {
              env_stack_set_next_member(env, i + 1);
              return env_stack_push(env, var_type, var_type_id);
          }
      
      env_type_is_resolve_sink() changes its behaviour based on resolve_mode.
      For RESOLVE_PTR, we can simplify the if condition to the following:
      
          (btf_type_is_modifier() || btf_type_is_ptr) && !env_type_is_resolved()
      
      Since we're dealing with a VAR the clause evaluates to false. This is
      not sufficient to trigger the bug however. The log output and EINVAL
      are only generated if btf_type_id_size() fails.
      
          if (!btf_type_id_size(btf, &type_id, &type_size)) {
              btf_verifier_log_vsi(env, v->t, vsi, "Invalid type");
              return -EINVAL;
          }
      
      Most types are sized, so for example a VAR referring to an INT is not a
      problem. The bug is only triggered if a VAR points at a modifier. Since
      we skipped btf_var_resolve that modifier was also never resolved, which
      means that btf_resolved_type_id returns 0 aka VOID for the modifier.
      This in turn causes btf_type_id_size to return NULL, triggering EINVAL.
      
      To summarise, the following conditions are necessary:
      
      - VAR pointing at PTR, STRUCT, UNION or ARRAY
      - Followed by a VAR pointing at TYPEDEF, VOLATILE, CONST, RESTRICT or
        TYPE_TAG
      
      The fix is to reset resolve_mode to RESOLVE_TBD before attempting to
      resolve a VAR from a DATASEC.
      
      Fixes: 1dc92851 ("bpf: kernel side support for BTF Var and DataSec")
      Signed-off-by: default avatarLorenz Bauer <lmb@isovalent.com>
      Link: https://lore.kernel.org/r/20230306112138.155352-2-lmb@isovalent.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      9b459804
    • Alexander Lobakin's avatar
      bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES · 294635a8
      Alexander Lobakin authored
      &xdp_buff and &xdp_frame are bound in a way that
      
      xdp_buff->data_hard_start == xdp_frame
      
      It's always the case and e.g. xdp_convert_buff_to_frame() relies on
      this.
      IOW, the following:
      
      	for (u32 i = 0; i < 0xdead; i++) {
      		xdpf = xdp_convert_buff_to_frame(&xdp);
      		xdp_convert_frame_to_buff(xdpf, &xdp);
      	}
      
      shouldn't ever modify @xdpf's contents or the pointer itself.
      However, "live packet" code wrongly treats &xdp_frame as part of its
      context placed *before* the data_hard_start. With such flow,
      data_hard_start is sizeof(*xdpf) off to the right and no longer points
      to the XDP frame.
      
      Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
      places and praying that there are no more miscalcs left somewhere in the
      code, unionize ::frm with ::data in a flex array, so that both starts
      pointing to the actual data_hard_start and the XDP frame actually starts
      being a part of it, i.e. a part of the headroom, not the context.
      A nice side effect is that the maximum frame size for this mode gets
      increased by 40 bytes, as xdp_buff::frame_sz includes everything from
      data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
      info.
      Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
      hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
      
      Minor: align `&head->data` with how `head->frm` is assigned for
      consistency.
      Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
      clarity.
      
      (was found while testing XDP traffic generator on ice, which calls
       xdp_convert_frame_to_buff() for each XDP frame)
      
      Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      294635a8
    • Bagas Sanjaya's avatar
      bpf, doc: Link to submitting-patches.rst for general patch submission info · b7abcd9c
      Bagas Sanjaya authored
      The link for patch submission information in general refers to index
      page for "Working with the kernel development community" section of
      kernel docs, whereas the link should have been
      Documentation/process/submitting-patches.rst instead.
      
      Fix it by replacing the index target with the appropriate doc.
      
      Fixes: 54222838 ("bpf, doc: convert bpf_devel_QA.rst to use RST formatting")
      Signed-off-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230228074523.11493-3-bagasdotme@gmail.com
      b7abcd9c
    • Bagas Sanjaya's avatar
      bpf, doc: Do not link to docs.kernel.org for kselftest link · 32db18d6
      Bagas Sanjaya authored
      The question on how to run BPF selftests have a reference link to kernel
      selftest documentation (Documentation/dev-tools/kselftest.rst). However,
      it uses external link to the documentation at kernel.org/docs (aka
      docs.kernel.org) instead, which requires Internet access.
      
      Fix this and replace the link with internal linking, by using :doc: directive
      while keeping the anchor text.
      
      Fixes: b7a27c3a ("bpf, doc: howto use/run the BPF selftests")
      Signed-off-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20230228074523.11493-2-bagasdotme@gmail.com
      32db18d6
    • Florian Westphal's avatar
      netfilter: tproxy: fix deadlock due to missing BH disable · 4a024267
      Florian Westphal authored
      The xtables packet traverser performs an unconditional local_bh_disable(),
      but the nf_tables evaluation loop does not.
      
      Functions that are called from either xtables or nftables must assume
      that they can be called in process context.
      
      inet_twsk_deschedule_put() assumes that no softirq interrupt can occur.
      If tproxy is used from nf_tables its possible that we'll deadlock
      trying to aquire a lock already held in process context.
      
      Add a small helper that takes care of this and use it.
      
      Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/
      Fixes: 4ed8eb65 ("netfilter: nf_tables: Add native tproxy support")
      Reported-and-tested-by: default avatarMajor Dávid <major.david@balasys.hu>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      4a024267
    • Ivan Delalande's avatar
      netfilter: ctnetlink: revert to dumping mark regardless of event type · 9f7dd42f
      Ivan Delalande authored
      It seems that change was unintentional, we have userspace code that
      needs the mark while listening for events like REPLY, DESTROY, etc.
      Also include 0-marks in requested dumps, as they were before that fix.
      
      Fixes: 1feeae07 ("netfilter: ctnetlink: fix compilation warning after data race fixes in ct mark")
      Signed-off-by: default avatarIvan Delalande <colona@arista.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      9f7dd42f
    • Selvin Xavier's avatar
      bnxt_en: Fix the double free during device removal · 89b59a84
      Selvin Xavier authored
      Following warning reported by KASAN during driver unload
      
      ==================================================================
      BUG: KASAN: double-free in bnxt_remove_one+0x103/0x200 [bnxt_en]
      Free of addr ffff88814e8dd4c0 by task rmmod/17469
      CPU: 47 PID: 17469 Comm: rmmod Kdump: loaded Tainted: G S                 6.2.0-rc7+ #2
      Hardware name: Dell Inc. PowerEdge R740/01YM03, BIOS 2.3.10 08/15/2019
      Call Trace:
       <TASK>
       dump_stack_lvl+0x33/0x46
       print_report+0x17b/0x4b3
       ? __call_rcu_common.constprop.79+0x27e/0x8c0
       ? __pfx_free_object_rcu+0x10/0x10
       ? __virt_addr_valid+0xe3/0x160
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       kasan_report_invalid_free+0x64/0xd0
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       __kasan_slab_free+0x179/0x1c0
       ? bnxt_remove_one+0x103/0x200 [bnxt_en]
       __kmem_cache_free+0x194/0x350
       bnxt_remove_one+0x103/0x200 [bnxt_en]
       pci_device_remove+0x62/0x110
       device_release_driver_internal+0xf6/0x1c0
       driver_detach+0x76/0xe0
       bus_remove_driver+0x89/0x160
       pci_unregister_driver+0x26/0x110
       ? strncpy_from_user+0x188/0x1c0
       bnxt_exit+0xc/0x24 [bnxt_en]
       __x64_sys_delete_module+0x21f/0x390
       ? __pfx___x64_sys_delete_module+0x10/0x10
       ? __pfx_mem_cgroup_handle_over_high+0x10/0x10
       ? _raw_spin_lock+0x87/0xe0
       ? __pfx__raw_spin_lock+0x10/0x10
       ? __audit_syscall_entry+0x185/0x210
       ? ktime_get_coarse_real_ts64+0x51/0x80
       ? syscall_trace_enter.isra.18+0x126/0x1a0
       do_syscall_64+0x37/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7effcb6fd71b
      Code: 73 01 c3 48 8b 0d 6d 17 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 17 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffeada270b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 00005623660e0750 RCX: 00007effcb6fd71b
      RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005623660e07b8
      RBP: 0000000000000000 R08: 00007ffeada26031 R09: 0000000000000000
      R10: 00007effcb771280 R11: 0000000000000206 R12: 00007ffeada272e0
      R13: 00007ffeada28bc4 R14: 00005623660e02a0 R15: 00005623660e0750
       </TASK>
      
      Auxiliary device structures are freed in bnxt_aux_dev_release. So avoid
      calling kfree from bnxt_remove_one.
      
      Also, set bp->edev to NULL before freeing the auxilary private structure.
      
      Fixes: d80d88b0 ("bnxt_en: Add auxiliary driver support")
      Reviewed-by: default avatarAjit Khaparde <ajit.khaparde@broadcom.com>
      Reviewed-by: default avatarAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: default avatarSelvin Xavier <selvin.xavier@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89b59a84
    • Michael Chan's avatar
      bnxt_en: Avoid order-5 memory allocation for TPA data · accd7e23
      Michael Chan authored
      The driver needs to keep track of all the possible concurrent TPA (GRO/LRO)
      completions on the aggregation ring.  On P5 chips, the maximum number
      of concurrent TPA is 256 and the amount of memory we allocate is order-5
      on systems using 4K pages.  Memory allocation failure has been reported:
      
      NetworkManager: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0-1
      CPU: 15 PID: 2995 Comm: NetworkManager Kdump: loaded Not tainted 5.10.156 #1
      Hardware name: Dell Inc. PowerEdge R660/0M1CC5, BIOS 0.2.25 08/12/2022
      Call Trace:
       dump_stack+0x57/0x6e
       warn_alloc.cold.120+0x7b/0xdd
       ? _cond_resched+0x15/0x30
       ? __alloc_pages_direct_compact+0x15f/0x170
       __alloc_pages_slowpath.constprop.108+0xc58/0xc70
       __alloc_pages_nodemask+0x2d0/0x300
       kmalloc_order+0x24/0xe0
       kmalloc_order_trace+0x19/0x80
       bnxt_alloc_mem+0x1150/0x15c0 [bnxt_en]
       ? bnxt_get_func_stat_ctxs+0x13/0x60 [bnxt_en]
       __bnxt_open_nic+0x12e/0x780 [bnxt_en]
       bnxt_open+0x10b/0x240 [bnxt_en]
       __dev_open+0xe9/0x180
       __dev_change_flags+0x1af/0x220
       dev_change_flags+0x21/0x60
       do_setlink+0x35c/0x1100
      
      Instead of allocating this big chunk of memory and dividing it up for the
      concurrent TPA instances, allocate each small chunk separately for each
      TPA instance.  This will reduce it to order-0 allocations.
      
      Fixes: 79632e9b ("bnxt_en: Expand bnxt_tpa_info struct to support 57500 chips.")
      Reviewed-by: default avatarSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: default avatarDamodharam Ammepalli <damodharam.ammepalli@broadcom.com>
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      accd7e23
    • Russell King (Oracle)'s avatar
      net: phylib: get rid of unnecessary locking · f4b47a2e
      Russell King (Oracle) authored
      The locking in phy_probe() and phy_remove() does very little to prevent
      any races with e.g. phy_attach_direct(), but instead causes lockdep ABBA
      warnings. Remove it.
      
      ======================================================
      WARNING: possible circular locking dependency detected
      6.2.0-dirty #1108 Tainted: G        W   E
      ------------------------------------------------------
      ip/415 is trying to acquire lock:
      ffff5c268f81ef50 (&dev->lock){+.+.}-{3:3}, at: phy_attach_direct+0x17c/0x3a0 [libphy]
      
      but task is already holding lock:
      ffffaef6496cb518 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x154/0x560
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (rtnl_mutex){+.+.}-{3:3}:
             __lock_acquire+0x35c/0x6c0
             lock_acquire.part.0+0xcc/0x220
             lock_acquire+0x68/0x84
             __mutex_lock+0x8c/0x414
             mutex_lock_nested+0x34/0x40
             rtnl_lock+0x24/0x30
             sfp_bus_add_upstream+0x34/0x150
             phy_sfp_probe+0x4c/0x94 [libphy]
             mv3310_probe+0x148/0x184 [marvell10g]
             phy_probe+0x8c/0x200 [libphy]
             call_driver_probe+0xbc/0x15c
             really_probe+0xc0/0x320
             __driver_probe_device+0x84/0x120
             driver_probe_device+0x44/0x120
             __device_attach_driver+0xc4/0x160
             bus_for_each_drv+0x80/0xe0
             __device_attach+0xb0/0x1f0
             device_initial_probe+0x1c/0x2c
             bus_probe_device+0xa4/0xb0
             device_add+0x360/0x53c
             phy_device_register+0x60/0xa4 [libphy]
             fwnode_mdiobus_phy_device_register+0xc0/0x190 [fwnode_mdio]
             fwnode_mdiobus_register_phy+0x160/0xd80 [fwnode_mdio]
             of_mdiobus_register+0x140/0x340 [of_mdio]
             orion_mdio_probe+0x298/0x3c0 [mvmdio]
             platform_probe+0x70/0xe0
             call_driver_probe+0x34/0x15c
             really_probe+0xc0/0x320
             __driver_probe_device+0x84/0x120
             driver_probe_device+0x44/0x120
             __driver_attach+0x104/0x210
             bus_for_each_dev+0x78/0xdc
             driver_attach+0x2c/0x3c
             bus_add_driver+0x184/0x240
             driver_register+0x80/0x13c
             __platform_driver_register+0x30/0x3c
             xt_compat_calc_jump+0x28/0xa4 [x_tables]
             do_one_initcall+0x50/0x1b0
             do_init_module+0x50/0x1fc
             load_module+0x684/0x744
             __do_sys_finit_module+0xc4/0x140
             __arm64_sys_finit_module+0x28/0x34
             invoke_syscall+0x50/0x120
             el0_svc_common.constprop.0+0x6c/0x1b0
             do_el0_svc+0x34/0x44
             el0_svc+0x48/0xf0
             el0t_64_sync_handler+0xb8/0xc0
             el0t_64_sync+0x1a0/0x1a4
      
      -> #0 (&dev->lock){+.+.}-{3:3}:
             check_prev_add+0xb4/0xc80
             validate_chain+0x414/0x47c
             __lock_acquire+0x35c/0x6c0
             lock_acquire.part.0+0xcc/0x220
             lock_acquire+0x68/0x84
             __mutex_lock+0x8c/0x414
             mutex_lock_nested+0x34/0x40
             phy_attach_direct+0x17c/0x3a0 [libphy]
             phylink_fwnode_phy_connect.part.0+0x70/0xe4 [phylink]
             phylink_fwnode_phy_connect+0x48/0x60 [phylink]
             mvpp2_open+0xec/0x2e0 [mvpp2]
             __dev_open+0x104/0x214
             __dev_change_flags+0x1d4/0x254
             dev_change_flags+0x2c/0x7c
             do_setlink+0x254/0xa50
             __rtnl_newlink+0x430/0x514
             rtnl_newlink+0x58/0x8c
             rtnetlink_rcv_msg+0x17c/0x560
             netlink_rcv_skb+0x64/0x150
             rtnetlink_rcv+0x20/0x30
             netlink_unicast+0x1d4/0x2b4
             netlink_sendmsg+0x1a4/0x400
             ____sys_sendmsg+0x228/0x290
             ___sys_sendmsg+0x88/0xec
             __sys_sendmsg+0x70/0xd0
             __arm64_sys_sendmsg+0x2c/0x40
             invoke_syscall+0x50/0x120
             el0_svc_common.constprop.0+0x6c/0x1b0
             do_el0_svc+0x34/0x44
             el0_svc+0x48/0xf0
             el0t_64_sync_handler+0xb8/0xc0
             el0t_64_sync+0x1a0/0x1a4
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock(&dev->lock);
                                     lock(rtnl_mutex);
        lock(&dev->lock);
      
       *** DEADLOCK ***
      
      Fixes: 298e54fa ("net: phy: add core phylib sfp support")
      Reported-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4b47a2e
    • Rongguang Wei's avatar
      net: stmmac: add to set device wake up flag when stmmac init phy · a9334b70
      Rongguang Wei authored
      When MAC is not support PMT, driver will check PHY's WoL capability
      and set device wakeup capability in stmmac_init_phy(). We can enable
      the WoL through ethtool, the driver would enable the device wake up
      flag. Now the device_may_wakeup() return true.
      
      But if there is a way which enable the PHY's WoL capability derectly,
      like in BIOS. The driver would not know the enable thing and would not
      set the device wake up flag. The phy_suspend may failed like this:
      
      [   32.409063] PM: dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x50 returns -16
      [   32.409065] PM: Device stmmac-1:00 failed to suspend: error -16
      [   32.409067] PM: Some devices failed to suspend, or early wake event detected
      
      Add to set the device wakeup enable flag according to the get_wol
      function result in PHY can fix the error in this scene.
      
      v2: add a Fixes tag.
      
      Fixes: 1d8e5b0f ("net: stmmac: Support WOL with phy")
      Signed-off-by: default avatarRongguang Wei <weirongguang@kylinos.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9334b70
  5. 03 Mar, 2023 2 commits
    • Liu Jian's avatar
      bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() · d900f3d2
      Liu Jian authored
      When the buffer length of the recvmsg system call is 0, we got the
      flollowing soft lockup problem:
      
      watchdog: BUG: soft lockup - CPU#3 stuck for 27s! [a.out:6149]
      CPU: 3 PID: 6149 Comm: a.out Kdump: loaded Not tainted 6.2.0+ #30
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:remove_wait_queue+0xb/0xc0
      Code: 5e 41 5f c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 <41> 56 41 55 41 54 55 48 89 fd 53 48 89 f3 4c 8d 6b 18 4c 8d 73 20
      RSP: 0018:ffff88811b5978b8 EFLAGS: 00000246
      RAX: 0000000000000000 RBX: ffff88811a7d3780 RCX: ffffffffb7a4d768
      RDX: dffffc0000000000 RSI: ffff88811b597908 RDI: ffff888115408040
      RBP: 1ffff110236b2f1b R08: 0000000000000000 R09: ffff88811a7d37e7
      R10: ffffed10234fa6fc R11: 0000000000000001 R12: ffff88811179b800
      R13: 0000000000000001 R14: ffff88811a7d38a8 R15: ffff88811a7d37e0
      FS:  00007f6fb5398740(0000) GS:ffff888237180000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 000000010b6ba002 CR4: 0000000000370ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       tcp_msg_wait_data+0x279/0x2f0
       tcp_bpf_recvmsg_parser+0x3c6/0x490
       inet_recvmsg+0x280/0x290
       sock_recvmsg+0xfc/0x120
       ____sys_recvmsg+0x160/0x3d0
       ___sys_recvmsg+0xf0/0x180
       __sys_recvmsg+0xea/0x1a0
       do_syscall_64+0x3f/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      
      The logic in tcp_bpf_recvmsg_parser is as follows:
      
      msg_bytes_ready:
      	copied = sk_msg_recvmsg(sk, psock, msg, len, flags);
      	if (!copied) {
      		wait data;
      		goto msg_bytes_ready;
      	}
      
      In this case, "copied" always is 0, the infinite loop occurs.
      
      According to the Linux system call man page, 0 should be returned in this
      case. Therefore, in tcp_bpf_recvmsg_parser(), if the length is 0, directly
      return. Also modify several other functions with the same problem.
      
      Fixes: 1f5be6b3 ("udp: Implement udp_bpf_recvmsg() for sockmap")
      Fixes: 9825d866 ("af_unix: Implement unix_dgram_bpf_recvmsg()")
      Fixes: c5d2177a ("bpf, sockmap: Fix race in ingress receive verdict with redirect to self")
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Cc: Jakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20230303080946.1146638-1-liujian56@huawei.com
      d900f3d2
    • David S. Miller's avatar
      Merge branch 'nfp-ipsec-csum' · 52812526
      David S. Miller authored
      Simon Horman says:
      
      ====================
      nfp: fix incorrect IPsec checksum handling
      
      this short series resolves two problems with IPsec checksum handling
      in the nfp driver.
      
      * PATCH 1/3, 2/3: Correct setting of checksum flags.
        One patch for each of the nfd3 and nfdk datapaths.
      
      * Patch 3/3: Correct configuration of NETIF_F_CSUM_MASK
        so that the stack does not unecessarily calculate csums for
        IPsec offload packets.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52812526