1. 10 Oct, 2024 9 commits
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: do not remove closing subflows · db0a37b7
      Matthieu Baerts (NGI0) authored
      In a previous fix, the in-kernel path-manager has been modified not to
      retrigger the removal of a subflow if it was already closed, e.g. when
      the initial subflow is removed, but kept in the subflows list.
      
      To be complete, this fix should also skip the subflows that are in any
      closing state: mptcp_close_ssk() will initiate the closure, but the
      switch to the TCP_CLOSE state depends on the other peer.
      
      Fixes: 58e1b66b ("mptcp: pm: do not remove already closed subflows")
      Cc: stable@vger.kernel.org
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-4-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      db0a37b7
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: fallback when MPTCP opts are dropped after 1st data · 119d51e2
      Matthieu Baerts (NGI0) authored
      As reported by Christoph [1], before this patch, an MPTCP connection was
      wrongly reset when a host received a first data packet with MPTCP
      options after the 3wHS, but got the next ones without.
      
      According to the MPTCP v1 specs [2], a fallback should happen in this
      case, because the host didn't receive a DATA_ACK from the other peer,
      nor receive data for more than the initial window which implies a
      DATA_ACK being received by the other peer.
      
      The patch here re-uses the same logic as the one used in other places:
      by looking at allow_infinite_fallback, which is disabled at the creation
      of an additional subflow. It's not looking at the first DATA_ACK (or
      implying one received from the other side) as suggested by the RFC, but
      it is in continuation with what was already done, which is safer, and it
      fixes the reported issue. The next step, looking at this first DATA_ACK,
      is tracked in [4].
      
      This patch has been validated using the following Packetdrill script:
      
         0 socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
        +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
        +0 bind(3, ..., ...) = 0
        +0 listen(3, 1) = 0
      
        // 3WHS is OK
        +0.0 < S  0:0(0)       win 65535  <mss 1460, sackOK, nop, nop, nop, wscale 6, mpcapable v1 flags[flag_h] nokey>
        +0.0 > S. 0:0(0) ack 1            <mss 1460, nop, nop, sackOK, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey]>
        +0.1 <  . 1:1(0) ack 1 win 2048                                              <mpcapable v1 flags[flag_h] key[ckey=2, skey]>
        +0 accept(3, ..., ...) = 4
      
        // Data from the client with valid MPTCP options (no DATA_ACK: normal)
        +0.1 < P. 1:501(500) ack 1 win 2048 <mpcapable v1 flags[flag_h] key[skey, ckey] mpcdatalen 500, nop, nop>
        // From here, the MPTCP options will be dropped by a middlebox
        +0.0 >  . 1:1(0)     ack 501        <dss dack8=501 dll=0 nocs>
      
        +0.1 read(4, ..., 500) = 500
        +0   write(4, ..., 100) = 100
      
        // The server replies with data, still thinking MPTCP is being used
        +0.0 > P. 1:101(100)   ack 501          <dss dack8=501 dsn8=1 ssn=1 dll=100 nocs, nop, nop>
        // But the client already did a fallback to TCP, because the two previous packets have been received without MPTCP options
        +0.1 <  . 501:501(0)   ack 101 win 2048
      
        +0.0 < P. 501:601(100) ack 101 win 2048
        // The server should fallback to TCP, not reset: it didn't get a DATA_ACK, nor data for more than the initial window
        +0.0 >  . 101:101(0)   ack 601
      
      Note that this script requires Packetdrill with MPTCP support, see [3].
      
      Fixes: dea2b1ea ("mptcp: do not reset MP_CAPABLE subflow on mapping errors")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/518 [1]
      Link: https://datatracker.ietf.org/doc/html/rfc8684#name-fallback [2]
      Link: https://github.com/multipath-tcp/packetdrill [3]
      Link: https://github.com/multipath-tcp/mptcp_net-next/issues/519 [4]
      Reviewed-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-3-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      119d51e2
    • Paolo Abeni's avatar
      tcp: fix mptcp DSS corruption due to large pmtu xmit · 4dabcdf5
      Paolo Abeni authored
      Syzkaller was able to trigger a DSS corruption:
      
        TCP: request_sock_subflow_v4: Possible SYN flooding on port [::]:20002. Sending cookies.
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 5227 at net/mptcp/protocol.c:695 __mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
        Modules linked in:
        CPU: 0 UID: 0 PID: 5227 Comm: syz-executor350 Not tainted 6.11.0-syzkaller-08829-gaf9c191a #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
        RIP: 0010:__mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695
        Code: 0f b6 dc 31 ff 89 de e8 b5 dd ea f5 89 d8 48 81 c4 50 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 98 da ea f5 90 <0f> 0b 90 e9 47 ff ff ff e8 8a da ea f5 90 0f 0b 90 e9 99 e0 ff ff
        RSP: 0018:ffffc90000006db8 EFLAGS: 00010246
        RAX: ffffffff8ba9df18 RBX: 00000000000055f0 RCX: ffff888030023c00
        RDX: 0000000000000100 RSI: 00000000000081e5 RDI: 00000000000055f0
        RBP: 1ffff110062bf1ae R08: ffffffff8ba9cf12 R09: 1ffff110062bf1b8
        R10: dffffc0000000000 R11: ffffed10062bf1b9 R12: 0000000000000000
        R13: dffffc0000000000 R14: 00000000700cec61 R15: 00000000000081e5
        FS:  000055556679c380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000020287000 CR3: 0000000077892000 CR4: 00000000003506f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <IRQ>
         move_skbs_to_msk net/mptcp/protocol.c:811 [inline]
         mptcp_data_ready+0x29c/0xa90 net/mptcp/protocol.c:854
         subflow_data_ready+0x34a/0x920 net/mptcp/subflow.c:1490
         tcp_data_queue+0x20fd/0x76c0 net/ipv4/tcp_input.c:5283
         tcp_rcv_established+0xfba/0x2020 net/ipv4/tcp_input.c:6237
         tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
         tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2350
         ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205
         ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233
         NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
         NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314
         __netif_receive_skb_one_core net/core/dev.c:5662 [inline]
         __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775
         process_backlog+0x662/0x15b0 net/core/dev.c:6107
         __napi_poll+0xcb/0x490 net/core/dev.c:6771
         napi_poll net/core/dev.c:6840 [inline]
         net_rx_action+0x89b/0x1240 net/core/dev.c:6962
         handle_softirqs+0x2c5/0x980 kernel/softirq.c:554
         do_softirq+0x11b/0x1e0 kernel/softirq.c:455
         </IRQ>
         <TASK>
         __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
         local_bh_enable include/linux/bottom_half.h:33 [inline]
         rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
         __dev_queue_xmit+0x1764/0x3e80 net/core/dev.c:4451
         dev_queue_xmit include/linux/netdevice.h:3094 [inline]
         neigh_hh_output include/net/neighbour.h:526 [inline]
         neigh_output include/net/neighbour.h:540 [inline]
         ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
         ip_local_out net/ipv4/ip_output.c:130 [inline]
         __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:536
         __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466
         tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline]
         tcp_mtu_probe net/ipv4/tcp_output.c:2547 [inline]
         tcp_write_xmit+0x641d/0x6bf0 net/ipv4/tcp_output.c:2752
         __tcp_push_pending_frames+0x9b/0x360 net/ipv4/tcp_output.c:3015
         tcp_push_pending_frames include/net/tcp.h:2107 [inline]
         tcp_data_snd_check net/ipv4/tcp_input.c:5714 [inline]
         tcp_rcv_established+0x1026/0x2020 net/ipv4/tcp_input.c:6239
         tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915
         sk_backlog_rcv include/net/sock.h:1113 [inline]
         __release_sock+0x214/0x350 net/core/sock.c:3072
         release_sock+0x61/0x1f0 net/core/sock.c:3626
         mptcp_push_release net/mptcp/protocol.c:1486 [inline]
         __mptcp_push_pending+0x6b5/0x9f0 net/mptcp/protocol.c:1625
         mptcp_sendmsg+0x10bb/0x1b10 net/mptcp/protocol.c:1903
         sock_sendmsg_nosec net/socket.c:730 [inline]
         __sock_sendmsg+0x1a6/0x270 net/socket.c:745
         ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2603
         ___sys_sendmsg net/socket.c:2657 [inline]
         __sys_sendmsg+0x2aa/0x390 net/socket.c:2686
         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
         do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
         entry_SYSCALL_64_after_hwframe+0x77/0x7f
        RIP: 0033:0x7fb06e9317f9
        Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
        RSP: 002b:00007ffe2cfd4f98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
        RAX: ffffffffffffffda RBX: 00007fb06e97f468 RCX: 00007fb06e9317f9
        RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000005
        RBP: 00007fb06e97f446 R08: 0000555500000000 R09: 0000555500000000
        R10: 0000555500000000 R11: 0000000000000246 R12: 00007fb06e97f406
        R13: 0000000000000001 R14: 00007ffe2cfd4fe0 R15: 0000000000000003
         </TASK>
      
      Additionally syzkaller provided a nice reproducer. The repro enables
      pmtu on the loopback device, leading to tcp_mtu_probe() generating
      very large probe packets.
      
      tcp_can_coalesce_send_queue_head() currently does not check for
      mptcp-level invariants, and allowed the creation of cross-DSS probes,
      leading to the mentioned corruption.
      
      Address the issue teaching tcp_can_coalesce_send_queue_head() about
      mptcp using the tcp_skb_can_collapse(), also reducing the code
      duplication.
      
      Fixes: 85712484 ("tcp: coalesce/collapse must respect MPTCP extensions")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+d1bff73460e33101f0e7@syzkaller.appspotmail.com
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/513Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-2-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      4dabcdf5
    • Paolo Abeni's avatar
      mptcp: handle consistently DSS corruption · e32d262c
      Paolo Abeni authored
      Bugged peer implementation can send corrupted DSS options, consistently
      hitting a few warning in the data path. Use DEBUG_NET assertions, to
      avoid the splat on some builds and handle consistently the error, dumping
      related MIBs and performing fallback and/or reset according to the
      subflow type.
      
      Fixes: 6771bfd9 ("mptcp: update mptcp ack sequence from work queue")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-1-c6fb8e93e551@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e32d262c
    • Breno Leitao's avatar
      net: netconsole: fix wrong warning · d94785bb
      Breno Leitao authored
      A warning is triggered when there is insufficient space in the buffer
      for userdata. However, this is not an issue since userdata will be sent
      in the next iteration.
      
      Current warning message:
      
          ------------[ cut here ]------------
           WARNING: CPU: 13 PID: 3013042 at drivers/net/netconsole.c:1122 write_ext_msg+0x3b6/0x3d0
            ? write_ext_msg+0x3b6/0x3d0
            console_flush_all+0x1e9/0x330
      
      The code incorrectly issues a warning when this_chunk is zero, which is
      a valid scenario. The warning should only be triggered when this_chunk
      is negative.
      
      Fixes: 1ec9daf9 ("net: netconsole: append userdata to fragmented netconsole messages")
      Signed-off-by: default avatarBreno Leitao <leitao@debian.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241008094325.896208-1-leitao@debian.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d94785bb
    • Vladimir Oltean's avatar
      net: dsa: refuse cross-chip mirroring operations · 8c924369
      Vladimir Oltean authored
      In case of a tc mirred action from one switch to another, the behavior
      is not correct. We simply tell the source switch driver to program a
      mirroring entry towards mirror->to_local_port = to_dp->index, but it is
      not even guaranteed that the to_dp belongs to the same switch as dp.
      
      For proper cross-chip support, we would need to go through the
      cross-chip notifier layer in switch.c, program the entry on cascade
      ports, and introduce new, explicit API for cross-chip mirroring, given
      that intermediary switches should have introspection into the DSA tags
      passed through the cascade port (and not just program a port mirror on
      the entire cascade port). None of that exists today.
      
      Reject what is not implemented so that user space is not misled into
      thinking it works.
      
      Fixes: f50f2127 ("net: dsa: Add plumbing for port mirroring")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://patch.msgid.link/20241008094320.3340980-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c924369
    • Wei Fang's avatar
      net: fec: don't save PTP state if PTP is unsupported · 6be06307
      Wei Fang authored
      Some platforms (such as i.MX25 and i.MX27) do not support PTP, so on
      these platforms fec_ptp_init() is not called and the related members
      in fep are not initialized. However, fec_ptp_save_state() is called
      unconditionally, which causes the kernel to panic. Therefore, add a
      condition so that fec_ptp_save_state() is not called if PTP is not
      supported.
      
      Fixes: a1477dc8 ("net: fec: Restart PPS after link state change")
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Closes: https://lore.kernel.org/lkml/353e41fe-6bb4-4ee9-9980-2da2a9c1c508@roeck-us.net/Signed-off-by: default avatarWei Fang <wei.fang@nxp.com>
      Reviewed-by: default avatarCsókás, Bence <csokas.bence@prolan.hu>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Link: https://patch.msgid.link/20241008061153.1977930-1-wei.fang@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6be06307
    • Rosen Penev's avatar
      net: ibm: emac: mal: add dcr_unmap to _remove · 080ddc22
      Rosen Penev authored
      It's done in probe so it should be undone here.
      
      Fixes: 1d3bb996 ("Device tree aware EMAC driver")
      Signed-off-by: default avatarRosen Penev <rosenp@gmail.com>
      Reviewed-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://patch.msgid.link/20241008233050.9422-1-rosenp@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      080ddc22
    • Jacky Chou's avatar
      net: ftgmac100: fixed not check status from fixed phy · 70a0da8c
      Jacky Chou authored
      Add error handling from calling fixed_phy_register.
      It may return some error, therefore, need to check the status.
      
      And fixed_phy_register needs to bind a device node for mdio.
      Add the mac device node for fixed_phy_register function.
      This is a reference to this function, of_phy_register_fixed_link().
      
      Fixes: e24a6c87 ("net: ftgmac100: Get link speed and duplex for NC-SI")
      Signed-off-by: default avatarJacky Chou <jacky_chou@aspeedtech.com>
      Link: https://patch.msgid.link/20241007032435.787892-1-jacky_chou@aspeedtech.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      70a0da8c
  2. 09 Oct, 2024 6 commits
  3. 08 Oct, 2024 18 commits
  4. 07 Oct, 2024 4 commits
  5. 04 Oct, 2024 3 commits