1. 20 Jan, 2021 20 commits
  2. 19 Jan, 2021 17 commits
    • Tariq Toukan's avatar
      net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled · a3eb4e9d
      Tariq Toukan authored
      With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
      logically done when RXCSUM offload is off.
      
      Fixes: 14136564 ("net: Add TLS RX offload feature")
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: default avatarBoris Pismenny <borisp@nvidia.com>
      Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a3eb4e9d
    • Jakub Kicinski's avatar
      Merge branch 'ipv4-ensure-ecn-bits-don-t-influence-source-address-validation' · 2565ff4e
      Jakub Kicinski authored
      Guillaume Nault says:
      
      ====================
      ipv4: Ensure ECN bits don't influence source address validation
      
      Functions that end up calling fib_table_lookup() should clear the ECN
      bits from the TOS, otherwise ECT(0) and ECT(1) packets can be treated
      differently.
      
      Most functions already clear the ECN bits, but there are a few cases
      where this is not done. This series only fixes the ones related to
      source address validation.
      ====================
      
      Link: https://lore.kernel.org/r/cover.1610790904.git.gnault@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2565ff4e
    • Guillaume Nault's avatar
      netfilter: rpfilter: mask ecn bits before fib lookup · 2e5a6266
      Guillaume Nault authored
      RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
      treats Not-ECT or ECT(1) packets in a different way than those with
      ECT(0) or CE.
      
      Reproducer:
      
        Create two netns, connected with a veth:
        $ ip netns add ns0
        $ ip netns add ns1
        $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
        $ ip -netns ns0 link set dev veth01 up
        $ ip -netns ns1 link set dev veth10 up
        $ ip -netns ns0 address add 192.0.2.10/32 dev veth01
        $ ip -netns ns1 address add 192.0.2.11/32 dev veth10
      
        Add a route to ns1 in ns0:
        $ ip -netns ns0 route add 192.0.2.11/32 dev veth01
      
        In ns1, only packets with TOS 4 can be routed to ns0:
        $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10
      
        Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
        is 4:
        $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
          ... 0% packet loss ...
      
        Now use iptable's rpfilter module in ns1:
        $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP
      
        Not-ECT and ECT(1) packets still pass:
        $ ip netns exec ns0 ping -Q 4 192.0.2.11   # TOS 4, Not-ECT
          ... 0% packet loss ...
        $ ip netns exec ns0 ping -Q 5 192.0.2.11   # TOS 4, ECT(1)
          ... 0% packet loss ...
      
        But ECT(0) and ECN packets are dropped:
        $ ip netns exec ns0 ping -Q 6 192.0.2.11   # TOS 4, ECT(0)
          ... 100% packet loss ...
        $ ip netns exec ns0 ping -Q 7 192.0.2.11   # TOS 4, CE
          ... 100% packet loss ...
      
      After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.
      
      Fixes: 8f97339d ("netfilter: add ipv4 reverse path filter match")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2e5a6266
    • Guillaume Nault's avatar
      udp: mask TOS bits in udp_v4_early_demux() · 8d2b51b0
      Guillaume Nault authored
      udp_v4_early_demux() is the only function that calls
      ip_mc_validate_source() with a TOS that hasn't been masked with
      IPTOS_RT_MASK.
      
      This results in different behaviours for incoming multicast UDPv4
      packets, depending on if ip_mc_validate_source() is called from the
      early-demux path (udp_v4_early_demux) or from the regular input path
      (ip_route_input_noref).
      
      ECN would normally not be used with UDP multicast packets, so the
      practical consequences should be limited on that side. However,
      IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align
      with the non-early-demux path behaviour.
      
      Reproducer:
      
        Setup two netns, connected with veth:
        $ ip netns add ns0
        $ ip netns add ns1
        $ ip -netns ns0 link set dev lo up
        $ ip -netns ns1 link set dev lo up
        $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
        $ ip -netns ns0 link set dev veth01 up
        $ ip -netns ns1 link set dev veth10 up
        $ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01
        $ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10
      
        In ns0, add route to multicast address 224.0.2.0/24 using source
        address 198.51.100.10:
        $ ip -netns ns0 address add 198.51.100.10/32 dev lo
        $ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10
      
        In ns1, define route to 198.51.100.10, only for packets with TOS 4:
        $ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10
      
        Also activate rp_filter in ns1, so that incoming packets not matching
        the above route get dropped:
        $ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1
      
        Now try to receive packets on 224.0.2.11:
        $ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof -
      
        In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is,
        tos 6 for socat):
        $ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
      
        The "test0" message is properly received by socat in ns1, because
        early-demux has no cached dst to use, so source address validation
        is done by ip_route_input_mc(), which receives a TOS that has the
        ECN bits masked.
      
        Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0):
        $ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
      
        The "test1" message isn't received by socat in ns1, because, now,
        early-demux has a cached dst to use and calls ip_mc_validate_source()
        immediately, without masking the ECN bits.
      
      Fixes: bc044e8d ("udp: perform source validation for mcast early demux")
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d2b51b0
    • Maxim Mikityanskiy's avatar
      xsk: Clear pool even for inactive queues · b425e24a
      Maxim Mikityanskiy authored
      The number of queues can change by other means, rather than ethtool. For
      example, attaching an mqprio qdisc with num_tc > 1 leads to creating
      multiple sets of TX queues, which may be then destroyed when mqprio is
      deleted. If an AF_XDP socket is created while mqprio is active,
      dev->_tx[queue_id].pool will be filled, but then real_num_tx_queues may
      decrease with deletion of mqprio, which will mean that the pool won't be
      NULLed, and a further increase of the number of TX queues may expose a
      dangling pointer.
      
      To avoid any potential misbehavior, this commit clears pool for RX and
      TX queues, regardless of real_num_*_queues, still taking into
      consideration num_*_queues to avoid overflows.
      
      Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
      Fixes: a41b4f3c ("xsk: simplify xdp_clear_umem_at_qid implementation")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
      Link: https://lore.kernel.org/bpf/20210118160333.333439-1-maximmi@mellanox.com
      b425e24a
    • Linus Torvalds's avatar
      Merge tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block · 45dfb8a5
      Linus Torvalds authored
      Pull task_work fix from Jens Axboe:
       "The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
        task_work run we had in get_signal().
      
        This caused a regression for some setups, since we're relying on eg
        ____fput() being run to close and release, for example, a pipe and
        wake the other end.
      
        For 5.11, I prefer the simple solution of just reinstating the
        unconditional run, even if it conceptually doesn't make much sense -
        if you need that kind of guarantee, you should be using TWA_SIGNAL
        instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
        ensure that other potential gotchas/assumptions for task_work don't
        regress for 5.11.
      
        We're looking into further simplifying the task_work notifications for
        5.12 which would resolve that too"
      
      * tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
        task_work: unconditionally run task_work from get_signal()
      45dfb8a5
    • Mircea Cirjaliu's avatar
      bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback · 301a33d5
      Mircea Cirjaliu authored
      I assume this was obtained by copy/paste. Point it to bpf_map_peek_elem()
      instead of bpf_map_pop_elem(). In practice it may have been less likely
      hit when under JIT given shielded via 84430d42 ("bpf, verifier: avoid
      retpoline for map push/pop/peek operation").
      
      Fixes: f1a2e44a ("bpf: add queue and stack maps")
      Signed-off-by: default avatarMircea Cirjaliu <mcirjaliu@bitdefender.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Mauricio Vasquez <mauriciovasquezbernal@gmail.com>
      Link: https://lore.kernel.org/bpf/AM7PR02MB6082663DFDCCE8DA7A6DD6B1BBA30@AM7PR02MB6082.eurprd02.prod.outlook.com
      301a33d5
    • Linus Torvalds's avatar
      Merge tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · f419f031
      Linus Torvalds authored
      Pull nfsd fixes from Chuck Lever:
      
       - Avoid exposing parent of root directory in NFSv3 READDIRPLUS results
      
       - Fix a tracepoint change that went in the initial 5.11 merge
      
      * tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        SUNRPC: Move the svc_xdr_recvfrom tracepoint again
        nfsd4: readdirplus shouldn't return parent of export
      f419f031
    • Linus Torvalds's avatar
      Merge tag 'hyperv-fixes-signed-20210119' of... · 28df8580
      Linus Torvalds authored
      Merge tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
      
      Pull hyperv fix from Wei Liu:
       "One patch from Dexuan to fix clockevent initialization"
      
      * tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
        x86/hyperv: Initialize clockevents after LAPIC is initialized
      28df8580
    • Jakub Kicinski's avatar
      Merge branch 'sh_eth-fix-reboot-crash' · f7b9820d
      Jakub Kicinski authored
      Geert Uytterhoeven says:
      
      ====================
      sh_eth: Fix reboot crash
      
      This patch fixes a regression v5.11-rc1, where rebooting while a sh_eth
      device is not opened will cause a crash.
      
      Changes compared to v1:
        - Export mdiobb_{read,write}(),
        - Call mdiobb_{read,write}() now they are exported,
        - Use mii_bus.parent to avoid bb_info.dev copy,
        - Drop RFC state.
      
      Alternatively, mdio-bitbang could provide Runtime PM-aware wrappers
      itself, and use them either manually (through a new parameter to
      alloc_mdio_bitbang(), or a new alloc_mdio_bitbang_*() function), or
      automatically (e.g. if pm_runtime_enabled() returns true).  Note that
      the latter requires a "struct device *" parameter to operate on.
      Currently there are only two drivers that call alloc_mdio_bitbang() and
      use Runtime PM: the Renesas sh_eth and ravb drivers.  This series fixes
      the former, while the latter is not affected (it keeps the device
      powered all the time between driver probe and driver unbind, and
      changing that seems to be non-trivial).
      ====================
      
      Link: https://lore.kernel.org/r/20210118150656.796584-1-geert+renesas@glider.beSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f7b9820d
    • Geert Uytterhoeven's avatar
      sh_eth: Make PHY access aware of Runtime PM to fix reboot crash · 02cae02a
      Geert Uytterhoeven authored
      Wolfram reports that his R-Car H2-based Lager board can no longer be
      rebooted in v5.11-rc1, as it crashes with an imprecise external abort.
      The issue can be reproduced on other boards (e.g. Koelsch with R-Car
      M2-W) too, if CONFIG_IP_PNP is disabled, and the Ethernet interface is
      down at reboot time:
      
          Unhandled fault: imprecise external abort (0x1406) at 0x00000000
          pgd = (ptrval)
          [00000000] *pgd=422b6835, *pte=00000000, *ppte=00000000
          Internal error: : 1406 [#1] ARM
          Modules linked in:
          CPU: 0 PID: 1105 Comm: init Tainted: G        W         5.10.0-rc1-00402-ge2f016cf #1048
          Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
          PC is at sh_mdio_ctrl+0x44/0x60
          LR is at sh_mmd_ctrl+0x20/0x24
          ...
          Backtrace:
          [<c0451f30>] (sh_mdio_ctrl) from [<c0451fd4>] (sh_mmd_ctrl+0x20/0x24)
           r7:0000001f r6:00000020 r5:00000002 r4:c22a1dc4
          [<c0451fb4>] (sh_mmd_ctrl) from [<c044fc18>] (mdiobb_cmd+0x38/0xa8)
          [<c044fbe0>] (mdiobb_cmd) from [<c044feb8>] (mdiobb_read+0x58/0xdc)
           r9:c229f844 r8:c0c329dc r7:c221e000 r6:00000001 r5:c22a1dc4 r4:00000001
          [<c044fe60>] (mdiobb_read) from [<c044c854>] (__mdiobus_read+0x74/0xe0)
           r7:0000001f r6:00000001 r5:c221e000 r4:c221e000
          [<c044c7e0>] (__mdiobus_read) from [<c044c9d8>] (mdiobus_read+0x40/0x54)
           r7:0000001f r6:00000001 r5:c221e000 r4:c221e458
          [<c044c998>] (mdiobus_read) from [<c044d678>] (phy_read+0x1c/0x20)
           r7:ffffe000 r6:c221e470 r5:00000200 r4:c229f800
          [<c044d65c>] (phy_read) from [<c044d94c>] (kszphy_config_intr+0x44/0x80)
          [<c044d908>] (kszphy_config_intr) from [<c044694c>] (phy_disable_interrupts+0x44/0x50)
           r5:c229f800 r4:c229f800
          [<c0446908>] (phy_disable_interrupts) from [<c0449370>] (phy_shutdown+0x18/0x1c)
           r5:c229f800 r4:c229f804
          [<c0449358>] (phy_shutdown) from [<c040066c>] (device_shutdown+0x168/0x1f8)
          [<c0400504>] (device_shutdown) from [<c013de44>] (kernel_restart_prepare+0x3c/0x48)
           r9:c22d2000 r8:c0100264 r7:c0b0d034 r6:00000000 r5:4321fedc r4:00000000
          [<c013de08>] (kernel_restart_prepare) from [<c013dee0>] (kernel_restart+0x1c/0x60)
          [<c013dec4>] (kernel_restart) from [<c013e1d8>] (__do_sys_reboot+0x168/0x208)
           r5:4321fedc r4:01234567
          [<c013e070>] (__do_sys_reboot) from [<c013e2e8>] (sys_reboot+0x18/0x1c)
           r7:00000058 r6:00000000 r5:00000000 r4:00000000
          [<c013e2d0>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
      
      As of commit e2f016cf ("net: phy: add a shutdown procedure"),
      system reboot calls phy_disable_interrupts() during shutdown.  As this
      happens unconditionally, the PHY registers may be accessed while the
      device is suspended, causing undefined behavior, which may crash the
      system.
      
      Fix this by wrapping the PHY bitbang accessors in the sh_eth driver by
      wrappers that take care of Runtime PM, to resume the device when needed.
      Reported-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Tested-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      02cae02a
    • Geert Uytterhoeven's avatar
      mdio-bitbang: Export mdiobb_{read,write}() · 8eed01b5
      Geert Uytterhoeven authored
      Export mdiobb_read() and mdiobb_write(), so Ethernet controller drivers
      can call them from their MDIO read/write wrappers.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Tested-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8eed01b5
    • Oleksandr Mazur's avatar
      net: core: devlink: use right genl user_ptr when handling port param get/set · 7e238de8
      Oleksandr Mazur authored
      Fix incorrect user_ptr dereferencing when handling port param get/set:
      
          idx [0] stores the 'struct devlink' pointer;
          idx [1] stores the 'struct devlink_port' pointer;
      
      Fixes: 637989b5 ("devlink: Always use user_ptr[0] for devlink and simplify post_doit")
      CC: Parav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarOleksandr Mazur <oleksandr.mazur@plvision.eu>
      Signed-off-by: default avatarVadym Kochan <vadym.kochan@plvision.eu>
      Link: https://lore.kernel.org/r/20210119085333.16833-1-vadym.kochan@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e238de8
    • Enke Chen's avatar
      tcp: fix TCP_USER_TIMEOUT with zero window · 9d9b1ee0
      Enke Chen authored
      The TCP session does not terminate with TCP_USER_TIMEOUT when data
      remain untransmitted due to zero window.
      
      The number of unanswered zero-window probes (tcp_probes_out) is
      reset to zero with incoming acks irrespective of the window size,
      as described in tcp_probe_timer():
      
          RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
          as long as the receiver continues to respond probes. We support
          this by default and reset icsk_probes_out with incoming ACKs.
      
      This counter, however, is the wrong one to be used in calculating the
      duration that the window remains closed and data remain untransmitted.
      Thanks to Jonathan Maxwell <jmaxwell37@gmail.com> for diagnosing the
      actual issue.
      
      In this patch a new timestamp is introduced for the socket in order to
      track the elapsed time for the zero-window probes that have not been
      answered with any non-zero window ack.
      
      Fixes: 9721e709 ("tcp: simplify window probe aborting on USER_TIMEOUT")
      Reported-by: default avatarWilliam McCall <william.mccall@gmail.com>
      Co-developed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarEnke Chen <enchen@paloaltonetworks.com>
      Reviewed-by: default avatarYuchung Cheng <ycheng@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20210115223058.GA39267@localhost.localdomainSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9d9b1ee0
    • Jakub Kicinski's avatar
      Merge branch 'ipv6-fixes-for-the-multicast-routes' · b889c7c8
      Jakub Kicinski authored
      Matteo Croce says:
      
      ====================
      ipv6: fixes for the multicast routes
      
      Fix two wrong flags in the IPv6 multicast routes created
      by the autoconf code.
      ====================
      
      Link: https://lore.kernel.org/r/20210115184209.78611-1-mcroce@linux.microsoft.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b889c7c8
    • Matteo Croce's avatar
      ipv6: set multicast flag on the multicast route · ceed9038
      Matteo Croce authored
      The multicast route ff00::/8 is created with type RTN_UNICAST:
      
        $ ip -6 -d route
        unicast ::1 dev lo proto kernel scope global metric 256 pref medium
        unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
        unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium
      
      Set the type to RTN_MULTICAST which is more appropriate.
      
      Fixes: e8478e80 ("net/ipv6: Save route type in rt6_info")
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ceed9038
    • Matteo Croce's avatar
      ipv6: create multicast route with RTPROT_KERNEL · a826b043
      Matteo Croce authored
      The ff00::/8 multicast route is created without specifying the fc_protocol
      field, so the default RTPROT_BOOT value is used:
      
        $ ip -6 -d route
        unicast ::1 dev lo proto kernel scope global metric 256 pref medium
        unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
        unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium
      
      As the documentation says, this value identifies routes installed during
      boot, but the route is created when interface is set up.
      Change the value to RTPROT_KERNEL which is a better value.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarMatteo Croce <mcroce@microsoft.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a826b043
  3. 18 Jan, 2021 3 commits