1. 23 Dec, 2021 5 commits
    • Hayes Wang's avatar
      r8152: fix the force speed doesn't work for RTL8156 · 45bf944e
      Hayes Wang authored
      It needs to set mdio force mode. Otherwise, link off always occurs when
      setting force speed.
      
      Fixes: 195aae32 ("r8152: support new chips")
      Signed-off-by: default avatarHayes Wang <hayeswang@realtek.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      45bf944e
    • Remi Pommarel's avatar
      net: bridge: fix ioctl old_deviceless bridge argument · d95a5620
      Remi Pommarel authored
      Commit 561d8352 ("bridge: use ndo_siocdevprivate") changed the
      source and destination arguments of copy_{to,from}_user in bridge's
      old_deviceless() from args[1] to uarg breaking SIOC{G,S}IFBR ioctls.
      
      Commit cbd7ad29 ("net: bridge: fix ioctl old_deviceless bridge
      argument") fixed only BRCTL_{ADD,DEL}_BRIDGES commands leaving
      BRCTL_GET_BRIDGES one untouched.
      
      The fixes BRCTL_GET_BRIDGES as well and has been tested with busybox's
      brctl.
      
      Example of broken brctl:
      $ brctl show
      bridge name     bridge id               STP enabled     interfaces
      brctl: can't get bridge name for index 0: No such device or address
      
      Example of fixed brctl:
      $ brctl show
      bridge name     bridge id               STP enabled     interfaces
      br0             8000.000000000000       no
      
      Fixes: 561d8352 ("bridge: use ndo_siocdevprivate")
      Signed-off-by: default avatarRemi Pommarel <repk@triplefau.lt>
      Reviewed-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Link: https://lore.kernel.org/all/20211223153139.7661-2-repk@triplefau.lt/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d95a5620
    • Xiaoliang Yang's avatar
      net: stmmac: ptp: fix potentially overflowing expression · eccffcf4
      Xiaoliang Yang authored
      Convert the u32 variable to type u64 in a context where expression of
      type u64 is required to avoid potential overflow.
      
      Fixes: e9e37200 ("net: stmmac: ptp: update tas basetime after ptp adjust")
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Link: https://lore.kernel.org/r/20211223073928.37371-1-xiaoliang.yang_1@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eccffcf4
    • Xiaoliang Yang's avatar
      net: dsa: tag_ocelot: use traffic class to map priority on injected header · ae2778a6
      Xiaoliang Yang authored
      For Ocelot switches, the CPU injected frames have an injection header
      where it can specify the QoS class of the packet and the DSA tag, now it
      uses the SKB priority to set that. If a traffic class to priority
      mapping is configured on the netdevice (with mqprio for example ...), it
      won't be considered for CPU injected headers. This patch make the QoS
      class aligned to the priority to traffic class mapping if it exists.
      
      Fixes: 8dce89aa ("net: dsa: ocelot: add tagger for Ocelot/Felix switches")
      Signed-off-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarMarouen Ghodhbane <marouen.ghodhbane@nxp.com>
      Link: https://lore.kernel.org/r/20211223072211.33130-1-xiaoliang.yang_1@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ae2778a6
    • Paolo Abeni's avatar
      veth: ensure skb entering GRO are not cloned. · 9695b7de
      Paolo Abeni authored
      After commit d3256efd ("veth: allow enabling NAPI even without XDP"),
      if GRO is enabled on a veth device and TSO is disabled on the peer
      device, TCP skbs will go through the NAPI callback. If there is no XDP
      program attached, the veth code does not perform any share check, and
      shared/cloned skbs could enter the GRO engine.
      
      Ignat reported a BUG triggered later-on due to the above condition:
      
      [   53.970529][    C1] kernel BUG at net/core/skbuff.c:3574!
      [   53.981755][    C1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      [   53.982634][    C1] CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc5+ #25
      [   53.982634][    C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      [   53.982634][    C1] RIP: 0010:skb_shift+0x13ef/0x23b0
      [   53.982634][    C1] Code: ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0
      7f 08 84 c0 0f 85 41 0c 00 00 41 80 7f 02 00 4d 8d b5 d0 00 00 00 0f
      85 74 f5 ff ff <0f> 0b 4d 8d 77 20 be 04 00 00 00 4c 89 44 24 78 4c 89
      f7 4c 89 8c
      [   53.982634][    C1] RSP: 0018:ffff8881008f7008 EFLAGS: 00010246
      [   53.982634][    C1] RAX: 0000000000000000 RBX: ffff8881180b4c80 RCX: 0000000000000000
      [   53.982634][    C1] RDX: 0000000000000002 RSI: ffff8881180b4d3c RDI: ffff88810bc9cac2
      [   53.982634][    C1] RBP: ffff8881008f70b8 R08: ffff8881180b4cf4 R09: ffff8881180b4cf0
      [   53.982634][    C1] R10: ffffed1022999e5c R11: 0000000000000002 R12: 0000000000000590
      [   53.982634][    C1] R13: ffff88810f940c80 R14: ffff88810f940d50 R15: ffff88810bc9cac0
      [   53.982634][    C1] FS:  0000000000000000(0000) GS:ffff888235880000(0000) knlGS:0000000000000000
      [   53.982634][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   53.982634][    C1] CR2: 00007ff5f9b86680 CR3: 0000000108ce8004 CR4: 0000000000170ee0
      [   53.982634][    C1] Call Trace:
      [   53.982634][    C1]  <TASK>
      [   53.982634][    C1]  tcp_sacktag_walk+0xaba/0x18e0
      [   53.982634][    C1]  tcp_sacktag_write_queue+0xe7b/0x3460
      [   53.982634][    C1]  tcp_ack+0x2666/0x54b0
      [   53.982634][    C1]  tcp_rcv_established+0x4d9/0x20f0
      [   53.982634][    C1]  tcp_v4_do_rcv+0x551/0x810
      [   53.982634][    C1]  tcp_v4_rcv+0x22ed/0x2ed0
      [   53.982634][    C1]  ip_protocol_deliver_rcu+0x96/0xaf0
      [   53.982634][    C1]  ip_local_deliver_finish+0x1e0/0x2f0
      [   53.982634][    C1]  ip_sublist_rcv_finish+0x211/0x440
      [   53.982634][    C1]  ip_list_rcv_finish.constprop.0+0x424/0x660
      [   53.982634][    C1]  ip_list_rcv+0x2c8/0x410
      [   53.982634][    C1]  __netif_receive_skb_list_core+0x65c/0x910
      [   53.982634][    C1]  netif_receive_skb_list_internal+0x5f9/0xcb0
      [   53.982634][    C1]  napi_complete_done+0x188/0x6e0
      [   53.982634][    C1]  gro_cell_poll+0x10c/0x1d0
      [   53.982634][    C1]  __napi_poll+0xa1/0x530
      [   53.982634][    C1]  net_rx_action+0x567/0x1270
      [   53.982634][    C1]  __do_softirq+0x28a/0x9ba
      [   53.982634][    C1]  run_ksoftirqd+0x32/0x60
      [   53.982634][    C1]  smpboot_thread_fn+0x559/0x8c0
      [   53.982634][    C1]  kthread+0x3b9/0x490
      [   53.982634][    C1]  ret_from_fork+0x22/0x30
      [   53.982634][    C1]  </TASK>
      
      Address the issue by skipping the GRO stage for shared or cloned skbs.
      To reduce the chance of OoO, try to unclone the skbs before giving up.
      
      v1 -> v2:
       - use avoid skb_copy and fallback to netif_receive_skb  - Eric
      Reported-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Fixes: d3256efd ("veth: allow enabling NAPI even without XDP")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Tested-by: default avatarIgnat Korchagin <ignat@cloudflare.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/b5f61c5602aab01bac8d711d8d1bfab0a4817db7.1640197544.git.pabeni@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9695b7de
  2. 22 Dec, 2021 8 commits
  3. 21 Dec, 2021 6 commits
    • Heiner Kallweit's avatar
      igb: fix deadlock caused by taking RTNL in RPM resume path · ac8c58f5
      Heiner Kallweit authored
      Recent net core changes caused an issue with few Intel drivers
      (reportedly igb), where taking RTNL in RPM resume path results in a
      deadlock. See [0] for a bug report. I don't think the core changes
      are wrong, but taking RTNL in RPM resume path isn't needed.
      The Intel drivers are the only ones doing this. See [1] for a
      discussion on the issue. Following patch changes the RPM resume path
      to not take RTNL.
      
      [0] https://bugzilla.kernel.org/show_bug.cgi?id=215129
      [1] https://lore.kernel.org/netdev/20211125074949.5f897431@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/t/
      
      Fixes: bd869245 ("net: core: try to runtime-resume detached device in __dev_open")
      Fixes: f32a2137 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops")
      Tested-by: default avatarMartin Stolpe <martin.stolpe@gmail.com>
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20211220201844.2714498-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ac8c58f5
    • Jeroen de Borst's avatar
      gve: Correct order of processing device options · 1f06f7d9
      Jeroen de Borst authored
      The legacy raw addressing device option was processed before the
      new RDA queue format option.  This caused the supported features mask,
      which is provided only on the RDA queue format option, not to be set.
      
      This disabled jumbo-frame support when using raw adressing.
      
      Fixes: 255489f5 ("gve: Add a jumbo-frame device option")
      Signed-off-by: default avatarJeroen de Borst <jeroendb@google.com>
      Link: https://lore.kernel.org/r/20211220192746.2900594-1-jeroendb@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1f06f7d9
    • Willem de Bruijn's avatar
      net: skip virtio_net_hdr_set_proto if protocol already set · 1ed1d592
      Willem de Bruijn authored
      virtio_net_hdr_set_proto infers skb->protocol from the virtio_net_hdr
      gso_type, to avoid packets getting dropped for lack of a proto type.
      
      Its protocol choice is a guess, especially in the case of UFO, where
      the single VIRTIO_NET_HDR_GSO_UDP label covers both UFOv4 and UFOv6.
      
      Skip this best effort if the field is already initialized. Whether
      explicitly from userspace, or implicitly based on an earlier call to
      dev_parse_header_protocol (which is more robust, but was introduced
      after this patch).
      
      Fixes: 9d2f67e4 ("net/packet: fix packet drop as of virtio gso")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211220145027.2784293-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1ed1d592
    • Willem de Bruijn's avatar
      net: accept UFOv6 packages in virtio_net_hdr_to_skb · 7e5cced9
      Willem de Bruijn authored
      Skb with skb->protocol 0 at the time of virtio_net_hdr_to_skb may have
      a protocol inferred from virtio_net_hdr with virtio_net_hdr_set_proto.
      
      Unlike TCP, UDP does not have separate types for IPv4 and IPv6. Type
      VIRTIO_NET_HDR_GSO_UDP is guessed to be IPv4/UDP. As of the below
      commit, UFOv6 packets are dropped due to not matching the protocol as
      obtained from dev_parse_header_protocol.
      
      Invert the test to take that L2 protocol field as starting point and
      pass both UFOv4 and UFOv6 for VIRTIO_NET_HDR_GSO_UDP.
      
      Fixes: 924a9bc3 ("net: check if protocol extracted by virtio_net_hdr_set_proto is correct")
      Link: https://lore.kernel.org/netdev/CABcq3pG9GRCYqFDBAJ48H1vpnnX=41u+MhQnayF1ztLH4WX0Fw@mail.gmail.com/Reported-by: default avatarAndrew Melnichenko <andrew@daynix.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211220144901.2784030-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7e5cced9
    • Willem de Bruijn's avatar
      docs: networking: replace skb_hwtstamp_tx with skb_tstamp_tx · a9725e1d
      Willem de Bruijn authored
      Tiny doc fix. The hardware transmit function was called skb_tstamp_tx
      from its introduction in commit ac45f602 ("net: infrastructure for
      hardware time stamping") in the same series as this documentation.
      
      Fixes: cb9eff09 ("net: new user space API for time stamping of incoming and outgoing packets")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20211220144608.2783526-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a9725e1d
    • Eric Dumazet's avatar
      inet: fully convert sk->sk_rx_dst to RCU rules · 8f905c0e
      Eric Dumazet authored
      syzbot reported various issues around early demux,
      one being included in this changelog [1]
      
      sk->sk_rx_dst is using RCU protection without clearly
      documenting it.
      
      And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
      are not following standard RCU rules.
      
      [a]    dst_release(dst);
      [b]    sk->sk_rx_dst = NULL;
      
      They look wrong because a delete operation of RCU protected
      pointer is supposed to clear the pointer before
      the call_rcu()/synchronize_rcu() guarding actual memory freeing.
      
      In some cases indeed, dst could be freed before [b] is done.
      
      We could cheat by clearing sk_rx_dst before calling
      dst_release(), but this seems the right time to stick
      to standard RCU annotations and debugging facilities.
      
      [1]
      BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
      BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
      Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
      
      CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
       dst_check include/net/dst.h:470 [inline]
       tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
       ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
      RIP: 0033:0x7f5e972bfd57
      Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
      RSP: 002b:00007fff8a413210 EFLAGS: 00000283
      RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
      RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
      RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
      R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
      R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
       </TASK>
      
      Allocated by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
       ip_route_input_rcu net/ipv4/route.c:2470 [inline]
       ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
       ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Freed by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:235 [inline]
       slab_free_hook mm/slub.c:1723 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
       slab_free mm/slub.c:3513 [inline]
       kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
       dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
       rcu_do_batch kernel/rcu/tree.c:2506 [inline]
       rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:2985 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
       dst_release net/core/dst.c:177 [inline]
       dst_release+0x79/0xe0 net/core/dst.c:167
       tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2768
       release_sock+0x54/0x1b0 net/core/sock.c:3300
       tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
       inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x289/0x3c0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write+0x429/0x660 fs/read_write.c:503
       vfs_write+0x7cd/0xae0 fs/read_write.c:590
       ksys_write+0x1ee/0x250 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88807f1cb700
       which belongs to the cache ip_dst_cache of size 176
      The buggy address is located 58 bytes inside of
       176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
      The buggy address belongs to the page:
      page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
       prep_new_page mm/page_alloc.c:2418 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
       alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
       alloc_slab_page mm/slub.c:1793 [inline]
       allocate_slab mm/slub.c:1930 [inline]
       new_slab+0x32d/0x4a0 mm/slub.c:1993
       ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
       slab_alloc_node mm/slub.c:3200 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       __mkroute_output net/ipv4/route.c:2564 [inline]
       ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
       ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
       __ip_route_output_key include/net/route.h:126 [inline]
       ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
       ip_route_output_key include/net/route.h:142 [inline]
       geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
       geneve_xmit_skb drivers/net/geneve.c:899 [inline]
       geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
       __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
       netdev_start_xmit include/linux/netdevice.h:5008 [inline]
       xmit_one net/core/dev.c:3590 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
       __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1338 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
       free_unref_page_prepare mm/page_alloc.c:3309 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3388
       qlink_free mm/kasan/quarantine.c:146 [inline]
       qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
       __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
       __alloc_skb+0x215/0x340 net/core/skbuff.c:414
       alloc_skb include/linux/skbuff.h:1126 [inline]
       alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
       sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
       mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
       add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
       add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
       mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
       mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
       mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
      
      Memory state around the buggy address:
       ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
       ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
       ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f905c0e
  4. 20 Dec, 2021 3 commits
  5. 18 Dec, 2021 13 commits
    • Jiasheng Jiang's avatar
      qlcnic: potential dereference null pointer of rx_queue->page_ring · 60ec7fcf
      Jiasheng Jiang authored
      The return value of kcalloc() needs to be checked.
      To avoid dereference of null pointer in case of the failure of alloc.
      Therefore, it might be better to change the return type of
      qlcnic_sriov_alloc_vlans() and return -ENOMEM when alloc fails and
      return 0 the others.
      Also, qlcnic_sriov_set_guest_vlan_mode() and __qlcnic_pci_sriov_enable()
      should deal with the return value of qlcnic_sriov_alloc_vlans().
      
      Fixes: 154d0c81 ("qlcnic: VLAN enhancement for 84XX adapters")
      Signed-off-by: default avatarJiasheng Jiang <jiasheng@iscas.ac.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60ec7fcf
    • Lin Ma's avatar
      ax25: NPD bug when detaching AX25 device · 1ade48d0
      Lin Ma authored
      The existing cleanup routine implementation is not well synchronized
      with the syscall routine. When a device is detaching, below race could
      occur.
      
      static int ax25_sendmsg(...) {
        ...
        lock_sock()
        ax25 = sk_to_ax25(sk);
        if (ax25->ax25_dev == NULL) // CHECK
        ...
        ax25_queue_xmit(skb, ax25->ax25_dev->dev); // USE
        ...
      }
      
      static void ax25_kill_by_device(...) {
        ...
        if (s->ax25_dev == ax25_dev) {
          s->ax25_dev = NULL;
          ...
      }
      
      Other syscall functions like ax25_getsockopt, ax25_getname,
      ax25_info_show also suffer from similar races. To fix them, this patch
      introduce lock_sock() into ax25_kill_by_device in order to guarantee
      that the nullify action in cleanup routine cannot proceed when another
      socket request is pending.
      Signed-off-by: default avatarHanjie Wu <nagi@zju.edu.cn>
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ade48d0
    • Lin Ma's avatar
      hamradio: improve the incomplete fix to avoid NPD · b2f37aea
      Lin Ma authored
      The previous commit 3e0588c2 ("hamradio: defer ax25 kfree after
      unregister_netdev") reorder the kfree operations and unregister_netdev
      operation to prevent UAF.
      
      This commit improves the previous one by also deferring the nullify of
      the ax->tty pointer. Otherwise, a NULL pointer dereference bug occurs.
      Partial of the stack trace is shown below.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000538
      RIP: 0010:ax_xmit+0x1f9/0x400
      ...
      Call Trace:
       dev_hard_start_xmit+0xec/0x320
       sch_direct_xmit+0xea/0x240
       __qdisc_run+0x166/0x5c0
       __dev_queue_xmit+0x2c7/0xaf0
       ax25_std_establish_data_link+0x59/0x60
       ax25_connect+0x3a0/0x500
       ? security_socket_connect+0x2b/0x40
       __sys_connect+0x96/0xc0
       ? __hrtimer_init+0xc0/0xc0
       ? common_nsleep+0x2e/0x50
       ? switch_fpu_return+0x139/0x1a0
       __x64_sys_connect+0x11/0x20
       do_syscall_64+0x33/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The crash point is shown as below
      
      static void ax_encaps(...) {
        ...
        set_bit(TTY_DO_WRITE_WAKEUP, &ax->tty->flags); // ax->tty = NULL!
        ...
      }
      
      By placing the nullify action after the unregister_netdev, the ax->tty
      pointer won't be assigned as NULL net_device framework layer is well
      synchronized.
      Signed-off-by: default avatarLin Ma <linma@zju.edu.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b2f37aea
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · aa3cc8a9
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-12-17
      
      Maciej Fijalkowski says:
      
      It seems that previous [0] Rx fix was not enough and there are still
      issues with AF_XDP Rx ZC support in ice driver. Elza reported that for
      multiple XSK sockets configured on a single netdev, some of them were
      becoming dead after a while. We have spotted more things that needed to
      be addressed this time. More of information can be found in particular
      commit messages.
      
      It also carries Alexandr's patch that was sent previously which was
      overlapping with this set.
      
      [0]: https://lore.kernel.org/bpf/20211129231746.2767739-1-anthony.l.nguyen@intel.com/
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa3cc8a9
    • George Kennedy's avatar
      tun: avoid double free in tun_free_netdev · 158b515f
      George Kennedy authored
      Avoid double free in tun_free_netdev() by moving the
      dev->tstats and tun->security allocs to a new ndo_init routine
      (tun_net_init()) that will be called by register_netdevice().
      ndo_init is paired with the desctructor (tun_free_netdev()),
      so if there's an error in register_netdevice() the destructor
      will handle the frees.
      
      BUG: KASAN: double-free or invalid-free in selinux_tun_dev_free_security+0x1a/0x20 security/selinux/hooks.c:5605
      
      CPU: 0 PID: 25750 Comm: syz-executor416 Not tainted 5.16.0-rc2-syzk #1
      Hardware name: Red Hat KVM, BIOS
      Call Trace:
      <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x89/0xb5 lib/dump_stack.c:106
      print_address_description.constprop.9+0x28/0x160 mm/kasan/report.c:247
      kasan_report_invalid_free+0x55/0x80 mm/kasan/report.c:372
      ____kasan_slab_free mm/kasan/common.c:346 [inline]
      __kasan_slab_free+0x107/0x120 mm/kasan/common.c:374
      kasan_slab_free include/linux/kasan.h:235 [inline]
      slab_free_hook mm/slub.c:1723 [inline]
      slab_free_freelist_hook mm/slub.c:1749 [inline]
      slab_free mm/slub.c:3513 [inline]
      kfree+0xac/0x2d0 mm/slub.c:4561
      selinux_tun_dev_free_security+0x1a/0x20 security/selinux/hooks.c:5605
      security_tun_dev_free_security+0x4f/0x90 security/security.c:2342
      tun_free_netdev+0xe6/0x150 drivers/net/tun.c:2215
      netdev_run_todo+0x4df/0x840 net/core/dev.c:10627
      rtnl_unlock+0x13/0x20 net/core/rtnetlink.c:112
      __tun_chr_ioctl+0x80c/0x2870 drivers/net/tun.c:3302
      tun_chr_ioctl+0x2f/0x40 drivers/net/tun.c:3311
      vfs_ioctl fs/ioctl.c:51 [inline]
      __do_sys_ioctl fs/ioctl.c:874 [inline]
      __se_sys_ioctl fs/ioctl.c:860 [inline]
      __x64_sys_ioctl+0x19d/0x220 fs/ioctl.c:860
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x3a/0x80 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x44/0xae
      Reported-by: default avatarsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: default avatarGeorge Kennedy <george.kennedy@oracle.com>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/1639679132-19884-1-git-send-email-george.kennedy@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      158b515f
    • Yevhen Orlov's avatar
      net: marvell: prestera: fix incorrect structure access · 2efc2256
      Yevhen Orlov authored
      In line:
      	upper = info->upper_dev;
      We access upper_dev field, which is related only for particular events
      (e.g. event == NETDEV_CHANGEUPPER). So, this line cause invalid memory
      access for another events,
      when ptr is not netdev_notifier_changeupper_info.
      
      The KASAN logs are as follows:
      
      [   30.123165] BUG: KASAN: stack-out-of-bounds in prestera_netdev_port_event.constprop.0+0x68/0x538 [prestera]
      [   30.133336] Read of size 8 at addr ffff80000cf772b0 by task udevd/778
      [   30.139866]
      [   30.141398] CPU: 0 PID: 778 Comm: udevd Not tainted 5.16.0-rc3 #6
      [   30.147588] Hardware name: DNI AmazonGo1 A7040 board (DT)
      [   30.153056] Call trace:
      [   30.155547]  dump_backtrace+0x0/0x2c0
      [   30.159320]  show_stack+0x18/0x30
      [   30.162729]  dump_stack_lvl+0x68/0x84
      [   30.166491]  print_address_description.constprop.0+0x74/0x2b8
      [   30.172346]  kasan_report+0x1e8/0x250
      [   30.176102]  __asan_load8+0x98/0xe0
      [   30.179682]  prestera_netdev_port_event.constprop.0+0x68/0x538 [prestera]
      [   30.186847]  prestera_netdev_event_handler+0x1b4/0x1c0 [prestera]
      [   30.193313]  raw_notifier_call_chain+0x74/0xa0
      [   30.197860]  call_netdevice_notifiers_info+0x68/0xc0
      [   30.202924]  register_netdevice+0x3cc/0x760
      [   30.207190]  register_netdev+0x24/0x50
      [   30.211015]  prestera_device_register+0x8a0/0xba0 [prestera]
      
      Fixes: 3d5048cc ("net: marvell: prestera: move netdev topology validation to prestera_main")
      Signed-off-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20211216171714.11341-1-yevhen.orlov@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2efc2256
    • Yevhen Orlov's avatar
      net: marvell: prestera: fix incorrect return of port_find · 8b681bd7
      Yevhen Orlov authored
      In case, when some ports is in list and we don't find requested - we
      return last iterator state and not return NULL as expected.
      
      Fixes: 501ef306 ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
      Signed-off-by: default avatarYevhen Orlov <yevhen.orlov@plvision.eu>
      Link: https://lore.kernel.org/r/20211216170736.8851-1-yevhen.orlov@plvision.euSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8b681bd7
    • Hoang Le's avatar
      Revert "tipc: use consistent GFP flags" · f845fe58
      Hoang Le authored
      This reverts commit 86c3a3e9.
      
      The tipc_aead_init() function can be calling from an interrupt routine.
      This allocation might sleep with GFP_KERNEL flag, hence the following BUG
      is reported.
      
      [   17.657509] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:230
      [   17.660916] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/3
      [   17.664093] preempt_count: 302, expected: 0
      [   17.665619] RCU nest depth: 2, expected: 0
      [   17.667163] Preemption disabled at:
      [   17.667165] [<0000000000000000>] 0x0
      [   17.669753] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Tainted: G        W         5.16.0-rc4+ #1
      [   17.673006] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
      [   17.675540] Call Trace:
      [   17.676285]  <IRQ>
      [   17.676913]  dump_stack_lvl+0x34/0x44
      [   17.678033]  __might_resched.cold+0xd6/0x10f
      [   17.679311]  kmem_cache_alloc_trace+0x14d/0x220
      [   17.680663]  tipc_crypto_start+0x4a/0x2b0 [tipc]
      [   17.682146]  ? kmem_cache_alloc_trace+0xd3/0x220
      [   17.683545]  tipc_node_create+0x2f0/0x790 [tipc]
      [   17.684956]  tipc_node_check_dest+0x72/0x680 [tipc]
      [   17.686706]  ? ___cache_free+0x31/0x350
      [   17.688008]  ? skb_release_data+0x128/0x140
      [   17.689431]  tipc_disc_rcv+0x479/0x510 [tipc]
      [   17.690904]  tipc_rcv+0x71c/0x730 [tipc]
      [   17.692219]  ? __netif_receive_skb_core+0xb7/0xf60
      [   17.693856]  tipc_l2_rcv_msg+0x5e/0x90 [tipc]
      [   17.695333]  __netif_receive_skb_list_core+0x20b/0x260
      [   17.697072]  netif_receive_skb_list_internal+0x1bf/0x2e0
      [   17.698870]  ? dev_gro_receive+0x4c2/0x680
      [   17.700255]  napi_complete_done+0x6f/0x180
      [   17.701657]  virtnet_poll+0x29c/0x42e [virtio_net]
      [   17.703262]  __napi_poll+0x2c/0x170
      [   17.704429]  net_rx_action+0x22f/0x280
      [   17.705706]  __do_softirq+0xfd/0x30a
      [   17.706921]  common_interrupt+0xa4/0xc0
      [   17.708206]  </IRQ>
      [   17.708922]  <TASK>
      [   17.709651]  asm_common_interrupt+0x1e/0x40
      [   17.711078] RIP: 0010:default_idle+0x18/0x20
      
      Fixes: 86c3a3e9 ("tipc: use consistent GFP flags")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Link: https://lore.kernel.org/r/20211217030059.5947-1-hoang.h.le@dektech.com.auSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f845fe58
    • Aleksander Jan Bajkowski's avatar
      net: lantiq_xrx200: increase buffer reservation · 1488fc20
      Aleksander Jan Bajkowski authored
      If the user sets a lower mtu on the CPU port than on the switch,
      then DMA inserts a few more bytes into the buffer than expected.
      In the worst case, it may exceed the size of the buffer. The
      experiments showed that the buffer should be a multiple of the
      burst length value. This patch rounds the length of the rx buffer
      upwards and fixes this bug. The reservation of FCS space in the
      buffer has been removed as PMAC strips the FCS.
      
      Fixes: 998ac358 ("net: lantiq: add support for jumbo frames")
      Reported-by: default avatarThomas Nixon <tom@tomn.co.uk>
      Signed-off-by: default avatarAleksander Jan Bajkowski <olek2@wp.pl>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1488fc20
    • Jakub Kicinski's avatar
      Merge branch 'net-sched-fix-ct-zone-matching-for-invalid-conntrack-state' · 14193d57
      Jakub Kicinski authored
      Paul Blakey says:
      
      ====================
      net/sched: Fix ct zone matching for invalid conntrack state
      
      Currently, when a packet is marked as invalid conntrack_in in act_ct,
      post_ct will be set, and connection info (nf_conn) will be removed
      from the skb. Later openvswitch and flower matching will parse this
      as ct_state=+trk+inv. But because the connection info is missing,
      there is also no zone info to match against even though the packet
      is tracked.
      
      This series fixes that, by passing the last executed zone by act_ct.
      The zone info is passed along from act_ct to the ct flow dissector
      (used by flower to extract zone info) and to ovs, the same way as post_ct
      is passed, via qdisc layer skb cb to dissector, and via skb extension
      to OVS.
      
      Since adding any more data to qdisc skb cb, there will be no room
      for BPF skb cb to extend it and stay under skb->cb size, this series
      moves the tc related info from within qdisc skb cb to a tc specific cb
      that also extends it.
      ====================
      
      Link: https://lore.kernel.org/r/20211214172435.24207-1-paulb@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      14193d57
    • Paul Blakey's avatar
      net: openvswitch: Fix matching zone id for invalid conns arriving from tc · 635d448a
      Paul Blakey authored
      Zone id is not restored if we passed ct and ct rejected the connection,
      as there is no ct info on the skb.
      
      Save the zone from tc skb cb to tc skb extension and pass it on to
      ovs, use that info to restore the zone id for invalid connections.
      
      Fixes: d29334c1 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      635d448a
    • Paul Blakey's avatar
      net/sched: flow_dissector: Fix matching on zone id for invalid conns · 38495958
      Paul Blakey authored
      If ct rejects a flow, it removes the conntrack info from the skb.
      act_ct sets the post_ct variable so the dissector will see this case
      as an +tracked +invalid state, but the zone id is lost with the
      conntrack info.
      
      To restore the zone id on such cases, set the last executed zone,
      via the tc control block, when passing ct, and read it back in the
      dissector if there is no ct info on the skb (invalid connection).
      
      Fixes: 7baf2429 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support")
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      38495958
    • Paul Blakey's avatar
      net/sched: Extend qdisc control block with tc control block · ec624fe7
      Paul Blakey authored
      BPF layer extends the qdisc control block via struct bpf_skb_data_end
      and because of that there is no more room to add variables to the
      qdisc layer control block without going over the skb->cb size.
      
      Extend the qdisc control block with a tc control block,
      and move all tc related variables to there as a pre-step for
      extending the tc control block with additional members.
      Signed-off-by: default avatarPaul Blakey <paulb@nvidia.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec624fe7
  6. 17 Dec, 2021 5 commits