1. 02 Sep, 2022 7 commits
    • Michal Swiatkowski's avatar
      ice: use bitmap_free instead of devm_kfree · 59ac3255
      Michal Swiatkowski authored
      pf->avail_txqs was allocated using bitmap_zalloc, bitmap_free should be
      used to free this memory.
      
      Fixes: 78b5713a ("ice: Alloc queue management bitmaps and arrays dynamically")
      Signed-off-by: default avatarMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      59ac3255
    • Przemyslaw Patynowski's avatar
      ice: Fix DMA mappings leak · 7e753eb6
      Przemyslaw Patynowski authored
      Fix leak, when user changes ring parameters.
      During reallocation of RX buffers, new DMA mappings are created for
      those buffers. New buffers with different RX ring count should
      substitute older ones, but those buffers were freed in ice_vsi_cfg_rxq
      and reallocated again with ice_alloc_rx_buf. kfree on rx_buf caused
      leak of already mapped DMA.
      Reallocate ZC with xdp_buf struct, when BPF program loads. Reallocate
      back to rx_buf, when BPF program unloads.
      If BPF program is loaded/unloaded and XSK pools are created, reallocate
      RX queues accordingly in XDP_SETUP_XSK_POOL handler.
      
      Steps for reproduction:
      while :
      do
      	for ((i=0; i<=8160; i=i+32))
      	do
      		ethtool -G enp130s0f0 rx $i tx $i
      		sleep 0.5
      		ethtool -g enp130s0f0
      	done
      done
      
      Fixes: 617f3e1b ("ice: xsk: allocate separate memory for XDP SW ring")
      Signed-off-by: default avatarPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
      Signed-off-by: default avatarMateusz Palczewski <mateusz.palczewski@intel.com>
      Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      7e753eb6
    • David S. Miller's avatar
      Merge tag 'rxrpc-fixes-20220901' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · e7506d34
      David S. Miller authored
      David Howells says:
      
      ====================
      rxrpc fixes
      Here are some fixes for AF_RXRPC:
      
       (1) Fix the handling of ICMP/ICMP6 packets.  This is a problem due to
           rxrpc being switched to acting as a UDP tunnel, thereby allowing it to
           steal the packets before they go through the UDP Rx queue.  UDP
           tunnels can't get ICMP/ICMP6 packets, however.  This patch adds an
           additional encap hook so that they can.
      
       (2) Fix the encryption routines in rxkad to handle packets that have more
           than three parts correctly.  The problem is that ->nr_frags doesn't
           count the initial fragment, so the sglist ends up too short.
      
       (3) Fix a problem with destruction of the local endpoint potentially
           getting repeated.
      
       (4) Fix the calculation of the time at which to resend.
           jiffies_to_usecs() gives microseconds, not nanoseconds.
      
       (5) Fix AFS to work out when callback promises and locks expire based on
           the time an op was issued rather than the time the first reply packet
           arrives.  We don't know how long the server took between calculating
           the expiry interval and transmitting the reply.
      
       (6) Given (5), rxrpc_get_reply_time() is no longer used, so remove it.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7506d34
    • Eric Dumazet's avatar
      tcp: TX zerocopy should not sense pfmemalloc status · 32614006
      Eric Dumazet authored
      We got a recent syzbot report [1] showing a possible misuse
      of pfmemalloc page status in TCP zerocopy paths.
      
      Indeed, for pages coming from user space or other layers,
      using page_is_pfmemalloc() is moot, and possibly could give
      false positives.
      
      There has been attempts to make page_is_pfmemalloc() more robust,
      but not using it in the first place in this context is probably better,
      removing cpu cycles.
      
      Note to stable teams :
      
      You need to backport 84ce071e ("net: introduce
      __skb_fill_page_desc_noacc") as a prereq.
      
      Race is more probable after commit c07aea3e
      ("mm: add a signature in struct page") because page_is_pfmemalloc()
      is now using low order bit from page->lru.next, which can change
      more often than page->index.
      
      Low order bit should never be set for lru.next (when used as an anchor
      in LRU list), so KCSAN report is mostly a false positive.
      
      Backporting to older kernel versions seems not necessary.
      
      [1]
      BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag
      
      write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0:
      __list_add include/linux/list.h:73 [inline]
      list_add include/linux/list.h:88 [inline]
      lruvec_add_folio include/linux/mm_inline.h:105 [inline]
      lru_add_fn+0x440/0x520 mm/swap.c:228
      folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246
      folio_batch_add_and_move mm/swap.c:263 [inline]
      folio_add_lru+0xf1/0x140 mm/swap.c:490
      filemap_add_folio+0xf8/0x150 mm/filemap.c:948
      __filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981
      pagecache_get_page+0x26/0x190 mm/folio-compat.c:104
      grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116
      ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988
      generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738
      ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270
      ext4_file_write_iter+0x2e3/0x1210
      call_write_iter include/linux/fs.h:2187 [inline]
      new_sync_write fs/read_write.c:491 [inline]
      vfs_write+0x468/0x760 fs/read_write.c:578
      ksys_write+0xe8/0x1a0 fs/read_write.c:631
      __do_sys_write fs/read_write.c:643 [inline]
      __se_sys_write fs/read_write.c:640 [inline]
      __x64_sys_write+0x3e/0x50 fs/read_write.c:640
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1:
      page_is_pfmemalloc include/linux/mm.h:1740 [inline]
      __skb_fill_page_desc include/linux/skbuff.h:2422 [inline]
      skb_fill_page_desc include/linux/skbuff.h:2443 [inline]
      tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018
      do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075
      tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline]
      tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150
      inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833
      kernel_sendpage+0x184/0x300 net/socket.c:3561
      sock_sendpage+0x5a/0x70 net/socket.c:1054
      pipe_to_sendpage+0x128/0x160 fs/splice.c:361
      splice_from_pipe_feed fs/splice.c:415 [inline]
      __splice_from_pipe+0x222/0x4d0 fs/splice.c:559
      splice_from_pipe fs/splice.c:594 [inline]
      generic_splice_sendpage+0x89/0xc0 fs/splice.c:743
      do_splice_from fs/splice.c:764 [inline]
      direct_splice_actor+0x80/0xa0 fs/splice.c:931
      splice_direct_to_actor+0x305/0x620 fs/splice.c:886
      do_splice_direct+0xfb/0x180 fs/splice.c:974
      do_sendfile+0x3bf/0x910 fs/read_write.c:1249
      __do_sys_sendfile64 fs/read_write.c:1317 [inline]
      __se_sys_sendfile64 fs/read_write.c:1303 [inline]
      __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      value changed: 0x0000000000000000 -> 0xffffea0004a1d288
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022
      
      Fixes: c07aea3e ("mm: add a signature in struct page")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32614006
    • Dan Carpenter's avatar
      tipc: fix shift wrapping bug in map_get() · e2b224ab
      Dan Carpenter authored
      There is a shift wrapping bug in this code so anything thing above
      31 will return false.
      
      Fixes: 35c55c98 ("tipc: add neighbor monitoring framework")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e2b224ab
    • Toke Høiland-Jørgensen's avatar
      sch_sfb: Don't assume the skb is still around after enqueueing to child · 9efd2329
      Toke Høiland-Jørgensen authored
      The sch_sfb enqueue() routine assumes the skb is still alive after it has
      been enqueued into a child qdisc, using the data in the skb cb field in the
      increment_qlen() routine after enqueue. However, the skb may in fact have
      been freed, causing a use-after-free in this case. In particular, this
      happens if sch_cake is used as a child of sfb, and the GSO splitting mode
      of CAKE is enabled (in which case the skb will be split into segments and
      the original skb freed).
      
      Fix this by copying the sfb cb data to the stack before enqueueing the skb,
      and using this stack copy in increment_qlen() instead of the skb pointer
      itself.
      
      Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18231
      Fixes: e13e02a3 ("net_sched: SFB flow scheduler")
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9efd2329
    • Heiner Kallweit's avatar
      Revert "net: phy: meson-gxl: improve link-up behavior" · 7fdc7766
      Heiner Kallweit authored
      This reverts commit 2c87c6f9.
      Meanwhile it turned out that the following commit is the proper
      workaround for the issue that 2c87c6f9 tries to address.
      a3a57bf0 ("net: stmmac: work around sporadic tx issue on link-up")
      It's nor clear why the to be reverted commit helped for one user,
      for others it didn't make a difference.
      
      Fixes: 2c87c6f9 ("net: phy: meson-gxl: improve link-up behavior")
      Signed-off-by: default avatarHeiner Kallweit <hkallweit1@gmail.com>
      Link: https://lore.kernel.org/r/8deeeddc-6b71-129b-1918-495a12dc11e3@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7fdc7766
  2. 01 Sep, 2022 17 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 42e66b1c
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from bluetooth, bpf and wireless.
      
        Current release - regressions:
      
         - bpf:
            - fix wrong last sg check in sk_msg_recvmsg()
            - fix kernel BUG in purge_effective_progs()
      
         - mac80211:
            - fix possible leak in ieee80211_tx_control_port()
            - potential NULL dereference in ieee80211_tx_control_port()
      
        Current release - new code bugs:
      
         - nfp: fix the access to management firmware hanging
      
        Previous releases - regressions:
      
         - ip: fix triggering of 'icmp redirect'
      
         - sched: tbf: don't call qdisc_put() while holding tree lock
      
         - bpf: fix corrupted packets for XDP_SHARED_UMEM
      
         - bluetooth: hci_sync: fix suspend performance regression
      
         - micrel: fix probe failure
      
        Previous releases - always broken:
      
         - tcp: make global challenge ack rate limitation per net-ns and
           default disabled
      
         - tg3: fix potential hang-up on system reboot
      
         - mac802154: fix reception for no-daddr packets
      
        Misc:
      
         - r8152: add PID for the lenovo onelink+ dock"
      
      * tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
        net/smc: Remove redundant refcount increase
        Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"
        tcp: make global challenge ack rate limitation per net-ns and default disabled
        tcp: annotate data-race around challenge_timestamp
        net: dsa: hellcreek: Print warning only once
        ip: fix triggering of 'icmp redirect'
        sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb
        selftests: net: sort .gitignore file
        Documentation: networking: correct possessive "its"
        kcm: fix strp_init() order and cleanup
        mlxbf_gige: compute MDIO period based on i1clk
        ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler
        net: lan966x: improve error handle in lan966x_fdma_rx_get_frame()
        nfp: fix the access to management firmware hanging
        net: phy: micrel: Make the GPIO to be non-exclusive
        net: virtio_net: fix notification coalescing comments
        net/sched: fix netdevice reference leaks in attach_default_qdiscs()
        net: sched: tbf: don't call qdisc_put() while holding tree lock
        net: Use u64_stats_fetch_begin_irq() for stats fetch.
        net: dsa: xrs700x: Use irqsave variant for u64 stats update
        ...
      42e66b1c
    • Linus Torvalds's avatar
      Merge tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab · d330076e
      Linus Torvalds authored
      Pull slab fix from Vlastimil Babka:
      
       - A fix from Waiman Long to avoid a theoretical deadlock reported by
         lockdep.
      
      * tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
        mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock
      d330076e
    • Linus Torvalds's avatar
      Merge tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 2880e1a1
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "Just handful changes at this time. The only major change is the
        regression fix about the x86 WC-page buffer allocation.
      
        The rest are trivial data-race fixes for ALSA sequencer core, the
        possible out-of-bounds access fixes in the new ALSA control hash code,
        and a few device-specific workarounds and fixes"
      
      * tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Add quirk for LH Labs Geek Out HD Audio 1V5
        ALSA: hda/realtek: Add speaker AMP init for Samsung laptops with ALC298
        ALSA: control: Re-order bounds checking in get_ctl_id_hash()
        ALSA: control: Fix an out-of-bounds bug in get_ctl_id_hash()
        ALSA: hda: intel-nhlt: Correct the handling of fmt_config flexible array
        ALSA: seq: Fix data-race at module auto-loading
        ALSA: seq: oss: Fix data-race for max_midi_devs access
        ALSA: memalloc: Revive x86-specific WC page allocations again
      2880e1a1
    • David Howells's avatar
      rxrpc: Remove rxrpc_get_reply_time() which is no longer used · 21457f4a
      David Howells authored
      Remove rxrpc_get_reply_time() as that is no longer used now that the call
      issue time is used instead of the reply time.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      21457f4a
    • David Howells's avatar
      afs: Use the operation issue time instead of the reply time for callbacks · 7903192c
      David Howells authored
      rxrpc and kafs between them try to use the receive timestamp on the first
      data packet (ie. the one with sequence number 1) as a base from which to
      calculate the time at which callback promise and lock expiration occurs.
      
      However, we don't know how long it took for the server to send us the reply
      from it having completed the basic part of the operation - it might then,
      for instance, have to send a bunch of a callback breaks, depending on the
      particular operation.
      
      Fix this by using the time at which the operation is issued on the client
      as a base instead.  That should never be longer than the server's idea of
      the expiry time.
      
      Fixes: 78107055 ("afs: Fix calculation of callback expiry time")
      Fixes: 2070a3e4 ("rxrpc: Allow the reply time to be obtained on a client call")
      Suggested-by: default avatarJeffrey E Altman <jaltman@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      7903192c
    • David Howells's avatar
      rxrpc: Fix calc of resend age · 214a9dc7
      David Howells authored
      Fix the calculation of the resend age to add a microsecond value as
      microseconds, not nanoseconds.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      214a9dc7
    • David Howells's avatar
      rxrpc: Fix local destruction being repeated · d3d86303
      David Howells authored
      If the local processor work item for the rxrpc local endpoint gets requeued
      by an event (such as an incoming packet) between it getting scheduled for
      destruction and the UDP socket being closed, the rxrpc_local_destroyer()
      function can get run twice.  The second time it can hang because it can end
      up waiting for cleanup events that will never happen.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      d3d86303
    • David Howells's avatar
      rxrpc: Fix an insufficiently large sglist in rxkad_verify_packet_2() · 0d40f728
      David Howells authored
      rxkad_verify_packet_2() has a small stack-allocated sglist of 4 elements,
      but if that isn't sufficient for the number of fragments in the socket
      buffer, we try to allocate an sglist large enough to hold all the
      fragments.
      
      However, for large packets with a lot of fragments, this isn't sufficient
      and we need at least one additional fragment.
      
      The problem manifests as skb_to_sgvec() returning -EMSGSIZE and this then
      getting returned by userspace.  Most of the time, this isn't a problem as
      rxrpc sets a limit of 5692, big enough for 4 jumbo subpackets to be glued
      together; occasionally, however, the server will ignore the reported limit
      and give a packet that's a lot bigger - say 19852 bytes with ->nr_frags
      being 7.  skb_to_sgvec() then tries to return a "zeroth" fragment that
      seems to occur before the fragments counted by ->nr_frags and we hit the
      end of the sglist too early.
      
      Note that __skb_to_sgvec() also has an skb_walk_frags() loop that is
      recursive up to 24 deep.  I'm not sure if I need to take account of that
      too - or if there's an easy way of counting those frags too.
      
      Fix this by counting an extra frag and allocating a larger sglist based on
      that.
      
      Fixes: d0d5c0cd ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
      Reported-by: default avatarMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: linux-afs@lists.infradead.org
      0d40f728
    • David Howells's avatar
      rxrpc: Fix ICMP/ICMP6 error handling · ac56a0b4
      David Howells authored
      Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing
      it to siphon off UDP packets early in the handling of received UDP packets
      thereby avoiding the packet going through the UDP receive queue, it doesn't
      get ICMP packets through the UDP ->sk_error_report() callback.  In fact, it
      doesn't appear that there's any usable option for getting hold of ICMP
      packets.
      
      Fix this by adding a new UDP encap hook to distribute error messages for
      UDP tunnels.  If the hook is set, then the tunnel driver will be able to
      see ICMP packets.  The hook provides the offset into the packet of the UDP
      header of the original packet that caused the notification.
      
      An alternative would be to call the ->error_handler() hook - but that
      requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error()
      do, though isn't really necessary or desirable in rxrpc's case is we want
      to parse them there and then, not queue them).
      
      Changes
      =======
      ver #3)
       - Fixed an uninitialised variable.
      
      ver #2)
       - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals.
      
      Fixes: 5271953c ("rxrpc: Use the UDP encap_rcv hook")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      ac56a0b4
    • Waiman Long's avatar
      mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding... · 0495e337
      Waiman Long authored
      mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock
      
      A circular locking problem is reported by lockdep due to the following
      circular locking dependency.
      
        +--> cpu_hotplug_lock --> slab_mutex --> kn->active --+
        |                                                     |
        +-----------------------------------------------------+
      
      The forward cpu_hotplug_lock ==> slab_mutex ==> kn->active dependency
      happens in
      
        kmem_cache_destroy():	cpus_read_lock(); mutex_lock(&slab_mutex);
        ==> sysfs_slab_unlink()
            ==> kobject_del()
                ==> kernfs_remove()
      	      ==> __kernfs_remove()
      	          ==> kernfs_drain(): rwsem_acquire(&kn->dep_map, ...);
      
      The backward kn->active ==> cpu_hotplug_lock dependency happens in
      
        kernfs_fop_write_iter(): kernfs_get_active();
        ==> slab_attr_store()
            ==> cpu_partial_store()
                ==> flush_all(): cpus_read_lock()
      
      One way to break this circular locking chain is to avoid holding
      cpu_hotplug_lock and slab_mutex while deleting the kobject in
      sysfs_slab_unlink() which should be equivalent to doing a write_lock
      and write_unlock pair of the kn->active virtual lock.
      
      Since the kobject structures are not protected by slab_mutex or the
      cpu_hotplug_lock, we can certainly release those locks before doing
      the delete operation.
      
      Move sysfs_slab_unlink() and sysfs_slab_release() to the newly
      created kmem_cache_release() and call it outside the slab_mutex &
      cpu_hotplug_lock critical sections. There will be a slight delay
      in the deletion of sysfs files if kmem_cache_release() is called
      indirectly from a work function.
      
      Fixes: 5a836bf6 ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Link: https://lore.kernel.org/all/YwOImVd+nRUsSAga@hyeyoo/Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      0495e337
    • Yacan Liu's avatar
      net/smc: Remove redundant refcount increase · a8424a9b
      Yacan Liu authored
      For passive connections, the refcount increment has been done in
      smc_clcsock_accept()-->smc_sock_alloc().
      
      Fixes: 3b2dec26 ("net/smc: restructure client and server code in af_smc")
      Signed-off-by: default avatarYacan Liu <liuyacan@corp.netease.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20220830152314.838736-1-liuyacan@corp.netease.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a8424a9b
    • Jakub Kicinski's avatar
      Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb" · 0b4f688d
      Jakub Kicinski authored
      This reverts commit 90fabae8.
      
      Patch was applied hastily, revert and let the v2 be reviewed.
      
      Fixes: 90fabae8 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb")
      Link: https://lore.kernel.org/all/87wnao2ha3.fsf@toke.dk/Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0b4f688d
    • Jakub Kicinski's avatar
      Merge branch 'tcp-tcp-challenge-ack-fixes' · a3daac63
      Jakub Kicinski authored
      Eric Dumazet says:
      
      ====================
      tcp: tcp challenge ack fixes
      
      syzbot found a typical data-race addressed in the first patch.
      
      While we are at it, second patch makes the global rate limit
      per net-ns and disabled by default.
      ====================
      
      Link: https://lore.kernel.org/r/20220830185656.268523-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a3daac63
    • Eric Dumazet's avatar
      tcp: make global challenge ack rate limitation per net-ns and default disabled · 79e3602c
      Eric Dumazet authored
      Because per host rate limiting has been proven problematic (side channel
      attacks can be based on it), per host rate limiting of challenge acks ideally
      should be per netns and turned off by default.
      
      This is a long due followup of following commits:
      
      083ae308 ("tcp: enable per-socket rate limiting of all 'challenge acks'")
      f2b2c582 ("tcp: mitigate ACK loops for connections as tcp_sock")
      75ff39cc ("tcp: make challenge acks less predictable")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jason Baron <jbaron@akamai.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      79e3602c
    • Eric Dumazet's avatar
      tcp: annotate data-race around challenge_timestamp · 8c705212
      Eric Dumazet authored
      challenge_timestamp can be read an written by concurrent threads.
      
      This was expected, but we need to annotate the race to avoid potential issues.
      
      Following patch moves challenge_timestamp and challenge_count
      to per-netns storage to provide better isolation.
      
      Fixes: 354e4aa3 ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8c705212
    • Kurt Kanzenbach's avatar
      net: dsa: hellcreek: Print warning only once · 52267ce2
      Kurt Kanzenbach authored
      In case the source port cannot be decoded, print the warning only once. This
      still brings attention to the user and does not spam the logs at the same time.
      Signed-off-by: Kurt Kanzenbach's avatarKurt Kanzenbach <kurt@linutronix.de>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarVladimir Oltean <olteanv@gmail.com>
      Link: https://lore.kernel.org/r/20220830163448.8921-1-kurt@linutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      52267ce2
    • Nicolas Dichtel's avatar
      ip: fix triggering of 'icmp redirect' · eb55dc09
      Nicolas Dichtel authored
      __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
      My understanding is that fib_validate_source() is used to know if the src
      address and the gateway address are on the same link. For that,
      fib_validate_source() returns 1 (same link) or 0 (not the same network).
      __mkroute_input() is the only user of these positive values, all other
      callers only look if the returned value is negative.
      
      Since the below patch, fib_validate_source() didn't return anymore 1 when
      both addresses are on the same network, because the route lookup returns
      RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
      Let's adapat the test to return 1 again when both addresses are on the same
      link.
      
      CC: stable@vger.kernel.org
      Fixes: 747c1430 ("ip: fix dflt addr selection for connected nexthop")
      Reported-by: default avatarkernel test robot <yujie.liu@intel.com>
      Reported-by: default avatarHeng Qi <hengqi@linux.alibaba.com>
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20220829100121.3821-1-nicolas.dichtel@6wind.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      eb55dc09
  3. 31 Aug, 2022 16 commits