1. 09 Dec, 2021 1 commit
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 6efcdadc
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      bpf 2021-12-08
      
      We've added 12 non-merge commits during the last 22 day(s) which contain
      a total of 29 files changed, 659 insertions(+), 80 deletions(-).
      
      The main changes are:
      
      1) Fix an off-by-two error in packet range markings and also add a batch of
         new tests for coverage of these corner cases, from Maxim Mikityanskiy.
      
      2) Fix a compilation issue on MIPS JIT for R10000 CPUs, from Johan Almbladh.
      
      3) Fix two functional regressions and a build warning related to BTF kfunc
         for modules, from Kumar Kartikeya Dwivedi.
      
      4) Fix outdated code and docs regarding BPF's migrate_disable() use on non-
         PREEMPT_RT kernels, from Sebastian Andrzej Siewior.
      
      5) Add missing includes in order to be able to detangle cgroup vs bpf header
         dependencies, from Jakub Kicinski.
      
      6) Fix regression in BPF sockmap tests caused by missing detachment of progs
         from sockets when they are removed from the map, from John Fastabend.
      
      7) Fix a missing "no previous prototype" warning in x86 JIT caused by BPF
         dispatcher, from Björn Töpel.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf: Add selftests to cover packet access corner cases
        bpf: Fix the off-by-two error in range markings
        treewide: Add missing includes masked by cgroup -> bpf dependency
        tools/resolve_btfids: Skip unresolved symbol warning for empty BTF sets
        bpf: Fix bpf_check_mod_kfunc_call for built-in modules
        bpf: Make CONFIG_DEBUG_INFO_BTF depend upon CONFIG_BPF_SYSCALL
        mips, bpf: Fix reference to non-existing Kconfig symbol
        bpf: Make sure bpf_disable_instrumentation() is safe vs preemption.
        Documentation/locking/locktypes: Update migrate_disable() bits.
        bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap
        bpf, sockmap: Attach map progs to psock early for feature probes
        bpf, x86: Fix "no previous prototype" warning
      ====================
      
      Link: https://lore.kernel.org/r/20211208155125.11826-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6efcdadc
  2. 08 Dec, 2021 12 commits
  3. 07 Dec, 2021 13 commits
  4. 06 Dec, 2021 4 commits
  5. 04 Dec, 2021 2 commits
    • Lee Jones's avatar
      net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero · 2be6d4d1
      Lee Jones authored
      Currently, due to the sequential use of min_t() and clamp_t() macros,
      in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is not set, the logic
      sets tx_max to 0.  This is then used to allocate the data area of the
      SKB requested later in cdc_ncm_fill_tx_frame().
      
      This does not cause an issue presently because when memory is
      allocated during initialisation phase of SKB creation, more memory
      (512b) is allocated than is required for the SKB headers alone (320b),
      leaving some space (512b - 320b = 192b) for CDC data (172b).
      
      However, if more elements (for example 3 x u64 = [24b]) were added to
      one of the SKB header structs, say 'struct skb_shared_info',
      increasing its original size (320b [320b aligned]) to something larger
      (344b [384b aligned]), then suddenly the CDC data (172b) no longer
      fits in the spare SKB data area (512b - 384b = 128b).
      
      Consequently the SKB bounds checking semantics fails and panics:
      
        skbuff: skb_over_panic: text:ffffffff830a5b5f len:184 put:172   \
           head:ffff888119227c00 data:ffff888119227c00 tail:0xb8 end:0x80 dev:<NULL>
      
        ------------[ cut here ]------------
        kernel BUG at net/core/skbuff.c:110!
        RIP: 0010:skb_panic+0x14f/0x160 net/core/skbuff.c:106
        <snip>
        Call Trace:
         <IRQ>
         skb_over_panic+0x2c/0x30 net/core/skbuff.c:115
         skb_put+0x205/0x210 net/core/skbuff.c:1877
         skb_put_zero include/linux/skbuff.h:2270 [inline]
         cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1116 [inline]
         cdc_ncm_fill_tx_frame+0x127f/0x3d50 drivers/net/usb/cdc_ncm.c:1293
         cdc_ncm_tx_fixup+0x98/0xf0 drivers/net/usb/cdc_ncm.c:1514
      
      By overriding the max value with the default CDC_NCM_NTB_MAX_SIZE_TX
      when not offered through the system provided params, we ensure enough
      data space is allocated to handle the CDC data, meaning no crash will
      occur.
      
      Cc: Oliver Neukum <oliver@neukum.org>
      Fixes: 289507d3 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning")
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Reviewed-by: default avatarBjørn Mork <bjorn@mork.no>
      Link: https://lore.kernel.org/r/20211202143437.1411410-1-lee.jones@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      2be6d4d1
    • Manish Chopra's avatar
      qede: validate non LSO skb length · 8e227b19
      Manish Chopra authored
      Although it is unlikely that stack could transmit a non LSO
      skb with length > MTU, however in some cases or environment such
      occurrences actually resulted into firmware asserts due to packet
      length being greater than the max supported by the device (~9700B).
      
      This patch adds the safeguard for such odd cases to avoid firmware
      asserts.
      
      v2: Added "Fixes" tag with one of the initial driver commit
          which enabled the TX traffic actually (as this was probably
          day1 issue which was discovered recently by some customer
          environment)
      
      Fixes: a2ec6172 ("qede: Add support for link")
      Signed-off-by: default avatarManish Chopra <manishc@marvell.com>
      Signed-off-by: default avatarAlok Prasad <palok@marvell.com>
      Signed-off-by: default avatarPrabhakar Kushwaha <pkushwaha@marvell.com>
      Signed-off-by: default avatarAriel Elior <aelior@marvell.com>
      Link: https://lore.kernel.org/r/20211203174413.13090-1-manishc@marvell.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8e227b19
  6. 03 Dec, 2021 8 commits
    • Maxim Mikityanskiy's avatar
      bpf: Fix the off-by-two error in range markings · 2fa7d94a
      Maxim Mikityanskiy authored
      The first commit cited below attempts to fix the off-by-one error that
      appeared in some comparisons with an open range. Due to this error,
      arithmetically equivalent pieces of code could get different verdicts
      from the verifier, for example (pseudocode):
      
        // 1. Passes the verifier:
        if (data + 8 > data_end)
            return early
        read *(u64 *)data, i.e. [data; data+7]
      
        // 2. Rejected by the verifier (should still pass):
        if (data + 7 >= data_end)
            return early
        read *(u64 *)data, i.e. [data; data+7]
      
      The attempted fix, however, shifts the range by one in a wrong
      direction, so the bug not only remains, but also such piece of code
      starts failing in the verifier:
      
        // 3. Rejected by the verifier, but the check is stricter than in #1.
        if (data + 8 >= data_end)
            return early
        read *(u64 *)data, i.e. [data; data+7]
      
      The change performed by that fix converted an off-by-one bug into
      off-by-two. The second commit cited below added the BPF selftests
      written to ensure than code chunks like #3 are rejected, however,
      they should be accepted.
      
      This commit fixes the off-by-two error by adjusting new_range in the
      right direction and fixes the tests by changing the range into the
      one that should actually fail.
      
      Fixes: fb2a311a ("bpf: fix off by one for range markings with L{T, E} patterns")
      Fixes: b37242c7 ("bpf: add test cases to bpf selftests to cover all access tests")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20211130181607.593149-1-maximmi@nvidia.com
      2fa7d94a
    • Jakub Kicinski's avatar
      treewide: Add missing includes masked by cgroup -> bpf dependency · 8581fd40
      Jakub Kicinski authored
      cgroup.h (therefore swap.h, therefore half of the universe)
      includes bpf.h which in turn includes module.h and slab.h.
      Since we're about to get rid of that dependency we need
      to clean things up.
      
      v2: drop the cpu.h include from cacheinfo.h, it's not necessary
      and it makes riscv sensitive to ordering of include files.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Acked-by: default avatarKrzysztof Wilczyński <kw@linux.com>
      Acked-by: default avatarPeter Chen <peter.chen@kernel.org>
      Acked-by: default avatarSeongJae Park <sj@kernel.org>
      Acked-by: default avatarJani Nikula <jani.nikula@intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Link: https://lore.kernel.org/all/20211120035253.72074-1-kuba@kernel.org/  # v1
      Link: https://lore.kernel.org/all/20211120165528.197359-1-kuba@kernel.org/ # cacheinfo discussion
      Link: https://lore.kernel.org/bpf/20211202203400.1208663-1-kuba@kernel.org
      8581fd40
    • Dan Carpenter's avatar
      net: altera: set a couple error code in probe() · badd7857
      Dan Carpenter authored
      There are two error paths which accidentally return success instead of
      a negative error code.
      
      Fixes: bbd2190c ("Altera TSE: Add main and header file for Altera Ethernet Driver")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      badd7857
    • Jiasheng Jiang's avatar
      net: bcm4908: Handle dma_set_coherent_mask error codes · 128f6ec9
      Jiasheng Jiang authored
      The return value of dma_set_coherent_mask() is not always 0.
      To catch the exception in case that dma is not support the mask.
      
      Fixes: 9d61d138 ("net: broadcom: rename BCM4908 driver & update DT binding")
      Signed-off-by: default avatarJiasheng Jiang <jiasheng@iscas.ac.cn>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      128f6ec9
    • Li Zhijian's avatar
      selftests: net/fcnal-test.sh: add exit code · 0f8a3b48
      Li Zhijian authored
      Previously, the selftest framework always treats it as *ok* even though
      some of them are failed actually. That's because the script always
      returns 0.
      
      It supports PASS/FAIL/SKIP exit code now.
      
      CC: Philip Li <philip.li@intel.com>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarLi Zhijian <zhijianx.li@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f8a3b48
    • Eric Dumazet's avatar
      bonding: make tx_rebalance_counter an atomic · dac8e00f
      Eric Dumazet authored
      KCSAN reported a data-race [1] around tx_rebalance_counter
      which can be accessed from different contexts, without
      the protection of a lock/mutex.
      
      [1]
      BUG: KCSAN: data-race in bond_alb_init_slave / bond_alb_monitor
      
      write to 0xffff888157e8ca24 of 4 bytes by task 7075 on cpu 0:
       bond_alb_init_slave+0x713/0x860 drivers/net/bonding/bond_alb.c:1613
       bond_enslave+0xd94/0x3010 drivers/net/bonding/bond_main.c:1949
       do_set_master net/core/rtnetlink.c:2521 [inline]
       __rtnl_newlink net/core/rtnetlink.c:3475 [inline]
       rtnl_newlink+0x1298/0x13b0 net/core/rtnetlink.c:3506
       rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5571
       netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2491
       rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5589
       netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
       netlink_unicast+0x5fc/0x6c0 net/netlink/af_netlink.c:1345
       netlink_sendmsg+0x6e1/0x7d0 net/netlink/af_netlink.c:1916
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg net/socket.c:724 [inline]
       ____sys_sendmsg+0x39a/0x510 net/socket.c:2409
       ___sys_sendmsg net/socket.c:2463 [inline]
       __sys_sendmsg+0x195/0x230 net/socket.c:2492
       __do_sys_sendmsg net/socket.c:2501 [inline]
       __se_sys_sendmsg net/socket.c:2499 [inline]
       __x64_sys_sendmsg+0x42/0x50 net/socket.c:2499
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      read to 0xffff888157e8ca24 of 4 bytes by task 1082 on cpu 1:
       bond_alb_monitor+0x8f/0xc00 drivers/net/bonding/bond_alb.c:1511
       process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
       worker_thread+0x616/0xa70 kernel/workqueue.c:2445
       kthread+0x2c7/0x2e0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      value changed: 0x00000001 -> 0x00000064
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 1082 Comm: kworker/u4:3 Not tainted 5.16.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: bond1 bond_alb_monitor
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dac8e00f
    • Eric Dumazet's avatar
      tcp: fix another uninit-value (sk_rx_queue_mapping) · 03cfda4f
      Eric Dumazet authored
      KMSAN is still not happy [1].
      
      I missed that passive connections do not inherit their
      sk_rx_queue_mapping values from the request socket,
      but instead tcp_child_process() is calling
      sk_mark_napi_id(child, skb)
      
      We have many sk_mark_napi_id() callers, so I am providing
      a new helper, forcing the setting sk_rx_queue_mapping
      and sk_napi_id.
      
      Note that we had no KMSAN report for sk_napi_id because
      passive connections got a copy of this field from the listener.
      sk_rx_queue_mapping in the other hand is inside the
      sk_dontcopy_begin/sk_dontcopy_end so sk_clone_lock()
      leaves this field uninitialized.
      
      We might remove dead code populating req->sk_rx_queue_mapping
      in the future.
      
      [1]
      
      BUG: KMSAN: uninit-value in __sk_rx_queue_set include/net/sock.h:1924 [inline]
      BUG: KMSAN: uninit-value in sk_rx_queue_update include/net/sock.h:1938 [inline]
      BUG: KMSAN: uninit-value in sk_mark_napi_id include/net/busy_poll.h:136 [inline]
      BUG: KMSAN: uninit-value in tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       __sk_rx_queue_set include/net/sock.h:1924 [inline]
       sk_rx_queue_update include/net/sock.h:1938 [inline]
       sk_mark_napi_id include/net/busy_poll.h:136 [inline]
       tcp_child_process+0xb42/0x1050 net/ipv4/tcp_minisocks.c:833
       tcp_v4_rcv+0x3d83/0x4ed0 net/ipv4/tcp_ipv4.c:2066
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
       run_ksoftirqd+0x33/0x50 kernel/softirq.c:920
       smpboot_thread_fn+0x616/0xbf0 kernel/smpboot.c:164
       kthread+0x721/0x850 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30
      
      Uninit was created at:
       __alloc_pages+0xbc7/0x10a0 mm/page_alloc.c:5409
       alloc_pages+0x8a5/0xb80
       alloc_slab_page mm/slub.c:1810 [inline]
       allocate_slab+0x287/0x1c20 mm/slub.c:1947
       new_slab mm/slub.c:2010 [inline]
       ___slab_alloc+0xbdf/0x1e90 mm/slub.c:3039
       __slab_alloc mm/slub.c:3126 [inline]
       slab_alloc_node mm/slub.c:3217 [inline]
       slab_alloc mm/slub.c:3259 [inline]
       kmem_cache_alloc+0xbb3/0x11c0 mm/slub.c:3264
       sk_prot_alloc+0xeb/0x570 net/core/sock.c:1914
       sk_clone_lock+0xd6/0x1940 net/core/sock.c:2118
       inet_csk_clone_lock+0x8d/0x6a0 net/ipv4/inet_connection_sock.c:956
       tcp_create_openreq_child+0xb1/0x1ef0 net/ipv4/tcp_minisocks.c:453
       tcp_v4_syn_recv_sock+0x268/0x2710 net/ipv4/tcp_ipv4.c:1563
       tcp_check_req+0x207c/0x2a30 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x36f5/0x4ed0 net/ipv4/tcp_ipv4.c:2047
       ip_protocol_deliver_rcu+0x760/0x10b0 net/ipv4/ip_input.c:204
       ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       ip_local_deliver+0x584/0x8c0 net/ipv4/ip_input.c:252
       dst_input include/net/dst.h:460 [inline]
       ip_sublist_rcv_finish net/ipv4/ip_input.c:551 [inline]
       ip_list_rcv_finish net/ipv4/ip_input.c:601 [inline]
       ip_sublist_rcv+0x11fd/0x1520 net/ipv4/ip_input.c:609
       ip_list_rcv+0x95f/0x9a0 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5505 [inline]
       __netif_receive_skb_list_core+0xe34/0x1240 net/core/dev.c:5553
       __netif_receive_skb_list+0x7fc/0x960 net/core/dev.c:5605
       netif_receive_skb_list_internal+0x868/0xde0 net/core/dev.c:5696
       gro_normal_list net/core/dev.c:5850 [inline]
       napi_complete_done+0x579/0xdd0 net/core/dev.c:6587
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0x17b6/0x2350 drivers/net/virtio_net.c:1557
       __napi_poll+0x14e/0xbc0 net/core/dev.c:7020
       napi_poll net/core/dev.c:7087 [inline]
       net_rx_action+0x824/0x1880 net/core/dev.c:7174
       __do_softirq+0x1fe/0x7eb kernel/softirq.c:558
      
      Fixes: 342159ee ("net: avoid dirtying sk->sk_rx_queue_mapping")
      Fixes: a37a0ee4 ("net: avoid uninit-value from tcp_conn_request")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Tested-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03cfda4f
    • Eric Dumazet's avatar
      inet: use #ifdef CONFIG_SOCK_RX_QUEUE_MAPPING consistently · a9418924
      Eric Dumazet authored
      Since commit 4e1beecc ("net/sock: Add kernel config
      SOCK_RX_QUEUE_MAPPING"),
      sk_rx_queue_mapping access is guarded by CONFIG_SOCK_RX_QUEUE_MAPPING.
      
      Fixes: 54b92e84 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Tariq Toukan <tariqt@nvidia.com>
      Acked-by: default avatarKuniyuki Iwashima <kuniyu@amazon.co.jp>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a9418924