1. 16 Dec, 2021 21 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 0c3e2474
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2021-12-16
      
      We've added 15 non-merge commits during the last 7 day(s) which contain
      a total of 12 files changed, 434 insertions(+), 30 deletions(-).
      
      The main changes are:
      
      1) Fix incorrect verifier state pruning behavior for <8B register spill/fill,
         from Paul Chaignon.
      
      2) Fix x86-64 JIT's extable handling for fentry/fexit when return pointer
         is an ERR_PTR(), from Alexei Starovoitov.
      
      3) Fix 3 different possibilities that BPF verifier missed where unprivileged
         could leak kernel addresses, from Daniel Borkmann.
      
      4) Fix xsk's poll behavior under need_wakeup flag, from Magnus Karlsson.
      
      5) Fix an oob-write in test_verifier due to a missed MAX_NR_MAPS bump,
         from Kumar Kartikeya Dwivedi.
      
      6) Fix a race in test_btf_skc_cls_ingress selftest, from Martin KaFai Lau.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        bpf, selftests: Fix racing issue in btf_skc_cls_ingress test
        selftest/bpf: Add a test that reads various addresses.
        bpf: Fix extable address check.
        bpf: Fix extable fixup offset.
        bpf, selftests: Add test case trying to taint map value pointer
        bpf: Make 32->64 bounds propagation slightly more robust
        bpf: Fix signed bounds propagation after mov32
        bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer
        bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg
        bpf, selftests: Add test case for atomic fetch on spilled pointer
        bpf: Fix kernel address leakage in atomic fetch
        selftests/bpf: Fix OOB write in test_verifier
        xsk: Do not sleep in poll() when need_wakeup set
        selftests/bpf: Tests for state pruning with u32 spill/fill
        bpf: Fix incorrect state pruning for <8B spill/fill
      ====================
      
      Link: https://lore.kernel.org/r/20211216210005.13815-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c3e2474
    • Martin KaFai Lau's avatar
      bpf, selftests: Fix racing issue in btf_skc_cls_ingress test · c2fcbf81
      Martin KaFai Lau authored
      The libbpf CI reported occasional failure in btf_skc_cls_ingress:
      
        test_syncookie:FAIL:Unexpected syncookie states gen_cookie:80326634 recv_cookie:0
        bpf prog error at line 97
      
      "error at line 97" means the bpf prog cannot find the listening socket
      when the final ack is received.  It then skipped processing
      the syncookie in the final ack which then led to "recv_cookie:0".
      
      The problem is the userspace program did not do accept() and went
      ahead to close(listen_fd) before the kernel (and the bpf prog) had
      a chance to process the final ack.
      
      The fix is to add accept() call so that the userspace will wait for
      the kernel to finish processing the final ack first before close()-ing
      everything.
      
      Fixes: 9a856cae ("bpf: selftest: Add test_btf_skc_cls_ingress")
      Reported-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20211216191630.466151-1-kafai@fb.com
      c2fcbf81
    • Alexei Starovoitov's avatar
      selftest/bpf: Add a test that reads various addresses. · 7edc3fcb
      Alexei Starovoitov authored
      Add a function to bpf_testmod that returns invalid kernel and user addresses.
      Then attach an fexit program to that function that tries to read
      memory through these addresses.
      
      This logic checks that bpf_probe_read_kernel and BPF_PROBE_MEM logic is sane.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      7edc3fcb
    • Alexei Starovoitov's avatar
      bpf: Fix extable address check. · 588a25e9
      Alexei Starovoitov authored
      The verifier checks that PTR_TO_BTF_ID pointer is either valid or NULL,
      but it cannot distinguish IS_ERR pointer from valid one.
      
      When offset is added to IS_ERR pointer it may become small positive
      value which is a user address that is not handled by extable logic
      and has to be checked for at the runtime.
      
      Tighten BPF_PROBE_MEM pointer check code to prevent this case.
      
      Fixes: 4c5de127 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
      Reported-by: default avatarLorenzo Fontana <lorenzo.fontana@elastic.co>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      588a25e9
    • Alexei Starovoitov's avatar
      bpf: Fix extable fixup offset. · 433956e9
      Alexei Starovoitov authored
      The prog - start_of_ldx is the offset before the faulting ldx to the location
      after it, so this will be used to adjust pt_regs->ip for jumping over it and
      continuing, and with old temp it would have been fixed up to the wrong offset,
      causing crash.
      
      Fixes: 4c5de127 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      433956e9
    • Daniel Borkmann's avatar
      bpf, selftests: Add test case trying to taint map value pointer · b1a7288d
      Daniel Borkmann authored
      Add a test case which tries to taint map value pointer arithmetic into a
      unknown scalar with subsequent export through the map.
      
      Before fix:
      
        # ./test_verifier 1186
        #1186/u map access: trying to leak tained dst reg FAIL
        Unexpected success to load!
        verification time 24 usec
        stack depth 8
        processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
        #1186/p map access: trying to leak tained dst reg FAIL
        Unexpected success to load!
        verification time 8 usec
        stack depth 8
        processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
        Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
      
      After fix:
      
        # ./test_verifier 1186
        #1186/u map access: trying to leak tained dst reg OK
        #1186/p map access: trying to leak tained dst reg OK
        Summary: 2 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b1a7288d
    • Daniel Borkmann's avatar
      bpf: Make 32->64 bounds propagation slightly more robust · e572ff80
      Daniel Borkmann authored
      Make the bounds propagation in __reg_assign_32_into_64() slightly more
      robust and readable by aligning it similarly as we did back in the
      __reg_combine_64_into_32() counterpart. Meaning, only propagate or
      pessimize them as a smin/smax pair.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e572ff80
    • Daniel Borkmann's avatar
      bpf: Fix signed bounds propagation after mov32 · 3cf2b61e
      Daniel Borkmann authored
      For the case where both s32_{min,max}_value bounds are positive, the
      __reg_assign_32_into_64() directly propagates them to their 64 bit
      counterparts, otherwise it pessimises them into [0,u32_max] universe and
      tries to refine them later on by learning through the tnum as per comment
      in mentioned function. However, that does not always happen, for example,
      in mov32 operation we call zext_32_to_64(dst_reg) which invokes the
      __reg_assign_32_into_64() as is without subsequent bounds update as
      elsewhere thus no refinement based on tnum takes place.
      
      Thus, not calling into the __update_reg_bounds() / __reg_deduce_bounds() /
      __reg_bound_offset() triplet as we do, for example, in case of ALU ops via
      adjust_scalar_min_max_vals(), will lead to more pessimistic bounds when
      dumping the full register state:
      
      Before fix:
      
        0: (b4) w0 = -1
        1: R0_w=invP4294967295
           (id=0,imm=ffffffff,
            smin_value=4294967295,smax_value=4294967295,
            umin_value=4294967295,umax_value=4294967295,
            var_off=(0xffffffff; 0x0),
            s32_min_value=-1,s32_max_value=-1,
            u32_min_value=-1,u32_max_value=-1)
      
        1: (bc) w0 = w0
        2: R0_w=invP4294967295
           (id=0,imm=ffffffff,
            smin_value=0,smax_value=4294967295,
            umin_value=4294967295,umax_value=4294967295,
            var_off=(0xffffffff; 0x0),
            s32_min_value=-1,s32_max_value=-1,
            u32_min_value=-1,u32_max_value=-1)
      
      Technically, the smin_value=0 and smax_value=4294967295 bounds are not
      incorrect, but given the register is still a constant, they break assumptions
      about const scalars that smin_value == smax_value and umin_value == umax_value.
      
      After fix:
      
        0: (b4) w0 = -1
        1: R0_w=invP4294967295
           (id=0,imm=ffffffff,
            smin_value=4294967295,smax_value=4294967295,
            umin_value=4294967295,umax_value=4294967295,
            var_off=(0xffffffff; 0x0),
            s32_min_value=-1,s32_max_value=-1,
            u32_min_value=-1,u32_max_value=-1)
      
        1: (bc) w0 = w0
        2: R0_w=invP4294967295
           (id=0,imm=ffffffff,
            smin_value=4294967295,smax_value=4294967295,
            umin_value=4294967295,umax_value=4294967295,
            var_off=(0xffffffff; 0x0),
            s32_min_value=-1,s32_max_value=-1,
            u32_min_value=-1,u32_max_value=-1)
      
      Without the smin_value == smax_value and umin_value == umax_value invariant
      being intact for const scalars, it is possible to leak out kernel pointers
      from unprivileged user space if the latter is enabled. For example, when such
      registers are involved in pointer arithmtics, then adjust_ptr_min_max_vals()
      will taint the destination register into an unknown scalar, and the latter
      can be exported and stored e.g. into a BPF map value.
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Reported-by: default avatarKuee K1r0a <liulin063@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3cf2b61e
    • Eric Dumazet's avatar
      sit: do not call ipip6_dev_free() from sit_init_net() · e28587cc
      Eric Dumazet authored
      ipip6_dev_free is sit dev->priv_destructor, already called
      by register_netdevice() if something goes wrong.
      
      Alternative would be to make ipip6_dev_free() robust against
      multiple invocations, but other drivers do not implement this
      strategy.
      
      syzbot reported:
      
      dst_release underflow
      WARNING: CPU: 0 PID: 5059 at net/core/dst.c:173 dst_release+0xd8/0xe0 net/core/dst.c:173
      Modules linked in:
      CPU: 1 PID: 5059 Comm: syz-executor.4 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:dst_release+0xd8/0xe0 net/core/dst.c:173
      Code: 4c 89 f2 89 d9 31 c0 5b 41 5e 5d e9 da d5 44 f9 e8 1d 90 5f f9 c6 05 87 48 c6 05 01 48 c7 c7 80 44 99 8b 31 c0 e8 e8 67 29 f9 <0f> 0b eb 85 0f 1f 40 00 53 48 89 fb e8 f7 8f 5f f9 48 83 c3 a8 48
      RSP: 0018:ffffc9000aa5faa0 EFLAGS: 00010246
      RAX: d6894a925dd15a00 RBX: 00000000ffffffff RCX: 0000000000040000
      RDX: ffffc90005e19000 RSI: 000000000003ffff RDI: 0000000000040000
      RBP: 0000000000000000 R08: ffffffff816a1f42 R09: ffffed1017344f2c
      R10: ffffed1017344f2c R11: 0000000000000000 R12: 0000607f462b1358
      R13: 1ffffffff1bfd305 R14: ffffe8ffffcb1358 R15: dffffc0000000000
      FS:  00007f66c71a2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f88aaed5058 CR3: 0000000023e0f000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       dst_cache_destroy+0x107/0x1e0 net/core/dst_cache.c:160
       ipip6_dev_free net/ipv6/sit.c:1414 [inline]
       sit_init_net+0x229/0x550 net/ipv6/sit.c:1936
       ops_init+0x313/0x430 net/core/net_namespace.c:140
       setup_net+0x35b/0x9d0 net/core/net_namespace.c:326
       copy_net_ns+0x359/0x5c0 net/core/net_namespace.c:470
       create_new_namespaces+0x4ce/0xa00 kernel/nsproxy.c:110
       unshare_nsproxy_namespaces+0x11e/0x180 kernel/nsproxy.c:226
       ksys_unshare+0x57d/0xb50 kernel/fork.c:3075
       __do_sys_unshare kernel/fork.c:3146 [inline]
       __se_sys_unshare kernel/fork.c:3144 [inline]
       __x64_sys_unshare+0x34/0x40 kernel/fork.c:3144
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f66c882ce99
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f66c71a2168 EFLAGS: 00000246 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 00007f66c893ff60 RCX: 00007f66c882ce99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000048040200
      RBP: 00007f66c8886ff1 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007fff6634832f R14: 00007f66c71a2300 R15: 0000000000022000
       </TASK>
      
      Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Link: https://lore.kernel.org/r/20211216111741.1387540-1-eric.dumazet@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e28587cc
    • Florian Fainelli's avatar
      net: systemport: Add global locking for descriptor lifecycle · 8b8e6e78
      Florian Fainelli authored
      The descriptor list is a shared resource across all of the transmit queues, and
      the locking mechanism used today only protects concurrency across a given
      transmit queue between the transmit and reclaiming. This creates an opportunity
      for the SYSTEMPORT hardware to work on corrupted descriptors if we have
      multiple producers at once which is the case when using multiple transmit
      queues.
      
      This was particularly noticeable when using multiple flows/transmit queues and
      it showed up in interesting ways in that UDP packets would get a correct UDP
      header checksum being calculated over an incorrect packet length. Similarly TCP
      packets would get an equally correct checksum computed by the hardware over an
      incorrect packet length.
      
      The SYSTEMPORT hardware maintains an internal descriptor list that it re-arranges
      when the driver produces a new descriptor anytime it writes to the
      WRITE_PORT_{HI,LO} registers, there is however some delay in the hardware to
      re-organize its descriptors and it is possible that concurrent TX queues
      eventually break this internal allocation scheme to the point where the
      length/status part of the descriptor gets used for an incorrect data buffer.
      
      The fix is to impose a global serialization for all TX queues in the short
      section where we are writing to the WRITE_PORT_{HI,LO} registers which solves
      the corruption even with multiple concurrent TX queues being used.
      
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20211215202450.4086240-1-f.fainelli@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8b8e6e78
    • D. Wythe's avatar
      net/smc: Prevent smc_release() from long blocking · 5c15b312
      D. Wythe authored
      In nginx/wrk benchmark, there's a hung problem with high probability
      on case likes that: (client will last several minutes to exit)
      
      server: smc_run nginx
      
      client: smc_run wrk -c 10000 -t 1 http://server
      
      Client hangs with the following backtrace:
      
      0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
      1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
      2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
      3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
      4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
      5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
      6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
      7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
      8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
      9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
      10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
      11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
      12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
      13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
      14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c
      
      This issue dues to flush_work(), which is used to wait for
      smc_connect_work() to finish in smc_release(). Once lots of
      smc_connect_work() was pending or all executing work dangling,
      smc_release() has to block until one worker comes to free, which
      is equivalent to wait another smc_connnect_work() to finish.
      
      In order to fix this, There are two changes:
      
      1. For those idle smc_connect_work(), cancel it from the workqueue; for
         executing smc_connect_work(), waiting for it to finish. For that
         purpose, replace flush_work() with cancel_work_sync().
      
      2. Since smc_connect() hold a reference for passive closing, if
         smc_connect_work() has been cancelled, release the reference.
      
      Fixes: 24ac3a08 ("net/smc: rebuild nonblocking connect")
      Reported-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Tested-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
      Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5c15b312
    • Gal Pressman's avatar
      net: Fix double 0x prefix print in SKB dump · 8a03ef67
      Gal Pressman authored
      When printing netdev features %pNF already takes care of the 0x prefix,
      remove the explicit one.
      
      Fixes: 6413139d ("skbuff: increase verbosity when dumping skb data")
      Signed-off-by: default avatarGal Pressman <gal@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8a03ef67
    • Wenliang Wang's avatar
      virtio_net: fix rx_drops stat for small pkts · 053c9e18
      Wenliang Wang authored
      We found the stat of rx drops for small pkts does not increment when
      build_skb fail, it's not coherent with other mode's rx drops stat.
      Signed-off-by: default avatarWenliang Wang <wangwenliang.1995@bytedance.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      053c9e18
    • Andrey Eremeev's avatar
      dsa: mv88e6xxx: fix debug print for SPEED_UNFORCED · e08cdf63
      Andrey Eremeev authored
      Debug print uses invalid check to detect if speed is unforced:
      (speed != SPEED_UNFORCED) should be used instead of (!speed).
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      Signed-off-by: default avatarAndrey Eremeev <Axtone4all@yandex.ru>
      Fixes: 96a2b40c ("net: dsa: mv88e6xxx: add port's MAC speed setter")
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e08cdf63
    • Jiasheng Jiang's avatar
      sfc_ef100: potential dereference of null pointer · 407ecd1b
      Jiasheng Jiang authored
      The return value of kmalloc() needs to be checked.
      To avoid use in efx_nic_update_stats() in case of the failure of alloc.
      
      Fixes: b593b6f1 ("sfc_ef100: statistics gathering")
      Signed-off-by: default avatarJiasheng Jiang <jiasheng@iscas.ac.cn>
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      407ecd1b
    • John Keeping's avatar
      net: stmmac: dwmac-rk: fix oob read in rk_gmac_setup · 0546b224
      John Keeping authored
      KASAN reports an out-of-bounds read in rk_gmac_setup on the line:
      
      	while (ops->regs[i]) {
      
      This happens for most platforms since the regs flexible array member is
      empty, so the memory after the ops structure is being read here.  It
      seems that mostly this happens to contain zero anyway, so we get lucky
      and everything still works.
      
      To avoid adding redundant data to nearly all the ops structures, add a
      new flag to indicate whether the regs field is valid and avoid this loop
      when it is not.
      
      Fixes: 3bb3d6b1 ("net: stmmac: Add RK3566/RK3568 SoC support")
      Signed-off-by: default avatarJohn Keeping <john@metanate.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0546b224
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 6209dd77
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-12-15
      
      This series contains updates to igb, igbvf, igc and ixgbe drivers.
      
      Karen moves checks for invalid VF MAC filters to occur earlier for
      igb.
      
      Letu Ren fixes a double free issue in igbvf probe.
      
      Sasha fixes incorrect min value being used when calculating for max for
      igc.
      
      Robert Schlabbach adds documentation on enabling NBASE-T support for
      ixgbe.
      
      Cyril Novikov adds missing initialization of MDIO bus speed for ixgbe.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6209dd77
    • Greg Jesionowski's avatar
      net: usb: lan78xx: add Allied Telesis AT29M2-AF · ef8a0f6e
      Greg Jesionowski authored
      This adds the vendor and product IDs for the AT29M2-AF which is a
      lan7801-based device.
      Signed-off-by: default avatarGreg Jesionowski <jesionowskigreg@gmail.com>
      Link: https://lore.kernel.org/r/20211214221027.305784-1-jesionowskigreg@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ef8a0f6e
    • Willem de Bruijn's avatar
      net/packet: rx_owner_map depends on pg_vec · ec6af094
      Willem de Bruijn authored
      Packet sockets may switch ring versions. Avoid misinterpreting state
      between versions, whose fields share a union. rx_owner_map is only
      allocated with a packet ring (pg_vec) and both are swapped together.
      If pg_vec is NULL, meaning no packet ring was allocated, then neither
      was rx_owner_map. And the field may be old state from a tpacket_v3.
      
      Fixes: 61fad681 ("net/packet: tpacket_rcv: avoid a producer race condition")
      Reported-by: default avatarSyzbot <syzbot+1ac0994a0a0c55151121@syzkaller.appspotmail.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211215143937.106178-1-willemdebruijn.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ec6af094
    • Haimin Zhang's avatar
      netdevsim: Zero-initialize memory for new map's value in function nsim_bpf_map_alloc · 48122177
      Haimin Zhang authored
      Zero-initialize memory for new map's value in function nsim_bpf_map_alloc
      since it may cause a potential kernel information leak issue, as follows:
      1. nsim_bpf_map_alloc calls nsim_map_alloc_elem to allocate elements for
      a new map.
      2. nsim_map_alloc_elem uses kmalloc to allocate map's value, but doesn't
      zero it.
      3. A user application can use IOCTL BPF_MAP_LOOKUP_ELEM to get specific
      element's information in the map.
      4. The kernel function map_lookup_elem will call bpf_map_copy_value to get
      the information allocated at step-2, then use copy_to_user to copy to the
      user buffer.
      This can only leak information for an array map.
      
      Fixes: 395cacb5 ("netdevsim: bpf: support fake map offload")
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarHaimin Zhang <tcs.kernel@gmail.com>
      Link: https://lore.kernel.org/r/20211215111530.72103-1-tcs.kernel@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      48122177
    • Ioana Ciornei's avatar
      dpaa2-eth: fix ethtool statistics · 972ce7e3
      Ioana Ciornei authored
      Unfortunately, with the blamed commit I also added a side effect in the
      ethtool stats shown. Because I added two more fields in the per channel
      structure without verifying if its size is used in any way, part of the
      ethtool statistics were off by 2.
      Fix this by not looking up the size of the structure but instead on a
      fixed value kept in a macro.
      
      Fixes: fc398bec ("net: dpaa2: add adaptive interrupt coalescing")
      Signed-off-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Link: https://lore.kernel.org/r/20211215105831.290070-1-ioana.ciornei@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      972ce7e3
  2. 15 Dec, 2021 16 commits
    • Cyril Novikov's avatar
      ixgbe: set X550 MDIO speed before talking to PHY · bf0a3750
      Cyril Novikov authored
      The MDIO bus speed must be initialized before talking to the PHY the first
      time in order to avoid talking to it using a speed that the PHY doesn't
      support.
      
      This fixes HW initialization error -17 (IXGBE_ERR_PHY_ADDR_INVALID) on
      Denverton CPUs (a.k.a. the Atom C3000 family) on ports with a 10Gb network
      plugged in. On those devices, HLREG0[MDCSPD] resets to 1, which combined
      with the 10Gb network results in a 24MHz MDIO speed, which is apparently
      too fast for the connected PHY. PHY register reads over MDIO bus return
      garbage, leading to initialization failure.
      
      Reproduced with Linux kernel 4.19 and 5.15-rc7. Can be reproduced using
      the following setup:
      
      * Use an Atom C3000 family system with at least one X552 LAN on the SoC
      * Disable PXE or other BIOS network initialization if possible
        (the interface must not be initialized before Linux boots)
      * Connect a live 10Gb Ethernet cable to an X550 port
      * Power cycle (not reset, doesn't always work) the system and boot Linux
      * Observe: ixgbe interfaces w/ 10GbE cables plugged in fail with error -17
      
      Fixes: e84db727 ("ixgbe: Introduce function to control MDIO speed")
      Signed-off-by: default avatarCyril Novikov <cnovikov@lynx.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      bf0a3750
    • Robert Schlabbach's avatar
      ixgbe: Document how to enable NBASE-T support · 271225fd
      Robert Schlabbach authored
      Commit a296d665 ("ixgbe: Add ethtool support to enable 2.5 and 5.0
      Gbps support") introduced suppression of the advertisement of NBASE-T
      speeds by default, according to Todd Fujinaka to accommodate customers
      with network switches which could not cope with advertised NBASE-T
      speeds, as posted in the E1000-devel mailing list:
      
      https://sourceforge.net/p/e1000/mailman/message/37106269/
      
      However, the suppression was not documented at all, nor was how to
      enable NBASE-T support.
      
      Properly document the NBASE-T suppression and how to enable NBASE-T
      support.
      
      Fixes: a296d665 ("ixgbe: Add ethtool support to enable 2.5 and 5.0 Gbps support")
      Reported-by: default avatarRobert Schlabbach <robert_s@gmx.net>
      Signed-off-by: default avatarRobert Schlabbach <robert_s@gmx.net>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      271225fd
    • Sasha Neftin's avatar
      igc: Fix typo in i225 LTR functions · 0182d1f3
      Sasha Neftin authored
      The LTR maximum value was incorrectly written using the scale from
      the LTR minimum value. This would cause incorrect values to be sent,
      in cases where the initial calculation lead to different min/max scales.
      
      Fixes: 707abf06 ("igc: Add initial LTR support")
      Suggested-by: default avatarDima Ruinskiy <dima.ruinskiy@intel.com>
      Signed-off-by: default avatarSasha Neftin <sasha.neftin@intel.com>
      Tested-by: default avatarNechama Kraus <nechamax.kraus@linux.intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      0182d1f3
    • Letu Ren's avatar
      igbvf: fix double free in `igbvf_probe` · b6d335a6
      Letu Ren authored
      In `igbvf_probe`, if register_netdev() fails, the program will go to
      label err_hw_init, and then to label err_ioremap. In free_netdev() which
      is just below label err_ioremap, there is `list_for_each_entry_safe` and
      `netif_napi_del` which aims to delete all entries in `dev->napi_list`.
      The program has added an entry `adapter->rx_ring->napi` which is added by
      `netif_napi_add` in igbvf_alloc_queues(). However, adapter->rx_ring has
      been freed below label err_hw_init. So this a UAF.
      
      In terms of how to patch the problem, we can refer to igbvf_remove() and
      delete the entry before `adapter->rx_ring`.
      
      The KASAN logs are as follows:
      
      [   35.126075] BUG: KASAN: use-after-free in free_netdev+0x1fd/0x450
      [   35.127170] Read of size 8 at addr ffff88810126d990 by task modprobe/366
      [   35.128360]
      [   35.128643] CPU: 1 PID: 366 Comm: modprobe Not tainted 5.15.0-rc2+ #14
      [   35.129789] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
      [   35.131749] Call Trace:
      [   35.132199]  dump_stack_lvl+0x59/0x7b
      [   35.132865]  print_address_description+0x7c/0x3b0
      [   35.133707]  ? free_netdev+0x1fd/0x450
      [   35.134378]  __kasan_report+0x160/0x1c0
      [   35.135063]  ? free_netdev+0x1fd/0x450
      [   35.135738]  kasan_report+0x4b/0x70
      [   35.136367]  free_netdev+0x1fd/0x450
      [   35.137006]  igbvf_probe+0x121d/0x1a10 [igbvf]
      [   35.137808]  ? igbvf_vlan_rx_add_vid+0x100/0x100 [igbvf]
      [   35.138751]  local_pci_probe+0x13c/0x1f0
      [   35.139461]  pci_device_probe+0x37e/0x6c0
      [   35.165526]
      [   35.165806] Allocated by task 366:
      [   35.166414]  ____kasan_kmalloc+0xc4/0xf0
      [   35.167117]  foo_kmem_cache_alloc_trace+0x3c/0x50 [igbvf]
      [   35.168078]  igbvf_probe+0x9c5/0x1a10 [igbvf]
      [   35.168866]  local_pci_probe+0x13c/0x1f0
      [   35.169565]  pci_device_probe+0x37e/0x6c0
      [   35.179713]
      [   35.179993] Freed by task 366:
      [   35.180539]  kasan_set_track+0x4c/0x80
      [   35.181211]  kasan_set_free_info+0x1f/0x40
      [   35.181942]  ____kasan_slab_free+0x103/0x140
      [   35.182703]  kfree+0xe3/0x250
      [   35.183239]  igbvf_probe+0x1173/0x1a10 [igbvf]
      [   35.184040]  local_pci_probe+0x13c/0x1f0
      
      Fixes: d4e0fe01 (igbvf: add new driver to support 82576 virtual functions)
      Reported-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Signed-off-by: default avatarLetu Ren <fantasquex@gmail.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      b6d335a6
    • Karen Sornek's avatar
      igb: Fix removal of unicast MAC filters of VFs · 584af821
      Karen Sornek authored
      Move checking condition of VF MAC filter before clearing
      or adding MAC filter to VF to prevent potential blackout caused
      by removal of necessary and working VF's MAC filter.
      
      Fixes: 1b8b062a ("igb: add VF trust infrastructure")
      Signed-off-by: default avatarKaren Sornek <karen.sornek@intel.com>
      Tested-by: default avatarKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      584af821
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-2021-12-15' of... · 1d1c950f
      David S. Miller authored
      Merge tag 'wireless-drivers-2021-12-15' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for v5.16
      
      Second set of fixes for v5.16, hopefully also the last one. I changed
      my email in MAINTAINERS, one crash fix in iwlwifi and some build
      problems fixed.
      
      iwlwifi
      
      * fix crash caused by a warning
      
      * fix LED linking problem
      
      brcmsmac
      
      * rework LED dependencies for being consistent with other drivers
      
      mt76
      
      * mt7921: fix build regression
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d1c950f
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue · 7c8089f9
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-12-14
      
      This series contains updates to ice driver only.
      
      Karol corrects division that was causing incorrect calculations and
      adds a check to ensure stale timestamps are not being used.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c8089f9
    • Daniel Borkmann's avatar
      bpf, selftests: Update test case for atomic cmpxchg on r0 with pointer · e523102c
      Daniel Borkmann authored
      Fix up unprivileged test case results for 'Dest pointer in r0' verifier tests
      given they now need to reject R0 containing a pointer value, and add a couple
      of new related ones with 32bit cmpxchg as well.
      
        root@foo:~/bpf/tools/testing/selftests/bpf# ./test_verifier
        #0/u invalid and of negative number OK
        #0/p invalid and of negative number OK
        [...]
        #1268/p XDP pkt read, pkt_meta' <= pkt_data, bad access 1 OK
        #1269/p XDP pkt read, pkt_meta' <= pkt_data, bad access 2 OK
        #1270/p XDP pkt read, pkt_data <= pkt_meta', good access OK
        #1271/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
        #1272/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
        Summary: 1900 PASSED, 0 SKIPPED, 0 FAILED
      Acked-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e523102c
    • Daniel Borkmann's avatar
      bpf: Fix kernel address leakage in atomic cmpxchg's r0 aux reg · a82fe085
      Daniel Borkmann authored
      The implementation of BPF_CMPXCHG on a high level has the following parameters:
      
        .-[old-val]                                          .-[new-val]
        BPF_R0 = cmpxchg{32,64}(DST_REG + insn->off, BPF_R0, SRC_REG)
                                `-[mem-loc]          `-[old-val]
      
      Given a BPF insn can only have two registers (dst, src), the R0 is fixed and
      used as an auxilliary register for input (old value) as well as output (returning
      old value from memory location). While the verifier performs a number of safety
      checks, it misses to reject unprivileged programs where R0 contains a pointer as
      old value.
      
      Through brute-forcing it takes about ~16sec on my machine to leak a kernel pointer
      with BPF_CMPXCHG. The PoC is basically probing for kernel addresses by storing the
      guessed address into the map slot as a scalar, and using the map value pointer as
      R0 while SRC_REG has a canary value to detect a matching address.
      
      Fix it by checking R0 for pointers, and reject if that's the case for unprivileged
      programs.
      
      Fixes: 5ffa2550 ("bpf: Add instructions for atomic_[cmp]xchg")
      Reported-by: Ryota Shiga (Flatt Security)
      Acked-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a82fe085
    • Daniel Borkmann's avatar
      bpf, selftests: Add test case for atomic fetch on spilled pointer · 180486b4
      Daniel Borkmann authored
      Test whether unprivileged would be able to leak the spilled pointer either
      by exporting the returned value from the atomic{32,64} operation or by reading
      and exporting the value from the stack after the atomic operation took place.
      
      Note that for unprivileged, the below atomic cmpxchg test case named "Dest
      pointer in r0 - succeed" is failing. The reason is that in the dst memory
      location (r10 -8) there is the spilled register r10:
      
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        0: (bf) r0 = r10
        1: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0
        1: (7b) *(u64 *)(r10 -8) = r0
        2: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=fp
        2: (b7) r1 = 0
        3: R0_w=fp0 R1_w=invP0 R10=fp0 fp-8_w=fp
        3: (db) r0 = atomic64_cmpxchg((u64 *)(r10 -8), r0, r1)
        4: R0_w=fp0 R1_w=invP0 R10=fp0 fp-8_w=mmmmmmmm
        4: (79) r1 = *(u64 *)(r0 -8)
        5: R0_w=fp0 R1_w=invP(id=0) R10=fp0 fp-8_w=mmmmmmmm
        5: (b7) r0 = 0
        6: R0_w=invP0 R1_w=invP(id=0) R10=fp0 fp-8_w=mmmmmmmm
        6: (95) exit
      
      However, allowing this case for unprivileged is a bit useless given an
      update with a new pointer will fail anyway:
      
        0: R1=ctx(id=0,off=0,imm=0) R10=fp0
        0: (bf) r0 = r10
        1: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0
        1: (7b) *(u64 *)(r10 -8) = r0
        2: R0_w=fp0 R1=ctx(id=0,off=0,imm=0) R10=fp0 fp-8_w=fp
        2: (db) r0 = atomic64_cmpxchg((u64 *)(r10 -8), r0, r10)
        R10 leaks addr into mem
      Acked-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      180486b4
    • Daniel Borkmann's avatar
      bpf: Fix kernel address leakage in atomic fetch · 7d3baf0a
      Daniel Borkmann authored
      The change in commit 37086bfd ("bpf: Propagate stack bounds to registers
      in atomics w/ BPF_FETCH") around check_mem_access() handling is buggy since
      this would allow for unprivileged users to leak kernel pointers. For example,
      an atomic fetch/and with -1 on a stack destination which holds a spilled
      pointer will migrate the spilled register type into a scalar, which can then
      be exported out of the program (since scalar != pointer) by dumping it into
      a map value.
      
      The original implementation of XADD was preventing this situation by using
      a double call to check_mem_access() one with BPF_READ and a subsequent one
      with BPF_WRITE, in both cases passing -1 as a placeholder value instead of
      register as per XADD semantics since it didn't contain a value fetch. The
      BPF_READ also included a check in check_stack_read_fixed_off() which rejects
      the program if the stack slot is of __is_pointer_value() if dst_regno < 0.
      The latter is to distinguish whether we're dealing with a regular stack spill/
      fill or some arithmetical operation which is disallowed on non-scalars, see
      also 6e7e63cb ("bpf: Forbid XADD on spilled pointers for unprivileged
      users") for more context on check_mem_access() and its handling of placeholder
      value -1.
      
      One minimally intrusive option to fix the leak is for the BPF_FETCH case to
      initially check the BPF_READ case via check_mem_access() with -1 as register,
      followed by the actual load case with non-negative load_reg to propagate
      stack bounds to registers.
      
      Fixes: 37086bfd ("bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH")
      Reported-by: <n4ke4mry@gmail.com>
      Acked-by: default avatarBrendan Jackman <jackmanb@google.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7d3baf0a
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-fixes-for-ulp-a-deadlock-and-netlink-docs' · 500f3720
      Jakub Kicinski authored
      Mat Martineau says:
      
      ====================
      mptcp: Fixes for ULP, a deadlock, and netlink docs
      
      Two of the MPTCP fixes in this set are related to the TCP_ULP socket
      option with MPTCP sockets operating in "fallback" mode (the connection
      has reverted to regular TCP). The other issues are an observed deadlock
      and missing parameter documentation in the MPTCP netlink API.
      
      Patch 1 marks TCP_ULP as unsupported earlier in MPTCP setsockopt code,
      so the fallback code path in the MPTCP layer does not pass the TCP_ULP
      option down to the subflow TCP socket.
      
      Patch 2 makes sure a TCP fallback socket returned to userspace by
      accept()ing on a MPTCP listening socket does not allow use of the
      "mptcp" TCP_ULP type. That ULP is intended only for use by in-kernel
      MPTCP subflows.
      
      Patch 3 fixes the possible deadlock when sending data and there are
      socket option changes to sync to the subflows.
      
      Patch 4 makes sure all MPTCP netlink event parameters are documented
      in the MPTCP uapi header.
      ====================
      
      Link: https://lore.kernel.org/r/20211214231604.211016-1-mathew.j.martineau@linux.intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      500f3720
    • Matthieu Baerts's avatar
      mptcp: add missing documented NL params · 6813b192
      Matthieu Baerts authored
      'loc_id' and 'rem_id' are set in all events linked to subflows but those
      were missing in the events description in the comments.
      
      Fixes: b911c97c ("mptcp: add netlink event support")
      Signed-off-by: default avatarMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6813b192
    • Maxim Galaganov's avatar
      mptcp: fix deadlock in __mptcp_push_pending() · 3d79e375
      Maxim Galaganov authored
      __mptcp_push_pending() may call mptcp_flush_join_list() with subflow
      socket lock held. If such call hits mptcp_sockopt_sync_all() then
      subsequently __mptcp_sockopt_sync() could try to lock the subflow
      socket for itself, causing a deadlock.
      
      sysrq: Show Blocked State
      task:ss-server       state:D stack:    0 pid:  938 ppid:     1 flags:0x00000000
      Call Trace:
       <TASK>
       __schedule+0x2d6/0x10c0
       ? __mod_memcg_state+0x4d/0x70
       ? csum_partial+0xd/0x20
       ? _raw_spin_lock_irqsave+0x26/0x50
       schedule+0x4e/0xc0
       __lock_sock+0x69/0x90
       ? do_wait_intr_irq+0xa0/0xa0
       __lock_sock_fast+0x35/0x50
       mptcp_sockopt_sync_all+0x38/0xc0
       __mptcp_push_pending+0x105/0x200
       mptcp_sendmsg+0x466/0x490
       sock_sendmsg+0x57/0x60
       __sys_sendto+0xf0/0x160
       ? do_wait_intr_irq+0xa0/0xa0
       ? fpregs_restore_userregs+0x12/0xd0
       __x64_sys_sendto+0x20/0x30
       do_syscall_64+0x38/0x90
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f9ba546c2d0
      RSP: 002b:00007ffdc3b762d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007f9ba56c8060 RCX: 00007f9ba546c2d0
      RDX: 000000000000077a RSI: 0000000000e5e180 RDI: 0000000000000234
      RBP: 0000000000cc57f0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ba56c8060
      R13: 0000000000b6ba60 R14: 0000000000cc7840 R15: 41d8685b1d7901b8
       </TASK>
      
      Fix the issue by using __mptcp_flush_join_list() instead of plain
      mptcp_flush_join_list() inside __mptcp_push_pending(), as suggested by
      Florian. The sockopt sync will be deferred to the workqueue.
      
      Fixes: 1b3e7ede ("mptcp: setsockopt: handle SO_KEEPALIVE and SO_PRIORITY")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/244Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMaxim Galaganov <max@internet.ru>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3d79e375
    • Florian Westphal's avatar
      mptcp: clear 'kern' flag from fallback sockets · d6692b3b
      Florian Westphal authored
      The mptcp ULP extension relies on sk->sk_sock_kern being set correctly:
      It prevents setsockopt(fd, IPPROTO_TCP, TCP_ULP, "mptcp", 6); from
      working for plain tcp sockets (any userspace-exposed socket).
      
      But in case of fallback, accept() can return a plain tcp sk.
      In such case, sk is still tagged as 'kernel' and setsockopt will work.
      
      This will crash the kernel, The subflow extension has a NULL ctx->conn
      mptcp socket:
      
      BUG: KASAN: null-ptr-deref in subflow_data_ready+0x181/0x2b0
      Call Trace:
       tcp_data_ready+0xf8/0x370
       [..]
      
      Fixes: cf7da0d6 ("mptcp: Create SUBFLOW socket for incoming connections")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d6692b3b
    • Florian Westphal's avatar
      mptcp: remove tcp ulp setsockopt support · 404cd9a2
      Florian Westphal authored
      TCP_ULP setsockopt cannot be used for mptcp because its already
      used internally to plumb subflow (tcp) sockets to the mptcp layer.
      
      syzbot managed to trigger a crash for mptcp connections that are
      in fallback mode:
      
      KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
      CPU: 1 PID: 1083 Comm: syz-executor.3 Not tainted 5.16.0-rc2-syzkaller #0
      RIP: 0010:tls_build_proto net/tls/tls_main.c:776 [inline]
      [..]
       __tcp_set_ulp net/ipv4/tcp_ulp.c:139 [inline]
       tcp_set_ulp+0x428/0x4c0 net/ipv4/tcp_ulp.c:160
       do_tcp_setsockopt+0x455/0x37c0 net/ipv4/tcp.c:3391
       mptcp_setsockopt+0x1b47/0x2400 net/mptcp/sockopt.c:638
      
      Remove support for TCP_ULP setsockopt.
      
      Fixes: d9e4c129 ("mptcp: only admit explicitly supported sockopt")
      Reported-by: syzbot+1fd9b69cde42967d1add@syzkaller.appspotmail.com
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      404cd9a2
  3. 14 Dec, 2021 3 commits