1. 01 Jul, 2022 4 commits
    • Daniel Borkmann's avatar
      bpf, selftests: Add verifier test case for jmp32's jeq/jne · a49b8ce7
      Daniel Borkmann authored
      Add a test case to trigger the verifier's incorrect conclusion in the
      case of jmp32's jeq/jne. Also here, make use of dead code elimination,
      so that we can see the verifier bailing out on unfixed kernels.
      
      Before:
      
        # ./test_verifier 724
        #724/p jeq32/jne32: bounds checking FAIL
        Failed to load prog 'Permission denied'!
        R4 !read_ok
        verification time 8 usec
        stack depth 0
        processed 8 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0
        Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
      
      After:
      
        # ./test_verifier 724
        #724/p jeq32/jne32: bounds checking OK
        Summary: 1 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220701124727.11153-4-daniel@iogearbox.net
      a49b8ce7
    • Daniel Borkmann's avatar
      bpf, selftests: Add verifier test case for imm=0,umin=0,umax=1 scalar · 73c4936f
      Daniel Borkmann authored
      Add a test case to trigger the constant scalar issue which leaves the
      register in scalar(imm=0,umin=0,umax=1,var_off=(0x0; 0x0)) state. Make
      use of dead code elimination, so that we can see the verifier bailing
      out on unfixed kernels. For the condition, we use jle given it checks
      on umax bound.
      
      Before:
      
        # ./test_verifier 743
        #743/p jump & dead code elimination FAIL
        Failed to load prog 'Permission denied'!
        R4 !read_ok
        verification time 11 usec
        stack depth 0
        processed 13 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
        Summary: 0 PASSED, 0 SKIPPED, 1 FAILED
      
      After:
      
        # ./test_verifier 743
        #743/p jump & dead code elimination OK
        Summary: 1 PASSED, 0 SKIPPED, 0 FAILED
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220701124727.11153-3-daniel@iogearbox.net
      73c4936f
    • Daniel Borkmann's avatar
      bpf: Fix insufficient bounds propagation from adjust_scalar_min_max_vals · 3844d153
      Daniel Borkmann authored
      Kuee reported a corner case where the tnum becomes constant after the call
      to __reg_bound_offset(), but the register's bounds are not, that is, its
      min bounds are still not equal to the register's max bounds.
      
      This in turn allows to leak pointers through turning a pointer register as
      is into an unknown scalar via adjust_ptr_min_max_vals().
      
      Before:
      
        func#0 @0
        0: R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
        0: (b7) r0 = 1                        ; R0_w=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0))
        1: (b7) r3 = 0                        ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0))
        2: (87) r3 = -r3                      ; R3_w=scalar()
        3: (87) r3 = -r3                      ; R3_w=scalar()
        4: (47) r3 |= 32767                   ; R3_w=scalar(smin=-9223372036854743041,umin=32767,var_off=(0x7fff; 0xffffffffffff8000),s32_min=-2147450881)
        5: (75) if r3 s>= 0x0 goto pc+1       ; R3_w=scalar(umin=9223372036854808575,var_off=(0x8000000000007fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767)
        6: (95) exit
      
        from 5 to 7: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
        7: (d5) if r3 s<= 0x8000 goto pc+1    ; R3=scalar(umin=32769,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767)
        8: (95) exit
      
        from 7 to 9: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=32768,var_off=(0x7fff; 0x8000)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
        9: (07) r3 += -32767                  ; R3_w=scalar(imm=0,umax=1,var_off=(0x0; 0x0))  <--- [*]
        10: (95) exit
      
      What can be seen here is that R3=scalar(umin=32767,umax=32768,var_off=(0x7fff;
      0x8000)) after the operation R3 += -32767 results in a 'malformed' constant, that
      is, R3_w=scalar(imm=0,umax=1,var_off=(0x0; 0x0)). Intersecting with var_off has
      not been done at that point via __update_reg_bounds(), which would have improved
      the umax to be equal to umin.
      
      Refactor the tnum <> min/max bounds information flow into a reg_bounds_sync()
      helper and use it consistently everywhere. After the fix, bounds have been
      corrected to R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0)) and thus the register
      is regarded as a 'proper' constant scalar of 0.
      
      After:
      
        func#0 @0
        0: R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
        0: (b7) r0 = 1                        ; R0_w=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0))
        1: (b7) r3 = 0                        ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0))
        2: (87) r3 = -r3                      ; R3_w=scalar()
        3: (87) r3 = -r3                      ; R3_w=scalar()
        4: (47) r3 |= 32767                   ; R3_w=scalar(smin=-9223372036854743041,umin=32767,var_off=(0x7fff; 0xffffffffffff8000),s32_min=-2147450881)
        5: (75) if r3 s>= 0x0 goto pc+1       ; R3_w=scalar(umin=9223372036854808575,var_off=(0x8000000000007fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767)
        6: (95) exit
      
        from 5 to 7: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
        7: (d5) if r3 s<= 0x8000 goto pc+1    ; R3=scalar(umin=32769,umax=9223372036854775807,var_off=(0x7fff; 0x7fffffffffff8000),s32_min=-2147450881,u32_min=32767)
        8: (95) exit
      
        from 7 to 9: R0=scalar(imm=1,umin=1,umax=1,var_off=(0x1; 0x0)) R1=ctx(off=0,imm=0,umax=0,var_off=(0x0; 0x0)) R3=scalar(umin=32767,umax=32768,var_off=(0x7fff; 0x8000)) R10=fp(off=0,imm=0,umax=0,var_off=(0x0; 0x0))
        9: (07) r3 += -32767                  ; R3_w=scalar(imm=0,umax=0,var_off=(0x0; 0x0))  <--- [*]
        10: (95) exit
      
      Fixes: b03c9f9f ("bpf/verifier: track signed and unsigned min/max values")
      Reported-by: default avatarKuee K1r0a <liulin063@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220701124727.11153-2-daniel@iogearbox.net
      3844d153
    • Daniel Borkmann's avatar
      bpf: Fix incorrect verifier simulation around jmp32's jeq/jne · a12ca627
      Daniel Borkmann authored
      Kuee reported a quirk in the jmp32's jeq/jne simulation, namely that the
      register value does not match expectations for the fall-through path. For
      example:
      
      Before fix:
      
        0: R1=ctx(off=0,imm=0) R10=fp0
        0: (b7) r2 = 0                        ; R2_w=P0
        1: (b7) r6 = 563                      ; R6_w=P563
        2: (87) r2 = -r2                      ; R2_w=Pscalar()
        3: (87) r2 = -r2                      ; R2_w=Pscalar()
        4: (4c) w2 |= w6                      ; R2_w=Pscalar(umin=563,umax=4294967295,var_off=(0x233; 0xfffffdcc),s32_min=-2147483085) R6_w=P563
        5: (56) if w2 != 0x8 goto pc+1        ; R2_w=P571  <--- [*]
        6: (95) exit
        R0 !read_ok
      
      After fix:
      
        0: R1=ctx(off=0,imm=0) R10=fp0
        0: (b7) r2 = 0                        ; R2_w=P0
        1: (b7) r6 = 563                      ; R6_w=P563
        2: (87) r2 = -r2                      ; R2_w=Pscalar()
        3: (87) r2 = -r2                      ; R2_w=Pscalar()
        4: (4c) w2 |= w6                      ; R2_w=Pscalar(umin=563,umax=4294967295,var_off=(0x233; 0xfffffdcc),s32_min=-2147483085) R6_w=P563
        5: (56) if w2 != 0x8 goto pc+1        ; R2_w=P8  <--- [*]
        6: (95) exit
        R0 !read_ok
      
      As can be seen on line 5 for the branch fall-through path in R2 [*] is that
      given condition w2 != 0x8 is false, verifier should conclude that r2 = 8 as
      upper 32 bit are known to be zero. However, verifier incorrectly concludes
      that r2 = 571 which is far off.
      
      The problem is it only marks false{true}_reg as known in the switch for JE/NE
      case, but at the end of the function, it uses {false,true}_{64,32}off to
      update {false,true}_reg->var_off and they still hold the prior value of
      {false,true}_reg->var_off before it got marked as known. The subsequent
      __reg_combine_32_into_64() then propagates this old var_off and derives new
      bounds. The information between min/max bounds on {false,true}_reg from
      setting the register to known const combined with the {false,true}_reg->var_off
      based on the old information then derives wrong register data.
      
      Fix it by detangling the BPF_JEQ/BPF_JNE cases and updating relevant
      {false,true}_{64,32}off tnums along with the register marking to known
      constant.
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Reported-by: default avatarKuee K1r0a <liulin063@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220701124727.11153-1-daniel@iogearbox.net
      a12ca627
  2. 28 Jun, 2022 2 commits
  3. 24 Jun, 2022 1 commit
  4. 18 Jun, 2022 5 commits
    • Peilin Ye's avatar
      net/sched: sch_netem: Fix arithmetic in netem_dump() for 32-bit platforms · a2b1a5d4
      Peilin Ye authored
      As reported by Yuming, currently tc always show a latency of UINT_MAX
      for netem Qdisc's on 32-bit platforms:
      
          $ tc qdisc add dev dummy0 root netem latency 100ms
          $ tc qdisc show dev dummy0
          qdisc netem 8001: root refcnt 2 limit 1000 delay 275s  275s
                                                     ^^^^^^^^^^^^^^^^
      
      Let us take a closer look at netem_dump():
      
              qopt.latency = min_t(psched_tdiff_t, PSCHED_NS2TICKS(q->latency,
                                   UINT_MAX);
      
      qopt.latency is __u32, psched_tdiff_t is signed long,
      (psched_tdiff_t)(UINT_MAX) is negative for 32-bit platforms, so
      qopt.latency is always UINT_MAX.
      
      Fix it by using psched_time_t (u64) instead.
      
      Note: confusingly, users have two ways to specify 'latency':
      
        1. normally, via '__u32 latency' in struct tc_netem_qopt;
        2. via the TCA_NETEM_LATENCY64 attribute, which is s64.
      
      For the second case, theoretically 'latency' could be negative.  This
      patch ignores that corner case, since it is broken (i.e. assigning a
      negative s64 to __u32) anyways, and should be handled separately.
      
      Thanks Ted Lin for the analysis [1] .
      
      [1] https://github.com/raspberrypi/linux/issues/3512Reported-by: default avatarYuming Chen <chenyuming.junnan@bytedance.com>
      Fixes: 112f9cb6 ("netem: convert to qdisc_watchdog_schedule_ns")
      Reviewed-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarPeilin Ye <peilin.ye@bytedance.com>
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Link: https://lore.kernel.org/r/20220616234336.2443-1-yepeilin.cs@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a2b1a5d4
    • Ivan Vecera's avatar
      ethtool: Fix get module eeprom fallback · a3bb7b63
      Ivan Vecera authored
      Function fallback_set_params() checks if the module type returned
      by a driver is ETH_MODULE_SFF_8079 and in this case it assumes
      that buffer returns a concatenated content of page  A0h and A2h.
      The check is wrong because the correct type is ETH_MODULE_SFF_8472.
      
      Fixes: 96d971e3 ("ethtool: Add fallback to get_module_eeprom from netlink command")
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Link: https://lore.kernel.org/r/20220616160856.3623273-1-ivecera@redhat.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a3bb7b63
    • Jay Vosburgh's avatar
      bonding: ARP monitor spams NETDEV_NOTIFY_PEERS notifiers · 7a9214f3
      Jay Vosburgh authored
      The bonding ARP monitor fails to decrement send_peer_notif, the
      number of peer notifications (gratuitous ARP or ND) to be sent. This
      results in a continuous series of notifications.
      
      Correct this by decrementing the counter for each notification.
      Reported-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Fixes: b0929915 ("bonding: Fix RTNL: assertion failed at net/core/rtnetlink.c for ab arp monitor")
      Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/Tested-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Reviewed-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Link: https://lore.kernel.org/r/9400.1655407960@famineSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a9214f3
    • Lorenzo Bianconi's avatar
      igb: fix a use-after-free issue in igb_clean_tx_ring · 3f6a57ee
      Lorenzo Bianconi authored
      Fix the following use-after-free bug in igb_clean_tx_ring routine when
      the NIC is running in XDP mode. The issue can be triggered redirecting
      traffic into the igb NIC and then closing the device while the traffic
      is flowing.
      
      [   73.322719] CPU: 1 PID: 487 Comm: xdp_redirect Not tainted 5.18.3-apu2 #9
      [   73.330639] Hardware name: PC Engines APU2/APU2, BIOS 4.0.7 02/28/2017
      [   73.337434] RIP: 0010:refcount_warn_saturate+0xa7/0xf0
      [   73.362283] RSP: 0018:ffffc9000081f798 EFLAGS: 00010282
      [   73.367761] RAX: 0000000000000000 RBX: ffffc90000420f80 RCX: 0000000000000000
      [   73.375200] RDX: ffff88811ad22d00 RSI: ffff88811ad171e0 RDI: ffff88811ad171e0
      [   73.382590] RBP: 0000000000000900 R08: ffffffff82298f28 R09: 0000000000000058
      [   73.390008] R10: 0000000000000219 R11: ffffffff82280f40 R12: 0000000000000090
      [   73.397356] R13: ffff888102343a40 R14: ffff88810359e0e4 R15: 0000000000000000
      [   73.404806] FS:  00007ff38d31d740(0000) GS:ffff88811ad00000(0000) knlGS:0000000000000000
      [   73.413129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   73.419096] CR2: 000055cff35f13f8 CR3: 0000000106391000 CR4: 00000000000406e0
      [   73.426565] Call Trace:
      [   73.429087]  <TASK>
      [   73.431314]  igb_clean_tx_ring+0x43/0x140 [igb]
      [   73.436002]  igb_down+0x1d7/0x220 [igb]
      [   73.439974]  __igb_close+0x3c/0x120 [igb]
      [   73.444118]  igb_xdp+0x10c/0x150 [igb]
      [   73.447983]  ? igb_pci_sriov_configure+0x70/0x70 [igb]
      [   73.453362]  dev_xdp_install+0xda/0x110
      [   73.457371]  dev_xdp_attach+0x1da/0x550
      [   73.461369]  do_setlink+0xfd0/0x10f0
      [   73.465166]  ? __nla_validate_parse+0x89/0xc70
      [   73.469714]  rtnl_setlink+0x11a/0x1e0
      [   73.473547]  rtnetlink_rcv_msg+0x145/0x3d0
      [   73.477709]  ? rtnl_calcit.isra.0+0x130/0x130
      [   73.482258]  netlink_rcv_skb+0x8d/0x110
      [   73.486229]  netlink_unicast+0x230/0x340
      [   73.490317]  netlink_sendmsg+0x215/0x470
      [   73.494395]  __sys_sendto+0x179/0x190
      [   73.498268]  ? move_addr_to_user+0x37/0x70
      [   73.502547]  ? __sys_getsockname+0x84/0xe0
      [   73.506853]  ? netlink_setsockopt+0x1c1/0x4a0
      [   73.511349]  ? __sys_setsockopt+0xc8/0x1d0
      [   73.515636]  __x64_sys_sendto+0x20/0x30
      [   73.519603]  do_syscall_64+0x3b/0x80
      [   73.523399]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   73.528712] RIP: 0033:0x7ff38d41f20c
      [   73.551866] RSP: 002b:00007fff3b945a68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   73.559640] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff38d41f20c
      [   73.567066] RDX: 0000000000000034 RSI: 00007fff3b945b30 RDI: 0000000000000003
      [   73.574457] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
      [   73.581852] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff3b945ab0
      [   73.589179] R13: 0000000000000000 R14: 0000000000000003 R15: 00007fff3b945b30
      [   73.596545]  </TASK>
      [   73.598842] ---[ end trace 0000000000000000 ]---
      
      Fixes: 9cbc948b ("igb: add XDP support")
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Reviewed-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Link: https://lore.kernel.org/r/e5c01d549dc37bff18e46aeabd6fb28a7bcf84be.1655388571.git.lorenzo@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3f6a57ee
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 582573f1
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2022-06-17
      
      We've added 12 non-merge commits during the last 4 day(s) which contain
      a total of 14 files changed, 305 insertions(+), 107 deletions(-).
      
      The main changes are:
      
      1) Fix x86 JIT tailcall count offset on BPF-2-BPF call, from Jakub Sitnicki.
      
      2) Fix a kprobe_multi link bug which misplaces BPF cookies, from Jiri Olsa.
      
      3) Fix an infinite loop when processing a module's BTF, from Kumar Kartikeya Dwivedi.
      
      4) Fix getting a rethook only in RCU available context, from Masami Hiramatsu.
      
      5) Fix request socket refcount leak in sk lookup helpers, from Jon Maxwell.
      
      6) Fix xsk xmit behavior which wrongly adds skb to already full cq, from Ciara Loftus.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        rethook: Reject getting a rethook if RCU is not watching
        fprobe, samples: Add use_trace option and show hit/missed counter
        bpf, docs: Update some of the JIT/maintenance entries
        selftest/bpf: Fix kprobe_multi bench test
        bpf: Force cookies array to follow symbols sorting
        ftrace: Keep address offset in ftrace_lookup_symbols
        selftests/bpf: Shuffle cookies symbols in kprobe multi test
        selftests/bpf: Test tail call counting with bpf2bpf and data on stack
        bpf, x86: Fix tail call count offset calculation on bpf2bpf call
        bpf: Limit maximum modifier chain length in btf_check_type_tags
        bpf: Fix request_sock leak in sk lookup helpers
        xsk: Fix generic transmit when completion queue reservation fails
      ====================
      
      Link: https://lore.kernel.org/r/20220617202119.2421-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      582573f1
  5. 17 Jun, 2022 14 commits
    • Masami Hiramatsu (Google)'s avatar
      rethook: Reject getting a rethook if RCU is not watching · c0f3bb40
      Masami Hiramatsu (Google) authored
      Since the rethook_recycle() will involve the call_rcu() for reclaiming
      the rethook_instance, the rethook must be set up at the RCU available
      context (non idle). This rethook_recycle() in the rethook trampoline
      handler is inevitable, thus the RCU available check must be done before
      setting the rethook trampoline.
      
      This adds a rcu_is_watching() check in the rethook_try_get() so that
      it will return NULL if it is called when !rcu_is_watching().
      
      Fixes: 54ecbe6f ("rethook: Add a generic return hook")
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/165461827269.280167.7379263615545598958.stgit@devnote2
      c0f3bb40
    • Masami Hiramatsu (Google)'s avatar
      fprobe, samples: Add use_trace option and show hit/missed counter · c88dbbcd
      Masami Hiramatsu (Google) authored
      Add use_trace option to use trace_printk() instead of pr_info()
      so that the handler doesn't involve the RCU operations.
      And show the hit and missed counter so that the user can check
      how many times the probe handler hit and missed.
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/bpf/165461826247.280167.11939123218334322352.stgit@devnote2
      c88dbbcd
    • Daniel Borkmann's avatar
      bpf, docs: Update some of the JIT/maintenance entries · 63ce81d1
      Daniel Borkmann authored
      Various minor updates around some of the BPF-related entries:
      
      JITs for ARM32/NFP/SPARC/X86-32 haven't seen updates in quite a while, thus
      for now, mark them as 'Odd Fixes' until they become more actively developed.
      
      JITs for POWERPC/S390 are in good shape and receive active development and
      review, thus bump to 'Supported' similar as we have with X86-64/ARM64.
      
      JITs for MIPS/RISC-V are in similar good shape as the ones mentioned above,
      but looked after mostly in spare time, thus leave for now in 'Maintained' state.
      
      Add Michael to PPC JIT given he's picking up the patches there, so it better
      reflects today's state.
      
      Also, I haven't done much reviewing around BPF sockmap/kTLS after John and I
      did the big rework back in the days to integrate sockmap with kTLS.
      
      These days, most of this is taken care by John, Jakub {Sitnicki,Kicinski} and
      others in the community, so remove myself from these two.
      
      Lastly, move all BPF-related entries into one place, that is, move the sockmap
      one over near rest of BPF.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/f9b8a63a0b48dc764bd4c50f87632889f5813f69.1655494758.git.daniel@iogearbox.netSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      63ce81d1
    • Riccardo Paolo Bestetti's avatar
      ipv4: ping: fix bind address validity check · b4a028c4
      Riccardo Paolo Bestetti authored
      Commit 8ff978b8 ("ipv4/raw: support binding to nonlocal addresses")
      introduced a helper function to fold duplicated validity checks of bind
      addresses into inet_addr_valid_or_nonlocal(). However, this caused an
      unintended regression in ping_check_bind_addr(), which previously would
      reject binding to multicast and broadcast addresses, but now these are
      both incorrectly allowed as reported in [1].
      
      This patch restores the original check. A simple reordering is done to
      improve readability and make it evident that multicast and broadcast
      addresses should not be allowed. Also, add an early exit for INADDR_ANY
      which replaces lost behavior added by commit 0ce779a9 ("net: Avoid
      unnecessary inet_addr_type() call when addr is INADDR_ANY").
      
      Furthermore, this patch introduces regression selftests to catch these
      specific cases.
      
      [1] https://lore.kernel.org/netdev/CANP3RGdkAcDyAZoT1h8Gtuu0saq+eOrrTiWbxnOs+5zn+cpyKg@mail.gmail.com/
      
      Fixes: 8ff978b8 ("ipv4/raw: support binding to nonlocal addresses")
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarCarlos Llamas <cmllamas@google.com>
      Signed-off-by: default avatarRiccardo Paolo Bestetti <pbl@bestov.io>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4a028c4
    • Xu Jia's avatar
      hamradio: 6pack: fix array-index-out-of-bounds in decode_std_command() · 2b04495e
      Xu Jia authored
      Hulk Robot reports incorrect sp->rx_count_cooked value in decode_std_command().
      This should be caused by the subtracting from sp->rx_count_cooked before.
      It seems that sp->rx_count_cooked value is changed to 0, which bypassed the
      previous judgment.
      
      The situation is shown below:
      
               (Thread 1)			|  (Thread 2)
      decode_std_command()		| resync_tnc()
      ...					|
      if (rest == 2)			|
      	sp->rx_count_cooked -= 2;	|
      else if (rest == 3)			| ...
      					| sp->rx_count_cooked = 0;
      	sp->rx_count_cooked -= 1;	|
      for (i = 0; i < sp->rx_count_cooked; i++) // report error
      	checksum += sp->cooked_buf[i];
      
      sp->rx_count_cooked is a shared variable but is not protected by a lock.
      The same applies to sp->rx_count. This patch adds a lock to fix the bug.
      
      The fail log is shown below:
      =======================================================================
      UBSAN: array-index-out-of-bounds in drivers/net/hamradio/6pack.c:925:31
      index 400 is out of range for type 'unsigned char [400]'
      CPU: 3 PID: 7433 Comm: kworker/u10:1 Not tainted 5.18.0-rc5-00163-g4b97bac0 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Workqueue: events_unbound flush_to_ldisc
      Call Trace:
       <TASK>
       dump_stack_lvl+0xcd/0x134
       ubsan_epilogue+0xb/0x50
       __ubsan_handle_out_of_bounds.cold+0x62/0x6c
       sixpack_receive_buf+0xfda/0x1330
       tty_ldisc_receive_buf+0x13e/0x180
       tty_port_default_receive_buf+0x6d/0xa0
       flush_to_ldisc+0x213/0x3f0
       process_one_work+0x98f/0x1620
       worker_thread+0x665/0x1080
       kthread+0x2e9/0x3a0
       ret_from_fork+0x1f/0x30
       ...
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Signed-off-by: default avatarXu Jia <xujia39@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2b04495e
    • Hoang Le's avatar
      tipc: fix use-after-free Read in tipc_named_reinit · 911600bf
      Hoang Le authored
      syzbot found the following issue on:
      ==================================================================
      BUG: KASAN: use-after-free in tipc_named_reinit+0x94f/0x9b0
      net/tipc/name_distr.c:413
      Read of size 8 at addr ffff88805299a000 by task kworker/1:9/23764
      
      CPU: 1 PID: 23764 Comm: kworker/1:9 Not tainted
      5.18.0-rc4-syzkaller-00878-g17d49e6e #0
      Hardware name: Google Compute Engine/Google Compute Engine,
      BIOS Google 01/01/2011
      Workqueue: events tipc_net_finalize_work
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x495
      mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       tipc_named_reinit+0x94f/0x9b0 net/tipc/name_distr.c:413
       tipc_net_finalize+0x234/0x3d0 net/tipc/net.c:138
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
       </TASK>
      [...]
      ==================================================================
      
      In the commit
      d966ddcc ("tipc: fix a deadlock when flushing scheduled work"),
      the cancel_work_sync() function just to make sure ONLY the work
      tipc_net_finalize_work() is executing/pending on any CPU completed before
      tipc namespace is destroyed through tipc_exit_net(). But this function
      is not guaranteed the work is the last queued. So, the destroyed instance
      may be accessed in the work which will try to enqueue later.
      
      In order to completely fix, we re-order the calling of cancel_work_sync()
      to make sure the work tipc_net_finalize_work() was last queued and it
      must be completed by calling cancel_work_sync().
      
      Reported-by: syzbot+47af19f3307fc9c5c82e@syzkaller.appspotmail.com
      Fixes: d966ddcc ("tipc: fix a deadlock when flushing scheduled work")
      Acked-by: default avatarJon Maloy <jmaloy@redhat.com>
      Signed-off-by: default avatarYing Xue <ying.xue@windriver.com>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      911600bf
    • Jay Vosburgh's avatar
      veth: Add updating of trans_start · e66e257a
      Jay Vosburgh authored
      Since commit 21a75f09 ("bonding: Fix ARP monitor validation"),
      the bonding ARP / ND link monitors depend on the trans_start time to
      determine link availability.  NETIF_F_LLTX drivers must update trans_start
      directly, which veth does not do.  This prevents use of the ARP or ND link
      monitors with veth interfaces in a bond.
      
      	Resolve this by having veth_xmit update the trans_start time.
      Reported-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Tested-by: default avatarJonathan Toppins <jtoppins@redhat.com>
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Fixes: 21a75f09 ("bonding: Fix ARP monitor validation")
      Link: https://lore.kernel.org/netdev/b2fd4147-8f50-bebd-963a-1a3e8d1d9715@redhat.com/Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e66e257a
    • Eric Dumazet's avatar
      net: fix data-race in dev_isalive() · cc26c266
      Eric Dumazet authored
      dev_isalive() is called under RTNL or dev_base_lock protection.
      
      This means that changes to dev->reg_state should be done with both locks held.
      
      syzbot reported:
      
      BUG: KCSAN: data-race in register_netdevice / type_show
      
      write to 0xffff888144ecf518 of 1 bytes by task 20886 on cpu 0:
      register_netdevice+0xb9f/0xdf0 net/core/dev.c:10050
      lapbeth_new_device drivers/net/wan/lapbether.c:414 [inline]
      lapbeth_device_event+0x4a0/0x6c0 drivers/net/wan/lapbether.c:456
      notifier_call_chain kernel/notifier.c:87 [inline]
      raw_notifier_call_chain+0x53/0xb0 kernel/notifier.c:455
      __dev_notify_flags+0x1d6/0x3a0
      dev_change_flags+0xa2/0xc0 net/core/dev.c:8607
      do_setlink+0x778/0x2230 net/core/rtnetlink.c:2780
      __rtnl_newlink net/core/rtnetlink.c:3546 [inline]
      rtnl_newlink+0x114c/0x16a0 net/core/rtnetlink.c:3593
      rtnetlink_rcv_msg+0x811/0x8c0 net/core/rtnetlink.c:6089
      netlink_rcv_skb+0x13e/0x240 net/netlink/af_netlink.c:2501
      rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:6107
      netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
      netlink_unicast+0x58a/0x660 net/netlink/af_netlink.c:1345
      netlink_sendmsg+0x661/0x750 net/netlink/af_netlink.c:1921
      sock_sendmsg_nosec net/socket.c:714 [inline]
      sock_sendmsg net/socket.c:734 [inline]
      __sys_sendto+0x21e/0x2c0 net/socket.c:2119
      __do_sys_sendto net/socket.c:2131 [inline]
      __se_sys_sendto net/socket.c:2127 [inline]
      __x64_sys_sendto+0x74/0x90 net/socket.c:2127
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      read to 0xffff888144ecf518 of 1 bytes by task 20423 on cpu 1:
      dev_isalive net/core/net-sysfs.c:38 [inline]
      netdev_show net/core/net-sysfs.c:50 [inline]
      type_show+0x24/0x90 net/core/net-sysfs.c:112
      dev_attr_show+0x35/0x90 drivers/base/core.c:2095
      sysfs_kf_seq_show+0x175/0x240 fs/sysfs/file.c:59
      kernfs_seq_show+0x75/0x80 fs/kernfs/file.c:162
      seq_read_iter+0x2c3/0x8e0 fs/seq_file.c:230
      kernfs_fop_read_iter+0xd1/0x2f0 fs/kernfs/file.c:235
      call_read_iter include/linux/fs.h:2052 [inline]
      new_sync_read fs/read_write.c:401 [inline]
      vfs_read+0x5a5/0x6a0 fs/read_write.c:482
      ksys_read+0xe8/0x1a0 fs/read_write.c:620
      __do_sys_read fs/read_write.c:630 [inline]
      __se_sys_read fs/read_write.c:628 [inline]
      __x64_sys_read+0x3e/0x50 fs/read_write.c:628
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      value changed: 0x00 -> 0x01
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 20423 Comm: udevd Tainted: G W 5.19.0-rc2-syzkaller-dirty #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc26c266
    • Claudiu Manoil's avatar
      phy: aquantia: Fix AN when higher speeds than 1G are not advertised · 9b7fd167
      Claudiu Manoil authored
      Even when the eth port is resticted to work with speeds not higher than 1G,
      and so the eth driver is requesting the phy (via phylink) to advertise up
      to 1000BASET support, the aquantia phy device is still advertising for 2.5G
      and 5G speeds.
      Clear these advertising defaults when requested.
      
      Cc: Ondrej Spacek <ondrej.spacek@nxp.com>
      Fixes: 09c4c57f ("net: phy: aquantia: add support for auto-negotiation configuration")
      Signed-off-by: default avatarClaudiu Manoil <claudiu.manoil@nxp.com>
      Link: https://lore.kernel.org/r/20220610084037.7625-1-claudiu.manoil@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9b7fd167
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: Fix cookie values for kprobe multi' · a4a8b2ee
      Alexei Starovoitov authored
      Jiri Olsa says:
      
      ====================
      
      hi,
      there's bug in kprobe_multi link that makes cookies misplaced when
      using symbols to attach. The reason is that we sort symbols by name
      but not adjacent cookie values. Current test did not find it because
      bpf_fentry_test* are already sorted by name.
      
      v3 changes:
        - fixed kprobe_multi bench test to filter out invalid entries
          from available_filter_functions
      
      v2 changes:
        - rebased on top of bpf/master
        - checking if cookies are defined later in swap function [Andrii]
        - added acks
      
      thanks,
      jirka
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a4a8b2ee
    • Jiri Olsa's avatar
      selftest/bpf: Fix kprobe_multi bench test · 73006702
      Jiri Olsa authored
      With [1] the available_filter_functions file contains records
      starting with __ftrace_invalid_address___ and marking disabled
      entries.
      
      We need to filter them out for the bench test to pass only
      resolvable symbols to kernel.
      
      [1] commit b39181f7 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
      
      Fixes: b39181f7 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20220615112118.497303-5-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      73006702
    • Jiri Olsa's avatar
      bpf: Force cookies array to follow symbols sorting · eb5fb032
      Jiri Olsa authored
      When user specifies symbols and cookies for kprobe_multi link
      interface it's very likely the cookies will be misplaced and
      returned to wrong functions (via get_attach_cookie helper).
      
      The reason is that to resolve the provided functions we sort
      them before passing them to ftrace_lookup_symbols, but we do
      not do the same sort on the cookie values.
      
      Fixing this by using sort_r function with custom swap callback
      that swaps cookie values as well.
      
      Fixes: 0236fec5 ("bpf: Resolve symbols with ftrace_lookup_symbols for kprobe multi link")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20220615112118.497303-4-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      eb5fb032
    • Jiri Olsa's avatar
      ftrace: Keep address offset in ftrace_lookup_symbols · eb1b2985
      Jiri Olsa authored
      We want to store the resolved address on the same index as
      the symbol string, because that's the user (bpf kprobe link)
      code assumption.
      
      Also making sure we don't store duplicates that might be
      present in kallsyms.
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Acked-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Fixes: bed0d9a5 ("ftrace: Add ftrace_lookup_symbols function")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20220615112118.497303-3-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      eb1b2985
    • Jiri Olsa's avatar
      selftests/bpf: Shuffle cookies symbols in kprobe multi test · ad884853
      Jiri Olsa authored
      There's a kernel bug that causes cookies to be misplaced and
      the reason we did not catch this with this test is that we
      provide bpf_fentry_test* functions already sorted by name.
      
      Shuffling function bpf_fentry_test2 deeper in the list and
      keeping the current cookie values as before will trigger
      the bug.
      
      The kernel fix is coming in following changes.
      Acked-by: default avatarSong Liu <songliubraving@fb.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Link: https://lore.kernel.org/r/20220615112118.497303-2-jolsa@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ad884853
  6. 16 Jun, 2022 5 commits
  7. 15 Jun, 2022 9 commits
    • Linus Torvalds's avatar
      Merge tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 30306f61
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - Correctly handle vm_map areas in hardened usercopy (Matthew Wilcox)
      
       - Adjust CFI RCU usage to avoid boot splats with cpuidle (Sami Tolvanen)
      
      * tag 'hardening-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        usercopy: Make usercopy resilient against ridiculously large copies
        usercopy: Cast pointer to an integer once
        usercopy: Handle vm_map_ram() areas
        cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle
      30306f61
    • Linus Torvalds's avatar
      Merge tag 'tpmdd-next-v5.19-rc3' of... · afe9eb14
      Linus Torvalds authored
      Merge tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
      
      Pull tpm fixes from Jarkko Sakkinen:
       "Two fixes for this merge window"
      
      * tag 'tpmdd-next-v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
        certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build
        certs/blacklist_hashes.c: fix const confusion in certs blacklist
      afe9eb14
    • Masahiro Yamada's avatar
      certs: fix and refactor CONFIG_SYSTEM_BLACKLIST_HASH_LIST build · 27b5b22d
      Masahiro Yamada authored
      Commit addf4663 ("certs: Check that builtin blacklist hashes are
      valid") was applied 8 months after the submission.
      
      In the meantime, the base code had been removed by commit b8c96a6b
      ("certs: simplify $(srctree)/ handling and remove config_filename
      macro").
      
      Fix the Makefile.
      
      Create a local copy of $(CONFIG_SYSTEM_BLACKLIST_HASH_LIST). It is
      included from certs/blacklist_hashes.c and also works as a timestamp.
      
      Send error messages from check-blacklist-hashes.awk to stderr instead
      of stdout.
      
      Fixes: addf4663 ("certs: Check that builtin blacklist hashes are valid")
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Reviewed-by: default avatarMickaël Salaün <mic@linux.microsoft.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      27b5b22d
    • Masahiro Yamada's avatar
      certs/blacklist_hashes.c: fix const confusion in certs blacklist · 6a1c3767
      Masahiro Yamada authored
      This file fails to compile as follows:
      
        CC      certs/blacklist_hashes.o
      certs/blacklist_hashes.c:4:1: error: ignoring attribute ‘section (".init.data")’ because it conflicts with previous ‘section (".init.rodata")’ [-Werror=attributes]
          4 | const char __initdata *const blacklist_hashes[] = {
            | ^~~~~
      In file included from certs/blacklist_hashes.c:2:
      certs/blacklist.h:5:38: note: previous declaration here
          5 | extern const char __initconst *const blacklist_hashes[];
            |                                      ^~~~~~~~~~~~~~~~
      
      Apply the same fix as commit 2be04df5 ("certs/blacklist_nohashes.c:
      fix const confusion in certs blacklist").
      
      Fixes: 734114f8 ("KEYS: Add a system blacklist keyring")
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Reviewed-by: default avatarMickaël Salaün <mic@linux.microsoft.com>
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      6a1c3767
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Limit maximum modifier chain length in btf_check_type_tags · d1a374a1
      Kumar Kartikeya Dwivedi authored
      On processing a module BTF of module built for an older kernel, we might
      sometimes find that some type points to itself forming a loop. If such a
      type is a modifier, btf_check_type_tags's while loop following modifier
      chain will be caught in an infinite loop.
      
      Fix this by defining a maximum chain length and bailing out if we spin
      any longer than that.
      
      Fixes: eb596b09 ("bpf: Ensure type tags precede modifiers in BTF")
      Reported-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20220615042151.2266537-1-memxor@gmail.com
      d1a374a1
    • Linus Torvalds's avatar
      Merge tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux · 979086f5
      Linus Torvalds authored
      Pull vfs idmapping fix from Christian Brauner:
       "This fixes an issue where we fail to change the group of a file when
        the caller owns the file and is a member of the group to change to.
      
        This is only relevant on idmapped mounts.
      
        There's a detailed description in the commit message and regression
        tests have been added to xfstests"
      
      * tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
        fs: account for group membership
      979086f5
    • Jon Maxwell's avatar
      bpf: Fix request_sock leak in sk lookup helpers · 3046a827
      Jon Maxwell authored
      A customer reported a request_socket leak in a Calico cloud environment. We
      found that a BPF program was doing a socket lookup with takes a refcnt on
      the socket and that it was finding the request_socket but returning the parent
      LISTEN socket via sk_to_full_sk() without decrementing the child request socket
      1st, resulting in request_sock slab object leak. This patch retains the
      existing behaviour of returning full socks to the caller but it also decrements
      the child request_socket if one is present before doing so to prevent the leak.
      
      Thanks to Curtis Taylor for all the help in diagnosing and testing this. And
      thanks to Antoine Tenart for the reproducer and patch input.
      
      v2 of this patch contains, refactor as per Daniel Borkmann's suggestions to
      validate RCU flags on the listen socket so that it balances with bpf_sk_release()
      and update comments as per Martin KaFai Lau's suggestion. One small change to
      Daniels suggestion, put "sk = sk2" under "if (sk2 != sk)" to avoid an extra
      instruction.
      
      Fixes: f7355a6c ("bpf: Check sk_fullsock() before returning from bpf_sk_lookup()")
      Fixes: edbf8c01 ("bpf: add skc_lookup_tcp helper")
      Co-developed-by: default avatarAntoine Tenart <atenart@kernel.org>
      Signed-off-by: default avatarAntoine Tenart <atenart@kernel.org>
      Signed-off-by: default avatarJon Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarCurtis Taylor <cutaylor-pub@yahoo.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/56d6f898-bde0-bb25-3427-12a330b29fb8@iogearbox.net
      Link: https://lore.kernel.org/bpf/20220615011540.813025-1-jmaxwell37@gmail.com
      3046a827
    • Duoming Zhou's avatar
      net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg · 219b51a6
      Duoming Zhou authored
      The skb_recv_datagram() in ax25_recvmsg() will hold lock_sock
      and block until it receives a packet from the remote. If the client
      doesn`t connect to server and calls read() directly, it will not
      receive any packets forever. As a result, the deadlock will happen.
      
      The fail log caused by deadlock is shown below:
      
      [  369.606973] INFO: task ax25_deadlock:157 blocked for more than 245 seconds.
      [  369.608919] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  369.613058] Call Trace:
      [  369.613315]  <TASK>
      [  369.614072]  __schedule+0x2f9/0xb20
      [  369.615029]  schedule+0x49/0xb0
      [  369.615734]  __lock_sock+0x92/0x100
      [  369.616763]  ? destroy_sched_domains_rcu+0x20/0x20
      [  369.617941]  lock_sock_nested+0x6e/0x70
      [  369.618809]  ax25_bind+0xaa/0x210
      [  369.619736]  __sys_bind+0xca/0xf0
      [  369.620039]  ? do_futex+0xae/0x1b0
      [  369.620387]  ? __x64_sys_futex+0x7c/0x1c0
      [  369.620601]  ? fpregs_assert_state_consistent+0x19/0x40
      [  369.620613]  __x64_sys_bind+0x11/0x20
      [  369.621791]  do_syscall_64+0x3b/0x90
      [  369.622423]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      [  369.623319] RIP: 0033:0x7f43c8aa8af7
      [  369.624301] RSP: 002b:00007f43c8197ef8 EFLAGS: 00000246 ORIG_RAX: 0000000000000031
      [  369.625756] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f43c8aa8af7
      [  369.626724] RDX: 0000000000000010 RSI: 000055768e2021d0 RDI: 0000000000000005
      [  369.628569] RBP: 00007f43c8197f00 R08: 0000000000000011 R09: 00007f43c8198700
      [  369.630208] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff845e6afe
      [  369.632240] R13: 00007fff845e6aff R14: 00007f43c8197fc0 R15: 00007f43c8198700
      
      This patch replaces skb_recv_datagram() with an open-coded variant of it
      releasing the socket lock before the __skb_wait_for_more_packets() call
      and re-acquiring it after such call in order that other functions that
      need socket lock could be executed.
      
      what's more, the socket lock will be released only when recvmsg() will
      block and that should produce nicer overall behavior.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Suggested-by: default avatarThomas Osterried <thomas@osterried.de>
      Signed-off-by: default avatarDuoming Zhou <duoming@zju.edu.cn>
      Reported-by: Thomas Habets <thomas@@habets.se>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      219b51a6
    • Jose Alonso's avatar
      net: usb: ax88179_178a needs FLAG_SEND_ZLP · 36a15e1c
      Jose Alonso authored
      The extra byte inserted by usbnet.c when
       (length % dev->maxpacket == 0) is causing problems to device.
      
      This patch sets FLAG_SEND_ZLP to avoid this.
      
      Tested with: 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
      
      Problems observed:
      ======================================================================
      1) Using ssh/sshfs. The remote sshd daemon can abort with the message:
         "message authentication code incorrect"
         This happens because the tcp message sent is corrupted during the
         USB "Bulk out". The device calculate the tcp checksum and send a
         valid tcp message to the remote sshd. Then the encryption detects
         the error and aborts.
      2) NETDEV WATCHDOG: ... (ax88179_178a): transmit queue 0 timed out
      3) Stop normal work without any log message.
         The "Bulk in" continue receiving packets normally.
         The host sends "Bulk out" and the device responds with -ECONNRESET.
         (The netusb.c code tx_complete ignore -ECONNRESET)
      Under normal conditions these errors take days to happen and in
      intense usage take hours.
      
      A test with ping gives packet loss, showing that something is wrong:
      ping -4 -s 462 {destination}	# 462 = 512 - 42 - 8
      Not all packets fail.
      My guess is that the device tries to find another packet starting
      at the extra byte and will fail or not depending on the next
      bytes (old buffer content).
      ======================================================================
      Signed-off-by: default avatarJose Alonso <joalonsof@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36a15e1c