1. 07 Dec, 2020 1 commit
    • Eric Dumazet's avatar
      bpf: Avoid overflows involving hash elem_size · e1868b9e
      Eric Dumazet authored
      Use of bpf_map_charge_init() was making sure hash tables would not use more
      than 4GB of memory.
      
      Since the implicit check disappeared, we have to be more careful
      about overflows, to support big hash tables.
      
      syzbot triggers a panic using :
      
      bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_LRU_HASH, key_size=16384, value_size=8,
                           max_entries=262200, map_flags=0, inner_map_fd=-1, map_name="",
                           map_ifindex=0, btf_fd=-1, btf_key_type_id=0, btf_value_type_id=0,
                           btf_vmlinux_value_type_id=0}, 64) = ...
      
      BUG: KASAN: vmalloc-out-of-bounds in bpf_percpu_lru_populate kernel/bpf/bpf_lru_list.c:594 [inline]
      BUG: KASAN: vmalloc-out-of-bounds in bpf_lru_populate+0x4ef/0x5e0 kernel/bpf/bpf_lru_list.c:611
      Write of size 2 at addr ffffc90017e4a020 by task syz-executor.5/19786
      
      CPU: 0 PID: 19786 Comm: syz-executor.5 Not tainted 5.10.0-rc3-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0x5/0x4c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
       bpf_percpu_lru_populate kernel/bpf/bpf_lru_list.c:594 [inline]
       bpf_lru_populate+0x4ef/0x5e0 kernel/bpf/bpf_lru_list.c:611
       prealloc_init kernel/bpf/hashtab.c:319 [inline]
       htab_map_alloc+0xf6e/0x1230 kernel/bpf/hashtab.c:507
       find_and_alloc_map kernel/bpf/syscall.c:123 [inline]
       map_create kernel/bpf/syscall.c:829 [inline]
       __do_sys_bpf+0xa81/0x5170 kernel/bpf/syscall.c:4336
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45deb9
      Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fd93fbc0c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 0000000000001a40 RCX: 000000000045deb9
      RDX: 0000000000000040 RSI: 0000000020000280 RDI: 0000000000000000
      RBP: 000000000119bf60 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000119bf2c
      R13: 00007ffc08a7be8f R14: 00007fd93fbc19c0 R15: 000000000119bf2c
      
      Fixes: 755e5d55 ("bpf: Eliminate rlimit-based memory accounting for hashtab maps")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/bpf/20201207182821.3940306-1-eric.dumazet@gmail.com
      e1868b9e
  2. 04 Dec, 2020 33 commits
  3. 03 Dec, 2020 6 commits
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 55fd59b0
      Jakub Kicinski authored
      Conflicts:
      	drivers/net/ethernet/ibm/ibmvnic.c
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      55fd59b0
    • Linus Torvalds's avatar
      Merge tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · bbe2ba04
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.10-rc7, including fixes from bpf, netfilter,
        wireless drivers, wireless mesh and can.
      
        Current release - regressions:
      
         - mt76: usb: fix crash on device removal
      
        Current release - always broken:
      
         - xsk: Fix umem cleanup from wrong context in socket destruct
      
        Previous release - regressions:
      
         - net: ip6_gre: set dev->hard_header_len when using header_ops
      
         - ipv4: Fix TOS mask in inet_rtm_getroute()
      
         - net, xsk: Avoid taking multiple skbuff references
      
        Previous release - always broken:
      
         - net/x25: prevent a couple of overflows
      
         - netfilter: ipset: prevent uninit-value in hash_ip6_add
      
         - geneve: pull IP header before ECN decapsulation
      
         - mpls: ensure LSE is pullable in TC and openvswitch paths
      
         - vxlan: respect needed_headroom of lower device
      
         - batman-adv: Consider fragmentation for needed packet headroom
      
         - can: drivers: don't count arbitration loss as an error
      
         - netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal
      
         - inet_ecn: Fix endianness of checksum update when setting ECT(1)
      
         - ibmvnic: fix various corner cases around reset handling
      
         - net/mlx5: fix rejecting unsupported Connect-X6DX SW steering
      
         - net/mlx5: Enforce HW TX csum offload with kTLS"
      
      * tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
        net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering
        net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS
        net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled
        net/mlx5: Fix wrong address reclaim when command interface is down
        net/sched: act_mpls: ensure LSE is pullable before reading it
        net: openvswitch: ensure LSE is pullable before reading it
        net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl
        net: mvpp2: Fix error return code in mvpp2_open()
        chelsio/chtls: fix a double free in chtls_setkey()
        rtw88: debug: Fix uninitialized memory in debugfs code
        vxlan: fix error return code in __vxlan_dev_create()
        net: pasemi: fix error return code in pasemi_mac_open()
        cxgb3: fix error return code in t3_sge_alloc_qset()
        net/x25: prevent a couple of overflows
        dpaa_eth: copy timestamp fields to new skb in A-050385 workaround
        net: ip6_gre: set dev->hard_header_len when using header_ops
        mt76: usb: fix crash on device removal
        iwlwifi: pcie: add some missing entries for AX210
        iwlwifi: pcie: invert values of NO_160 device config entries
        iwlwifi: pcie: add one missing entry for AX210
        ...
      bbe2ba04
    • Jakub Kicinski's avatar
      Merge branch 'mptcp-reject-invalid-mp_join-requests-right-away' · a4390e96
      Jakub Kicinski authored
      Florian Westphal says:
      
      ====================
      mptcp: reject invalid mp_join requests right away
      
      At the moment MPTCP can detect an invalid join request (invalid token,
      max number of subflows reached, and so on) right away but cannot reject
      the connection until the 3WHS has completed.
      Instead the connection will complete and the subflow is reset afterwards.
      
      To send the reset most information is already available, but we don't have
      good spot where the reset could be sent:
      
      1. The ->init_req callback is too early and also doesn't allow to return an
         error that could be used to inform the TCP stack that the SYN should be
         dropped.
      
      2. The ->route_req callback lacks the skb needed to send a reset.
      
      3. The ->send_synack callback is the best fit from the available hooks,
         but its called after the request socket has been inserted into the queue
         already. This means we'd have to remove it again right away.
      
      From a technical point of view, the second hook would be best:
       1. Its before insertion into listener queue.
       2. If it returns NULL TCP will drop the packet for us.
      
      Problem is that we'd have to pass the skb to the function just for MPTCP.
      
      Paolo suggested to merge init_req and route_req callbacks instead:
      This makes all info available to MPTCP -- a return value of NULL drops the
      packet and MPTCP can send the reset if needed.
      
      Because 'route_req' has a 'const struct sock *', this means either removal
      of const qualifier, or a bit of code churn to pass 'const' in security land.
      
      This does the latter; I did not find any spots that need write access to struct
      sock.
      
      To recap, the two alternatives are:
      1. Solve it entirely in MPTCP: use the ->send_synack callback to
         unlink the request socket from the listener & drop it.
      2. Avoid 'security' churn by removing the const qualifier.
      ====================
      
      Link: https://lore.kernel.org/r/20201130153631.21872-1-fw@strlen.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a4390e96
    • Florian Westphal's avatar
      mptcp: emit tcp reset when a join request fails · 3ecfbe3e
      Florian Westphal authored
      RFC 8684 says:
       If the token is unknown or the host wants to refuse subflow establishment
       (for example, due to a limit on the number of subflows it will permit),
       the receiver will send back a reset (RST) signal, analogous to an unknown
       port in TCP, containing an MP_TCPRST option (Section 3.6) with an
       "MPTCP specific error" reason code.
      
      mptcp-next doesn't support MP_TCPRST yet, this can be added in another
      change.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      3ecfbe3e
    • Florian Westphal's avatar
      tcp: merge 'init_req' and 'route_req' functions · 7ea851d1
      Florian Westphal authored
      The Multipath-TCP standard (RFC 8684) says that an MPTCP host should send
      a TCP reset if the token in a MP_JOIN request is unknown.
      
      At this time we don't do this, the 3whs completes and the 'new subflow'
      is reset afterwards.  There are two ways to allow MPTCP to send the
      reset.
      
      1. override 'send_synack' callback and emit the rst from there.
         The drawback is that the request socket gets inserted into the
         listeners queue just to get removed again right away.
      
      2. Send the reset from the 'route_req' function instead.
         This avoids the 'add&remove request socket', but route_req lacks the
         skb that is required to send the TCP reset.
      
      Instead of just adding the skb to that function for MPTCP sake alone,
      Paolo suggested to merge init_req and route_req functions.
      
      This saves one indirection from syn processing path and provides the skb
      to the merged function at the same time.
      
      'send reset on unknown mptcp join token' is added in next patch.
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7ea851d1
    • Florian Westphal's avatar
      security: add const qualifier to struct sock in various places · 41dd9596
      Florian Westphal authored
      A followup change to tcp_request_sock_op would have to drop the 'const'
      qualifier from the 'route_req' function as the
      'security_inet_conn_request' call is moved there - and that function
      expects a 'struct sock *'.
      
      However, it turns out its also possible to add a const qualifier to
      security_inet_conn_request instead.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Acked-by: default avatarJames Morris <jamorris@linux.microsoft.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      41dd9596