1. 25 Jun, 2019 1 commit
    • Julian Anastasov's avatar
      ipvs: fix tinfo memory leak in start_sync_thread · 5db7c8b9
      Julian Anastasov authored
      syzkaller reports for memory leak in start_sync_thread [1]
      
      As Eric points out, kthread may start and stop before the
      threadfn function is called, so there is no chance the
      data (tinfo in our case) to be released in thread.
      
      Fix this by releasing tinfo in the controlling code instead.
      
      [1]
      BUG: memory leak
      unreferenced object 0xffff8881206bf700 (size 32):
       comm "syz-executor761", pid 7268, jiffies 4294943441 (age 20.470s)
       hex dump (first 32 bytes):
         00 40 7c 09 81 88 ff ff 80 45 b8 21 81 88 ff ff  .@|......E.!....
         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
       backtrace:
         [<0000000057619e23>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
         [<0000000057619e23>] slab_post_alloc_hook mm/slab.h:439 [inline]
         [<0000000057619e23>] slab_alloc mm/slab.c:3326 [inline]
         [<0000000057619e23>] kmem_cache_alloc_trace+0x13d/0x280 mm/slab.c:3553
         [<0000000086ce5479>] kmalloc include/linux/slab.h:547 [inline]
         [<0000000086ce5479>] start_sync_thread+0x5d2/0xe10 net/netfilter/ipvs/ip_vs_sync.c:1862
         [<000000001a9229cc>] do_ip_vs_set_ctl+0x4c5/0x780 net/netfilter/ipvs/ip_vs_ctl.c:2402
         [<00000000ece457c8>] nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
         [<00000000ece457c8>] nf_setsockopt+0x4c/0x80 net/netfilter/nf_sockopt.c:115
         [<00000000942f62d4>] ip_setsockopt net/ipv4/ip_sockglue.c:1258 [inline]
         [<00000000942f62d4>] ip_setsockopt+0x9b/0xb0 net/ipv4/ip_sockglue.c:1238
         [<00000000a56a8ffd>] udp_setsockopt+0x4e/0x90 net/ipv4/udp.c:2616
         [<00000000fa895401>] sock_common_setsockopt+0x38/0x50 net/core/sock.c:3130
         [<0000000095eef4cf>] __sys_setsockopt+0x98/0x120 net/socket.c:2078
         [<000000009747cf88>] __do_sys_setsockopt net/socket.c:2089 [inline]
         [<000000009747cf88>] __se_sys_setsockopt net/socket.c:2086 [inline]
         [<000000009747cf88>] __x64_sys_setsockopt+0x26/0x30 net/socket.c:2086
         [<00000000ded8ba80>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:301
         [<00000000893b4ac8>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+7e2e50c8adfccd2e5041@syzkaller.appspotmail.com
      Suggested-by: default avatarEric Biggers <ebiggers@kernel.org>
      Fixes: 998e7a76 ("ipvs: Use kthread_run() instead of doing a double-fork via kernel_thread()")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5db7c8b9
  2. 21 Jun, 2019 1 commit
    • Julian Anastasov's avatar
      ipvs: defer hook registration to avoid leaks · cf47a0b8
      Julian Anastasov authored
      syzkaller reports for memory leak when registering hooks [1]
      
      As we moved the nf_unregister_net_hooks() call into
      __ip_vs_dev_cleanup(), defer the nf_register_net_hooks()
      call, so that hooks are allocated and freed from same
      pernet_operations (ipvs_core_dev_ops).
      
      [1]
      BUG: memory leak
      unreferenced object 0xffff88810acd8a80 (size 96):
       comm "syz-executor073", pid 7254, jiffies 4294950560 (age 22.250s)
       hex dump (first 32 bytes):
         02 00 00 00 00 00 00 00 50 8b bb 82 ff ff ff ff  ........P.......
         00 00 00 00 00 00 00 00 00 77 bb 82 ff ff ff ff  .........w......
       backtrace:
         [<0000000013db61f1>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline]
         [<0000000013db61f1>] slab_post_alloc_hook mm/slab.h:439 [inline]
         [<0000000013db61f1>] slab_alloc_node mm/slab.c:3269 [inline]
         [<0000000013db61f1>] kmem_cache_alloc_node_trace+0x15b/0x2a0 mm/slab.c:3597
         [<000000001a27307d>] __do_kmalloc_node mm/slab.c:3619 [inline]
         [<000000001a27307d>] __kmalloc_node+0x38/0x50 mm/slab.c:3627
         [<0000000025054add>] kmalloc_node include/linux/slab.h:590 [inline]
         [<0000000025054add>] kvmalloc_node+0x4a/0xd0 mm/util.c:431
         [<0000000050d1bc00>] kvmalloc include/linux/mm.h:637 [inline]
         [<0000000050d1bc00>] kvzalloc include/linux/mm.h:645 [inline]
         [<0000000050d1bc00>] allocate_hook_entries_size+0x3b/0x60 net/netfilter/core.c:61
         [<00000000e8abe142>] nf_hook_entries_grow+0xae/0x270 net/netfilter/core.c:128
         [<000000004b94797c>] __nf_register_net_hook+0x9a/0x170 net/netfilter/core.c:337
         [<00000000d1545cbc>] nf_register_net_hook+0x34/0xc0 net/netfilter/core.c:464
         [<00000000876c9b55>] nf_register_net_hooks+0x53/0xc0 net/netfilter/core.c:480
         [<000000002ea868e0>] __ip_vs_init+0xe8/0x170 net/netfilter/ipvs/ip_vs_core.c:2280
         [<000000002eb2d451>] ops_init+0x4c/0x140 net/core/net_namespace.c:130
         [<000000000284ec48>] setup_net+0xde/0x230 net/core/net_namespace.c:316
         [<00000000a70600fa>] copy_net_ns+0xf0/0x1e0 net/core/net_namespace.c:439
         [<00000000ff26c15e>] create_new_namespaces+0x141/0x2a0 kernel/nsproxy.c:107
         [<00000000b103dc79>] copy_namespaces+0xa1/0xe0 kernel/nsproxy.c:165
         [<000000007cc008a2>] copy_process.part.0+0x11fd/0x2150 kernel/fork.c:2035
         [<00000000c344af7c>] copy_process kernel/fork.c:1800 [inline]
         [<00000000c344af7c>] _do_fork+0x121/0x4f0 kernel/fork.c:2369
      
      Reported-by: syzbot+722da59ccb264bc19910@syzkaller.appspotmail.com
      Fixes: 719c7d56 ("ipvs: Fix use-after-free in ip_vs_in")
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      cf47a0b8
  3. 19 Jun, 2019 20 commits
    • Eric Dumazet's avatar
      inet: clear num_timeout reqsk_alloc() · 85f9aa75
      Eric Dumazet authored
      KMSAN caught uninit-value in tcp_create_openreq_child() [1]
      This is caused by a recent change, combined by the fact
      that TCP cleared num_timeout, num_retrans and sk fields only
      when a request socket was about to be queued.
      
      Under syncookie mode, a temporary request socket is used,
      and req->num_timeout could contain garbage.
      
      Lets clear these three fields sooner, there is really no
      point trying to defer this and risk other bugs.
      
      [1]
      
      BUG: KMSAN: uninit-value in tcp_create_openreq_child+0x157f/0x1cc0 net/ipv4/tcp_minisocks.c:526
      CPU: 1 PID: 13357 Comm: syz-executor591 Not tainted 5.2.0-rc4+ #3
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x191/0x1f0 lib/dump_stack.c:113
       kmsan_report+0x162/0x2d0 mm/kmsan/kmsan.c:611
       __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:304
       tcp_create_openreq_child+0x157f/0x1cc0 net/ipv4/tcp_minisocks.c:526
       tcp_v6_syn_recv_sock+0x761/0x2d80 net/ipv6/tcp_ipv6.c:1152
       tcp_get_cookie_sock+0x16e/0x6b0 net/ipv4/syncookies.c:209
       cookie_v6_check+0x27e0/0x29a0 net/ipv6/syncookies.c:252
       tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1039 [inline]
       tcp_v6_do_rcv+0xf1c/0x1ce0 net/ipv6/tcp_ipv6.c:1344
       tcp_v6_rcv+0x60b7/0x6a30 net/ipv6/tcp_ipv6.c:1554
       ip6_protocol_deliver_rcu+0x1433/0x22f0 net/ipv6/ip6_input.c:397
       ip6_input_finish net/ipv6/ip6_input.c:438 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_input+0x2af/0x340 net/ipv6/ip6_input.c:447
       dst_input include/net/dst.h:439 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:272
       __netif_receive_skb_one_core net/core/dev.c:4981 [inline]
       __netif_receive_skb net/core/dev.c:5095 [inline]
       process_backlog+0x721/0x1410 net/core/dev.c:5906
       napi_poll net/core/dev.c:6329 [inline]
       net_rx_action+0x738/0x1940 net/core/dev.c:6395
       __do_softirq+0x4ad/0x858 kernel/softirq.c:293
       do_softirq_own_stack+0x49/0x80 arch/x86/entry/entry_64.S:1052
       </IRQ>
       do_softirq kernel/softirq.c:338 [inline]
       __local_bh_enable_ip+0x199/0x1e0 kernel/softirq.c:190
       local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
       rcu_read_unlock_bh include/linux/rcupdate.h:682 [inline]
       ip6_finish_output2+0x213f/0x2670 net/ipv6/ip6_output.c:117
       ip6_finish_output+0xae4/0xbc0 net/ipv6/ip6_output.c:150
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0x5d3/0x720 net/ipv6/ip6_output.c:167
       dst_output include/net/dst.h:433 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0x1f53/0x2650 net/ipv6/ip6_output.c:271
       inet6_csk_xmit+0x3df/0x4f0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x4076/0x5b40 net/ipv4/tcp_output.c:1156
       tcp_transmit_skb net/ipv4/tcp_output.c:1172 [inline]
       tcp_write_xmit+0x39a9/0xa730 net/ipv4/tcp_output.c:2397
       __tcp_push_pending_frames+0x124/0x4e0 net/ipv4/tcp_output.c:2573
       tcp_send_fin+0xd43/0x1540 net/ipv4/tcp_output.c:3118
       tcp_close+0x16ba/0x1860 net/ipv4/tcp.c:2403
       inet_release+0x1f7/0x270 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:470
       __sock_release net/socket.c:601 [inline]
       sock_close+0x156/0x490 net/socket.c:1273
       __fput+0x4c9/0xba0 fs/file_table.c:280
       ____fput+0x37/0x40 fs/file_table.c:313
       task_work_run+0x22e/0x2a0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:185 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:168 [inline]
       prepare_exit_to_usermode+0x39d/0x4d0 arch/x86/entry/common.c:199
       syscall_return_slowpath+0x90/0x5c0 arch/x86/entry/common.c:279
       do_syscall_64+0xe2/0xf0 arch/x86/entry/common.c:305
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x401d50
      Code: 01 f0 ff ff 0f 83 40 0d 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d dd 8d 2d 00 00 75 14 b8 03 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 14 0d 00 00 c3 48 83 ec 08 e8 7a 02 00 00
      RSP: 002b:00007fff1cf58cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000401d50
      RDX: 000000000000001c RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00000000004a9050 R08: 0000000020000040 R09: 000000000000001c
      R10: 0000000020004004 R11: 0000000000000246 R12: 0000000000402ef0
      R13: 0000000000402f80 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:201 [inline]
       kmsan_internal_poison_shadow+0x53/0xa0 mm/kmsan/kmsan.c:160
       kmsan_kmalloc+0xa4/0x130 mm/kmsan/kmsan_hooks.c:177
       kmem_cache_alloc+0x534/0xb00 mm/slub.c:2781
       reqsk_alloc include/net/request_sock.h:84 [inline]
       inet_reqsk_alloc+0xa8/0x600 net/ipv4/tcp_input.c:6384
       cookie_v6_check+0xadb/0x29a0 net/ipv6/syncookies.c:173
       tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1039 [inline]
       tcp_v6_do_rcv+0xf1c/0x1ce0 net/ipv6/tcp_ipv6.c:1344
       tcp_v6_rcv+0x60b7/0x6a30 net/ipv6/tcp_ipv6.c:1554
       ip6_protocol_deliver_rcu+0x1433/0x22f0 net/ipv6/ip6_input.c:397
       ip6_input_finish net/ipv6/ip6_input.c:438 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_input+0x2af/0x340 net/ipv6/ip6_input.c:447
       dst_input include/net/dst.h:439 [inline]
       ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ipv6_rcv+0x683/0x710 net/ipv6/ip6_input.c:272
       __netif_receive_skb_one_core net/core/dev.c:4981 [inline]
       __netif_receive_skb net/core/dev.c:5095 [inline]
       process_backlog+0x721/0x1410 net/core/dev.c:5906
       napi_poll net/core/dev.c:6329 [inline]
       net_rx_action+0x738/0x1940 net/core/dev.c:6395
       __do_softirq+0x4ad/0x858 kernel/softirq.c:293
       do_softirq_own_stack+0x49/0x80 arch/x86/entry/entry_64.S:1052
       do_softirq kernel/softirq.c:338 [inline]
       __local_bh_enable_ip+0x199/0x1e0 kernel/softirq.c:190
       local_bh_enable+0x36/0x40 include/linux/bottom_half.h:32
       rcu_read_unlock_bh include/linux/rcupdate.h:682 [inline]
       ip6_finish_output2+0x213f/0x2670 net/ipv6/ip6_output.c:117
       ip6_finish_output+0xae4/0xbc0 net/ipv6/ip6_output.c:150
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0x5d3/0x720 net/ipv6/ip6_output.c:167
       dst_output include/net/dst.h:433 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ip6_xmit+0x1f53/0x2650 net/ipv6/ip6_output.c:271
       inet6_csk_xmit+0x3df/0x4f0 net/ipv6/inet6_connection_sock.c:135
       __tcp_transmit_skb+0x4076/0x5b40 net/ipv4/tcp_output.c:1156
       tcp_transmit_skb net/ipv4/tcp_output.c:1172 [inline]
       tcp_write_xmit+0x39a9/0xa730 net/ipv4/tcp_output.c:2397
       __tcp_push_pending_frames+0x124/0x4e0 net/ipv4/tcp_output.c:2573
       tcp_send_fin+0xd43/0x1540 net/ipv4/tcp_output.c:3118
       tcp_close+0x16ba/0x1860 net/ipv4/tcp.c:2403
       inet_release+0x1f7/0x270 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:470
       __sock_release net/socket.c:601 [inline]
       sock_close+0x156/0x490 net/socket.c:1273
       __fput+0x4c9/0xba0 fs/file_table.c:280
       ____fput+0x37/0x40 fs/file_table.c:313
       task_work_run+0x22e/0x2a0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:185 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:168 [inline]
       prepare_exit_to_usermode+0x39d/0x4d0 arch/x86/entry/common.c:199
       syscall_return_slowpath+0x90/0x5c0 arch/x86/entry/common.c:279
       do_syscall_64+0xe2/0xf0 arch/x86/entry/common.c:305
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Fixes: 336c39a0 ("tcp: undo init congestion window on false SYNACK timeout")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85f9aa75
    • Nathan Huckleberry's avatar
      net: mvpp2: debugfs: Add pmap to fs dump · 8110a7a7
      Nathan Huckleberry authored
      There was an unused variable 'mvpp2_dbgfs_prs_pmap_fops'
      Added a usage consistent with other fops to dump pmap
      to userspace.
      
      Cc: clang-built-linux@googlegroups.com
      Link: https://github.com/ClangBuiltLinux/linux/issues/529Signed-off-by: default avatarNathan Huckleberry <nhuck@google.com>
      Tested-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8110a7a7
    • David Ahern's avatar
      ipv6: Default fib6_type to RTN_UNICAST when not set · c7036d97
      David Ahern authored
      A user reported that routes are getting installed with type 0 (RTN_UNSPEC)
      where before the routes were RTN_UNICAST. One example is from accel-ppp
      which apparently still uses the ioctl interface and does not set
      rtmsg_type. Another is the netlink interface where ipv6 does not require
      rtm_type to be set (v4 does). Prior to the commit in the Fixes tag the
      ipv6 stack converted type 0 to RTN_UNICAST, so restore that behavior.
      
      Fixes: e8478e80 ("net/ipv6: Save route type in rt6_info")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7036d97
    • Krzysztof Kozlowski's avatar
      net: hns3: Fix inconsistent indenting · bf6de231
      Krzysztof Kozlowski authored
      Fix wrong indentation of goto return.
      Signed-off-by: default avatarKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf6de231
    • David S. Miller's avatar
      Merge branch 'af_iucv-fixes' · 99838e60
      David S. Miller authored
      Julian Wiedmann says:
      
      ====================
      net/af_iucv: fixes 2019-06-18
      
      I spent a few cycles on transmit problems for af_iucv over regular
      netdevices - please apply the following fixes to -net.
      
      The first patch allows for skb allocations outside of GFP_DMA, while the
      second patch respects that drivers might use skb_cow_head() and/or want
      additional dev->needed_headroom.
      Patch 3 is for a separate issue, where we didn't setup some of the
      netdevice-specific infrastructure when running as a z/VM guest.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99838e60
    • Julian Wiedmann's avatar
      net/af_iucv: always register net_device notifier · 06996c1d
      Julian Wiedmann authored
      Even when running as VM guest (ie pr_iucv != NULL), af_iucv can still
      open HiperTransport-based connections. For robust operation these
      connections require the af_iucv_netdev_notifier, so register it
      unconditionally.
      
      Also handle any error that register_netdevice_notifier() returns.
      
      Fixes: 9fbd87d4 ("af_iucv: handle netdev events")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      06996c1d
    • Julian Wiedmann's avatar
      net/af_iucv: build proper skbs for HiperTransport · 238965b7
      Julian Wiedmann authored
      The HiperSockets-based transport path in af_iucv is still too closely
      entangled with qeth.
      With commit a647a025 ("s390/qeth: speed-up L3 IQD xmit"), the
      relevant xmit code in qeth has begun to use skb_cow_head(). So to avoid
      unnecessary skb head expansions, af_iucv must learn to
      1) respect dev->needed_headroom when allocating skbs, and
      2) drop the header reference before cloning the skb.
      
      While at it, also stop hard-coding the LL-header creation stage and just
      use the appropriate helper.
      
      Fixes: a647a025 ("s390/qeth: speed-up L3 IQD xmit")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      238965b7
    • Julian Wiedmann's avatar
      net/af_iucv: remove GFP_DMA restriction for HiperTransport · fdbf6326
      Julian Wiedmann authored
      af_iucv sockets over z/VM IUCV require that their skbs are allocated
      in DMA memory. This restriction doesn't apply to connections over
      HiperSockets. So only set this limit for z/VM IUCV sockets, thereby
      increasing the likelihood that the large (and linear!) allocations for
      HiperTransport messages succeed.
      
      Fixes: 3881ac44 ("af_iucv: add HiperSockets transport")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Reviewed-by: default avatarHendrik Brueckner <brueckner@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fdbf6326
    • Rasmus Villemoes's avatar
      net: dsa: mv88e6xxx: fix shift of FID bits in mv88e6185_g1_vtu_loadpurge() · 48620e34
      Rasmus Villemoes authored
      The comment is correct, but the code ends up moving the bits four
      places too far, into the VTUOp field.
      
      Fixes: 11ea809f (net: dsa: mv88e6xxx: support 256 databases)
      Signed-off-by: default avatarRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48620e34
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · d470e720
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Module autoload for masquerade and redirection does not work.
      
      2) Leak in unqueued packets in nf_ct_frag6_queue(). Ignore duplicated
         fragments, pretend they are placed into the queue. Patches from
         Guillaume Nault.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d470e720
    • Sunil Muthuswamy's avatar
      hvsock: fix epollout hang from race condition · cb359b60
      Sunil Muthuswamy authored
      Currently, hvsock can enter into a state where epoll_wait on EPOLLOUT will
      not return even when the hvsock socket is writable, under some race
      condition. This can happen under the following sequence:
      - fd = socket(hvsocket)
      - fd_out = dup(fd)
      - fd_in = dup(fd)
      - start a writer thread that writes data to fd_out with a combination of
        epoll_wait(fd_out, EPOLLOUT) and
      - start a reader thread that reads data from fd_in with a combination of
        epoll_wait(fd_in, EPOLLIN)
      - On the host, there are two threads that are reading/writing data to the
        hvsocket
      
      stack:
      hvs_stream_has_space
      hvs_notify_poll_out
      vsock_poll
      sock_poll
      ep_poll
      
      Race condition:
      check for epollout from ep_poll():
      	assume no writable space in the socket
      	hvs_stream_has_space() returns 0
      check for epollin from ep_poll():
      	assume socket has some free space < HVS_PKT_LEN(HVS_SEND_BUF_SIZE)
      	hvs_stream_has_space() will clear the channel pending send size
      	host will not notify the guest because the pending send size has
      		been cleared and so the hvsocket will never mark the
      		socket writable
      
      Now, the EPOLLOUT will never return even if the socket write buffer is
      empty.
      
      The fix is to set the pending size to the default size and never change it.
      This way the host will always notify the guest whenever the writable space
      is bigger than the pending size. The host is already optimized to *only*
      notify the guest when the pending size threshold boundary is crossed and
      not everytime.
      
      This change also reduces the cpu usage somewhat since hv_stream_has_space()
      is in the hotpath of send:
      vsock_stream_sendmsg()->hv_stream_has_space()
      Earlier hv_stream_has_space was setting/clearing the pending size on every
      call.
      Signed-off-by: default avatarSunil Muthuswamy <sunilmut@microsoft.com>
      Reviewed-by: default avatarDexuan Cui <decui@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb359b60
    • Fred Klassen's avatar
      net/udp_gso: Allow TX timestamp with UDP GSO · 76e21533
      Fred Klassen authored
      Fixes an issue where TX Timestamps are not arriving on the error queue
      when UDP_SEGMENT CMSG type is combined with CMSG type SO_TIMESTAMPING.
      This can be illustrated with an updated updgso_bench_tx program which
      includes the '-T' option to test for this condition. It also introduces
      the '-P' option which will call poll() before reading the error queue.
      
          ./udpgso_bench_tx -4ucTPv -S 1472 -l2 -D 172.16.120.18
          poll timeout
          udp tx:      0 MB/s        1 calls/s      1 msg/s
      
      The "poll timeout" message above indicates that TX timestamp never
      arrived.
      
      This patch preserves tx_flags for the first UDP GSO segment. Only the
      first segment is timestamped, even though in some cases there may be
      benefital in timestamping both the first and last segment.
      
      Factors in deciding on first segment timestamp only:
      
      - Timestamping both first and last segmented is not feasible. Hardware
      can only have one outstanding TS request at a time.
      
      - Timestamping last segment may under report network latency of the
      previous segments. Even though the doorbell is suppressed, the ring
      producer counter has been incremented.
      
      - Timestamping the first segment has the upside in that it reports
      timestamps from the application's view, e.g. RTT.
      
      - Timestamping the first segment has the downside that it may
      underreport tx host network latency. It appears that we have to pick
      one or the other. And possibly follow-up with a config flag to choose
      behavior.
      
      v2: Remove tests as noted by Willem de Bruijn <willemb@google.com>
          Moving tests from net to net-next
      
      v3: Update only relevant tx_flag bits as per
          Willem de Bruijn <willemb@google.com>
      
      v4: Update comments and commit message as per
          Willem de Bruijn <willemb@google.com>
      
      Fixes: ee80d1eb ("udp: add udp gso")
      Signed-off-by: default avatarFred Klassen <fklassen@appneta.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76e21533
    • David S. Miller's avatar
      Merge branch 'net-netem-fix-issues-with-corrupting-GSO-frames' · e11e1007
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      net: netem: fix issues with corrupting GSO frames
      
      Corrupting GSO frames currently leads to crashes, due to skb use
      after free.  These stem from the skb list handling - the segmented
      skbs come back on a list, and this list is not properly unlinked
      before enqueuing the segments.  Turns out this condition is made
      very likely to occur because of another bug - in backlog accounting.
      Segments are counted twice, which means qdisc's limit gets reached
      leading to drops and making the use after free very likely to happen.
      
      The bugs are fixed in order in which they were added to the tree.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e11e1007
    • Jakub Kicinski's avatar
      net: netem: fix use after free and double free with packet corruption · 3e14c383
      Jakub Kicinski authored
      Brendan reports that the use of netem's packet corruption capability
      leads to strange crashes.  This seems to be caused by
      commit d66280b1 ("net: netem: use a list in addition to rbtree")
      which uses skb->next pointer to construct a fast-path queue of
      in-order skbs.
      
      Packet corruption code has to invoke skb_gso_segment() in case
      of skbs in need of GSO.  skb_gso_segment() returns a list of
      skbs.  If next pointers of the skbs on that list do not get cleared
      fast path list may point to freed skbs or skbs which are also on
      the RB tree.
      
      Let's say skb gets segmented into 3 frames:
      
      A -> B -> C
      
      A gets hooked to the t_head t_tail list by tfifo_enqueue(), but it's
      next pointer didn't get cleared so we have:
      
      h t
      |/
      A -> B -> C
      
      Now if B and C get also get enqueued successfully all is fine, because
      tfifo_enqueue() will overwrite the list in order.  IOW:
      
      Enqueue B:
      
      h    t
      |    |
      A -> B    C
      
      Enqueue C:
      
      h         t
      |         |
      A -> B -> C
      
      But if B and C get reordered we may end up with:
      
      h t            RB tree
      |/                |
      A -> B -> C       B
                         \
                          C
      
      Or if they get dropped just:
      
      h t
      |/
      A -> B -> C
      
      where A and B are already freed.
      
      To reproduce either limit has to be set low to cause freeing of
      segs or reorders have to happen (due to delay jitter).
      
      Note that we only have to mark the first segment as not on the
      list, "finish_segs" handling of other frags already does that.
      
      Another caveat is that qdisc_drop_all() still has to free all
      segments correctly in case of drop of first segment, therefore
      we re-link segs before calling it.
      
      v2:
       - re-link before drop, v1 was leaking non-first segs if limit
         was hit at the first seg
       - better commit message which lead to discovering the above :)
      Reported-by: default avatarBrendan Galloway <brendan.galloway@netronome.com>
      Fixes: d66280b1 ("net: netem: use a list in addition to rbtree")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e14c383
    • Jakub Kicinski's avatar
      net: netem: fix backlog accounting for corrupted GSO frames · 177b8007
      Jakub Kicinski authored
      When GSO frame has to be corrupted netem uses skb_gso_segment()
      to produce the list of frames, and re-enqueues the segments one
      by one.  The backlog length has to be adjusted to account for
      new frames.
      
      The current calculation is incorrect, leading to wrong backlog
      lengths in the parent qdisc (both bytes and packets), and
      incorrect packet backlog count in netem itself.
      
      Parent backlog goes negative, netem's packet backlog counts
      all non-first segments twice (thus remaining non-zero even
      after qdisc is emptied).
      
      Move the variables used to count the adjustment into local
      scope to make 100% sure they aren't used at any stage in
      backports.
      
      Fixes: 6071bd1a ("netem: Segment GSO packets on enqueue")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      177b8007
    • Colin Ian King's avatar
      net: lio_core: fix potential sign-extension overflow on large shift · 94762740
      Colin Ian King authored
      Left shifting the signed int value 1 by 31 bits has undefined behaviour
      and the shift amount oq_no can be as much as 63.  Fix this by using
      BIT_ULL(oq_no) instead.
      
      Addresses-Coverity: ("Bad shift operation")
      Fixes: f21fb3ed ("Add support of Cavium Liquidio ethernet adapters")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94762740
    • David S. Miller's avatar
      Merge branch 'net-fix-quite-a-few-dst_cache-crashes-reported-by-syzbot' · 55458d2f
      David S. Miller authored
      Xin Long says:
      
      ====================
      net: fix quite a few dst_cache crashes reported by syzbot
      
      There are two kinds of crashes reported many times by syzbot with no
      reproducer. Call Traces are like:
      
           BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190
           net/ipv4/route.c:1556
             rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556
             __mkroute_output net/ipv4/route.c:2332 [inline]
             ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564
             ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393
             __ip_route_output_key include/net/route.h:125 [inline]
             ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651
             ip_route_output_key include/net/route.h:135 [inline]
           ...
      
         or:
      
           kasan: GPF could be caused by NULL-ptr deref or user memory access
           RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168
             <IRQ>
             rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline]
             free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217
             __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
             rcu_do_batch kernel/rcu/tree.c:2437 [inline]
             invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline]
             rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697
           ...
      
      They were caused by the fib_nh_common percpu member 'nhc_pcpu_rth_output'
      overwritten by another percpu variable 'dev->tstats' access overflow in
      tipc udp media xmit path when counting packets on a non tunnel device.
      
      The fix is to make udp tunnel work with no tunnel device by allowing not
      to count packets on the tstats when the tunnel dev is NULL in Patches 1/3
      and 2/3, then pass a NULL tunnel dev in tipc_udp_tunnel() in Patch 3/3.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55458d2f
    • Xin Long's avatar
      tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb · c3bcde02
      Xin Long authored
      udp_tunnel(6)_xmit_skb() called by tipc_udp_xmit() expects a tunnel device
      to count packets on dev->tstats, a perpcu variable. However, TIPC is using
      udp tunnel with no tunnel device, and pass the lower dev, like veth device
      that only initializes dev->lstats(a perpcu variable) when creating it.
      
      Later iptunnel_xmit_stats() called by ip(6)tunnel_xmit() thinks the dev as
      a tunnel device, and uses dev->tstats instead of dev->lstats. tstats' each
      pointer points to a bigger struct than lstats, so when tstats->tx_bytes is
      increased, other percpu variable's members could be overwritten.
      
      syzbot has reported quite a few crashes due to fib_nh_common percpu member
      'nhc_pcpu_rth_output' overwritten, call traces are like:
      
        BUG: KASAN: slab-out-of-bounds in rt_cache_valid+0x158/0x190
        net/ipv4/route.c:1556
          rt_cache_valid+0x158/0x190 net/ipv4/route.c:1556
          __mkroute_output net/ipv4/route.c:2332 [inline]
          ip_route_output_key_hash_rcu+0x819/0x2d50 net/ipv4/route.c:2564
          ip_route_output_key_hash+0x1ef/0x360 net/ipv4/route.c:2393
          __ip_route_output_key include/net/route.h:125 [inline]
          ip_route_output_flow+0x28/0xc0 net/ipv4/route.c:2651
          ip_route_output_key include/net/route.h:135 [inline]
        ...
      
      or:
      
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        RIP: 0010:dst_dev_put+0x24/0x290 net/core/dst.c:168
          <IRQ>
          rt_fibinfo_free_cpus net/ipv4/fib_semantics.c:200 [inline]
          free_fib_info_rcu+0x2e1/0x490 net/ipv4/fib_semantics.c:217
          __rcu_reclaim kernel/rcu/rcu.h:240 [inline]
          rcu_do_batch kernel/rcu/tree.c:2437 [inline]
          invoke_rcu_callbacks kernel/rcu/tree.c:2716 [inline]
          rcu_process_callbacks+0x100a/0x1ac0 kernel/rcu/tree.c:2697
        ...
      
      The issue exists since tunnel stats update is moved to iptunnel_xmit by
      Commit 039f5062 ("ip_tunnel: Move stats update to iptunnel_xmit()"),
      and here to fix it by passing a NULL tunnel dev to udp_tunnel(6)_xmit_skb
      so that the packets counting won't happen on dev->tstats.
      
      Reported-by: syzbot+9d4c12bfd45a58738d0a@syzkaller.appspotmail.com
      Reported-by: syzbot+a9e23ea2aa21044c2798@syzkaller.appspotmail.com
      Reported-by: syzbot+c4c4b2bb358bb936ad7e@syzkaller.appspotmail.com
      Reported-by: syzbot+0290d2290a607e035ba1@syzkaller.appspotmail.com
      Reported-by: syzbot+a43d8d4e7e8a7a9e149e@syzkaller.appspotmail.com
      Reported-by: syzbot+a47c5f4c6c00fc1ed16e@syzkaller.appspotmail.com
      Fixes: 039f5062 ("ip_tunnel: Move stats update to iptunnel_xmit()")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3bcde02
    • Xin Long's avatar
      ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL · 6f6a8622
      Xin Long authored
      A similar fix to Patch "ip_tunnel: allow not to count pkts on tstats by
      setting skb's dev to NULL" is also needed by ip6_tunnel.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f6a8622
    • Xin Long's avatar
      ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL · 5684abf7
      Xin Long authored
      iptunnel_xmit() works as a common function, also used by a udp tunnel
      which doesn't have to have a tunnel device, like how TIPC works with
      udp media.
      
      In these cases, we should allow not to count pkts on dev's tstats, so
      that udp tunnel can work with no tunnel device safely.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5684abf7
  4. 18 Jun, 2019 4 commits
    • Fei Li's avatar
      tun: wake up waitqueues after IFF_UP is set · 72b319dc
      Fei Li authored
      Currently after setting tap0 link up, the tun code wakes tx/rx waited
      queues up in tun_net_open() when .ndo_open() is called, however the
      IFF_UP flag has not been set yet. If there's already a wait queue, it
      would fail to transmit when checking the IFF_UP flag in tun_sendmsg().
      Then the saving vhost_poll_start() will add the wq into wqh until it
      is waken up again. Although this works when IFF_UP flag has been set
      when tun_chr_poll detects; this is not true if IFF_UP flag has not
      been set at that time. Sadly the latter case is a fatal error, as
      the wq will never be waken up in future unless later manually
      setting link up on purpose.
      
      Fix this by moving the wakeup process into the NETDEV_UP event
      notifying process, this makes sure IFF_UP has been set before all
      waited queues been waken up.
      Signed-off-by: default avatarFei Li <lifei.shirley@bytedance.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72b319dc
    • JingYi Hou's avatar
      net: remove duplicate fetch in sock_getsockopt · d0bae4a0
      JingYi Hou authored
      In sock_getsockopt(), 'optlen' is fetched the first time from userspace.
      'len < 0' is then checked. Then in condition 'SO_MEMINFO', 'optlen' is
      fetched the second time from userspace.
      
      If change it between two fetches may cause security problems or unexpected
      behaivor, and there is no reason to fetch it a second time.
      
      To fix this, we need to remove the second fetch.
      Signed-off-by: default avatarJingYi Hou <houjingyi647@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0bae4a0
    • Tuong Lien's avatar
      tipc: fix issues with early FAILOVER_MSG from peer · d0f84d08
      Tuong Lien authored
      It appears that a FAILOVER_MSG can come from peer even when the failure
      link is resetting (i.e. just after the 'node_write_unlock()'...). This
      means the failover procedure on the node has not been started yet.
      The situation is as follows:
      
               node1                                node2
        linkb          linka                  linka        linkb
          |              |                      |            |
          |              |                      x failure    |
          |              |                  RESETTING        |
          |              |                      |            |
          |              x failure            RESET          |
          |          RESETTING             FAILINGOVER       |
          |              |   (FAILOVER_MSG)     |            |
          |<-------------------------------------------------|
          | *FAILINGOVER |                      |            |
          |              | (dummy FAILOVER_MSG) |            |
          |------------------------------------------------->|
          |            RESET                    |            | FAILOVER_END
          |         FAILINGOVER               RESET          |
          .              .                      .            .
          .              .                      .            .
          .              .                      .            .
      
      Once this happens, the link failover procedure will be triggered
      wrongly on the receiving node since the node isn't in FAILINGOVER state
      but then another link failover will be carried out.
      The consequences are:
      
      1) A peer might get stuck in FAILINGOVER state because the 'sync_point'
      was set, reset and set incorrectly, the criteria to end the failover
      would not be met, it could keep waiting for a message that has already
      received.
      
      2) The early FAILOVER_MSG(s) could be queued in the link failover
      deferdq but would be purged or not pulled out because the 'drop_point'
      was not set correctly.
      
      3) The early FAILOVER_MSG(s) could be dropped too.
      
      4) The dummy FAILOVER_MSG could make the peer leaving FAILINGOVER state
      shortly, but later on it would be restarted.
      
      The same situation can also happen when the link is in PEER_RESET state
      and a FAILOVER_MSG arrives.
      
      The commit resolves the issues by forcing the link down immediately, so
      the failover procedure will be started normally (which is the same as
      when receiving a FAILOVER_MSG and the link is in up state).
      
      Also, the function "tipc_node_link_failover()" is toughen to avoid such
      a situation from happening.
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.se>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0f84d08
    • Mauro S. M. Rodrigues's avatar
      bnx2x: Check if transceiver implements DDM before access · cf18cecc
      Mauro S. M. Rodrigues authored
      Some transceivers may comply with SFF-8472 even though they do not
      implement the Digital Diagnostic Monitoring (DDM) interface described in
      the spec. The existence of such area is specified by the 6th bit of byte
      92, set to 1 if implemented.
      
      Currently, without checking this bit, bnx2x fails trying to read sfp
      module's EEPROM with the follow message:
      
      ethtool -m enP5p1s0f1
      Cannot get Module EEPROM data: Input/output error
      
      Because it fails to read the additional 256 bytes in which it is assumed
      to exist the DDM data.
      
      This issue was noticed using a Mellanox Passive DAC PN 01FT738. The EEPROM
      data was confirmed by Mellanox as correct and similar to other Passive
      DACs from other manufacturers.
      Signed-off-by: default avatarMauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
      Acked-by: default avatarSudarsana Reddy Kalluru <skalluru@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf18cecc
  5. 17 Jun, 2019 14 commits
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 29f785ff
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "MS_MOVE regression fix + breakage in fsmount(2) (also introduced in
        this cycle, along with fsmount(2) itself).
      
        I'm still digging through the piles of mail, so there might be more
        fixes to follow, but these two are obvious and self-contained, so
        there's no point delaying those..."
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/namespace: fix unprivileged mount propagation
        vfs: fsmount: add missing mntget()
      29f785ff
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · da0f3820
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Lots of bug fixes here:
      
         1) Out of bounds access in __bpf_skc_lookup, from Lorenz Bauer.
      
         2) Fix rate reporting in cfg80211_calculate_bitrate_he(), from John
            Crispin.
      
         3) Use after free in psock backlog workqueue, from John Fastabend.
      
         4) Fix source port matching in fdb peer flow rule of mlx5, from Raed
            Salem.
      
         5) Use atomic_inc_not_zero() in fl6_sock_lookup(), from Eric Dumazet.
      
         6) Network header needs to be set for packet redirect in nfp, from
            John Hurley.
      
         7) Fix udp zerocopy refcnt, from Willem de Bruijn.
      
         8) Don't assume linear buffers in vxlan and geneve error handlers,
            from Stefano Brivio.
      
         9) Fix TOS matching in mlxsw, from Jiri Pirko.
      
        10) More SCTP cookie memory leak fixes, from Neil Horman.
      
        11) Fix VLAN filtering in rtl8366, from Linus Walluij.
      
        12) Various TCP SACK payload size and fragmentation memory limit fixes
            from Eric Dumazet.
      
        13) Use after free in pneigh_get_next(), also from Eric Dumazet.
      
        14) LAPB control block leak fix from Jeremy Sowden"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (145 commits)
        lapb: fixed leak of control-blocks.
        tipc: purge deferredq list for each grp member in tipc_group_delete
        ax25: fix inconsistent lock state in ax25_destroy_timer
        neigh: fix use-after-free read in pneigh_get_next
        tcp: fix compile error if !CONFIG_SYSCTL
        hv_sock: Suppress bogus "may be used uninitialized" warnings
        be2net: Fix number of Rx queues used for flow hashing
        net: handle 802.1P vlan 0 packets properly
        tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()
        tcp: add tcp_min_snd_mss sysctl
        tcp: tcp_fragment() should apply sane memory limits
        tcp: limit payload size of sacked skbs
        Revert "net: phylink: set the autoneg state in phylink_phy_change"
        bpf: fix nested bpf tracepoints with per-cpu data
        bpf: Fix out of bounds memory access in bpf_sk_storage
        vsock/virtio: set SOCK_DONE on peer shutdown
        net: dsa: rtl8366: Fix up VLAN filtering
        net: phylink: set the autoneg state in phylink_phy_change
        net: add high_order_alloc_disable sysctl/static key
        tcp: add tcp_tx_skb_cache sysctl
        ...
      da0f3820
    • Christian Brauner's avatar
      fs/namespace: fix unprivileged mount propagation · d728cf79
      Christian Brauner authored
      When propagating mounts across mount namespaces owned by different user
      namespaces it is not possible anymore to move or umount the mount in the
      less privileged mount namespace.
      
      Here is a reproducer:
      
        sudo mount -t tmpfs tmpfs /mnt
        sudo --make-rshared /mnt
      
        # create unprivileged user + mount namespace and preserve propagation
        unshare -U -m --map-root --propagation=unchanged
      
        # now change back to the original mount namespace in another terminal:
        sudo mkdir /mnt/aaa
        sudo mount -t tmpfs tmpfs /mnt/aaa
      
        # now in the unprivileged user + mount namespace
        mount --move /mnt/aaa /opt
      
      Unfortunately, this is a pretty big deal for userspace since this is
      e.g. used to inject mounts into running unprivileged containers.
      So this regression really needs to go away rather quickly.
      
      The problem is that a recent change falsely locked the root of the newly
      added mounts by setting MNT_LOCKED. Fix this by only locking the mounts
      on copy_mnt_ns() and not when adding a new mount.
      
      Fixes: 3bd045cc ("separate copying and locking mount tree on cross-userns copies")
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Tested-by: default avatarChristian Brauner <christian@brauner.io>
      Acked-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarChristian Brauner <christian@brauner.io>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d728cf79
    • Eric Biggers's avatar
      vfs: fsmount: add missing mntget() · 1b0b9cc8
      Eric Biggers authored
      sys_fsmount() needs to take a reference to the new mount when adding it
      to the anonymous mount namespace.  Otherwise the filesystem can be
      unmounted while it's still in use, as found by syzkaller.
      Reported-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reported-by: syzbot+99de05d099a170867f22@syzkaller.appspotmail.com
      Reported-by: syzbot+7008b8b8ba7df475fdc8@syzkaller.appspotmail.com
      Fixes: 93766fbd ("vfs: syscall: Add fsmount() to create a mount for a superblock")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      1b0b9cc8
    • David S. Miller's avatar
      Merge branch 'tcp-fixes' · 4fddbf8a
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: make sack processing more robust
      
      Jonathan Looney brought to our attention multiple problems
      in TCP stack at the sender side.
      
      SACK processing can be abused by malicious peers to either
      cause overflows, or increase of memory usage.
      
      First two patches fix the immediate problems.
      
      Since the malicious peers abuse senders by advertizing a very
      small MSS in their SYN or SYNACK packet, the last two
      patches add a new sysctl so that admins can chose a higher
      limit for MSS clamping.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4fddbf8a
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-v5.2/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · eb7c825b
      Linus Torvalds authored
      Pull RISC-V fixes from Paul Walmsley:
       "This contains fixes, defconfig, and DT data changes for the v5.2-rc
        series.
      
        The fixes are relatively straightforward:
      
         - Addition of a TLB fence in the vmalloc_fault path, so the CPU
           doesn't enter an infinite page fault loop
      
         - Readdition of the pm_power_off export, so device drivers that
           reassign it can now be built as modules
      
         - A udelay() fix for RV32, fixing a miscomputation of the delay time
      
         - Removal of deprecated smp_mb__*() barriers
      
        This also adds initial DT data infrastructure for arch/riscv, along
        with initial data for the SiFive FU540-C000 SoC and the corresponding
        HiFive Unleashed board.
      
        We also update the RV64 defconfig to include some core drivers for the
        FU540 in the build"
      
      * tag 'riscv-for-v5.2/fixes-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: remove unused barrier defines
        riscv: mm: synchronize MMU after pte change
        riscv: dts: add initial board data for the SiFive HiFive Unleashed
        riscv: dts: add initial support for the SiFive FU540-C000 SoC
        dt-bindings: riscv: convert cpu binding to json-schema
        dt-bindings: riscv: sifive: add YAML documentation for the SiFive FU540
        arch: riscv: add support for building DTB files from DT source data
        riscv: Fix udelay in RV32.
        riscv: export pm_power_off again
        RISC-V: defconfig: enable clocks, serial console
      eb7c825b
    • Rolf Eike Beer's avatar
      riscv: remove unused barrier defines · 259931fd
      Rolf Eike Beer authored
      They were introduced in commit fab957c1 ("RISC-V: Atomic and
      Locking Code") long after commit 2e39465a ("locking: Remove
      deprecated smp_mb__() barriers") removed the remnants of all previous
      instances from the tree.
      Signed-off-by: default avatarRolf Eike Beer <eb@emlix.com>
      [paul.walmsley@sifive.com: stripped spurious mbox header from patch
       description; fixed commit references in patch header]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      259931fd
    • ShihPo Hung's avatar
      riscv: mm: synchronize MMU after pte change · bf587caa
      ShihPo Hung authored
      Because RISC-V compliant implementations can cache invalid entries
      in TLB, an SFENCE.VMA is necessary after changes to the page table.
      This patch adds an SFENCE.vma for the vmalloc_fault path.
      Signed-off-by: default avatarShihPo Hung <shihpo.hung@sifive.com>
      [paul.walmsley@sifive.com: reversed tab->whitespace conversion,
       wrapped comment lines]
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: linux-riscv@lists.infradead.org
      Cc: stable@vger.kernel.org
      bf587caa
    • Paul Walmsley's avatar
      riscv: dts: add initial board data for the SiFive HiFive Unleashed · c35f1b87
      Paul Walmsley authored
      Add initial board data for the SiFive HiFive Unleashed A00.
      
      Currently the data populated in this DT file describes the board
      DRAM configuration and the external clock sources that supply the
      PRCI.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Tested-by: default avatarLoys Ollivier <lollivier@baylibre.com>
      Tested-by: default avatarKevin Hilman <khilman@baylibre.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Antony Pavlov <antonynpavlov@gmail.com>
      Cc: devicetree@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      c35f1b87
    • Paul Walmsley's avatar
      riscv: dts: add initial support for the SiFive FU540-C000 SoC · 72296bde
      Paul Walmsley authored
      Add initial support for the SiFive FU540-C000 SoC.  This is a 28nm SoC
      based around the SiFive U54-MC core complex and a TileLink
      interconnect.
      
      This file is expected to grow as more device drivers are added to the
      kernel.
      
      This patch includes a fix to the QSPI memory map due to a
      documentation bug, found by ShihPo Hung <shihpo.hung@sifive.com>, adds
      entries for the I2C controller, and merges all DT changes that
      formerly were made dynamically by the riscv-pk BBL proxy kernel.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Tested-by: default avatarLoys Ollivier <lollivier@baylibre.com>
      Tested-by: default avatarKevin Hilman <khilman@baylibre.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: ShihPo Hung <shihpo.hung@sifive.com>
      Cc: devicetree@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      72296bde
    • Paul Walmsley's avatar
      dt-bindings: riscv: convert cpu binding to json-schema · 4fd669a8
      Paul Walmsley authored
      At Rob's request, we're starting to migrate our DT binding
      documentation to json-schema YAML format.  Start by converting our cpu
      binding documentation.  While doing so, document more properties and
      nodes.  This includes adding binding documentation support for the E51
      and U54 CPU cores ("harts") that are present on this SoC.  These cores
      are described in:
      
          https://static.dev.sifive.com/FU540-C000-v1.0.pdf
      
      This cpus.yaml file is intended to be a starting point and to
      evolve over time.  It passes dt-doc-validate as of the yaml-bindings
      commit 4c79d42e9216.
      
      This patch was originally based on the ARM json-schema binding
      documentation as added by commit 672951cb ("dt-bindings: arm: Convert
      cpu binding to json-schema").
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: devicetree@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      4fd669a8
    • Paul Walmsley's avatar
      dt-bindings: riscv: sifive: add YAML documentation for the SiFive FU540 · c7af5598
      Paul Walmsley authored
      Add YAML DT binding documentation for the SiFive FU540 SoC.  This
      SoC is documented at:
      
          https://static.dev.sifive.com/FU540-C000-v1.0.pdf
      
      Passes dt-doc-validate, as of yaml-bindings commit 4c79d42e9216.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Reviewed-by: default avatarRob Herring <robh@kernel.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: devicetree@vger.kernel.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-kernel@vger.kernel.org
      c7af5598
    • Paul Walmsley's avatar
      arch: riscv: add support for building DTB files from DT source data · 8d4e048d
      Paul Walmsley authored
      Similar to ARM64, add support for building DTB files from DT source
      data for RISC-V boards.
      
      This patch starts with the infrastructure needed for SiFive boards.
      Boards from other vendors would add support here in a similar form.
      Signed-off-by: default avatarPaul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: default avatarPaul Walmsley <paul@pwsan.com>
      Tested-by: default avatarLoys Ollivier <lollivier@baylibre.com>
      Tested-by: default avatarKevin Hilman <khilman@baylibre.com>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      8d4e048d
    • Jeremy Sowden's avatar
      lapb: fixed leak of control-blocks. · 6be8e297
      Jeremy Sowden authored
      lapb_register calls lapb_create_cb, which initializes the control-
      block's ref-count to one, and __lapb_insert_cb, which increments it when
      adding the new block to the list of blocks.
      
      lapb_unregister calls __lapb_remove_cb, which decrements the ref-count
      when removing control-block from the list of blocks, and calls lapb_put
      itself to decrement the ref-count before returning.
      
      However, lapb_unregister also calls __lapb_devtostruct to look up the
      right control-block for the given net_device, and __lapb_devtostruct
      also bumps the ref-count, which means that when lapb_unregister returns
      the ref-count is still 1 and the control-block is leaked.
      
      Call lapb_put after __lapb_devtostruct to fix leak.
      
      Reported-by: syzbot+afb980676c836b4a0afa@syzkaller.appspotmail.com
      Signed-off-by: default avatarJeremy Sowden <jeremy@azazel.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6be8e297