1. 10 Mar, 2017 8 commits
    • David Howells's avatar
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells authored
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cdfbabfb
    • David S. Miller's avatar
      Merge branch 'bnxt_en-misc-small-fixes' · 81dca07b
      David S. Miller authored
      Michael Chan says:
      
      ====================
      bnxt_en: Misc. small fixes.
      
      Fixes include moving the initial function reset, notifying the RDMA driver
      during tx timeout, setting dcbx_cap properly depending on whether the
      firmware agent is running or not, and an autoneg related improvement.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81dca07b
    • Michael Chan's avatar
      bnxt_en: Ignore 0 value in autoneg supported speed from firmware. · 520ad89a
      Michael Chan authored
      In some situations, the firmware will return 0 for autoneg supported
      speed.  This may happen if the firmware detects no SFP module, for
      example.  The driver should ignore this so that we don't end up with
      an invalid autoneg setting with nothing advertised.  When SFP module
      is inserted, we'll get the updated settings from firmware at that time.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      520ad89a
    • Michael Chan's avatar
      bnxt_en: Check if firmware LLDP agent is running. · bc39f885
      Michael Chan authored
      Set DCB_CAP_DCBX_HOST capability flag only if the firmware LLDP agent
      is not running.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc39f885
    • Michael Chan's avatar
      bnxt_en: Call bnxt_ulp_stop() during tx timeout. · b386cd36
      Michael Chan authored
      If we call bnxt_reset_task() due to tx timeout, we should call
      bnxt_ulp_stop() to inform the RDMA driver about the error and the
      impending reset.
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b386cd36
    • Michael Chan's avatar
      bnxt_en: Perform function reset earlier during probe. · 3c2217a6
      Michael Chan authored
      The firmware call to do function reset is done too late.  It is causing
      the rings that have been reserved to be freed.  In NPAR mode, this bug
      is causing us to run out of rings.
      
      Fixes: 391be5c2 ("bnxt_en: Implement new scheme to reserve tx rings.")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c2217a6
    • LABBE Corentin's avatar
      tun: remove copyright printing · 6cbac982
      LABBE Corentin authored
      Printing copyright does not give any useful information on the boot
      process.
      Furthermore, the email address printed is obsolete since
      commit ba57b6f2 ("MAINTAINERS: fix bouncing tun/tap entries")
      Signed-off-by: default avatarCorentin Labbe <clabbe.montjoie@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6cbac982
    • Alexander Potapenko's avatar
      net: initialize msg.msg_flags in recvfrom · 9f138fa6
      Alexander Potapenko authored
      KMSAN reports a use of uninitialized memory in put_cmsg() because
      msg.msg_flags in recvfrom haven't been initialized properly.
      The flag values don't affect the result on this path, but it's still a
      good idea to initialize them explicitly.
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9f138fa6
  2. 09 Mar, 2017 27 commits
  3. 07 Mar, 2017 5 commits
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 8474c8ca
      David S. Miller authored
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2017-03-06
      
      1) Fix lockdep splat on xfrm policy subsystem initialization.
         From Florian Westphal.
      
      2) When using socket policies on IPv4-mapped IPv6 addresses,
         we access the flow informations of the wrong address family
         what leads to an out of bounds access. Fix this by using
         the family we get with the dst_entry, like we do it for the
         standard policy lookup.
      
      3) vti6 can report a PMTU below IPV6_MIN_MTU. Fix this by
         adding a check for that before sending a ICMPV6_PKT_TOOBIG
         message.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8474c8ca
    • WANG Cong's avatar
      ipv6: reorder icmpv6_init() and ip6_mr_init() · 15e66807
      WANG Cong authored
      Andrey reported the following kernel crash:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88001f311700 task.stack: ffff88001f6e8000
      RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618
      RSP: 0018:ffff88001f6ef418 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 1ffff10003edde8c RCX: ffffc900043ee000
      RDX: 0000000000000004 RSI: ffffffff83e3b3f8 RDI: 0000000000000020
      RBP: ffff88001f6ef508 R08: fffffbfff0dcc5d8 R09: 0000000000000000
      R10: ffffffff86e62ec0 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88001f6ef4e0 R15: ffff8800380a0040
      FS:  00007f7a52cec700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000061c500 CR3: 000000001f1ae000 CR4: 00000000000006f0
      DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
       sock_release+0x8d/0x1e0 net/socket.c:597
       __sock_create+0x39d/0x880 net/socket.c:1226
       sock_create_kern+0x3f/0x50 net/socket.c:1243
       inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526
       icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954
       ops_init+0x10a/0x550 net/core/net_namespace.c:115
       setup_net+0x261/0x660 net/core/net_namespace.c:291
       copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396
      9pnet_virtio: no channels available for device ./file1
       create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
       SYSC_unshare kernel/fork.c:2281 [inline]
       SyS_unshare+0x64e/0x1000 kernel/fork.c:2231
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This is because net->ipv6.mr6_tables is not initialized at that point,
      ip6mr_rules_init() is not called yet, therefore on the error path when
      we iterator the list, we trigger this oops. Fix this by reordering
      ip6mr_rules_init() before icmpv6_sk_init().
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      15e66807
    • Eric Dumazet's avatar
      dccp: fix use-after-free in dccp_feat_activate_values · 62f8f4d9
      Eric Dumazet authored
      Dmitry reported crashes in DCCP stack [1]
      
      Problem here is that when I got rid of listener spinlock, I missed the
      fact that DCCP stores a complex state in struct dccp_request_sock,
      while TCP does not.
      
      Since multiple cpus could access it at the same time, we need to add
      protection.
      
      [1]
      BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
      net/dccp/feat.c:1541 at addr ffff88003713be68
      Read of size 8 by task syz-executor2/8457
      CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
       print_address_description mm/kasan/report.c:200 [inline]
       kasan_report_error mm/kasan/report.c:289 [inline]
       kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
       kasan_report mm/kasan/report.c:332 [inline]
       __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
       dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
       dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
       dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
       dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
       dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
       do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
       </IRQ>
       do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
       do_softirq kernel/softirq.c:176 [inline]
       __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
       local_bh_enable include/linux/bottom_half.h:31 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
       ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
       ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
       ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
       inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
       dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
       dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
       dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
       dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       SYSC_sendto+0x660/0x810 net/socket.c:1687
       SyS_sendto+0x40/0x50 net/socket.c:1655
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x4458b9
      RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9
      RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017
      RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020
      R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8
      R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700
      Object at ffff88003713be50, in cache kmalloc-64 size: 64
      Allocated:
      PID = 8446
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
       kmalloc include/linux/slab.h:490 [inline]
       dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
       dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
       __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
       dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
       dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
       dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
       dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
       dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
       dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
       dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
       sk_backlog_rcv include/net/sock.h:893 [inline]
       __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
       dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
      Freed:
      PID = 15
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514 [inline]
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1355 [inline]
       slab_free_freelist_hook mm/slub.c:1377 [inline]
       slab_free mm/slub.c:2954 [inline]
       kfree+0xe8/0x2b0 mm/slub.c:3874
       dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418
       dccp_feat_entry_destructor net/dccp/feat.c:416 [inline]
       dccp_feat_list_pop net/dccp/feat.c:541 [inline]
       dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543
       dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
       dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
       dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
       dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
      Memory state around the buggy address:
       ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
                                                                ^
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      62f8f4d9
    • Thomas Falcon's avatar
      ibmvnic: Allocate number of rx/tx buffers agreed on by firmware · 068d9f90
      Thomas Falcon authored
      The amount of TX/RX buffers that the vNIC driver currently allocates
      is different from the amount agreed upon in negotiation with firmware.
      Correct that by allocating the requested number of buffers confirmed
      by firmware.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      068d9f90
    • Thomas Falcon's avatar
      ibmvnic: Fix overflowing firmware/hardware TX queue · 142c0ac4
      Thomas Falcon authored
      Use a counter to track the number of outstanding transmissions sent
      that have not received completions. If the counter reaches the maximum
      number of queue entries, stop transmissions on that queue. As we receive
      more completions from firmware, wake the queue once the counter reaches
      an acceptable level.
      
      This patch prevents hardware/firmware TX queue from filling up and
      and generating errors.  Since incorporating this fix, internal testing
      has reported that these firmware errors have stopped.
      Signed-off-by: default avatarThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      142c0ac4