1. 08 Nov, 2019 40 commits
    • Xin Long's avatar
      sctp: add pf_expose per netns and sock and asoc · aef587be
      Xin Long authored
      As said in rfc7829, section 3, point 12:
      
        The SCTP stack SHOULD expose the PF state of its destination
        addresses to the ULP as well as provide the means to notify the
        ULP of state transitions of its destination addresses from
        active to PF, and vice versa.  However, it is recommended that
        an SCTP stack implementing SCTP-PF also allows for the ULP to be
        kept ignorant of the PF state of its destinations and the
        associated state transitions, thus allowing for retention of the
        simpler state transition model of [RFC4960] in the ULP.
      
      Not only does it allow to expose the PF state to ULP, but also
      allow to ignore sctp-pf to ULP.
      
      So this patch is to add pf_expose per netns, sock and asoc. And in
      sctp_assoc_control_transport(), ulp_notify will be set to false if
      asoc->expose is not 'enabled' in next patch.
      
      It also allows a user to change pf_expose per netns by sysctl, and
      pf_expose per sock and asoc will be initialized with it.
      
      Note that pf_expose also works for SCTP_GET_PEER_ADDR_INFO sockopt,
      to not allow a user to query the state of a sctp-pf peer address
      when pf_expose is 'disabled', as said in section 7.3.
      
      v1->v2:
        - Fix a build warning noticed by Nathan Chancellor.
      v2->v3:
        - set pf_expose to UNUSED by default to keep compatible with old
          applications.
      v3->v4:
        - add a new entry for pf_expose on ip-sysctl.txt, as Marcelo suggested.
        - change this patch to 1/5, and move sctp_assoc_control_transport
          change into 2/5, as Marcelo suggested.
        - use SCTP_PF_EXPOSE_UNSET instead of SCTP_PF_EXPOSE_UNUSED, and
          set SCTP_PF_EXPOSE_UNSET to 0 in enum, as Marcelo suggested.
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aef587be
    • Jiri Pirko's avatar
      devlink: disallow reload operation during device cleanup · a0c76345
      Jiri Pirko authored
      There is a race between driver code that does setup/cleanup of device
      and devlink reload operation that in some drivers works with the same
      code. Use after free could we easily obtained by running:
      
      while true; do
              echo 10 > /sys/bus/netdevsim/new_device
              devlink dev reload netdevsim/netdevsim10 &
              echo 10 > /sys/bus/netdevsim/del_device
      done
      
      Fix this by enabling reload only after setup of device is complete and
      disabling it at the beginning of the cleanup process.
      Reported-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Fixes: 2d8dc5bb ("devlink: Add support for reload")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a0c76345
    • Jiri Pirko's avatar
      selftest: net: add alternative names test · f95e6c9c
      Jiri Pirko authored
      Add a simple test for recently added netdevice alternative names.
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f95e6c9c
    • Eric Dumazet's avatar
      packet: fix data-race in fanout_flow_is_huge() · b756ad92
      Eric Dumazet authored
      KCSAN reported the following data-race [1]
      
      Adding a couple of READ_ONCE()/WRITE_ONCE() should silence it.
      
      Since the report hinted about multiple cpus using the history
      concurrently, I added a test avoiding writing on it if the
      victim slot already contains the desired value.
      
      [1]
      
      BUG: KCSAN: data-race in fanout_demux_rollover / fanout_demux_rollover
      
      read to 0xffff8880b01786cc of 4 bytes by task 18921 on cpu 1:
       fanout_flow_is_huge net/packet/af_packet.c:1303 [inline]
       fanout_demux_rollover+0x33e/0x3f0 net/packet/af_packet.c:1353
       packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
       deliver_skb net/core/dev.c:1888 [inline]
       dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
       xmit_one net/core/dev.c:3195 [inline]
       dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
       __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      write to 0xffff8880b01786cc of 4 bytes by task 18922 on cpu 0:
       fanout_flow_is_huge net/packet/af_packet.c:1306 [inline]
       fanout_demux_rollover+0x3a4/0x3f0 net/packet/af_packet.c:1353
       packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
       deliver_skb net/core/dev.c:1888 [inline]
       dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
       xmit_one net/core/dev.c:3195 [inline]
       dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
       __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 18922 Comm: syz-executor.3 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 3b3a5b0a ("packet: rollover huge flows before small flows")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b756ad92
    • David S. Miller's avatar
      Merge branch 'TIPC-Encryption' · 1c8f11d0
      David S. Miller authored
      Tuong Lien says:
      
      ====================
      TIPC Encryption
      
      This series provides TIPC encryption feature, kernel part. There will be
      another one in the 'iproute2/tipc' for user space to set key.
      
      v2: add select crypto 'aes(gcm)' for TIPC_CRYPTO in Kconfig
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c8f11d0
    • Tuong Lien's avatar
      tipc: add support for AEAD key setting via netlink · e1f32190
      Tuong Lien authored
      This commit adds two netlink commands to TIPC in order for user to be
      able to set or remove AEAD keys:
      - TIPC_NL_KEY_SET
      - TIPC_NL_KEY_FLUSH
      
      When the 'KEY_SET' is given along with the key data, the key will be
      initiated and attached to TIPC crypto. On the other hand, the
      'KEY_FLUSH' command will remove all existing keys if any.
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1f32190
    • Tuong Lien's avatar
      tipc: introduce TIPC encryption & authentication · fc1b6d6d
      Tuong Lien authored
      This commit offers an option to encrypt and authenticate all messaging,
      including the neighbor discovery messages. The currently most advanced
      algorithm supported is the AEAD AES-GCM (like IPSec or TLS). All
      encryption/decryption is done at the bearer layer, just before leaving
      or after entering TIPC.
      
      Supported features:
      - Encryption & authentication of all TIPC messages (header + data);
      - Two symmetric-key modes: Cluster and Per-node;
      - Automatic key switching;
      - Key-expired revoking (sequence number wrapped);
      - Lock-free encryption/decryption (RCU);
      - Asynchronous crypto, Intel AES-NI supported;
      - Multiple cipher transforms;
      - Logs & statistics;
      
      Two key modes:
      - Cluster key mode: One single key is used for both TX & RX in all
      nodes in the cluster.
      - Per-node key mode: Each nodes in the cluster has one specific TX key.
      For RX, a node requires its peers' TX key to be able to decrypt the
      messages from those peers.
      
      Key setting from user-space is performed via netlink by a user program
      (e.g. the iproute2 'tipc' tool).
      
      Internal key state machine:
      
                                       Attach    Align(RX)
                                           +-+   +-+
                                           | V   | V
              +---------+      Attach     +---------+
              |  IDLE   |---------------->| PENDING |(user = 0)
              +---------+                 +---------+
                 A   A                   Switch|  A
                 |   |                         |  |
                 |   | Free(switch/revoked)    |  |
           (Free)|   +----------------------+  |  |Timeout
                 |              (TX)        |  |  |(RX)
                 |                          |  |  |
                 |                          |  v  |
              +---------+      Switch     +---------+
              | PASSIVE |<----------------| ACTIVE  |
              +---------+       (RX)      +---------+
              (user = 1)                  (user >= 1)
      
      The number of TFMs is 10 by default and can be changed via the procfs
      'net/tipc/max_tfms'. At this moment, as for simplicity, this file is
      also used to print the crypto statistics at runtime:
      
      echo 0xfff1 > /proc/sys/net/tipc/max_tfms
      
      The patch defines a new TIPC version (v7) for the encryption message (-
      backward compatibility as well). The message is basically encapsulated
      as follows:
      
         +----------------------------------------------------------+
         | TIPCv7 encryption  | Original TIPCv2    | Authentication |
         | header             | packet (encrypted) | Tag            |
         +----------------------------------------------------------+
      
      The throughput is about ~40% for small messages (compared with non-
      encryption) and ~9% for large messages. With the support from hardware
      crypto i.e. the Intel AES-NI CPU instructions, the throughput increases
      upto ~85% for small messages and ~55% for large messages.
      
      By default, the new feature is inactive (i.e. no encryption) until user
      sets a key for TIPC. There is however also a new option - "TIPC_CRYPTO"
      in the kernel configuration to enable/disable the new code when needed.
      
      MAINTAINERS | add two new files 'crypto.h' & 'crypto.c' in tipc
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fc1b6d6d
    • Tuong Lien's avatar
      tipc: add new AEAD key structure for user API · 134bdac3
      Tuong Lien authored
      The new structure 'tipc_aead_key' is added to the 'tipc.h' for user to
      be able to transfer a key to TIPC in kernel. Netlink will be used for
      this purpose in the later commits.
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      134bdac3
    • Tuong Lien's avatar
      tipc: enable creating a "preliminary" node · 4cbf8ac2
      Tuong Lien authored
      When user sets RX key for a peer not existing on the own node, a new
      node entry is needed to which the RX key will be attached. However,
      since the peer node address (& capabilities) is unknown at that moment,
      only the node-ID is provided, this commit allows the creation of a node
      with only the data that we call as “preliminary”.
      
      A preliminary node is not the object of the “tipc_node_find()” but the
      “tipc_node_find_by_id()”. Once the first message i.e. LINK_CONFIG comes
      from that peer, and is successfully decrypted by the own node, the
      actual peer node data will be properly updated and the node will
      function as usual.
      
      In addition, the node timer always starts when a node object is created
      so if a preliminary node is not used, it will be cleaned up.
      
      The later encryption functions will also use the node timer and be able
      to create a preliminary node automatically when needed.
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4cbf8ac2
    • Tuong Lien's avatar
      tipc: add reference counter to bearer · 2a7ee696
      Tuong Lien authored
      As a need to support the crypto asynchronous operations in the later
      commits, apart from the current RCU mechanism for bearer pointer, we
      add a 'refcnt' to the bearer object as well.
      
      So, a bearer can be hold via 'tipc_bearer_hold()' without being freed
      even though the bearer or interface can be disabled in the meanwhile.
      If that happens, the bearer will be released then when the crypto
      operation is completed and 'tipc_bearer_put()' is called.
      Acked-by: default avatarYing Xue <ying.xue@windreiver.com>
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarTuong Lien <tuong.t.lien@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a7ee696
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · f1ff4e80
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      100GbE Intel Wired LAN Driver Updates 2019-11-08
      
      Another series that contains updates to the ice driver only.
      
      Anirudh cleans up the code of kernel config of ifdef wrappers by moving
      code that is needed by DCB to disable and enable the PF VSI for
      configuration.  Implements ice_vsi_type_str() to convert an VSI type
      enum value to its string equivalent to help identify VSI types from
      module print statements.
      
      Usha and Tarun add support for setting the maximum per-queue bit rate
      for transmit queues.
      
      Dave implements dcb_nl set functions and supporting software DCB
      functions to support the callbacks defined in the dcbnl_rtnl_ops
      structure.
      
      Henry adds a check to ensure we are not resetting the device when trying
      to configure it, and to return -EBUSY during a reset.
      
      Usha fixes a call trace caused by the receive/transmit descriptor size
      change request via ethtool when DCB is configured by using the number of
      enabled queues and not the total number of allocated queues.
      
      Paul cleans up and refactors the software LLDP configuration to handle
      when firmware DCBX is disabled.
      
      Akeem adds checks to ensure the VF or PF is not disabled before honoring
      mailbox messages to configure the VF.
      
      Brett corrects the check to make sure the vector_id passed down from
      iavf is less than the max allowed interrupts per VF.  Updates a flag bit
      to align with the current specification.
      
      Bruce updates a switch statement to use the correct status of the
      Download Package AQ command.  Does some housekeeping by cleaning up a
      conditional check that is not needed.
      
      Mitch shortens up the delay for SQ responses to resolve issues with VF
      resets failing.
      
      Jake cleans up the code reducing namespace pollution and to simplify
      ice_debug_cq() since it always uses the same mask, not need to pass it
      in.  Improve debugging by adding the command opcode in the debug
      messages that print an error code.
      
      v2: fixed reverse christmas tree issue in patch 3 of the series.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f1ff4e80
    • Eric Dumazet's avatar
      net: icmp: fix data-race in cmp_global_allow() · bbab7ef2
      Eric Dumazet authored
      This code reads two global variables without protection
      of a lock. We need READ_ONCE()/WRITE_ONCE() pairs to
      avoid load/store-tearing and better document the intent.
      
      KCSAN reported :
      BUG: KCSAN: data-race in icmp_global_allow / icmp_global_allow
      
      read to 0xffffffff861a8014 of 4 bytes by task 11201 on cpu 0:
       icmp_global_allow+0x36/0x1b0 net/ipv4/icmp.c:254
       icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
       icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
       icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
       icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
       dst_link_failure include/net/dst.h:419 [inline]
       vti_xmit net/ipv4/ip_vti.c:243 [inline]
       vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
       __netdev_start_xmit include/linux/netdevice.h:4420 [inline]
       netdev_start_xmit include/linux/netdevice.h:4434 [inline]
       xmit_one net/core/dev.c:3280 [inline]
       dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
       __dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
      
      write to 0xffffffff861a8014 of 4 bytes by task 11183 on cpu 1:
       icmp_global_allow+0x174/0x1b0 net/ipv4/icmp.c:272
       icmpv6_global_allow net/ipv6/icmp.c:184 [inline]
       icmpv6_global_allow net/ipv6/icmp.c:179 [inline]
       icmp6_send+0x493/0x1140 net/ipv6/icmp.c:514
       icmpv6_send+0x71/0xb0 net/ipv6/ip6_icmp.c:43
       ip6_link_failure+0x43/0x180 net/ipv6/route.c:2640
       dst_link_failure include/net/dst.h:419 [inline]
       vti_xmit net/ipv4/ip_vti.c:243 [inline]
       vti_tunnel_xmit+0x27f/0xa50 net/ipv4/ip_vti.c:279
       __netdev_start_xmit include/linux/netdevice.h:4420 [inline]
       netdev_start_xmit include/linux/netdevice.h:4434 [inline]
       xmit_one net/core/dev.c:3280 [inline]
       dev_hard_start_xmit+0xef/0x430 net/core/dev.c:3296
       __dev_queue_xmit+0x14c9/0x1b60 net/core/dev.c:3873
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3906
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a6/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 11183 Comm: syz-executor.2 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 4cdf507d ("icmp: add a global rate limitation")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bbab7ef2
    • Eric Dumazet's avatar
      net/sched: annotate lockless accesses to qdisc->empty · 90b2be27
      Eric Dumazet authored
      KCSAN reported the following race [1]
      
      BUG: KCSAN: data-race in __dev_queue_xmit / net_tx_action
      
      read to 0xffff8880ba403508 of 1 bytes by task 21814 on cpu 1:
       __dev_xmit_skb net/core/dev.c:3389 [inline]
       __dev_queue_xmit+0x9db/0x1b40 net/core/dev.c:3761
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_hh_output include/net/neighbour.h:500 [inline]
       neigh_output include/net/neighbour.h:509 [inline]
       ip6_finish_output2+0x873/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      write to 0xffff8880ba403508 of 1 bytes by interrupt on cpu 0:
       qdisc_run_begin include/net/sch_generic.h:160 [inline]
       qdisc_run include/net/pkt_sched.h:120 [inline]
       net_tx_action+0x2b1/0x6c0 net/core/dev.c:4551
       __do_softirq+0x115/0x33f kernel/softirq.c:292
       do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
       do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337
       do_softirq kernel/softirq.c:329 [inline]
       __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189
       local_bh_enable include/linux/bottom_half.h:32 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:688 [inline]
       ip6_finish_output2+0x7bb/0xec0 net/ipv6/ip6_output.c:117
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 21817 Comm: syz-executor.2 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: d518d2ed ("net/sched: fix race between deactivation and dequeue for NOLOCK qdisc")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Davide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      90b2be27
    • Jacob Keller's avatar
      ice: print opcode when printing controlq errors · fb0254b2
      Jacob Keller authored
      To help aid in debugging, display the command opcode in debug messages
      that print an error code. This makes it easier to see what command
      failed if only ICE_DBG_AQ_MSG is enabled.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fb0254b2
    • Jacob Keller's avatar
      ice: use more accurate ICE_DBG mask types · faa01721
      Jacob Keller authored
      ice_debug_cq is passed a mask which is always ICE_DBG_AQ_CMD. Modify this
      function, removing the mask parameter entirely, and directly use the more
      appropriate ICE_DBG_AQ_DESC and ICE_DBG_AQ_DESC_BUF.
      
      The function is only called from ice_controlq.c, and has no
      other callers outside of that file. Move it and mark it static to avoid
      namespace pollution.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      faa01721
    • Anirudh Venkataramanan's avatar
      ice: Introduce and use ice_vsi_type_str · 964674f1
      Anirudh Venkataramanan authored
      ice_vsi_type_str converts an ice_vsi_type enum value to its string
      equivalent. This is expected to help easily identify VSI types from
      module print statements.
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      964674f1
    • Bruce Allan's avatar
      ice: remove unnecessary conditional check · 87a2e498
      Bruce Allan authored
      There is no reason to do this conditional check before the assignment so
      simply remove it.
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      87a2e498
    • Brett Creeley's avatar
      ice: Update enum ice_flg64_bits to current specification · 893869d5
      Brett Creeley authored
      Currently the VLAN ice_flg64_bits are off by 1. Fix this by
      setting the ICE_FLG_EVLAN_x8100 flag to 14, which also updates
      ICE_FLG_EVLAN_x9100 to 15 and ICE_FLG_VLAN_x8100 to 16.
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      893869d5
    • Mitch Williams's avatar
      ice: delay less · 88bb432a
      Mitch Williams authored
      Shorten the delay for SQ responses, but increase the number of loops.
      Max delay time is unchanged, but some operations complete much more
      quickly.
      
      In the process, add a new define to make the delay count and delay time
      more explicit. Add comments to make things more explicit.
      
      This fixes a problem with VF resets failing on with many VFs.
      Signed-off-by: default avatarMitch Williams <mitch.a.williams@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      88bb432a
    • Bruce Allan's avatar
      ice: use pkg_dwnld_status instead of sq_last_status · e000248e
      Bruce Allan authored
      Since the return value from the Download Package AQ command is stored in
      hw->pkg_dwnld_status, use that instead of sq_last_status since that may
      have the return value from some other AQ command leading to unexpected
      results.
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      e000248e
    • Brett Creeley's avatar
      ice: Change max MSI-x vector_id check in cfg_irq_map · b791cdd5
      Brett Creeley authored
      Currently we check to make sure the vector_id passed down from iavf
      is less than or equal to pf->hw.func_caps.common_caps.num_msix_vectors.
      This is incorrect because the vector_id is always 0-based and never
      greater than or equal to the ICE_MAX_INTR_PER_VF. Fix this by checking
      to make sure the vector_id is less than the max allowed interrupts per
      VF (ICE_MAX_INTR_PER_VF).
      Signed-off-by: default avatarBrett Creeley <brett.creeley@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b791cdd5
    • Akeem G Abodunrin's avatar
      ice: Check if VF is disabled for Opcode and other operations · ec4f5a43
      Akeem G Abodunrin authored
      This patch adds code to check if PF or VF is disabled before honoring
      mailbox message to configure VF - If it is disabled, and opcode is for
      resetting VF, the PF driver simply tell VF that all is set. In addition,
      if reset is ongoing, and Admin intend to configure VF on the host, we can
      poll the VF enabling bit to make sure it is ready before continue - If
      after ~250 milliseconds, VF is not in active state, we can bail out with
      invalid error.
      Signed-off-by: default avatarAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      ec4f5a43
    • Paul Greenwalt's avatar
      ice: configure software LLDP in ice_init_pf_dcb · 241c8cf0
      Paul Greenwalt authored
      Move software LLDP configuration when FW DCBX is disabled to
      ice_init_pf_dcb, since that is where the FW DCBX state is determined.
      Remove this software LLDP configuration from ice_vsi_setup and
      ice_set_priv_flags. Software configuration includes redirecting Rx LLDP
      packets up the stack, when FW DCBX is not running.
      Signed-off-by: default avatarPaul Greenwalt <paul.greenwalt@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      241c8cf0
    • Usha Ketineni's avatar
      ice: Fix to change Rx/Tx ring descriptor size via ethtool with DCBx · c0a3665f
      Usha Ketineni authored
      This patch fixes the call trace caused by the kernel when the Rx/Tx
      descriptor size change request is initiated via ethtool when DCB is
      configured. ice_set_ringparam() should use vsi->num_txq instead of
      vsi->alloc_txq as it represents the queues that are enabled in the
      driver when DCB is enabled/disabled. Otherwise, queue index being
      used can go out of range.
      
      For example, when vsi->alloc_txq has 104 queues and with 3 TCS enabled
      via DCB, each TC gets 34 queues, vsi->num_txq will be 102 and only 102
      queues will be enabled.
      Signed-off-by: default avatarUsha Ketineni <usha.k.ketineni@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c0a3665f
    • Henry Tieman's avatar
      ice: avoid setting features during reset · 5f8cc355
      Henry Tieman authored
      Certain subsystems behave very badly when called during reset (core
      dump). This patch returns -EBUSY when reconfiguring some subsystems
      during reset. With this patch some ethtool functions will not core
      dump during reset.
      Signed-off-by: default avatarHenry Tieman <henry.w.tieman@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5f8cc355
    • Dave Ertman's avatar
      ice: Implement DCBNL support · b94b013e
      Dave Ertman authored
      Implement interface layer for the DCBNL subsystem. These are the functions
      to support the callbacks defined in the dcbnl_rtnl_ops struct. These
      callbacks are going to be used to interface with the DCB settings of the
      device. Implementation of dcb_nl set functions and supporting SW DCB
      functions.
      Signed-off-by: default avatarDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b94b013e
    • Usha Ketineni's avatar
      ice: Add NDO callback to set the maximum per-queue bitrate · 1ddef455
      Usha Ketineni authored
      Allow for rate limiting Tx queues. Bitrate is set in
      Mbps(megabits per second).
      
      Mbps max-rate is set for the queue via sysfs:
      /sys/class/net/<iface>/queues/tx-<queue>/tx_maxrate
      ex: echo 100 >/sys/class/net/ens7/queues/tx-0/tx_maxrate
          echo 200 >/sys/class/net/ens7/queues/tx-1/tx_maxrate
      Note: A value of zero for tx_maxrate means disabled,
      default is disabled.
      Signed-off-by: default avatarUsha Ketineni <usha.k.ketineni@intel.com>
      Co-developed-by: default avatarTarun Singh <tarun.k.singh@intel.com>
      Signed-off-by: default avatarTarun Singh <tarun.k.singh@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1ddef455
    • Anirudh Venkataramanan's avatar
      ice: Use ice_ena_vsi and ice_dis_vsi in DCB configuration flow · 9d614b64
      Anirudh Venkataramanan authored
      DCB configuration flow needs to disable and enable only the PF (main)
      VSI, so use ice_ena_vsi and ice_dis_vsi. To avoid the use of ifdef to
      control the staticness of these functions, move them to ice_lib.c.
      
      Also replace the allocate and copy of old_cfg to kmemdup() in
      ice_pf_dcb_cfg().
      Signed-off-by: default avatarAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      9d614b64
    • Rahul Lakkireddy's avatar
      cxgb4: fix 64-bit division on i386 · 97c20ea8
      Rahul Lakkireddy authored
      Fix following compile error on i386 architecture.
      
      ERROR: "__udivdi3" [drivers/net/ethernet/chelsio/cxgb4/cxgb4.ko] undefined!
      
      Fixes: 0e395b3c ("cxgb4: add FLOWC based QoS offload")
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97c20ea8
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-net-next-2019-11-08' of... · 5bd2ce6a
      David S. Miller authored
      Merge tag 'mac80211-next-for-net-next-2019-11-08' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      Some relatively small changes:
       * typo fixes in docs
       * APIs for station separation using VLAN tags rather
         than separate wifi netdevs
       * some preparations for upcoming features (802.3 offload
         and airtime queue limits (AQL)
       * stack reduction in ieee80211_assoc_success()
       * use DEFINE_DEBUGFS_ATTRIBUTE in hwsim
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5bd2ce6a
    • YueHaibing's avatar
      cxgb4: Use match_string() helper to simplify the code · c8119fa8
      YueHaibing authored
      match_string() returns the array index of a matching string.
      Use it instead of the open-coded implementation.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8119fa8
    • Christophe Roullier's avatar
      net: ethernet: stmmac: Add support for syscfg clock · caee3174
      Christophe Roullier authored
      Add optional support for syscfg clock in dwmac-stm32.c
      Now Syscfg clock is activated automatically when syscfg
      registers are used
      Signed-off-by: default avatarChristophe Roullier <christophe.roullier@st.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caee3174
    • Gurumoorthi Gnanasambandhan's avatar
      cfg80211: VLAN offload support for set_key and set_sta_vlan · 14f34e36
      Gurumoorthi Gnanasambandhan authored
      This provides an alternative mechanism for AP VLAN support where a
      single netdev is used with VLAN tagged frames instead of separate
      netdevs for each VLAN without tagged frames from the WLAN driver.
      
      By setting NL80211_EXT_FEATURE_VLAN_OFFLOAD flag the driver indicates
      support for a single netdev with VLAN tagged frames. Separate
      VLAN-specific netdevs can be added using RTM_NEWLINK/IFLA_VLAN_ID
      similarly to Ethernet. NL80211_CMD_NEW_KEY (for group keys),
      NL80211_CMD_NEW_STATION, and NL80211_CMD_SET_STATION will optionally
      specify vlan_id using NL80211_ATTR_VLAN_ID.
      Signed-off-by: default avatarGurumoorthi Gnanasambandhan <gguru@codeaurora.org>
      Signed-off-by: default avatarJouni Malinen <jouni@codeaurora.org>
      Link: https://lore.kernel.org/r/20191031214640.5012-1-jouni@codeaurora.orgSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      14f34e36
    • Toke Høiland-Jørgensen's avatar
      mac80211: Shrink the size of ack_frame_id to make room for tx_time_est · 6912daed
      Toke Høiland-Jørgensen authored
      To implement airtime queue limiting, we need to keep a running account of
      the estimated airtime of all skbs queued into the device. Do to this
      correctly, we need to store the airtime estimate into the skb so we can
      decrease the outstanding balance when the skb is freed. This means that the
      time estimate must be stored somewhere that will survive for the lifetime
      of the skb.
      
      To get this, decrease the size of the ack_frame_id field to 6 bits, and
      lower the size of the ID space accordingly. This leaves 10 bits for use for
      tx_time_est, which is enough to store a maximum of 4096 us, if we shift the
      values so they become units of 4us.
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/157182474063.150713.16132669599100802716.stgit@toke.dkSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      6912daed
    • Johannes Berg's avatar
      mac80211: don't re-parse elems in ieee80211_assoc_success() · f61d7884
      Johannes Berg authored
      We've already parsed the same data in the caller, so we can
      pass it. The only thing is that we might fill in more details
      in ieee80211_assoc_success(), but that doesn't bother the
      caller, so it's fine to do even when we share the parsed data.
      
      This reduces the stack space usage of the call stack here,
      Arnd reported it had grown above the 1024 byte warning limit.
      Reported-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Link: https://lore.kernel.org/r/20191028125240.cb7661671bd2.I757c8752bf4f2f35e54f5e0a2c0a9cd9216c3d8b@changeidSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      f61d7884
    • John Crispin's avatar
      mac80211: move store skb ack code to its own function · 5d8983c8
      John Crispin authored
      This patch moves the code handling SKBTX_WIFI_STATUS inside the TX path
      into an extra function. This allows us to reuse it inside the 802.11 encap
      offloading datapath.
      Signed-off-by: default avatarJohn Crispin <john@phrozen.org>
      Link: https://lore.kernel.org/r/20191029091304.7330-2-john@phrozen.orgSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      5d8983c8
    • zhong jiang's avatar
      mac80211_hwsim: use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs fops · 7d13cf1e
      zhong jiang authored
      It is more clear to use DEFINE_DEBUGFS_ATTRIBUTE to define debugfs file
      operation rather than DEFINE_SIMPLE_ATTRIBUTE.
      
      It is detected with the help of coccinelle.
      Signed-off-by: default avatarzhong jiang <zhongjiang@huawei.com>
      Link: https://lore.kernel.org/r/1572404462-45462-1-git-send-email-zhongjiang@huawei.comSigned-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      7d13cf1e
    • Hoang Le's avatar
      tipc: eliminate checking netns if node established · d408bef4
      Hoang Le authored
      Currently, we scan over all network namespaces at each received
      discovery message in order to check if the sending peer might be
      present in a host local namespaces.
      
      This is unnecessary since we can assume that a peer will not change its
      location during an established session.
      
      We now improve the condition for this testing so that we don't perform
      any redundant scans.
      
      Fixes: f73b1281 ("tipc: improve throughput between nodes in netns")
      Acked-by: default avatarJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d408bef4
    • Eric Dumazet's avatar
      net: add a READ_ONCE() in skb_peek_tail() · f8cc62ca
      Eric Dumazet authored
      skb_peek_tail() can be used without protection of a lock,
      as spotted by KCSAN [1]
      
      In order to avoid load-stearing, add a READ_ONCE()
      
      Note that the corresponding WRITE_ONCE() are already there.
      
      [1]
      BUG: KCSAN: data-race in sk_wait_data / skb_queue_tail
      
      read to 0xffff8880b36a4118 of 8 bytes by task 20426 on cpu 1:
       skb_peek_tail include/linux/skbuff.h:1784 [inline]
       sk_wait_data+0x15b/0x250 net/core/sock.c:2477
       kcm_wait_data+0x112/0x1f0 net/kcm/kcmsock.c:1103
       kcm_recvmsg+0xac/0x320 net/kcm/kcmsock.c:1130
       sock_recvmsg_nosec net/socket.c:871 [inline]
       sock_recvmsg net/socket.c:889 [inline]
       sock_recvmsg+0x92/0xb0 net/socket.c:885
       ___sys_recvmsg+0x1a0/0x3e0 net/socket.c:2480
       do_recvmmsg+0x19a/0x5c0 net/socket.c:2601
       __sys_recvmmsg+0x1ef/0x200 net/socket.c:2680
       __do_sys_recvmmsg net/socket.c:2703 [inline]
       __se_sys_recvmmsg net/socket.c:2696 [inline]
       __x64_sys_recvmmsg+0x89/0xb0 net/socket.c:2696
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      write to 0xffff8880b36a4118 of 8 bytes by task 451 on cpu 0:
       __skb_insert include/linux/skbuff.h:1852 [inline]
       __skb_queue_before include/linux/skbuff.h:1958 [inline]
       __skb_queue_tail include/linux/skbuff.h:1991 [inline]
       skb_queue_tail+0x7e/0xc0 net/core/skbuff.c:3145
       kcm_queue_rcv_skb+0x202/0x310 net/kcm/kcmsock.c:206
       kcm_rcv_strparser+0x74/0x4b0 net/kcm/kcmsock.c:370
       __strp_recv+0x348/0xf50 net/strparser/strparser.c:309
       strp_recv+0x84/0xa0 net/strparser/strparser.c:343
       tcp_read_sock+0x174/0x5c0 net/ipv4/tcp.c:1639
       strp_read_sock+0xd4/0x140 net/strparser/strparser.c:366
       do_strp_work net/strparser/strparser.c:414 [inline]
       strp_work+0x9a/0xe0 net/strparser/strparser.c:423
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 451 Comm: kworker/u4:3 Not tainted 5.4.0-rc3+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: kstrp strp_work
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f8cc62ca
    • Eric Dumazet's avatar
      net: add annotations on hh->hh_len lockless accesses · c305c6ae
      Eric Dumazet authored
      KCSAN reported a data-race [1]
      
      While we can use READ_ONCE() on the read sides,
      we need to make sure hh->hh_len is written last.
      
      [1]
      
      BUG: KCSAN: data-race in eth_header_cache / neigh_resolve_output
      
      write to 0xffff8880b9dedcb8 of 4 bytes by task 29760 on cpu 0:
       eth_header_cache+0xa9/0xd0 net/ethernet/eth.c:247
       neigh_hh_init net/core/neighbour.c:1463 [inline]
       neigh_resolve_output net/core/neighbour.c:1480 [inline]
       neigh_resolve_output+0x415/0x470 net/core/neighbour.c:1470
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
       ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
       rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      read to 0xffff8880b9dedcb8 of 4 bytes by task 29572 on cpu 1:
       neigh_resolve_output net/core/neighbour.c:1479 [inline]
       neigh_resolve_output+0x113/0x470 net/core/neighbour.c:1470
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       NF_HOOK include/linux/netfilter.h:305 [inline]
       ndisc_send_skb+0x459/0x5f0 net/ipv6/ndisc.c:505
       ndisc_send_ns+0x207/0x430 net/ipv6/ndisc.c:647
       rt6_probe_deferred+0x98/0xf0 net/ipv6/route.c:615
       process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
       worker_thread+0xa0/0x800 kernel/workqueue.c:2415
       kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 1 PID: 29572 Comm: kworker/1:4 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: events rt6_probe_deferred
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c305c6ae