1. 17 Feb, 2020 9 commits
    • Arjun Roy's avatar
      tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy. · 33946518
      Arjun Roy authored
      This patchset is intended to reduce the number of extra system calls
      imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
      this patchset has demonstrated a system call reduction of about 30%
      when coupled with userspace changes.
      
      For applications using epoll, returning sk_err along with the result
      of tcp receive zerocopy could remove the need to call
      recvmsg()=-EAGAIN after a spurious wakeup.
      
      Consider a multi-threaded application using epoll. A thread may awaken
      with EPOLLIN but another thread may already be reading. The
      spuriously-awoken thread does not necessarily know that another thread
      'won'; rather, it may be possible that it was woken up due to the
      presence of an error if there is no data. A zerocopy read receiving 0
      bytes thus would need to be followed up by recvmsg to be sure.
      
      Instead, we return sk_err directly with zerocopy, so the application
      can avoid this extra system call.
      Signed-off-by: default avatarArjun Roy <arjunroy@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33946518
    • Arjun Roy's avatar
      tcp-zerocopy: Return inq along with tcp receive zerocopy. · c8856c05
      Arjun Roy authored
      This patchset is intended to reduce the number of extra system calls
      imposed by TCP receive zerocopy. For ping-pong RPC style workloads,
      this patchset has demonstrated a system call reduction of about 30%
      when coupled with userspace changes.
      
      For applications using edge-triggered epoll, returning inq along with
      the result of tcp receive zerocopy could remove the need to call
      recvmsg()=-EAGAIN after a successful zerocopy. Generally speaking,
      since normally we would need to perform a recvmsg() call for every
      successful small RPC read via TCP receive zerocopy, returning inq can
      reduce the number of system calls performed by approximately half.
      Signed-off-by: default avatarArjun Roy <arjunroy@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c8856c05
    • David S. Miller's avatar
      Merge branch 'Enhance-virtio-vsock-connection-semantics' · 8c8da5b8
      David S. Miller authored
      Sebastien Boeuf says:
      
      ====================
      Enhance virtio-vsock connection semantics
      
      This series improves the semantics behind the way virtio-vsock server
      accepts connections coming from the client. Whenever the server
      receives a connection request from the client, if it is bound to the
      socket but not yet listening, it will answer with a RST packet. The
      point is to ensure each request from the client is quickly processed
      so that the client can decide about the strategy of retrying or not.
      
      The series includes along with the improvement patch a new test to
      ensure the behavior is consistent across all hypervisors drivers.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8c8da5b8
    • Sebastien Boeuf's avatar
      tools: testing: vsock: Test when server is bound but not listening · 9de9f7d1
      Sebastien Boeuf authored
      Whenever the server side of vsock is binding to the socket, but not
      listening yet, we expect the behavior from the client to be identical to
      what happens when the server is not even started.
      
      This new test runs the server side so that it binds to the socket
      without ever listening to it. The client side will try to connect and
      should receive an ECONNRESET error.
      
      This new test provides a way to validate the previously introduced patch
      for making sure the server side will always answer with a RST packet in
      case the client requested a new connection.
      Signed-off-by: default avatarSebastien Boeuf <sebastien.boeuf@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9de9f7d1
    • Sebastien Boeuf's avatar
      net: virtio_vsock: Enhance connection semantics · df12eb6d
      Sebastien Boeuf authored
      Whenever the vsock backend on the host sends a packet through the RX
      queue, it expects an answer on the TX queue. Unfortunately, there is one
      case where the host side will hang waiting for the answer and might
      effectively never recover if no timeout mechanism was implemented.
      
      This issue happens when the guest side starts binding to the socket,
      which insert a new bound socket into the list of already bound sockets.
      At this time, we expect the guest to also start listening, which will
      trigger the sk_state to move from TCP_CLOSE to TCP_LISTEN. The problem
      occurs if the host side queued a RX packet and triggered an interrupt
      right between the end of the binding process and the beginning of the
      listening process. In this specific case, the function processing the
      packet virtio_transport_recv_pkt() will find a bound socket, which means
      it will hit the switch statement checking for the sk_state, but the
      state won't be changed into TCP_LISTEN yet, which leads the code to pick
      the default statement. This default statement will only free the buffer,
      while it should also respond to the host side, by sending a packet on
      its TX queue.
      
      In order to simply fix this unfortunate chain of events, it is important
      that in case the default statement is entered, and because at this stage
      we know the host side is waiting for an answer, we must send back a
      packet containing the operation VIRTIO_VSOCK_OP_RST.
      
      One could say that a proper timeout mechanism on the host side will be
      enough to avoid the backend to hang. But the point of this patch is to
      ensure the normal use case will be provided with proper responsiveness
      when it comes to establishing the connection.
      Signed-off-by: default avatarSebastien Boeuf <sebastien.boeuf@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df12eb6d
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-net-next-2020-02-14' of... · ddb535a6
      David S. Miller authored
      Merge tag 'mac80211-next-for-net-next-2020-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      A few big new things:
       * 802.11 frame encapsulation offload support
       * more HE (802.11ax) support, including some for 6 GHz band
       * powersave in hwsim, for better testing
      
      Of course as usual there are various cleanups and small fixes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ddb535a6
    • chenqiwu's avatar
      net: x25: convert to list_for_each_entry_safe() · 1e5946f5
      chenqiwu authored
      Use list_for_each_entry_safe() instead of list_for_each_safe()
      to simplify the code.
      Signed-off-by: default avatarchenqiwu <chenqiwu@xiaomi.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e5946f5
    • Gustavo A. R. Silva's avatar
      lib: objagg: Replace zero-length arrays with flexible-array member · 1f4c51de
      Gustavo A. R. Silva authored
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertenly introduced[3] to the codebase from now on.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f4c51de
    • Yangbo Lu's avatar
      ptp_qoriq: drop the code of alarm · d71151a3
      Yangbo Lu authored
      The alarm function hadn't been supported by PTP clock driver.
      The recommended solution PHC + phc2sys + nanosleep provides
      best performance. So drop the code of alarm in ptp_qoriq driver.
      Signed-off-by: default avatarYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d71151a3
  2. 14 Feb, 2020 27 commits
  3. 13 Feb, 2020 4 commits
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · b19e8c68
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
       "Summary below, but it's all reasonably straightforward. There are some
        more fixes on the horizon, but nothing disastrous yet.
      
        Summary:
      
         - Fix build when KASLR is enabled but CONFIG_ARCH_RANDOM is not set
      
         - Fix context-switching of SSBS state on systems that implement it
      
         - Fix spinlock compiler warning introduced during the merge window
      
         - Fix incorrect header inclusion (linux/clk-provider.h)
      
         - Use SYSCTL_{ZERO,ONE} instead of rolling our own static variables
      
         - Don't scream if optional SMMUv3 PMU irq is missing
      
         - Remove some unused function prototypes"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
        arm64: Fix CONFIG_ARCH_RANDOM=n build
        perf/smmuv3: Use platform_get_irq_optional() for wired interrupt
        arm64/spinlock: fix a -Wunused-function warning
        arm64: ssbs: Fix context-switch when SSBS is present on all CPUs
        arm64: use shared sysctl constants
        arm64: Drop do_el0_ia_bp_hardening() & do_sp_pc_abort() declarations
      b19e8c68
    • Linus Torvalds's avatar
      Merge tag 'gpio-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 1d40890a
      Linus Torvalds authored
      Pull GPIO fixes from Linus Walleij:
      
       - Revert two patches to gpio_do_set_config() and implement the proper
         solution that works, also drop an unecessary call in set_config()
      
       - Fix up the lockdep class for hierarchical IRQ domains.
      
       - Remove some bridge code for line directions.
      
       - Fix a register access bug in the Xilinx driver.
      
      * tag 'gpio-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: sifive: fix static checker warning
        spmi: pmic-arb: Set lockdep class for hierarchical irq domains
        gpio: xilinx: Fix bug where the wrong GPIO register is written to
        gpiolib: remove unnecessary argument from set_config call
        gpio: bd71828: Remove unneeded defines for GPIO_LINE_DIRECTION_IN/OUT
        MAINTAINERS: Sort entries in database for GPIO
        gpiolib: fix gpio_do_set_config()
        Revert "gpiolib: remove set but not used variable 'config'"
        Revert "gpiolib: Remove duplicated function gpio_do_set_config()"
      1d40890a
    • David S. Miller's avatar
      Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer' · 803381f9
      David S. Miller authored
      Jason A. Donenfeld says:
      
      ====================
      icmp: account for NAT when sending icmps from ndo layer
      
      The ICMP routines use the source address for two reasons:
      
      1. Rate-limiting ICMP transmissions based on source address, so
         that one source address cannot provoke a flood of replies. If
         the source address is wrong, the rate limiting will be
         incorrectly applied.
      
      2. Choosing the interface and hence new source address of the
         generated ICMP packet. If the original packet source address
         is wrong, ICMP replies will be sent from the wrong source
         address, resulting in either a misdelivery, infoleak, or just
         general network admin confusion.
      
      Most of the time, the icmp_send and icmpv6_send routines can just reach
      down into the skb's IP header to determine the saddr. However, if
      icmp_send or icmpv6_send is being called from a network device driver --
      there are a few in the tree -- then it's possible that by the time
      icmp_send or icmpv6_send looks at the packet, the packet's source
      address has already been transformed by SNAT or MASQUERADE or some other
      transformation that CONNTRACK knows about. In this case, the packet's
      source address is most certainly the *wrong* source address to be used
      for the purpose of ICMP replies.
      
      Rather, the source address we want to use for ICMP replies is the
      original one, from before the transformation occurred.
      
      Fortunately, it's very easy to just ask CONNTRACK if it knows about this
      packet, and if so, how to fix it up. The saddr is the only field in the
      header we need to fix up, for the purposes of the subsequent processing
      in the icmp_send and icmpv6_send functions, so we do the lookup very
      early on, so that the rest of the ICMP machinery can progress as usual.
      
      Changes v3->v4:
      - Add back the skb_shared checking, since the previous assumption isn't
        actually true [Eric]. This implies dropping the additional patches v3 had
        for removing skb_share_check from various drivers. We can revisit that
        general set of ideas later, but that's probably better suited as a net-next
        patchset rather than this stable one which is geared at fixing bugs. So,
        this implements things in the safe conservative way.
      
      Changes v2->v3:
      - Add selftest to ensure this actually does what we want and never regresses.
      - Check the size of the skb header before operating on it.
      - Use skb_ensure_writable to ensure we can modify the cloned skb [Florian].
      - Conditionalize this on IPS_SRC_NAT so we don't do anything unnecessarily
        [Florian].
      - It turns out that since we're calling these from the xmit path,
        skb_share_check isn't required, so remove that [Florian]. This simplifes the
        code a bit too. **The supposition here is that skbs passed to ndo_start_xmit
        are _never_ shared. If this is not correct NOW IS THE TIME TO PIPE UP, for
        doom awaits us later.**
      - While investigating the shared skb business, several drivers appeared to be
        calling it incorrectly in the xmit path, so this series also removes those
        unnecessary calls, based on the supposition mentioned in the previous point.
      
      Changes v1->v2:
      - icmpv6 takes subtly different types than icmpv4, like u32 instead of be32,
        u8 instead of int.
      - Since we're technically writing to the skb, we need to make sure it's not
        a shared one [Dave, 2017].
      - Restore the original skb data after icmp_send returns. All current users
        are freeing the packet right after, so it doesn't matter, but future users
        might not.
      - Remove superfluous route lookup in sunvnet [Dave].
      - Use NF_NAT instead of NF_CONNTRACK for condition [Florian].
      - Include this cover letter [Dave].
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      803381f9
    • Jason A. Donenfeld's avatar
      xfrm: interface: use icmp_ndo_send helper · 45942ba8
      Jason A. Donenfeld authored
      Because xfrmi is calling icmp from network device context, it should use
      the ndo helper so that the rate limiting applies correctly.
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45942ba8