1. 09 Nov, 2016 37 commits
  2. 08 Nov, 2016 3 commits
    • Soheil Hassas Yeganeh's avatar
      sock: do not set sk_err in sock_dequeue_err_skb · f5f99309
      Soheil Hassas Yeganeh authored
      Do not set sk_err when dequeuing errors from the error queue.
      Doing so results in:
      a) Bugs: By overwriting existing sk_err values, it possibly
         hides legitimate errors. It is also incorrect when local
         errors are queued with ip_local_error. That happens in the
         context of a system call, which already returns the error
         code.
      b) Inconsistent behavior: When there are pending errors on
         the error queue, sk_err is sometimes 0 (e.g., for
         the first timestamp on the error queue) and sometimes
         set to an error code (after dequeuing the first
         timestamp).
      c) Suboptimality: Setting sk_err to ENOMSG on simple
         TX timestamps can abort parallel reads and writes.
      
      Removing this line doesn't break userspace. This is because
      userspace code cannot rely on sk_err for detecting whether
      there is something on the error queue. Except for ICMP messages
      received for UDP and RAW, sk_err is not set at enqueue time,
      and as a result sk_err can be 0 while there are plenty of
      errors on the error queue.
      
      For ICMP packets in UDP and RAW, sk_err is set when they are
      enqueued on the error queue, but that does not result in aborting
      reads and writes. For such cases, sk_err is only readable via
      getsockopt(SO_ERROR) which will reset the value of sk_err on
      its own. More importantly, prior to this patch,
      recvmsg(MSG_ERRQUEUE) has a race on setting sk_err (i.e.,
      sk_err is set by sock_dequeue_err_skb without atomic ops or
      locks) which can store 0 in sk_err even when we have ICMP
      messages pending. Removing this line from sock_dequeue_err_skb
      eliminates that race.
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5f99309
    • David S. Miller's avatar
      Merge branch 'IFF_NO_QUEUE-semantics' · 5f7f7502
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
      
      This patchset is a cleanup for IFF_NO_QUEUE devices.  It will
      hopefully help userspace get a more consistent behavior when attaching
      qdisc to such virtual devices.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f7f7502
    • Jesper Dangaard Brouer's avatar
      qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device · 84c46dd8
      Jesper Dangaard Brouer authored
      It is a clear misconfiguration to attach a qdisc to a device with
      tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
      htb, plug and sfb) inherit/copy this value as their queue length.
      
      Why should the kernel catch such a misconfiguration?  Because prior to
      introducing the IFF_NO_QUEUE device flag, userspace found a loophole
      in the qdisc config system that allowed them to achieve the equivalent
      of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
      a device.  The loophole on older kernels is setting tx_queue_len=0,
      *prior* to device qdisc init (the config time is significant, simply
      setting tx_queue_len=0 doesn't trigger the loophole).
      
      This loophole is currently used by Docker[1] to get better performance
      and scalability out of the veth device.  The Docker developers were
      warned[1] that they needed to adjust the tx_queue_len if ever
      attaching a qdisc.  The OpenShift project didn't remember this warning
      and attached a qdisc, this were caught and fixed in[2].
      
      [1] https://github.com/docker/libcontainer/pull/193
      [2] https://github.com/openshift/origin/pull/11126
      
      Instead of fixing every userspace program that used this loophole, and
      forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
      catch the misconfiguration on the kernel side.
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84c46dd8