1. 25 Apr, 2017 3 commits
    • Willem de Bruijn's avatar
      virtio-net: transmit napi · b92f1e67
      Willem de Bruijn authored
      Convert virtio-net to a standard napi tx completion path. This enables
      better TCP pacing using TCP small queues and increases single stream
      throughput.
      
      The virtio-net driver currently cleans tx descriptors on transmission
      of new packets in ndo_start_xmit. Latency depends on new traffic, so
      is unbounded. To avoid deadlock when a socket reaches its snd limit,
      packets are orphaned on tranmission. This breaks socket backpressure,
      including TSQ.
      
      Napi increases the number of interrupts generated compared to the
      current model, which keeps interrupts disabled as long as the ring
      has enough free descriptors. Keep tx napi optional and disabled for
      now. Follow-on patches will reduce the interrupt cost.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b92f1e67
    • Willem de Bruijn's avatar
      virtio-net: napi helper functions · e4e8452a
      Willem de Bruijn authored
      Prepare virtio-net for tx napi by converting existing napi code to
      use helper functions. This also deduplicates some logic.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4e8452a
    • David S. Miller's avatar
      sparc64: Improve 64-bit constant loading in eBPF JIT. · 14933dc8
      David S. Miller authored
      Doing a full 64-bit decomposition is really stupid especially for
      simple values like 0 and -1.
      
      But if we are going to optimize this, go all the way and try for all 2
      and 3 instruction sequences not requiring a temporary register as
      well.
      
      First we do the easy cases where it's a zero or sign extended 32-bit
      number (sethi+or, sethi+xor, respectively).
      
      Then we try to find a range of set bits we can load simply then shift
      up into place, in various ways.
      
      Then we try negating the constant and see if we can do a simple
      sequence using that with a xor at the end.  (f.e. the range of set
      bits can't be loaded simply, but for the negated value it can)
      
      The final optimized strategy involves 4 instructions sequences not
      needing a temporary register.
      
      Otherwise we sadly fully decompose using a temp..
      
      Example, from ALU64_XOR_K: 0x0000ffffffff0000 ^ 0x0 = 0x0000ffffffff0000:
      
      0000000000000000 <foo>:
         0:   9d e3 bf 50     save  %sp, -176, %sp
         4:   01 00 00 00     nop
         8:   90 10 00 18     mov  %i0, %o0
         c:   13 3f ff ff     sethi  %hi(0xfffffc00), %o1
        10:   92 12 63 ff     or  %o1, 0x3ff, %o1     ! ffffffff <foo+0xffffffff>
        14:   93 2a 70 10     sllx  %o1, 0x10, %o1
        18:   15 3f ff ff     sethi  %hi(0xfffffc00), %o2
        1c:   94 12 a3 ff     or  %o2, 0x3ff, %o2     ! ffffffff <foo+0xffffffff>
        20:   95 2a b0 10     sllx  %o2, 0x10, %o2
        24:   92 1a 60 00     xor  %o1, 0, %o1
        28:   12 e2 40 8a     cxbe  %o1, %o2, 38 <foo+0x38>
        2c:   9a 10 20 02     mov  2, %o5
        30:   10 60 00 03     b,pn   %xcc, 3c <foo+0x3c>
        34:   01 00 00 00     nop
        38:   9a 10 20 01     mov  1, %o5     ! 1 <foo+0x1>
        3c:   81 c7 e0 08     ret
        40:   91 eb 40 00     restore  %o5, %g0, %o0
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14933dc8
  2. 24 Apr, 2017 37 commits