1. 23 Dec, 2016 22 commits
    • Marcelo Ricardo Leitner's avatar
      sctp: fix recovering from 0 win with small data chunks · 1636098c
      Marcelo Ricardo Leitner authored
      Currently if SCTP closes the receive window with window pressure, mostly
      caused by excessive skb overhead on payload/overheads ratio, SCTP will
      close the window abruptly while saving the delta on rwnd_press. It will
      start recovering rwnd as the chunks are consumed by the application and
      the rwnd_press will be only recovered after rwnd reach the same value as
      of rwnd_press, mostly to prevent silly window syndrome.
      
      Thing is, this is very inefficient with small data chunks, as with those
      it will never reach back that value, and thus it will never recover from
      such pressure. This means that we will not issue window updates when
      recovering from 0 window and will rely on a sender retransmit to notice
      it.
      
      The fix here is to remove such threshold, as no value is good enough: it
      depends on the (avg) chunk sizes being used.
      
      Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by
      sending SIGSTOP to netserver, sleep 1.2, and SIGCONT.
      Rate limited to 845kbps, for visibility. Capture done at netserver side.
      
      Previously:
      01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [
      01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS
      01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
      01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS
      01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS
      01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
      02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
        (window update missed)
      04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
      04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g
      04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
      04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS
      04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g
      
      After:
      02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153]
      02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S
      02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
      02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S
      02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S
      02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
      03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508]
      03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017]
      03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526]
      03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035]
      03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544]
      03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053]
      03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562]
        (following data transmission triggered by window updates above)
      03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
      03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000]
      03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
      03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S
      03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894]
      03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S
      03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1636098c
    • Marcelo Ricardo Leitner's avatar
      sctp: do not loose window information if in rwnd_over · 58b94d88
      Marcelo Ricardo Leitner authored
      It's possible that we receive a packet that is larger than current
      window. If it's the first packet in this way, it will cause it to
      increase rwnd_over. Then, if we receive another data chunk (specially as
      SCTP allows you to have one data chunk in flight even during 0 window),
      rwnd_over will be overwritten instead of added to.
      
      In the long run, this could cause the window to grow bigger than its
      initial size, as rwnd_over would be charged only for the last received
      data chunk while the code will try open the window for all packets that
      were received and had its value in rwnd_over overwritten. This, then,
      can lead to the worsening of payload/buffer ratio and cause rwnd_press
      to kick in more often.
      
      The fix is to sum it too, same as is done for rwnd_press, so that if we
      receive 3 chunks after closing the window, we still have to release that
      same amount before re-opening it.
      
      Log snippet from sctp_test exhibiting the issue:
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      [  146.209232] sctp: sctp_assoc_rwnd_decrease:
      association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
      [  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
      rwnd decreased by 1 to (0, 1, 114221)
      Signed-off-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58b94d88
    • David S. Miller's avatar
      Merge branch 'virtio-net-xdp-fixes' · e57cbe48
      David S. Miller authored
      Jason Wang says:
      
      ====================
      several fixups for virtio-net XDP
      
      Merry Xmas and a Happy New year to all:
      
      This series tries to fixes several issues for virtio-net XDP which
      could be categorized into several parts:
      
      - fix several issues during XDP linearizing
      - allow csumed packet to work for XDP_PASS
      - make EWMA rxbuf size estimation works for XDP
      - forbid XDP when GUEST_UFO is support
      - remove big packet XDP support
      - add XDP support or small buffer
      
      Please see individual patches for details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e57cbe48
    • Jason Wang's avatar
      virtio-net: XDP support for small buffers · bb91accf
      Jason Wang authored
      Commit f600b690 ("virtio_net: Add XDP support") leaves the case of
      small receive buffer untouched. This will confuse the user who want to
      set XDP but use small buffers. Other than forbid XDP in small buffer
      mode, let's make it work. XDP then can only work at skb->data since
      virtio-net create skbs during refill, this is sub optimal which could
      be optimized in the future.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb91accf
    • Jason Wang's avatar
      virtio-net: remove big packet XDP codes · c47a43d3
      Jason Wang authored
      Now we in fact don't allow XDP for big packets, remove its codes.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c47a43d3
    • Jason Wang's avatar
      virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support · 92502fe8
      Jason Wang authored
      When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO
      packet that exceeds a single page which could not be handled
      correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is
      supported. While at it, forbid XDP for ECN (which comes only from GRO)
      too to prevent user from misconfiguration.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92502fe8
    • Jason Wang's avatar
      virtio-net: make rx buf size estimation works for XDP · 5c33474d
      Jason Wang authored
      We don't update ewma rx buf size in the case of XDP. This will lead
      underestimation of rx buf size which causes host to produce more than
      one buffers. This will greatly increase the possibility of XDP page
      linearization.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c33474d
    • Jason Wang's avatar
      virtio-net: unbreak csumed packets for XDP_PASS · b00f70b0
      Jason Wang authored
      We drop csumed packet when do XDP for packets. This breaks
      XDP_PASS when GUEST_CSUM is supported. Fix this by allowing csum flag
      to be set. With this patch, simple TCP works for XDP_PASS.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b00f70b0
    • Jason Wang's avatar
      virtio-net: correctly handle XDP_PASS for linearized packets · 1830f893
      Jason Wang authored
      When XDP_PASS were determined for linearized packets, we try to get
      new buffers in the virtqueue and build skbs from them. This is wrong,
      we should create skbs based on existed buffers instead. Fixing them by
      creating skb based on xdp_page.
      
      With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1830f893
    • Jason Wang's avatar
      virtio-net: fix page miscount during XDP linearizing · 56a86f84
      Jason Wang authored
      We don't put page during linearizing, the would cause leaking when
      xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by
      put page accordingly. Also decrease the number of buffers during
      linearizing to make sure caller can free buffers correctly when packet
      exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize
      huge number of packets.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      56a86f84
    • Jason Wang's avatar
      virtio-net: correctly xmit linearized page on XDP_TX · 275be061
      Jason Wang authored
      After we linearize page, we should xmit this page instead of the page
      of first buffer which may lead unexpected result. With this patch, we
      can see correct packet during XDP_TX.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      275be061
    • Jason Wang's avatar
      virtio-net: remove the warning before XDP linearizing · 73b62bd0
      Jason Wang authored
      Since we use EWMA to estimate the size of rx buffer. When rx buffer
      size is underestimated, it's usual to have a packet with more than one
      buffers. Consider this is not a bug, remove the warning and correct
      the comment before XDP linearizing.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      73b62bd0
    • David S. Miller's avatar
      Merge branch 'mlxsw-router-fixes' · d3a51d6c
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Router fixes
      
      Ido says:
      
      First two patches ensure we remove from the device's table neighbours
      that are considered to be dead by the neighbour core.
      
      The last patch removes nexthop groups from the device when they are no
      longer valid.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3a51d6c
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Correctly remove nexthop groups · 58312125
      Ido Schimmel authored
      At the end of the nexthop initialization process we determine whether
      the nexthop should be offloaded or not based on the NUD state of the
      neighbour representing it. After all the nexthops were initialized we
      refresh the nexthop group and potentially offload it to the device, in
      case some of the nexthops were resolved.
      
      Make the destruction of a nexthop group symmetric with its creation by
      marking all nexthops as invalid and then refresh the nexthop group to
      make sure it was removed from the device's tables.
      
      Fixes: b2157149 ("mlxsw: spectrum_router: Add the nexthop neigh activity update")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      58312125
    • Ido Schimmel's avatar
      mlxsw: spectrum_router: Don't reflect dead neighs · 93a87e5e
      Ido Schimmel authored
      When a neighbour is considered to be dead, we should remove it from the
      device's table regardless of its NUD state.
      
      Without this patch, after setting a port to be administratively down we
      get the following errors when we periodically try to update the kernel
      about neighbours activity:
      
      [  461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find
      matching neighbour for IP=192.168.100.2
      
      Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93a87e5e
    • Ido Schimmel's avatar
      neigh: Send netevent after marking neigh as dead · 53f800e3
      Ido Schimmel authored
      neigh_cleanup_and_release() is always called after marking a neighbour
      as dead, but it only notifies user space and not in-kernel listeners of
      the netevent notification chain.
      
      This can cause multiple problems. In my specific use case, it causes the
      listener (a switch driver capable of L3 offloads) to believe a neighbour
      entry is still valid, and is thus erroneously kept in the device's
      table.
      
      Fix that by sending a netevent after marking the neighbour as dead.
      
      Fixes: a6bf9e93 ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53f800e3
    • Dave Jones's avatar
      ipv6: handle -EFAULT from skb_copy_bits · a98f9175
      Dave Jones authored
      By setting certain socket options on ipv6 raw sockets, we can confuse the
      length calculation in rawv6_push_pending_frames triggering a BUG_ON.
      
      RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
      RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
      RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
      RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
      RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
      R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
      R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
      
      Call Trace:
       [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
       [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
       [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
       [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
       [<ffffffff816da27e>] SyS_sendto+0xe/0x10
       [<ffffffff81002910>] do_syscall_64+0x50/0xa0
       [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.
      
      Reproducer:
      
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <unistd.h>
      #include <sys/types.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      
      #define LEN 504
      
      int main(int argc, char* argv[])
      {
      	int fd;
      	int zero = 0;
      	char buf[LEN];
      
      	memset(buf, 0, LEN);
      
      	fd = socket(AF_INET6, SOCK_RAW, 7);
      
      	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
      	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
      
      	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
      }
      Signed-off-by: default avatarDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a98f9175
    • Willem de Bruijn's avatar
      inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets · 39b2dd76
      Willem de Bruijn authored
      Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
      the packet. For sockets that have transport headers pulled, transport
      offset can be negative. Use signed comparison to avoid overflow.
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Reported-by: default avatarNisar Jagabar <njagabar@cloudmark.com>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39b2dd76
    • David S. Miller's avatar
      Merge branch 'cls_flower-act_tunnel_key-fixes' · 9aa340a5
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      net/sched fixes for cls_flower and act_tunnel_key
      
      This small series contain a fix to the matching flags support
      in flower and to the tunnel key action MD prep for IPv6.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9aa340a5
    • Or Gerlitz's avatar
      net/sched: cls_flower: Mandate mask when matching on flags · d9724772
      Or Gerlitz authored
      When matching on flags, we should require the user to provide the
      mask and avoid using an all-ones mask. Not doing so causes matching
      on flags provided w.o mask to hit on the value being unset for all
      flags, which may not what the user wanted to happen.
      
      Fixes: faa3ffce ('net/sched: cls_flower: Add support for matching on flags')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reported-by: default avatarPaul Blakey <paulb@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d9724772
    • Or Gerlitz's avatar
      net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6 · dc594ecd
      Or Gerlitz authored
      The UDP dst port was provided to the helper function which sets the
      IPv6 IP tunnel meta-data under a wrong param order, fix that.
      
      Fixes: 75bfbca0 ('net/sched: act_tunnel_key: Add UDP dst port option')
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc594ecd
    • jpinto's avatar
      stmmac: CSR clock configuration fix · 567be786
      jpinto authored
      When testing stmmac with my QoS reference design I checked a problem in the
      CSR clock configuration that was impossibilitating the phy discovery, since
      every read operation returned 0x0000ffff. This patch fixes the issue.
      Signed-off-by: default avatarJoao Pinto <jpinto@synopsys.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      567be786
  2. 22 Dec, 2016 2 commits
  3. 21 Dec, 2016 7 commits
    • Colin Ian King's avatar
      net: fddi: skfp: use %p format specifier for addresses rather than %x · 551cde19
      Colin Ian King authored
      Trivial fix: Addresses should be printed using the %p format specifier
      rather than using %x.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      551cde19
    • Eric Dumazet's avatar
      tcp: add a missing barrier in tcp_tasklet_func() · 0a9648f1
      Eric Dumazet authored
      Madalin reported crashes happening in tcp_tasklet_func() on powerpc64
      
      Before TSQ_QUEUED bit is cleared, we must ensure the changes done
      by list_del(&tp->tsq_node); are committed to memory, otherwise
      corruption might happen, as an other cpu could catch TSQ_QUEUED
      clearance too soon.
      
      We can notice that old kernels were immune to this bug, because
      TSQ_QUEUED was cleared after a bh_lock_sock(sk)/bh_unlock_sock(sk)
      section, but they could have missed a kick to write additional bytes,
      when NIC interrupts for a given flow are spread to multiple cpus.
      
      Affected TCP flows would need an incoming ACK or RTO timer to add more
      packets to the pipe. So overall situation should be better now.
      
      Fixes: b223feb9 ("tcp: tsq: add shortcut in tcp_tasklet_func()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMadalin Bucur <madalin.bucur@nxp.com>
      Tested-by: default avatarMadalin Bucur <madalin.bucur@nxp.com>
      Tested-by: default avatarXing Lei <xing.lei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a9648f1
    • Thomas Petazzoni's avatar
      net: mvpp2: fix dma unmapping of TX buffers for fragments · 8354491c
      Thomas Petazzoni authored
      Since commit 71ce391d ("net: mvpp2: enable proper per-CPU TX
      buffers unmapping"), we are not correctly DMA unmapping TX buffers for
      fragments.
      
      Indeed, the mvpp2_txq_inc_put() function only stores in the
      txq_cpu->tx_buffs[] array the physical address of the buffer to be
      DMA-unmapped when skb != NULL. In addition, when DMA-unmapping, we use
      skb_headlen(skb) to get the size to be unmapped. Both of this works fine
      for TX descriptors that are associated directly to a SKB, but not the
      ones that are used for fragments, with a NULL pointer as skb:
      
       - We have a NULL physical address when calling DMA unmap
       - skb_headlen(skb) crashes because skb is NULL
      
      This causes random crashes when fragments are used.
      
      To solve this problem, we need to:
      
       - Store the physical address of the buffer to be unmapped
         unconditionally, regardless of whether it is tied to a SKB or not.
      
       - Store the length of the buffer to be unmapped, which requires a new
         field.
      
      Instead of adding a third array to store the length of the buffer to be
      unmapped, and as suggested by David Miller, this commit refactors the
      tx_buffs[] and tx_skb[] arrays of 'struct mvpp2_txq_pcpu' into a
      separate structure 'mvpp2_txq_pcpu_buf', to which a 'size' field is
      added. Therefore, instead of having three arrays to allocate/free, we
      have a single one, which also improve data locality, reducing the
      impact on the CPU cache.
      
      Fixes: 71ce391d ("net: mvpp2: enable proper per-CPU TX buffers unmapping")
      Reported-by: default avatarRaphael G <raphael.glon@corp.ovh.com>
      Cc: Raphael G <raphael.glon@corp.ovh.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8354491c
    • Heiko Stübner's avatar
      net: ethernet: stmmac: dwmac-rk: make clk enablement first in powerup · f217bfde
      Heiko Stübner authored
      Right now the dwmac-rk tries to set up the GRF-specific speed and link
      options before enabling clocks, phys etc and on previous socs this works
      because the GRF is supplied on the whole by one clock.
      
      On the rk3399 however the GRF (General Register Files) clock-supply
      has been split into multiple clocks and while there is no specific
      grf-gmac clock like for other sub-blocks, it seems the mac-specific
      portions are actually supplied by the general mac clock.
      
      This results in hangs on rk3399 boards if the driver is build as module.
      When built in te problem of course doesn't surface, as the clocks
      are of course still on at the stage before clock_disable_unused.
      
      To solve this, simply move the clock enablement to the first position
      in the powerup callback. This is also a good idea in general to
      enable clocks before everything else.
      
      Tested on rk3288, rk3368 and rk3399 the dwmac still works on all of them.
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f217bfde
    • Kalesh A P's avatar
      be2net: Increase skb headroom size to 256 bytes · 76b15923
      Kalesh A P authored
      The driver currently allocates 128 bytes of skb headroom.
      This was found to be insufficient with some configurations
      like Geneve tunnels, which resulted in skb head reallocations.
      
      Increase the headroom to 256 bytes to fix this.
      Signed-off-by: default avatarKalesh A P <kalesh-anakkur.purayil@broadcom.com>
      Signed-off-by: default avatarSuresh Reddy <suresh.reddy@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76b15923
    • Larry Finger's avatar
      rtlwifi: Fix kernel oops introduced with commit e4965614 · 22b68b93
      Larry Finger authored
      With commit e4965614 {"rtlwifi: Use dev_kfree_skb_irq instead of
      kfree_skb"), the method used to free an skb was changed because the
      kfree_skb() was inside a spinlock. What was forgotten is that kfree_skb()
      guards against a NULL value for the argument. Routine dev_kfree_skb_irq()
      does not, and a test is needed to prevent kernel panics.
      
      Fixes: e4965614 ("rtlwifi: Use dev_kfree_skb_irq instead of kfree_skb")
      Signed-off-by: default avatarLarry Finger <Larry.Finger@lwfinger.net>
      Cc: Stable <stable@vger.kernel.org> # 4.9+
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: default avatarKalle Valo <kvalo@codeaurora.org>
      22b68b93
    • Tobias Klausmann's avatar
      ath9k: do not return early to fix rcu unlocking · d1f1c0e2
      Tobias Klausmann authored
      Starting with commit d94a461d ("ath9k: use ieee80211_tx_status_noskb
      where possible") the driver uses rcu_read_lock() && rcu_read_unlock(), yet on
      returning early in ath_tx_edma_tasklet() the unlock is missing leading to stalls
      and suspicious RCU usage:
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       4.9.0-rc8 #11 Not tainted
       -------------------------------
       kernel/rcu/tree.c:705 Illegal idle entry in RCU read-side critical section.!
      
       other info that might help us debug this:
      
       RCU used illegally from idle CPU!
       rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       1 lock held by swapper/7/0:
       #0:
        (
       rcu_read_lock
       ){......}
       , at:
       [<ffffffffa06ed110>] ath_tx_edma_tasklet+0x0/0x450 [ath9k]
      
       stack backtrace:
       CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.0-rc8 #11
       Hardware name: Acer Aspire V3-571G/VA50_HC_CR, BIOS V2.21 12/16/2013
        ffff88025efc3f38 ffffffff8132b1e5 ffff88017ede4540 0000000000000001
        ffff88025efc3f68 ffffffff810a25f7 ffff88025efcee60 ffff88017edebdd8
        ffff88025eeb5400 0000000000000091 ffff88025efc3f88 ffffffff810c3cd4
       Call Trace:
        <IRQ>
        [<ffffffff8132b1e5>] dump_stack+0x68/0x93
        [<ffffffff810a25f7>] lockdep_rcu_suspicious+0xd7/0x110
        [<ffffffff810c3cd4>] rcu_eqs_enter_common.constprop.85+0x154/0x200
        [<ffffffff810c5a54>] rcu_irq_exit+0x44/0xa0
        [<ffffffff81058631>] irq_exit+0x61/0xd0
        [<ffffffff81018d25>] do_IRQ+0x65/0x110
        [<ffffffff81672189>] common_interrupt+0x89/0x89
        <EOI>
        [<ffffffff814ffe11>] ? cpuidle_enter_state+0x151/0x200
        [<ffffffff814ffee2>] cpuidle_enter+0x12/0x20
        [<ffffffff8109a6ae>] call_cpuidle+0x1e/0x40
        [<ffffffff8109a8f6>] cpu_startup_entry+0x146/0x220
        [<ffffffff810336f8>] start_secondary+0x148/0x170
      Signed-off-by: default avatarTobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
      Fixes: d94a461d ("ath9k: use ieee80211_tx_status_noskb where possible")
      Cc: <stable@vger.kernel.org> # v4.9
      Acked-by: default avatarFelix Fietkau <nbd@nbd.name>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarGabriel Craciunescu <nix.or.die@gmail.com>
      Signed-off-by: default avatarKalle Valo <kvalo@qca.qualcomm.com>
      d1f1c0e2
  4. 20 Dec, 2016 9 commits
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ba6d973f
      Linus Torvalds authored
      Pull networking fixes and cleanups from David Miller:
      
       1) Use rb_entry() instead of hardcoded container_of(), from Geliang
          Tang.
      
       2) Use correct memory barriers in stammac driver, from Pavel Machek.
      
       3) Fix assoc bind address handling in SCTP, from Xin Long.
      
       4) Make the length check for UFO handling consistent between
          __ip_append_data() and ip_finish_output(), from Zheng Li.
      
       5) HSI driver compatible strings were busted fro hix5hd2, from Dongpo
          Li.
      
       6) Handle devm_ioremap() errors properly in cavium driver, from Arvind
          Yadav.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (22 commits)
        RDS: use rb_entry()
        net_sched: sch_netem: use rb_entry()
        net_sched: sch_fq: use rb_entry()
        net/mlx5: use rb_entry()
        ethernet: sfc: Add Kconfig entry for vendor Solarflare
        sctp: not copying duplicate addrs to the assoc's bind address list
        sctp: reduce indent level in sctp_copy_local_addr_list
        ARM: dts: hix5hd2: don't change the existing compatible string
        net: hix5hd2_gmac: fix compatible strings name
        openvswitch: Add a missing break statement.
        net: netcp: ethss: fix 10gbe host port tx pri map configuration
        net: netcp: ethss: fix errors in ethtool ops
        fsl/fman: enable compilation on ARM64
        fsl/fman: A007273 only applies to PPC SoCs
        powerpc: fsl/fman: remove fsl,fman from of_device_ids[]
        fsl/fman: fix 1G support for QSGMII interfaces
        dt: bindings: net: use boolean dt properties for eee broken modes
        net: phy: use boolean dt properties for eee broken modes
        net: phy: fix sign type error in genphy_config_eee_advert
        ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
        ...
      ba6d973f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 3eb86259
      Linus Torvalds authored
      Merge final set of updates from Andrew Morton:
      
       - a series to make IMA play better across kexec
      
       - a handful of random fixes
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
        ratelimit: fix WARN_ON_RATELIMIT return value
        kcov: make kcov work properly with KASLR enabled
        arm64: setup: introduce kaslr_offset()
        mm: fadvise: avoid expensive remote LRU cache draining after FADV_DONTNEED
        ima: platform-independent hash value
        ima: define a canonical binary_runtime_measurements list format
        ima: support restoring multiple template formats
        ima: store the builtin/custom template definitions in a list
        ima: on soft reboot, save the measurement list
        powerpc: ima: send the kexec buffer to the next kernel
        ima: maintain memory size needed for serializing the measurement list
        ima: permit duplicate measurement list entries
        ima: on soft reboot, restore the measurement list
        powerpc: ima: get the kexec buffer passed by the previous kernel
      3eb86259
    • Linus Torvalds's avatar
      Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration · f95adbc1
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
      
       - new features (poll and SRAM usage) added to the mailbox-test driver
      
       - major update of Broadcom's PDC controller driver
      
       - minor fix for auto-loading test and STI driver modules
      
      * 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: mailbox-test: allow reserved areas in SRAM
        mailbox: mailbox-test: add support for fasync/poll
        mailbox: bcm-pdc: Remove unnecessary void* casts
        mailbox: bcm-pdc: Simplify interrupt handler logic
        mailbox: bcm-pdc: Performance improvements
        mailbox: bcm-pdc: Don't use iowrite32 to write DMA descriptors
        mailbox: bcm-pdc: Convert from threaded IRQ to tasklet
        mailbox: bcm-pdc: Try to improve branch prediction
        mailbox: bcm-pdc: streamline rx code
        mailbox: bcm-pdc: Convert from interrupts to poll for tx done
        mailbox: bcm-pdc: PDC driver leaves debugfs files after removal
        mailbox: bcm-pdc: Changes so mbox client can be removed / re-inserted
        mailbox: bcm-pdc: Use octal permissions rather than symbolic
        mailbox: sti: Fix module autoload for OF registration
        mailbox: mailbox-test: Fix module autoload
      f95adbc1
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 74f65bbf
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang.
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: mux: mlxcpld: fix i2c mux selection caching
        i2c: designware: fix wrong Tx/Rx FIFO for ACPI
        i2c: xgene: Fix missing code of DTB support
        i2c: mux: pca954x: fix i2c mux selection caching
        i2c: octeon: thunderx: Limit register access retries
      74f65bbf
    • Linus Torvalds's avatar
      Merge tag 'doc-4.10-3' of git://git.lwn.net/linux · 1351522b
      Linus Torvalds authored
      Pull documentation fix from Jonathan Corbet:
       "A single fix for the build system.
      
        It would appear that the docutils developers, in their wisdom, broke
        the API in the 0.13 release. This fix detects the breakage and allows
        the docs to be built with both the old and new versions"
      
      * tag 'doc-4.10-3' of git://git.lwn.net/linux:
        docs: sphinx-extensions: make rstFlatTable work with docutils 0.13
      1351522b
    • Linus Torvalds's avatar
      Merge tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze · d5379e5e
      Linus Torvalds authored
      Pull arch/microblaze updates from Michal Simek:
      
       - wire-up new syscalls
      
       - add new codes and fpga families
      
       - fix a return value
      
      * tag 'microblaze-4.10-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
        microblaze: Add new fpga families
        microblaze: Add missing release version code v9.6 and v10
        microblaze: Add missing syscalls
        microblaze: Fix return value from xilinx_timer_init
      d5379e5e
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20161219' of git://github.com/jcmvbkbc/linux-xtensa · ec92b88c
      Linus Torvalds authored
      Pull Xtensa updates from Max Filippov:
      
       - enable HAVE_DMA_CONTIGUOUS, configure shared DMA pool reservation in
         kc705 DTS
      
       - update xtensa DMA-related Documentation/features entries
      
       - clean up arch/xtensa/kernel/setup.c: move S32C1I self-test out of it,
         remove unused declarations, fix screen_info definition
      
      * tag 'xtensa-20161219' of git://github.com/jcmvbkbc/linux-xtensa:
        xtensa: update DMA-related Documentation/features entries
        xtensa: configure shared DMA pool reservation in kc705 DTS
        xtensa: enable HAVE_DMA_CONTIGUOUS
        xtensa: move S32C1I self-test to a separate file
        xtensa: fix screen_info, clean up unused declarations in setup.c
      ec92b88c
    • Geliang Tang's avatar
      RDS: use rb_entry() · a763f78c
      Geliang Tang authored
      To make the code clearer, use rb_entry() instead of container_of() to
      deal with rbtree.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a763f78c
    • Geliang Tang's avatar
      net_sched: sch_netem: use rb_entry() · 7f7cd56c
      Geliang Tang authored
      To make the code clearer, use rb_entry() instead of container_of() to
      deal with rbtree.
      Signed-off-by: default avatarGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7f7cd56c