1. 05 Oct, 2015 40 commits
    • Daniel Borkmann's avatar
      ebpf: include perf_event only where really needed · 0cdf5640
      Daniel Borkmann authored
      Commit ea317b26 ("bpf: Add new bpf map type to store the pointer
      to struct perf_event") added perf_event.h to the main eBPF header, so
      it gets included for all users. perf_event.h is actually only needed
      from array map side, so lets sanitize this a bit.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cdf5640
    • Nicolas Schichan's avatar
      ARM: net: support BPF_ALU | BPF_MOD instructions in the BPF JIT. · 4560cdff
      Nicolas Schichan authored
      For ARMv7 with UDIV instruction support, generate an UDIV instruction
      followed by an MLS instruction.
      
      For other ARM variants, generate code calling a C wrapper similar to
      the jit_udiv() function used for BPF_ALU | BPF_DIV instructions.
      
      Some performance numbers reported by the test_bpf module (the duration
      per filter run is reported in nanoseconds, between "jitted:<x>" and
      "PASS":
      
      ARMv7 QEMU nojit:	test_bpf: #3 DIV_MOD_KX jited:0 2196 PASS
      ARMv7 QEMU jit:		test_bpf: #3 DIV_MOD_KX jited:1 104 PASS
      ARMv5 QEMU nojit:	test_bpf: #3 DIV_MOD_KX jited:0 2176 PASS
      ARMv5 QEMU jit:		test_bpf: #3 DIV_MOD_KX jited:1 1104 PASS
      ARMv5 kirkwood nojit:	test_bpf: #3 DIV_MOD_KX jited:0 1103 PASS
      ARMv5 kirkwood jit:	test_bpf: #3 DIV_MOD_KX jited:1 311 PASS
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4560cdff
    • David S. Miller's avatar
      Merge branch 'asix-rx-mem-handling' · df7b6015
      David S. Miller authored
      Mark Craske says:
      
      ====================
      Improve ASIX RX memory allocation error handling
      
      The ASIX RX handler algorithm is weak on error handling.
      There is a design flaw in the ASIX RX handler algorithm because the
      implementation for handling RX Ethernet frames for the DUB-E100 C1 can
      have Ethernet frames spanning multiple URBs. This means that payload data
      from more than 1 URB is sometimes needed to fill the socket buffer with a
      complete Ethernet frame. When the URB with the start of an Ethernet frame
      is received then an attempt is made to allocate a socket buffer. If the
      memory allocation fails then the algorithm sets the buffer pointer member
      to NULL and the function exits (no crash yet). Subsequently, the RX hander
      is called again to process the next URB which assumes there is a socket
      buffer available and the kernel crashes when there is no buffer.
      
      This patchset implements an improvement to the RX handling algorithm to
      avoid a crash when no memory is available for the socket buffer.
      
      The patchset will apply cleanly to the net-next master branch but the
      created kernel has not been tested. The driver was tested on ARM kernels
      v3.8 and v3.14 for a commercial product.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df7b6015
    • Dean Jenkins's avatar
      asix: Continue processing URB if no RX netdev buffer · 6a570814
      Dean Jenkins authored
      Avoid a loss of synchronisation of the Ethernet Data header 32-bit
      word due to a failure to get a netdev socket buffer.
      
      The ASIX RX handling algorithm returned 0 upon a failure to get
      an allocation of a netdev socket buffer. This causes the URB
      processing to stop which potentially causes a loss of synchronisation
      with the Ethernet Data header 32-bit word. Therefore, subsequent
      processing of URBs may be rejected due to a loss of synchronisation.
      This may cause additional good Ethernet frames to be discarded
      along with outputting of synchronisation error messages.
      
      Implement a solution which checks whether a netdev socket buffer
      has been allocated before trying to copy the Ethernet frame into
      the netdev socket buffer. But continue to process the URB so that
      synchronisation is maintained. Therefore, only a single Ethernet
      frame is discarded when no netdev socket buffer is available.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a570814
    • Dean Jenkins's avatar
      asix: On RX avoid creating bad Ethernet frames · 3f30b158
      Dean Jenkins authored
      When RX Ethernet frames span multiple URB socket buffers,
      the data stream may suffer a discontinuity which will cause
      the current Ethernet frame in the netdev socket buffer
      to be incomplete. This frame needs to be discarded instead
      of appending unrelated data from the current URB socket buffer
      to the Ethernet frame in the netdev socket buffer. This avoids
      creating a corrupted Ethernet frame in the netdev socket buffer.
      
      A discontinuity can occur when the previous URB socket buffer
      held an incomplete Ethernet frame due to truncation or a
      URB socket buffer containing the end of the Ethernet frame
      was missing.
      
      Therefore, add a sanity test for when an Ethernet frame
      spans multiple URB socket buffers to check that the remaining
      bytes of the currently received Ethernet frame point to
      a good Data header 32-bit word of the next Ethernet
      frame. Upon error, reset the remaining bytes variable to
      zero and discard the current netdev socket buffer.
      Assume that the Data header is located at the start of
      the current socket buffer and attempt to process the next
      Ethernet frame from there. This avoids unnecessarily
      discarding a good URB socket buffer that contains a new
      Ethernet frame.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f30b158
    • Dean Jenkins's avatar
      asix: Simplify asix_rx_fixup_internal() netdev alloc · 9a5ccd8e
      Dean Jenkins authored
      The code is checking that the Ethernet frame will fit into a
      netdev allocated socket buffer within the constraints of MTU size,
      Ethernet header length plus VLAN header length.
      
      The original code was checking rx->remaining each loop of the while
      loop that processes multiple Ethernet frames per URB and/or Ethernet
      frames that span across URBs. rx->remaining decreases per while loop
      so there is no point in potentially checking multiple times that the
      Ethernet frame (remaining part) will fit into the netdev socket buffer.
      
      The modification checks that the size of the Ethernet frame will fit
      the netdev socket buffer before allocating the netdev socket buffer.
      This avoids grabbing memory and then deciding that the Ethernet frame
      is too big and then freeing the memory.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a5ccd8e
    • Dean Jenkins's avatar
      asix: Tidy-up 32-bit header word synchronisation · 3bfc69ab
      Dean Jenkins authored
      Tidy-up the Data header 32-bit word synchronisation logic in
      asix_rx_fixup_internal() by removing redundant logic tests.
      
      The code is looking at the following cases of the Data header
      32-bit word that is present before each Ethernet frame:
      
      a) all 32 bits of the Data header word are in the URB socket buffer
      b) first 16 bits of the Data header word are at the end of the URB
         socket buffer
      c) last 16 bits of the Data header word are at the start of the URB
         socket buffer eg. split_head = true
      
      Note that the lifetime of rx->split_head exists outside of the
      function call and is accessed per processing of each URB. Therefore,
      split_head being true acts on the next URB to be processed.
      
      To check for b) the offset will be 16 bits (2 bytes) from the end of
      the buffer then indicate split_head is true.
      To check for c) split_head must be true because the first 16 bits
      have been found.
      To check for a) else c)
      
      Note that the || logic of the old code included the state
      (skb->len - offset == sizeof(u16) && rx->split_head) which is not
      possible because the split_head cannot be true whilst checking for b).
      This is because the split_head indicates that the first 16 bits have
      been found and that is not possible whilst checking for the first 16
      bits. Therefore simplify the logic.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bfc69ab
    • Dean Jenkins's avatar
      asix: Rename remaining and size for clarity · 7b0378f5
      Dean Jenkins authored
      The Data header synchronisation is easier to understand
      if the variables "remaining" and "size" are renamed.
      
      Therefore, the lifetime of the "remaining" variable exists
      outside of asix_rx_fixup_internal() and is used to indicate
      any remaining pending bytes of the Ethernet frame that need
      to be obtained from the next socket buffer. This allows an
      Ethernet frame to span across multiple socket buffers.
      
      "size" is now local to asix_rx_fixup_internal() and contains
      the size read from the Data header 32-bit word.
      
      Add "copy_length" to hold the number of the Ethernet frame
      bytes (maybe a part of a full frame) that are to be copied
      out of the socket buffer.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b0378f5
    • Daniel Borkmann's avatar
      bpf, seccomp: prepare for upcoming criu support · bab18991
      Daniel Borkmann authored
      The current ongoing effort to dump existing cBPF seccomp filters back
      to user space requires to hold the pre-transformed instructions like
      we do in case of socket filters from sk_attach_filter() side, so they
      can be reloaded in original form at a later point in time by utilities
      such as criu.
      
      To prepare for this, simply extend the bpf_prog_create_from_user()
      API to hold a flag that tells whether we should store the original
      or not. Also, fanout filters could make use of that in future for
      things like diag. While fanout filters already use bpf_prog_destroy(),
      move seccomp over to them as well to handle original programs when
      present.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Tycho Andersen <tycho.andersen@canonical.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Tested-by: default avatarTycho Andersen <tycho.andersen@canonical.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bab18991
    • WANG Cong's avatar
      vrf: fix a kernel warning · 0a15afd2
      WANG Cong authored
      This fixes:
      
       tried to remove device ip6gre0 from (null)
       ------------[ cut here ]------------
       kernel BUG at net/core/dev.c:5219!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
       CPU: 3 PID: 161 Comm: kworker/u8:2 Not tainted 4.3.0-rc2+ #1142
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       Workqueue: netns cleanup_net
       task: ffff8800d784a9c0 ti: ffff8800d74a4000 task.ti: ffff8800d74a4000
       RIP: 0010:[<ffffffff817f0797>]  [<ffffffff817f0797>] __netdev_adjacent_dev_remove+0x40/0xec
       RSP: 0018:ffff8800d74a7a98  EFLAGS: 00010282
       RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000
       RDX: ffff88011adcf701 RSI: ffff88011adccbf8 RDI: ffff88011adccbf8
       RBP: ffff8800d74a7ab8 R08: 0000000000000001 R09: 0000000000000000
       R10: ffffffff81d190ff R11: 00000000ffffffff R12: ffff8800d599e7c0
       R13: 0000000000000000 R14: ffff8800d599e890 R15: ffffffff82385e00
       FS:  0000000000000000(0000) GS:ffff88011ac00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00007ffd6f003000 CR3: 000000000220c000 CR4: 00000000000006e0
       Stack:
        0000000000000000 ffff8800d599e7c0 0000000000000b00 ffff8800d599e8a0
        ffff8800d74a7ad8 ffffffff817f0861 0000000000000000 ffff8800d599e7c0
        ffff8800d74a7af8 ffffffff817f088f 0000000000000000 ffff8800d599e7c0
       Call Trace:
        [<ffffffff817f0861>] __netdev_adjacent_dev_unlink+0x1e/0x35
        [<ffffffff817f088f>] __netdev_adjacent_dev_unlink_neighbour+0x17/0x41
        [<ffffffff817f56e6>] netdev_upper_dev_unlink+0x6c/0x13d
        [<ffffffff81674a3d>] vrf_del_slave+0x26/0x7d
        [<ffffffff81674ac3>] vrf_device_event+0x2f/0x34
        [<ffffffff81098c40>] notifier_call_chain+0x75/0x9c
        [<ffffffff81098fa2>] raw_notifier_call_chain+0x14/0x16
        [<ffffffff817ee129>] call_netdevice_notifiers_info+0x52/0x59
        [<ffffffff817f179d>] call_netdevice_notifiers+0x13/0x15
        [<ffffffff817f6f18>] rollback_registered_many+0x14f/0x24f
        [<ffffffff817f70f2>] unregister_netdevice_many+0x19/0x64
        [<ffffffff819a2455>] ip6gre_exit_net+0x163/0x177
        [<ffffffff817eb019>] ops_exit_list+0x44/0x55
        [<ffffffff817ebcb7>] cleanup_net+0x193/0x226
        [<ffffffff81091e1c>] process_one_work+0x26c/0x4d8
        [<ffffffff81091d20>] ? process_one_work+0x170/0x4d8
        [<ffffffff81092296>] worker_thread+0x1df/0x2c2
        [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f
        [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f
        [<ffffffff81097a20>] kthread+0xd4/0xdc
        [<ffffffff810bc523>] ? trace_hardirqs_on_caller+0x17d/0x199
        [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83
        [<ffffffff81a5240f>] ret_from_fork+0x3f/0x70
        [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83
      
      Fixes: 93a7e7e8 ("net: Remove the now unused vrf_ptr")
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a15afd2
    • kbuild test robot's avatar
      9886ce2b
    • Jon Ringle's avatar
      net: Microchip encx24j600 driver · 04fbfce7
      Jon Ringle authored
      This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet
      controller over a SPI bus interface. This driver makes use of the regmap API to
      optimize access to registers by caching registers where possible.
      
      Datasheet:
      http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdfSigned-off-by: default avatarJon Ringle <jringle@gridpoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04fbfce7
    • Jon Ringle's avatar
      regmap: Allow installing custom reg_update_bits function · 7741c373
      Jon Ringle authored
      This commit allows installing a custom reg_update_bits function for cases where
      the hardware provides a mechanism to set or clear register bits without a
      read/modify/write cycle. Such is the case with the Microchip ENCX24J600.
      Signed-off-by: default avatarJon Ringle <jringle@gridpoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7741c373
    • Govindarajulu Varadarajan's avatar
      enic: do hang reset only in case of tx timeout · 937317c7
      Govindarajulu Varadarajan authored
      The current code invokes hang reset in case of error interrupt. We should
      hang reset only in case of tx timeout. This because of the way hang reset
      is implemented in firmware. Hang reset takes more firmware resources than
      soft reset. Adaptor does not generate error interrupt in case of tx
      timeout.
      
      Hang reset only in case of tx timeout, in .ndo_tx_timeout. Do soft reset
      otherwise. Introduce deferred work, enic_tx_hang_reset, to do hang reset.
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      937317c7
    • Govindarajulu Varadarajan's avatar
      enic: handle spurious error interrupt · cc809237
      Govindarajulu Varadarajan authored
      Some of the enic adaptors are know to generate spurious interrupts. When
      error interrupt is generated, driver just resets the device. This patch
      resets the device only when an error is occurred.
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc809237
    • David S. Miller's avatar
      Merge branch 'cxgb4-next' · 2905f5bb
      David S. Miller authored
      Hariprasad Shenai says:
      
      ====================
      cxgb4: Trivial fixes for cxgb4
      
      Fixes the following issues
      Don't read non existent T4/T5/T6 adapter registers for ethtool dump.
      For T4, dont read mailbox control registers. Adds new devlog faility and
      report correct link speed for unsupported ones.
      
      This patch series has been created against net-next tree and includes
      patches on cxgb4 driver.
      
      We have included all the maintainers of respective drivers. Kindly review
      the change and let us know in case of any review comments.
      ====================
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2905f5bb
    • Hariprasad Shenai's avatar
      cxgb4: Report correct link speed for unsupported ones · 85412255
      Hariprasad Shenai authored
      When we get garbage from the firmware with weird Port Speeds,
      etc. we should emit a warning regarding unsupported speeds rather than
      use the bogus default of "10Mbps" which isn't even an option in the
      firmware Port Information message
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85412255
    • Hariprasad Shenai's avatar
      cxgb4: Adds a new Device Log Facility FW_DEVLOG_FACILITY_CF · da4976e1
      Hariprasad Shenai authored
      The firmware team added a new Device Log Facility FW_DEVLOG_FACILITY_CF,
      but the driver has been decoding Device Log messages with that Facility as
      "(NULL)", fixing it.
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da4976e1
    • Hariprasad Shenai's avatar
      cxgb4: For T4, don't read the Firmware Mailbox Control register · b3695540
      Hariprasad Shenai authored
      T4 doesn't have the Shadow copy of the register which we can read without
      side effect. So don't read mbox control register for T4 adapter
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3695540
    • Hariprasad Shenai's avatar
      cxgb4 : Update T4/T5/T6 register ranges · 8119c018
      Hariprasad Shenai authored
      Update T4/T5/T6 adapter register ranges so that it doesn't read non
      existent registers when dumped using ethtool
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8119c018
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next · 40e10680
      David S. Miller authored
      Eric W. Biederman says:
      
      ====================
      net: Pass net through ip fragmention
      
      This is the next installment of my work to pass struct net through the
      output path so the code does not need to guess how to figure out which
      network namespace it is in, and ultimately routes can have output
      devices in another network namespace.
      
      This round focuses on passing net through ip fragmentation which we seem
      to call from about everywhere.  That is the main ip output paths, the
      bridge netfilter code, and openvswitch.  This has to happend at once
      accross the tree as function pointers are involved.
      
      First some prep work is done, then ipv4 and ipv6 are converted and then
      temporary helper functions are removed.
      ====================
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40e10680
    • David S. Miller's avatar
      Merge branch 'rds-perf' · 7e2832f1
      David S. Miller authored
      Sowmini Varadhan says:
      
      ====================
      RDS: RDS-TCP perf enhancements
      
      A 3-part patchset that (a) improves current RDS-TCP perf
      by 2X-3X and (b) refactors earlier robustness code for
      better observability/scaling.
      
      Patch 1 is an enhancment of earlier robustness fixes
      that had used separate sockets for client and server endpoints to
      resolve race conditions. It is possible to have an equivalent
      solution that does not use 2 sockets. The benefit of a
      single socket solution is that it results in more predictable
      and observable behavior for the underlying TCP pipe of an
      RDS connection
      
      Patches 2 and 3 are simple, straightforward perf bug fixes
      that align the RDS TCP socket with other parts of the kernel stack.
      
      v2: fix kbuild-test-robot warnings, comments from  Sergei Shtylov
          and Santosh Shilimkar.
      ====================
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e2832f1
    • Sowmini Varadhan's avatar
      RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit · 76b29ef1
      Sowmini Varadhan authored
      For the same reasons as commit 2f533844 ("tcp: allow splice() to
      build full TSO packets") and commit 35f9c09f ("tcp: tcp_sendpages()
      should call tcp_push() once"), rds_tcp_xmit may have multiple pages to
      send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to
      tcp_sendpage()
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76b29ef1
    • Sowmini Varadhan's avatar
      RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune · 1edd6a14
      Sowmini Varadhan authored
      Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
      clobbers efficient use of TSO because it inflates the size_goal
      that is computed in tcp_sendmsg/tcp_sendpage and skews packet
      latency, and the default values for these parameters actually
      results in significantly better performance.
      
      In request-response tests using rds-stress with a packet size of
      100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16)
      between a single pair of IP addresses achieves a throughput of
      6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
      equivalent conditions on these platforms.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1edd6a14
    • Sowmini Varadhan's avatar
      RDS: Use a single TCP socket for both send and receive. · 3b20fc38
      Sowmini Varadhan authored
      Commit f711a6ae ("net/rds: RDS-TCP: Always create a new rds_sock
      for an incoming connection.") modified rds-tcp so that an incoming SYN
      would ignore an existing "client" TCP connection which had the local
      port set to the transient port.  The motivation for ignoring the existing
      "client" connection in f711a6ae was to avoid race conditions and an
      endless duel of reconnect attempts triggered by a restart/abort of one
      of the nodes in the TCP connection.
      
      However, having separate sockets for active and passive sides
      is avoidable, and the simpler model of a single TCP socket for
      both send and receives of all RDS connections associated with
      that tcp socket makes for easier observability. We avoid the race
      conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
      if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
      The c_outgoing bit is initialized in __rds_conn_create().
      
      A side-effect of re-using the client rds_connection for an incoming
      SYN is the potential of encountering duelling SYNs, i.e., we
      have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
      SYN. The logic to arbitrate this criss-crossing SYN exchange in
      rds_tcp_accept_one() has been modified to emulate the BGP state
      machine: the smaller IP address should back off from the connection attempt.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b20fc38
    • David S. Miller's avatar
      Merge branch 'xgbe-next' · 393159e9
      David S. Miller authored
      Tom Lendacky says:
      
      ====================
      amd-xgbe: AMD XGBE driver updates 2015-09-30
      
      The following patches are included in this driver update series:
      
      - Remove unneeded semi-colon
      - Follow the DT/ACPI precedence used by the device_ APIs
      - Add ethtool support for getting and setting the msglevel
      - Add ethtool support error and debug messages
      - Simplify the hardware FIFO assignment calculations
      - Add receive buffer unavailable statistic
      - Use the device workqueue instead of the system workqueue
      - Remove the use of a link state bit
      
      This patch series is based on net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      393159e9
    • Lendacky, Thomas's avatar
      amd-xgbe: Remove the XGBE_LINK state bit · 50789845
      Lendacky, Thomas authored
      The XGBE_LINK bit is used just to determine whether to call the
      netif_carrier_on/off functions. Rather than define and use this bit,
      just call the functions. The netif_carrier_ok function can be used in
      place of checking the XGBE_LINK bit in the future.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50789845
    • Lendacky, Thomas's avatar
      amd-xgbe: Use device workqueue instead of system workqueue · afb43e8a
      Lendacky, Thomas authored
      The driver creates, flushes and destroys a device workqueue but queues
      work to the system workqueue. Switch from using the system workqueue to
      the device workqueue.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afb43e8a
    • Lendacky, Thomas's avatar
      amd-xgbe: Add receive buffer unavailable statistic · 72c9ac4e
      Lendacky, Thomas authored
      Add a statistic that tracks how many times an interrupt is generated for
      a receive buffer not being available to the hardware which prevents the
      hardware from being able to DMA the received data.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72c9ac4e
    • Lendacky, Thomas's avatar
      amd-xgbe: Simplify calculation and setting of queue fifos · 9c439e4b
      Lendacky, Thomas authored
      The calculation of the Tx and Rx fifo sizes can be calculated rather
      than hardcoded in a switch statement. Additionally, the per-queue fifo
      sizes can be calculated rather than hardcoded using if/else if statements
      that can possibly underutilize the available fifo area.
      
      Change the code to calculate the fifo sizes and the per-queue fifo sizes
      to simplify the code and make best use of the available fifo.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c439e4b
    • Lendacky, Thomas's avatar
      amd-xgbe: Add ethtool error and debug messages · e5dd8b81
      Lendacky, Thomas authored
      Add error and dynamic debug messages to various ethtool functions in
      the driver while also removing the DBGPR debug print calls. Also, change
      the message level for some error messages from alert to err.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5dd8b81
    • Lendacky, Thomas's avatar
      amd-xgbe: Add ethtool support for setting the msglevel · 349fb2d7
      Lendacky, Thomas authored
      Provide the ethtool functions to support getting and setting the
      msglevel for the driver.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      349fb2d7
    • Lendacky, Thomas's avatar
      amd-xgbe: Use proper DT / ACPI precedence checking · 47f2e6c2
      Lendacky, Thomas authored
      Device tree presence takes precedence over ACPI in the device_* APIs.
      The amd-xgbe driver should follow the same precedence. Update the check
      on whether to use DT / ACPI to follow this.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47f2e6c2
    • Lendacky, Thomas's avatar
      amd-xgbe: Remove an unneeded semicolon on a switch statement · 3947d78a
      Lendacky, Thomas authored
      Remove an unneeded semicolon at the end of a switch statement block.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3947d78a
    • Eric Dumazet's avatar
      tcp: restore fastopen operations · ac8cfc7b
      Eric Dumazet authored
      I accidentally cleared fastopenq.max_qlen in reqsk_queue_alloc()
      while max_qlen can be set before listen() is called,
      using TCP_FASTOPEN socket option for example.
      
      Fixes: 0536fcc0 ("tcp: prepare fastopen code for upcoming listener changes")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ac8cfc7b
    • David S. Miller's avatar
      Merge branch 'net-y2038' · 77946de5
      David S. Miller authored
      Arnd Bergmann says:
      
      ====================
      net: assorted y2038 changes
      
      This is a set of changes for network drivers and core code to
      get rid of the use of time_t and derived data structures.
      
      I have a longer set of patches that enables me to build kernels
      with the time_t definition removed completely as a help to find
      y2038 overflow issues. This is the subset for networking that
      contains all code that has a reasonable way of fixing at the
      moment and that is either commonly used (in one of the defconfigs)
      or that blocks building a whole subsystem.
      
      Most of the patches in this series should be noncontroversial,
      but the last two that I marked [RFC] are a bit tricky and
      need input from people that are more familiar with the code than
      I am. All 12 patches are independent of one another and can
      be applied in any order, so feel free to pick all that look
      good.
      
      Patches that are not included here are:
      
       - disabling less common device drivers that I don't have a fix
         for yet, this includes
      	drivers/net/ethernet/brocade/bna/bfa_ioc.c
      	drivers/net/ethernet/qlogic/netxen/netxen_nic_hw.c
      	drivers/net/ethernet/tile/tilegx.c
      	drivers/net/hamradio/baycom_ser_fdx.c
      	drivers/net/wireless/ath/ath10k/core.h
      	drivers/net/wireless/ath/ath9k/
      	drivers/net/wireless/ath/ath9k/
      	drivers/net/wireless/atmel.c
      	drivers/net/wireless/prism54/isl_38xx.c
      	drivers/net/wireless/rt2x00/rt2x00debug.c
      	drivers/net/wireless/rtlwifi/
      	drivers/net/wireless/ti/wlcore/
      	drivers/staging/ozwpan/
      	net/atm/mpoa_caches.c
      	net/atm/mpoa_proc.c
      	net/dccp/probe.c
      	net/ipv4/tcp_probe.c
      	net/netfilter/nfnetlink_queue_core.c
      	net/netfilter/nfnetlink_queue_core.c
      	net/netfilter/xt_time.c
      	net/openvswitch/flow.c
      	net/sctp/probe.c
      	net/sunrpc/auth_gss/
      	net/sunrpc/svcauth_unix.c
      	net/vmw_vsock/af_vsock.c
         We'll get there eventually, or we an add a dependency to ensure
         they are not built on 32-bit kernels that need to survive
         beyond 2038. Most of these should be really easy to fix.
      
       - recvmmsg/sendmmsg system calls: patches have been sent out
         as part of the syscall series, need a little more work and
         review
      
       - SIOCGSTAMP/SIOCGSTAMPNS/ ioctl calls: tricky, need to discuss
         with some folks at kernel summit
      
       - SO_RCVTIMEO/SO_SNDTIMEO/SO_TIMESTAMP/SO_TIMESTAMPNS socket
         opt: similar and related to the ioctl
      
       - mmapped packet socket: need to create v4 of the API, nontrivial
      
       - pktgen: sends 32-bit timestamps over network, need to find out
         if using unsigned stamps is good enough
      
       - af_rxpc: similar to pktgen, uses 32-bit times for deadlines
      
       - ppp ioctl: patch is being worked on, nontrivial but doable
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      77946de5
    • Arnd Bergmann's avatar
      net: sctp: avoid incorrect time_t use · 3ef0a25b
      Arnd Bergmann authored
      We want to avoid using time_t in the kernel because of the y2038
      overflow problem. The use in sctp is not for storing seconds at
      all, but instead uses microseconds and is passed as 32-bit
      on all machines.
      
      This patch changes the type to u32, which better fits the use.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: linux-sctp@vger.kernel.org
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ef0a25b
    • Arnd Bergmann's avatar
      ipv6: use ktime_t for internal timestamps · 3dd7669f
      Arnd Bergmann authored
      The ipv6 mip6 implementation is one of only a few users of the
      skb_get_timestamp() function in the kernel, which is both unsafe
      on 32-bit architectures because of the 2038 overflow, and slightly
      less efficient than the skb_get_ktime() based approach.
      
      This converts the function call and the mip6_report_rate_limiter
      structure that stores the time stamp, eliminating all uses of
      timeval in the ipv6 code.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dd7669f
    • Arnd Bergmann's avatar
      nfnetlink: use y2038 safe timestamp · f6389ecb
      Arnd Bergmann authored
      The __build_packet_message function fills a nfulnl_msg_packet_timestamp
      structure that uses 64-bit seconds and is therefore y2038 safe, but
      it uses an intermediate 'struct timespec' which is not.
      
      This trivially changes the code to use 'struct timespec64' instead,
      to correct the result on 32-bit architectures.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: netfilter-devel@vger.kernel.org
      Cc: coreteam@netfilter.org
      Acked-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6389ecb
    • Arnd Bergmann's avatar
      atm: remove 'struct zatm_t_hist' · 70ba07b6
      Arnd Bergmann authored
      The zatm_t_hist structure is not used anywhere in the kernel, but is
      exported to user space. As we are trying to eliminate uses of time_t
      in the kernel for y2038 compatibility, the current definition triggers
      checking tools because it contains 'struct timeval'.
      
      As pointed out by Chas Williams, the only user of this structure was
      the ZATM_GETHIST ioctl command that has been removed a long time ago,
      and we can remove the structure as well without breaking any user
      space.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Chas Williams <3chas3@gmail.com>
      Cc: linux-atm-general@lists.sourceforge.net
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70ba07b6