1. 07 Oct, 2015 6 commits
  2. 06 Oct, 2015 4 commits
  3. 05 Oct, 2015 30 commits
    • Daniel Borkmann's avatar
      ebpf: include perf_event only where really needed · 0cdf5640
      Daniel Borkmann authored
      Commit ea317b26 ("bpf: Add new bpf map type to store the pointer
      to struct perf_event") added perf_event.h to the main eBPF header, so
      it gets included for all users. perf_event.h is actually only needed
      from array map side, so lets sanitize this a bit.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Kaixu Xia <xiakaixu@huawei.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cdf5640
    • Nicolas Schichan's avatar
      ARM: net: support BPF_ALU | BPF_MOD instructions in the BPF JIT. · 4560cdff
      Nicolas Schichan authored
      For ARMv7 with UDIV instruction support, generate an UDIV instruction
      followed by an MLS instruction.
      
      For other ARM variants, generate code calling a C wrapper similar to
      the jit_udiv() function used for BPF_ALU | BPF_DIV instructions.
      
      Some performance numbers reported by the test_bpf module (the duration
      per filter run is reported in nanoseconds, between "jitted:<x>" and
      "PASS":
      
      ARMv7 QEMU nojit:	test_bpf: #3 DIV_MOD_KX jited:0 2196 PASS
      ARMv7 QEMU jit:		test_bpf: #3 DIV_MOD_KX jited:1 104 PASS
      ARMv5 QEMU nojit:	test_bpf: #3 DIV_MOD_KX jited:0 2176 PASS
      ARMv5 QEMU jit:		test_bpf: #3 DIV_MOD_KX jited:1 1104 PASS
      ARMv5 kirkwood nojit:	test_bpf: #3 DIV_MOD_KX jited:0 1103 PASS
      ARMv5 kirkwood jit:	test_bpf: #3 DIV_MOD_KX jited:1 311 PASS
      Signed-off-by: default avatarNicolas Schichan <nschichan@freebox.fr>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4560cdff
    • David S. Miller's avatar
      Merge branch 'asix-rx-mem-handling' · df7b6015
      David S. Miller authored
      Mark Craske says:
      
      ====================
      Improve ASIX RX memory allocation error handling
      
      The ASIX RX handler algorithm is weak on error handling.
      There is a design flaw in the ASIX RX handler algorithm because the
      implementation for handling RX Ethernet frames for the DUB-E100 C1 can
      have Ethernet frames spanning multiple URBs. This means that payload data
      from more than 1 URB is sometimes needed to fill the socket buffer with a
      complete Ethernet frame. When the URB with the start of an Ethernet frame
      is received then an attempt is made to allocate a socket buffer. If the
      memory allocation fails then the algorithm sets the buffer pointer member
      to NULL and the function exits (no crash yet). Subsequently, the RX hander
      is called again to process the next URB which assumes there is a socket
      buffer available and the kernel crashes when there is no buffer.
      
      This patchset implements an improvement to the RX handling algorithm to
      avoid a crash when no memory is available for the socket buffer.
      
      The patchset will apply cleanly to the net-next master branch but the
      created kernel has not been tested. The driver was tested on ARM kernels
      v3.8 and v3.14 for a commercial product.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df7b6015
    • Dean Jenkins's avatar
      asix: Continue processing URB if no RX netdev buffer · 6a570814
      Dean Jenkins authored
      Avoid a loss of synchronisation of the Ethernet Data header 32-bit
      word due to a failure to get a netdev socket buffer.
      
      The ASIX RX handling algorithm returned 0 upon a failure to get
      an allocation of a netdev socket buffer. This causes the URB
      processing to stop which potentially causes a loss of synchronisation
      with the Ethernet Data header 32-bit word. Therefore, subsequent
      processing of URBs may be rejected due to a loss of synchronisation.
      This may cause additional good Ethernet frames to be discarded
      along with outputting of synchronisation error messages.
      
      Implement a solution which checks whether a netdev socket buffer
      has been allocated before trying to copy the Ethernet frame into
      the netdev socket buffer. But continue to process the URB so that
      synchronisation is maintained. Therefore, only a single Ethernet
      frame is discarded when no netdev socket buffer is available.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6a570814
    • Dean Jenkins's avatar
      asix: On RX avoid creating bad Ethernet frames · 3f30b158
      Dean Jenkins authored
      When RX Ethernet frames span multiple URB socket buffers,
      the data stream may suffer a discontinuity which will cause
      the current Ethernet frame in the netdev socket buffer
      to be incomplete. This frame needs to be discarded instead
      of appending unrelated data from the current URB socket buffer
      to the Ethernet frame in the netdev socket buffer. This avoids
      creating a corrupted Ethernet frame in the netdev socket buffer.
      
      A discontinuity can occur when the previous URB socket buffer
      held an incomplete Ethernet frame due to truncation or a
      URB socket buffer containing the end of the Ethernet frame
      was missing.
      
      Therefore, add a sanity test for when an Ethernet frame
      spans multiple URB socket buffers to check that the remaining
      bytes of the currently received Ethernet frame point to
      a good Data header 32-bit word of the next Ethernet
      frame. Upon error, reset the remaining bytes variable to
      zero and discard the current netdev socket buffer.
      Assume that the Data header is located at the start of
      the current socket buffer and attempt to process the next
      Ethernet frame from there. This avoids unnecessarily
      discarding a good URB socket buffer that contains a new
      Ethernet frame.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f30b158
    • Dean Jenkins's avatar
      asix: Simplify asix_rx_fixup_internal() netdev alloc · 9a5ccd8e
      Dean Jenkins authored
      The code is checking that the Ethernet frame will fit into a
      netdev allocated socket buffer within the constraints of MTU size,
      Ethernet header length plus VLAN header length.
      
      The original code was checking rx->remaining each loop of the while
      loop that processes multiple Ethernet frames per URB and/or Ethernet
      frames that span across URBs. rx->remaining decreases per while loop
      so there is no point in potentially checking multiple times that the
      Ethernet frame (remaining part) will fit into the netdev socket buffer.
      
      The modification checks that the size of the Ethernet frame will fit
      the netdev socket buffer before allocating the netdev socket buffer.
      This avoids grabbing memory and then deciding that the Ethernet frame
      is too big and then freeing the memory.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9a5ccd8e
    • Dean Jenkins's avatar
      asix: Tidy-up 32-bit header word synchronisation · 3bfc69ab
      Dean Jenkins authored
      Tidy-up the Data header 32-bit word synchronisation logic in
      asix_rx_fixup_internal() by removing redundant logic tests.
      
      The code is looking at the following cases of the Data header
      32-bit word that is present before each Ethernet frame:
      
      a) all 32 bits of the Data header word are in the URB socket buffer
      b) first 16 bits of the Data header word are at the end of the URB
         socket buffer
      c) last 16 bits of the Data header word are at the start of the URB
         socket buffer eg. split_head = true
      
      Note that the lifetime of rx->split_head exists outside of the
      function call and is accessed per processing of each URB. Therefore,
      split_head being true acts on the next URB to be processed.
      
      To check for b) the offset will be 16 bits (2 bytes) from the end of
      the buffer then indicate split_head is true.
      To check for c) split_head must be true because the first 16 bits
      have been found.
      To check for a) else c)
      
      Note that the || logic of the old code included the state
      (skb->len - offset == sizeof(u16) && rx->split_head) which is not
      possible because the split_head cannot be true whilst checking for b).
      This is because the split_head indicates that the first 16 bits have
      been found and that is not possible whilst checking for the first 16
      bits. Therefore simplify the logic.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3bfc69ab
    • Dean Jenkins's avatar
      asix: Rename remaining and size for clarity · 7b0378f5
      Dean Jenkins authored
      The Data header synchronisation is easier to understand
      if the variables "remaining" and "size" are renamed.
      
      Therefore, the lifetime of the "remaining" variable exists
      outside of asix_rx_fixup_internal() and is used to indicate
      any remaining pending bytes of the Ethernet frame that need
      to be obtained from the next socket buffer. This allows an
      Ethernet frame to span across multiple socket buffers.
      
      "size" is now local to asix_rx_fixup_internal() and contains
      the size read from the Data header 32-bit word.
      
      Add "copy_length" to hold the number of the Ethernet frame
      bytes (maybe a part of a full frame) that are to be copied
      out of the socket buffer.
      Signed-off-by: default avatarDean Jenkins <Dean_Jenkins@mentor.com>
      Signed-off-by: default avatarMark Craske <Mark_Craske@mentor.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b0378f5
    • Daniel Borkmann's avatar
      bpf, seccomp: prepare for upcoming criu support · bab18991
      Daniel Borkmann authored
      The current ongoing effort to dump existing cBPF seccomp filters back
      to user space requires to hold the pre-transformed instructions like
      we do in case of socket filters from sk_attach_filter() side, so they
      can be reloaded in original form at a later point in time by utilities
      such as criu.
      
      To prepare for this, simply extend the bpf_prog_create_from_user()
      API to hold a flag that tells whether we should store the original
      or not. Also, fanout filters could make use of that in future for
      things like diag. While fanout filters already use bpf_prog_destroy(),
      move seccomp over to them as well to handle original programs when
      present.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Tycho Andersen <tycho.andersen@canonical.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Tested-by: default avatarTycho Andersen <tycho.andersen@canonical.com>
      Acked-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bab18991
    • WANG Cong's avatar
      vrf: fix a kernel warning · 0a15afd2
      WANG Cong authored
      This fixes:
      
       tried to remove device ip6gre0 from (null)
       ------------[ cut here ]------------
       kernel BUG at net/core/dev.c:5219!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
       CPU: 3 PID: 161 Comm: kworker/u8:2 Not tainted 4.3.0-rc2+ #1142
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       Workqueue: netns cleanup_net
       task: ffff8800d784a9c0 ti: ffff8800d74a4000 task.ti: ffff8800d74a4000
       RIP: 0010:[<ffffffff817f0797>]  [<ffffffff817f0797>] __netdev_adjacent_dev_remove+0x40/0xec
       RSP: 0018:ffff8800d74a7a98  EFLAGS: 00010282
       RAX: 000000000000002a RBX: 0000000000000000 RCX: 0000000000000000
       RDX: ffff88011adcf701 RSI: ffff88011adccbf8 RDI: ffff88011adccbf8
       RBP: ffff8800d74a7ab8 R08: 0000000000000001 R09: 0000000000000000
       R10: ffffffff81d190ff R11: 00000000ffffffff R12: ffff8800d599e7c0
       R13: 0000000000000000 R14: ffff8800d599e890 R15: ffffffff82385e00
       FS:  0000000000000000(0000) GS:ffff88011ac00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00007ffd6f003000 CR3: 000000000220c000 CR4: 00000000000006e0
       Stack:
        0000000000000000 ffff8800d599e7c0 0000000000000b00 ffff8800d599e8a0
        ffff8800d74a7ad8 ffffffff817f0861 0000000000000000 ffff8800d599e7c0
        ffff8800d74a7af8 ffffffff817f088f 0000000000000000 ffff8800d599e7c0
       Call Trace:
        [<ffffffff817f0861>] __netdev_adjacent_dev_unlink+0x1e/0x35
        [<ffffffff817f088f>] __netdev_adjacent_dev_unlink_neighbour+0x17/0x41
        [<ffffffff817f56e6>] netdev_upper_dev_unlink+0x6c/0x13d
        [<ffffffff81674a3d>] vrf_del_slave+0x26/0x7d
        [<ffffffff81674ac3>] vrf_device_event+0x2f/0x34
        [<ffffffff81098c40>] notifier_call_chain+0x75/0x9c
        [<ffffffff81098fa2>] raw_notifier_call_chain+0x14/0x16
        [<ffffffff817ee129>] call_netdevice_notifiers_info+0x52/0x59
        [<ffffffff817f179d>] call_netdevice_notifiers+0x13/0x15
        [<ffffffff817f6f18>] rollback_registered_many+0x14f/0x24f
        [<ffffffff817f70f2>] unregister_netdevice_many+0x19/0x64
        [<ffffffff819a2455>] ip6gre_exit_net+0x163/0x177
        [<ffffffff817eb019>] ops_exit_list+0x44/0x55
        [<ffffffff817ebcb7>] cleanup_net+0x193/0x226
        [<ffffffff81091e1c>] process_one_work+0x26c/0x4d8
        [<ffffffff81091d20>] ? process_one_work+0x170/0x4d8
        [<ffffffff81092296>] worker_thread+0x1df/0x2c2
        [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f
        [<ffffffff810920b7>] ? process_scheduled_works+0x2f/0x2f
        [<ffffffff81097a20>] kthread+0xd4/0xdc
        [<ffffffff810bc523>] ? trace_hardirqs_on_caller+0x17d/0x199
        [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83
        [<ffffffff81a5240f>] ret_from_fork+0x3f/0x70
        [<ffffffff8109794c>] ? __kthread_parkme+0x83/0x83
      
      Fixes: 93a7e7e8 ("net: Remove the now unused vrf_ptr")
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0a15afd2
    • kbuild test robot's avatar
      9886ce2b
    • Jon Ringle's avatar
      net: Microchip encx24j600 driver · 04fbfce7
      Jon Ringle authored
      This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet
      controller over a SPI bus interface. This driver makes use of the regmap API to
      optimize access to registers by caching registers where possible.
      
      Datasheet:
      http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdfSigned-off-by: default avatarJon Ringle <jringle@gridpoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      04fbfce7
    • Jon Ringle's avatar
      regmap: Allow installing custom reg_update_bits function · 7741c373
      Jon Ringle authored
      This commit allows installing a custom reg_update_bits function for cases where
      the hardware provides a mechanism to set or clear register bits without a
      read/modify/write cycle. Such is the case with the Microchip ENCX24J600.
      Signed-off-by: default avatarJon Ringle <jringle@gridpoint.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7741c373
    • Govindarajulu Varadarajan's avatar
      enic: do hang reset only in case of tx timeout · 937317c7
      Govindarajulu Varadarajan authored
      The current code invokes hang reset in case of error interrupt. We should
      hang reset only in case of tx timeout. This because of the way hang reset
      is implemented in firmware. Hang reset takes more firmware resources than
      soft reset. Adaptor does not generate error interrupt in case of tx
      timeout.
      
      Hang reset only in case of tx timeout, in .ndo_tx_timeout. Do soft reset
      otherwise. Introduce deferred work, enic_tx_hang_reset, to do hang reset.
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      937317c7
    • Govindarajulu Varadarajan's avatar
      enic: handle spurious error interrupt · cc809237
      Govindarajulu Varadarajan authored
      Some of the enic adaptors are know to generate spurious interrupts. When
      error interrupt is generated, driver just resets the device. This patch
      resets the device only when an error is occurred.
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc809237
    • David S. Miller's avatar
      Merge branch 'cxgb4-next' · 2905f5bb
      David S. Miller authored
      Hariprasad Shenai says:
      
      ====================
      cxgb4: Trivial fixes for cxgb4
      
      Fixes the following issues
      Don't read non existent T4/T5/T6 adapter registers for ethtool dump.
      For T4, dont read mailbox control registers. Adds new devlog faility and
      report correct link speed for unsupported ones.
      
      This patch series has been created against net-next tree and includes
      patches on cxgb4 driver.
      
      We have included all the maintainers of respective drivers. Kindly review
      the change and let us know in case of any review comments.
      ====================
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2905f5bb
    • Hariprasad Shenai's avatar
      cxgb4: Report correct link speed for unsupported ones · 85412255
      Hariprasad Shenai authored
      When we get garbage from the firmware with weird Port Speeds,
      etc. we should emit a warning regarding unsupported speeds rather than
      use the bogus default of "10Mbps" which isn't even an option in the
      firmware Port Information message
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85412255
    • Hariprasad Shenai's avatar
      cxgb4: Adds a new Device Log Facility FW_DEVLOG_FACILITY_CF · da4976e1
      Hariprasad Shenai authored
      The firmware team added a new Device Log Facility FW_DEVLOG_FACILITY_CF,
      but the driver has been decoding Device Log messages with that Facility as
      "(NULL)", fixing it.
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      da4976e1
    • Hariprasad Shenai's avatar
      cxgb4: For T4, don't read the Firmware Mailbox Control register · b3695540
      Hariprasad Shenai authored
      T4 doesn't have the Shadow copy of the register which we can read without
      side effect. So don't read mbox control register for T4 adapter
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3695540
    • Hariprasad Shenai's avatar
      cxgb4 : Update T4/T5/T6 register ranges · 8119c018
      Hariprasad Shenai authored
      Update T4/T5/T6 adapter register ranges so that it doesn't read non
      existent registers when dumped using ethtool
      Signed-off-by: default avatarHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8119c018
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next · 40e10680
      David S. Miller authored
      Eric W. Biederman says:
      
      ====================
      net: Pass net through ip fragmention
      
      This is the next installment of my work to pass struct net through the
      output path so the code does not need to guess how to figure out which
      network namespace it is in, and ultimately routes can have output
      devices in another network namespace.
      
      This round focuses on passing net through ip fragmentation which we seem
      to call from about everywhere.  That is the main ip output paths, the
      bridge netfilter code, and openvswitch.  This has to happend at once
      accross the tree as function pointers are involved.
      
      First some prep work is done, then ipv4 and ipv6 are converted and then
      temporary helper functions are removed.
      ====================
      Acked-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40e10680
    • David S. Miller's avatar
      Merge branch 'rds-perf' · 7e2832f1
      David S. Miller authored
      Sowmini Varadhan says:
      
      ====================
      RDS: RDS-TCP perf enhancements
      
      A 3-part patchset that (a) improves current RDS-TCP perf
      by 2X-3X and (b) refactors earlier robustness code for
      better observability/scaling.
      
      Patch 1 is an enhancment of earlier robustness fixes
      that had used separate sockets for client and server endpoints to
      resolve race conditions. It is possible to have an equivalent
      solution that does not use 2 sockets. The benefit of a
      single socket solution is that it results in more predictable
      and observable behavior for the underlying TCP pipe of an
      RDS connection
      
      Patches 2 and 3 are simple, straightforward perf bug fixes
      that align the RDS TCP socket with other parts of the kernel stack.
      
      v2: fix kbuild-test-robot warnings, comments from  Sergei Shtylov
          and Santosh Shilimkar.
      ====================
      Acked-by: default avatarSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e2832f1
    • Sowmini Varadhan's avatar
      RDS-TCP: Set up MSG_MORE and MSG_SENDPAGE_NOTLAST as appropriate in rds_tcp_xmit · 76b29ef1
      Sowmini Varadhan authored
      For the same reasons as commit 2f533844 ("tcp: allow splice() to
      build full TSO packets") and commit 35f9c09f ("tcp: tcp_sendpages()
      should call tcp_push() once"), rds_tcp_xmit may have multiple pages to
      send, so use the MSG_MORE and MSG_SENDPAGE_NOTLAST as hints to
      tcp_sendpage()
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      76b29ef1
    • Sowmini Varadhan's avatar
      RDS-TCP: Do not bloat sndbuf/rcvbuf in rds_tcp_tune · 1edd6a14
      Sowmini Varadhan authored
      Using the value of RDS_TCP_DEFAULT_BUFSIZE (128K)
      clobbers efficient use of TSO because it inflates the size_goal
      that is computed in tcp_sendmsg/tcp_sendpage and skews packet
      latency, and the default values for these parameters actually
      results in significantly better performance.
      
      In request-response tests using rds-stress with a packet size of
      100K with 16 threads (test parameters -q 100000 -a 256 -t16 -d16)
      between a single pair of IP addresses achieves a throughput of
      6-8 Gbps. Without this patch, throughput maxes at 2-3 Gbps under
      equivalent conditions on these platforms.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1edd6a14
    • Sowmini Varadhan's avatar
      RDS: Use a single TCP socket for both send and receive. · 3b20fc38
      Sowmini Varadhan authored
      Commit f711a6ae ("net/rds: RDS-TCP: Always create a new rds_sock
      for an incoming connection.") modified rds-tcp so that an incoming SYN
      would ignore an existing "client" TCP connection which had the local
      port set to the transient port.  The motivation for ignoring the existing
      "client" connection in f711a6ae was to avoid race conditions and an
      endless duel of reconnect attempts triggered by a restart/abort of one
      of the nodes in the TCP connection.
      
      However, having separate sockets for active and passive sides
      is avoidable, and the simpler model of a single TCP socket for
      both send and receives of all RDS connections associated with
      that tcp socket makes for easier observability. We avoid the race
      conditions from f711a6ae by attempting reconnects in rds_conn_shutdown
      if, and only if, the (new) c_outgoing bit is set for RDS_TRANS_TCP.
      The c_outgoing bit is initialized in __rds_conn_create().
      
      A side-effect of re-using the client rds_connection for an incoming
      SYN is the potential of encountering duelling SYNs, i.e., we
      have an outgoing RDS_CONN_CONNECTING socket when we get the incoming
      SYN. The logic to arbitrate this criss-crossing SYN exchange in
      rds_tcp_accept_one() has been modified to emulate the BGP state
      machine: the smaller IP address should back off from the connection attempt.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3b20fc38
    • David S. Miller's avatar
      Merge branch 'xgbe-next' · 393159e9
      David S. Miller authored
      Tom Lendacky says:
      
      ====================
      amd-xgbe: AMD XGBE driver updates 2015-09-30
      
      The following patches are included in this driver update series:
      
      - Remove unneeded semi-colon
      - Follow the DT/ACPI precedence used by the device_ APIs
      - Add ethtool support for getting and setting the msglevel
      - Add ethtool support error and debug messages
      - Simplify the hardware FIFO assignment calculations
      - Add receive buffer unavailable statistic
      - Use the device workqueue instead of the system workqueue
      - Remove the use of a link state bit
      
      This patch series is based on net-next.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      393159e9
    • Lendacky, Thomas's avatar
      amd-xgbe: Remove the XGBE_LINK state bit · 50789845
      Lendacky, Thomas authored
      The XGBE_LINK bit is used just to determine whether to call the
      netif_carrier_on/off functions. Rather than define and use this bit,
      just call the functions. The netif_carrier_ok function can be used in
      place of checking the XGBE_LINK bit in the future.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50789845
    • Lendacky, Thomas's avatar
      amd-xgbe: Use device workqueue instead of system workqueue · afb43e8a
      Lendacky, Thomas authored
      The driver creates, flushes and destroys a device workqueue but queues
      work to the system workqueue. Switch from using the system workqueue to
      the device workqueue.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      afb43e8a
    • Lendacky, Thomas's avatar
      amd-xgbe: Add receive buffer unavailable statistic · 72c9ac4e
      Lendacky, Thomas authored
      Add a statistic that tracks how many times an interrupt is generated for
      a receive buffer not being available to the hardware which prevents the
      hardware from being able to DMA the received data.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72c9ac4e
    • Lendacky, Thomas's avatar
      amd-xgbe: Simplify calculation and setting of queue fifos · 9c439e4b
      Lendacky, Thomas authored
      The calculation of the Tx and Rx fifo sizes can be calculated rather
      than hardcoded in a switch statement. Additionally, the per-queue fifo
      sizes can be calculated rather than hardcoded using if/else if statements
      that can possibly underutilize the available fifo area.
      
      Change the code to calculate the fifo sizes and the per-queue fifo sizes
      to simplify the code and make best use of the available fifo.
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c439e4b