1. 28 Mar, 2017 7 commits
    • David S. Miller's avatar
      Merge branch 'bond-link-status-fixes' · 95ed0edd
      David S. Miller authored
      Mahesh Bandewar says:
      
      ====================
      link-status fixes for mii-monitoring
      
      The mii monitoring is divided into two phases - inspect and commit. The
      inspect phase technically should not make any changes to the state and
      defer it to the commit phase. However detected link state inconsistencies
      on several machines and discovered that it's the result of some
      inconsistent update to link states and assumption that you *always* get
      rtnl-mutex. In reality when trylock() fails to acquire rtnl-mutex, the
      commit phase is postponed until next mii-mon run. At the next round
      because of the state change performed in the previous inspect-run, this
      round does not detect any changes and would skip calling commit phase.
      This would result in an inconsistent state until next link event happens
      (if it ever happens).
      
      During the the commit phase, it's always assumed that speed and duplex
      fetch is always successful, but that's always not the case. However the
      slave state is marked UP irrespective of speed / duplex fetch operation.
      If the speed / duplex fetch operation results in insane values for either
      of these two fields, then keeping internal link state UP is not going to
      provide fruitful results either.
      
      Please see into individual patches for more details.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95ed0edd
    • Mahesh Bandewar's avatar
    • Mahesh Bandewar's avatar
      bonding: correctly update link status during mii-commit phase · b5bf0f5b
      Mahesh Bandewar authored
      bond_miimon_commit() marks the link UP after attempting to get the speed
      and duplex settings for the link. There is a possibility that
      bond_update_speed_duplex() could fail. This is another place where it
      could result into an inconsistent bonding link state.
      
      With this patch the link will be marked UP only if the speed and duplex
      values retrieved have sane values and processed further.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b5bf0f5b
    • Mahesh Bandewar's avatar
      bonding: make speed, duplex setting consistent with link state · c4adfc82
      Mahesh Bandewar authored
      bond_update_speed_duplex() retrieves speed and duplex settings. There
      is a possibility of failure in retrieving these values but caller has
      to assume it's always successful. This leads to having inconsistent
      slave link settings. If these (speed, duplex) values cannot be
      retrieved, then keeping the link UP causes problems.
      
      The updated bond_update_speed_duplex() returns 0 on success if it
      retrieves sane values for speed and duplex. On failure it returns 1
      and marks the link down.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c4adfc82
    • Mahesh Bandewar's avatar
      bonding: improve link-status update in mii-monitoring · de77ecd4
      Mahesh Bandewar authored
      The primary issue is that mii-inspect phase updates link-state and
      expects changes to be committed during the mii-commit phase. After
      the inspect phase if it fails to acquire rtnl-mutex, the commit
      phase (bond_mii_commit) doesn't get to run. This partially updated
      state stays and makes the internal-state inconsistent.
      
      e.g. setup bond0 => slaves: eth1, eth2
      eth1 goes DOWN -> UP
         mii_monitor()
      	mii-inspect()
      	    bond_set_slave_link_state(eth1, UP, DontNotify)
      	rtnl_trylock() <- fails!
      
      Next mii-monitor round
      eth1: No change
         mii_monitor()
      	mii-inspect()
      	    eth1->link == current-status (ethtool_ops->get_link)
      	    no-change-detected
      
      End result:
          eth1:
            Link = BOND_LINK_UP
            Speed = 0xfffff  [SpeedUnknown]
            Duplex = 0xff    [DuplexUnknown]
      
      This doesn't always happen but for some unlucky machines in a large set
      of machines it creates problems.
      
      The fix for this is to avoid making changes during inspect phase and
      postpone them until acquiring the rtnl-mutex / invoking commit phase.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de77ecd4
    • Mahesh Bandewar's avatar
      bonding: split bond_set_slave_link_state into two parts · f307668b
      Mahesh Bandewar authored
      Split the function into two (a) propose (b) commit phase without
      changing the semantics for the original API.
      Signed-off-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f307668b
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 205ed44e
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-03-27
      
      This series contains updates to i40e and i40evf only.
      
      Alex updates the driver code so that we can do bulk updates of the page
      reference count instead of just incrementing it by one reference at a
      time.  Fixed an issue where we were not resetting skb back to NULL when
      we have freed it.  Cleaned up the i40e_process_skb_fields() to align with
      other Intel drivers.  Removed FCoE code, since it is not supported in any
      of the Fortville/Fortpark hardware, so there is not much point of carrying
      the code around, especially if it is broken and untested.
      
      Harshitha fixes a bug in the driver where the calculation of the RSS size
      was not taking into account the number of traffic classes enabled.
      
      Robert fixes a potential race condition during VF reset by eliminating
      IOMMU DMAR Faults caused by VF hardware and when the OS initiates a VF
      reset and before the reset is finished we modify the VF's settings.
      
      Bimmy removes a delay that is no longer needed, since it was only needed
      for preproduction hardware.
      
      Colin King fixes null pointer dereference, where VSI was being
      dereferenced before the VSI NULL check.
      
      Jake fixes an issue with the recent addition of the "client code" to the
      driver, where we attempt to use an uninitialized variable, so correctly
      initialize the params variable by calling i40e_client_get_params().
      
      v2: dropped patch 5 of the original series from Carolyn since we need
          more documentation and reason why the added delay, so Carolyn is
          taking the time to update the patch before we re-submit it for
          kernel inclusion.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      205ed44e
  2. 27 Mar, 2017 20 commits
  3. 26 Mar, 2017 10 commits
  4. 25 Mar, 2017 3 commits
    • David S. Miller's avatar
      Merge branch 'epoll-busypoll' · 2239cc63
      David S. Miller authored
      Alexander Duyck says:
      
      ====================
      Add busy poll support for epoll
      
      This patch set adds support for using busy polling with epoll. The main
      idea behind this is that we record the NAPI ID for the last event that is
      moved onto the ready list for the epoll context and then when we no longer
      have any events on the ready list we begin polling with that ID. If the
      busy polling does not yield any events then we will reset the NAPI ID to 0
      and wait until a new event is added to the ready list with a valid NAPI ID
      before we will resume busy polling.
      
      Most of the changes in this set authored by me are meant to be cleanup or
      fixes for various things. For example, I am trying to make it so that we
      don't perform hash look-ups for the NAPI instance when we are only working
      with sender_cpu and the like.
      
      At the heart of this set is the last 3 patches which enable epoll support
      and add support for obtaining the NAPI ID of a given socket. With these it
      becomes possible for an application to make use of epoll and get optimal
      busy poll utilization by stacking multiple sockets with the same NAPI ID on
      the same epoll context.
      
      v1: The first version of this series only allowed epoll to busy poll if all
          of the sockets with a NAPI ID shared the same NAPI ID. I feel we were
          too strict with this requirement, so I changed the behavior for v2.
      v2: The second version was pretty much a full rewrite of the first set. The
          main changes consisted of pulling apart several patches to better
          address the need to clean up a few items and to make the code easier to
          review. In the set however I went a bit overboard and was trying to fix
          an issue that would only occur with 500+ years of uptime, and in the
          process limited the range for busy_poll/busy_read unnecessarily.
      v3: Split off the code for limiting busy_poll and busy_read into a separate
          patch for net.
          Updated patch that changed busy loop time tracking so that it uses
          "local_clock() >> 10" as we originally did.
          Tweaked "Change return type.." patch by moving declaration of "work"
          inside the loop where is was accessed and always reset to 0.
          Added "Acked-by" for patches that received acks.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2239cc63
    • Sridhar Samudrala's avatar
      net: Introduce SO_INCOMING_NAPI_ID · 6d433902
      Sridhar Samudrala authored
      This socket option returns the NAPI ID associated with the queue on which
      the last frame is received. This information can be used by the apps to
      split the incoming flows among the threads based on the Rx queue on which
      they are received.
      
      If the NAPI ID actually represents a sender_cpu then the value is ignored
      and 0 is returned.
      Signed-off-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d433902
    • Sridhar Samudrala's avatar
      epoll: Add busy poll support to epoll with socket fds. · bf3b9f63
      Sridhar Samudrala authored
      This patch adds busy poll support to epoll. The implementation is meant to
      be opportunistic in that it will take the NAPI ID from the last socket
      that is added to the ready list that contains a valid NAPI ID and it will
      use that for busy polling until the ready list goes empty.  Once the ready
      list goes empty the NAPI ID is reset and busy polling is disabled until a
      new socket is added to the ready list.
      
      In addition when we insert a new socket into the epoll we record the NAPI
      ID and assume we are going to receive events on it.  If that doesn't occur
      it will be evicted as the active NAPI ID and we will resume normal
      behavior.
      
      An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket
      options to spread the incoming connections to specific worker threads
      based on the incoming queue. This enables epoll for each worker thread
      to have only sockets that receive packets from a single queue. So when an
      application calls epoll_wait() and there are no events available to report,
      busy polling is done on the associated queue to pull the packets.
      Signed-off-by: default avatarSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bf3b9f63