1. 06 Apr, 2017 9 commits
    • Ngai-Mint Kwan's avatar
      fm10k: do not enqueue mailbox when host not ready · 7d4fe0d1
      Ngai-Mint Kwan authored
      Interfaces will reset whenever the TX mailbox FIFO has become full. This
      occurs more frequently whenever the IES API application is not running
      to process and clear the messages in the FIFO. Thus, this could lead to
      situations where the interface would enter an infinite reset loop. That
      is: if the interface is trying to synchronize a huge number of unicast
      and multicast entries with the IES API application, the TX mailbox FIFO
      will become full and the interface resets. Once the interface exits
      reset, it'll try to synchronize the unicast and multicast entries again.
      Ergo, this creates an infinite loop. Other actions such as multiple
      mulitcast mode or up/down transitions will fill the TX mailbox FIFO and
      induce the interface to reset. To correct these situations, check if the
      interface's "host_ready" flag is enabled before enqueuing any messages
      to the TX mailbox FIFO. This check will be conducted by a function call.
      Lastly, this issue mainly affects the PF and, thus, the VF is exempt.
      Signed-off-by: default avatarNgai-Mint Kwan <ngai-mint.kwan@intel.com>
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7d4fe0d1
    • Ngai-Mint Kwan's avatar
      fm10k: disable receive queue when configuring ring · 16b1889f
      Ngai-Mint Kwan authored
      Write to RXQCTL register to disable the receive queue when configuring
      the RX ring.
      Signed-off-by: default avatarNgai-Mint Kwan <ngai-mint.kwan@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      16b1889f
    • Jacob Keller's avatar
      fm10k: update function header comment for fm10k_get_stats64 · 02957703
      Jacob Keller authored
      Re-word the comment to avoid stating that we return a value for this
      void function. Additionally, there is no need to mention older kernels,
      since this is the upstream kernel.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      02957703
    • Jacob Keller's avatar
      fm10k: allow service task to reschedule itself · b4fd8ffc
      Jacob Keller authored
      If some code path executes fm10k_service_event_schedule(), it is
      guaranteed that we only queue the service task once, since we use
      __FM10K_SERVICE_SCHED flag. Unfortunately this has a side effect that if
      a service request occurs while we are currently running the watchdog, it
      is possible that we will fail to notice the request and ignore it until
      the next time the request occurs.
      
      This can cause problems with pf/vf mailbox communication and other
      service event tasks. To avoid this, introduce a FM10K_SERVICE_REQUEST
      bit. When we successfully schedule (and set the _SCHED bit) the service
      task, we will clear this bit. However, if we are unable to currently
      schedule the service event, we just set the new SERVICE_REQUEST bit.
      
      Finally, after the service event completes, we will re-schedule if the
      request bit has been set.
      
      This should ensure that we do not miss any service event schedules,
      since we will re-schedule it once the currently running task finishes.
      This means that for each request, we will always schedule the service
      task to run at least once in full after the request came in.
      
      This will avoid timing issues that can occur with the service event
      scheduling. We do pay a cost in re-running many tasks, but all the
      service event tasks use either flags to avoid duplicate work, or are
      tolerant of being run multiple times.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      b4fd8ffc
    • Jacob Keller's avatar
      fm10k: future-proof state bitmaps using DECLARE_BITMAP · 46929557
      Jacob Keller authored
      This ensures that future programmers do not have to remember to re-size
      the bitmaps due to adding new values. Although this is unlikely for this
      driver, it may happen and it's best to prevent it from ever being an
      issue.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      46929557
    • Jacob Keller's avatar
      fm10k: use a BITMAP for flags to avoid race conditions · 3ee7b3a3
      Jacob Keller authored
      Replace bitwise operators and #defines with a BITMAP and enumeration
      values. This is similar to how we handle the "state" values as well.
      
      This has two distinct advantages over the old method. First, we ensure
      correctness of operations which are currently problematic due to race
      conditions. Suppose that two kernel threads are running, such as the
      watchdog and an ethtool ioctl, and both modify flags. We'll say that the
      watchdog is CPU A, and the ethtool ioctl is CPU B.
      
      CPU A sets FLAG_1, which can be seen as
        CPU A read FLAGS
        CPU A write FLAGS | FLAG_1
      
      CPU B sets FLAG_2, which can be seen as
        CPU B read FLAGS
        CPU A write FLAGS | FLAG_2
      
      However, "|=" and "&=" operators are not actually atomic. So this could
      be ordered like the following:
      
      CPU A read FLAGS -> variable
      CPU B read FLAGS -> variable
      CPU A write FLAGS (variable | FLAG_1)
      CPU B write FLAGS (variable | FLAG_2)
      
      Notice how the 2nd write from CPU B could actually undo the write from
      CPU A because it isn't guaranteed that the |= operation is atomic.
      
      In practice the race windows for most flag writes is incredibly narrow
      so it is not easy to isolate issues. However, the more flags we have,
      the more likely they will cause problems. Additionally, if such
      a problem were to arise, it would be incredibly difficult to track down.
      
      Second, there is an additional advantage beyond code correctness. We can
      now automatically size the BITMAP if more flags were added, so that we
      do not need to remember that flags is u32 and thus if we added too many
      flags we would over-run the variable. This is not a likely occurrence
      for fm10k driver, but this patch can serve as an example for other
      drivers which have many more flags.
      
      This particular change does have a bit of trouble converting some of the
      idioms previously used with the #defines for flags. Specifically, when
      converting FM10K_FLAG_RSS_FIELD_IPV[46]_UDP flags. This whole operation
      was actually quite problematic, because we actually stored flags
      separately. This could more easily show the problem of the above
      re-ordering issue.
      
      This is really difficult to test whether atomics make a difference in
      practical scenarios, but you can ensure that basic functionality remains
      the same. This patch has a lot of code coverage, but most of it is
      relatively simple.
      
      While we are modifying these files, update their copyright year.
      Signed-off-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      3ee7b3a3
    • Phil Turnbull's avatar
      fm10k: correctly check if interface is removed · 540fca35
      Phil Turnbull authored
      FM10K_REMOVED expects a hardware address, not a 'struct fm10k_hw'.
      
      Fixes: 5cb8db4a ("fm10k: Add support for VF")
      Signed-off-by: default avatarPhil Turnbull <phil.turnbull@oracle.com>
      Tested-by: default avatarKrishneil Singh <krishneil.k.singh@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      540fca35
    • Jarod Wilson's avatar
      bonding: attempt to better support longer hw addresses · faeeb317
      Jarod Wilson authored
      People are using bonding over Infiniband IPoIB connections, and who knows
      what else. Infiniband has a hardware address length of 20 octets
      (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32.
      Various places in the bonding code are currently hard-wired to 6 octets
      (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides,
      only alb is currently possible on Infiniband links right now anyway, due
      to commit 1533e773, so the alb code is where most of the changes are.
      
      One major component of this change is the addition of a bond_hw_addr_copy
      function that takes a length argument, instead of using ether_addr_copy
      everywhere that hardware addresses need to be copied about. The other
      major component of this change is converting the bonding code from using
      struct sockaddr for address storage to struct sockaddr_storage, as the
      former has an address storage space of only 14, while the latter is 128
      minus a few, which is necessary to support bonding over device with up to
      MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes
      up some memory corruption issues with the current code, where it's
      possible to write an infiniband hardware address into a sockaddr declared
      on the stack.
      
      Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet
      hardware address now:
      
      $ cat /proc/net/bonding/bond0
      Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      
      Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
      Primary Slave: mlx4_ib0 (primary_reselect always)
      Currently Active Slave: mlx4_ib0
      MII Status: up
      MII Polling Interval (ms): 100
      Up Delay (ms): 100
      Down Delay (ms): 100
      
      Slave Interface: mlx4_ib0
      MII Status: up
      Speed: Unknown
      Duplex: Unknown
      Link Failure Count: 0
      Permanent HW addr:
      80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01
      Slave queue ID: 0
      
      Slave Interface: mlx4_ib1
      MII Status: up
      Speed: Unknown
      Duplex: Unknown
      Link Failure Count: 0
      Permanent HW addr:
      80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02
      Slave queue ID: 0
      
      Also tested with a standard 1Gbps NIC bonding setup (with a mix of
      e1000 and e1000e cards), running LNST's bonding tests.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faeeb317
    • Edward Cree's avatar
      sfc: don't insert mc_list on low-latency firmware if it's too long · 148cbab6
      Edward Cree authored
      If the mc_list is longer than 256 addresses, we enter mc_promisc mode.
      If we're in mc_promisc mode and the firmware doesn't support cascaded
       multicast, normally we also insert our mc_list, to prevent stealing by
       another VI.  However, if the mc_list was too long, this isn't really
       helpful - the MC groups that didn't fit in the list can still get
       stolen, and having only some of them stealable will probably cause
       more confusing behaviour than having them all stealable.  Since
       inserting 256 multicast filters takes a long time and can lead to MCDI
       state machine timeouts, just skip the mc_list insert in this overflow
       condition.
      Signed-off-by: default avatarEdward Cree <ecree@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      148cbab6
  2. 05 Apr, 2017 31 commits