1. 26 Jan, 2015 22 commits
  2. 25 Jan, 2015 18 commits
    • Harout Hedeshian's avatar
      net: ipv6: Add sysctl entry to disable MTU updates from RA · c2943f14
      Harout Hedeshian authored
      The kernel forcefully applies MTU values received in router
      advertisements provided the new MTU is less than the current. This
      behavior is undesirable when the user space is managing the MTU. Instead
      a sysctl flag 'accept_ra_mtu' is introduced such that the user space
      can control whether or not RA provided MTU updates should be applied. The
      default behavior is unchanged; user space must explicitly set this flag
      to 0 for RA MTUs to be ignored.
      Signed-off-by: default avatarHarout Hedeshian <harouth@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2943f14
    • David S. Miller's avatar
      Merge branch 'fib_trie_next' · 46a93af2
      David S. Miller authored
      Alexander Duyck says:
      
      ====================
      Fixes and improvements for recent fib_trie updates
      
      While performing testing and prepping the next round of patches I found a
      few minor issues and improvements that could be made.
      
      These changes should help to reduce the overall code size and improve the
      performance slighlty as I noticed a 20ns or so improvement in my worst-case
      testing which will likely only result in a 1ns difference with a standard
      sized trie.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      46a93af2
    • Alexander Duyck's avatar
      fib_trie: Various clean-ups for handling slen · 64c62723
      Alexander Duyck authored
      While doing further work on the fib_trie I noted a few items.
      
      First I was using calls that were far more complicated than they needed to
      be for determining when to push/pull the suffix length.  I have updated the
      code to reflect the simplier logic.
      
      The second issue is that I realised we weren't necessarily handling the
      case of a leaf_info struct surviving a flush.  I have updated the logic so
      that now we will call pull_suffix in the event of having a leaf info value
      left in the leaf after flushing it.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      64c62723
    • Alexander Duyck's avatar
      fib_trie: Move fib_find_alias to file where it is used · 02525368
      Alexander Duyck authored
      The function fib_find_alias is only accessed by functions in fib_trie.c as
      such it makes sense to relocate it and cast it as static so that the
      compiler can take advantage of optimizations it can do to it as a local
      function.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02525368
    • Alexander Duyck's avatar
      fib_trie: Use empty_children instead of counting empty nodes in stats collection · 30cfe7c9
      Alexander Duyck authored
      It doesn't make much sense to count the pointers ourselves when
      empty_children already has a count for the number of NULL pointers stored
      in the tnode.  As such save ourselves the cycles and just use
      empty_children.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30cfe7c9
    • Alexander Duyck's avatar
      fib_trie: Add collapse() and should_collapse() to resize · 95f60ea3
      Alexander Duyck authored
      This patch really does two things.
      
      First it pulls the logic for determining if we should collapse one node out
      of the tree and the actual code doing the collapse into a separate pair of
      functions.  This helps to make the changes to these areas more readable.
      
      Second it encodes the upper 32b of the empty_children value onto the
      full_children value in the case of bits == KEYLENGTH.  By doing this we are
      able to handle the case of a 32b node where empty_children would appear to
      be 0 when it was actually 1ul << 32.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95f60ea3
    • Alexander Duyck's avatar
      fib_trie: Fall back to slen update on inflate/halve failure · a80e89d4
      Alexander Duyck authored
      This change corrects an issue where if inflate or halve fails we were
      exiting the resize function without at least updating the slen for the
      node.  To correct this I have moved the update of max_size into the while
      loop so that it is only decremented on a successful call to either inflate
      or halve.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a80e89d4
    • Alexander Duyck's avatar
      fib_trie: Fix RCU bug and merge similar bits of inflate/halve · 69fa57b1
      Alexander Duyck authored
      This patch addresses two issues.
      
      The first issue is the fact that I believe I had the RCU freeing sequence
      slightly out of order.  As a result we could get into an issue if a caller
      went into a child of a child of the new node, then backtraced into the to be
      freed parent, and then attempted to access a child of a child that may have
      been consumed in a resize of one of the new nodes children.  To resolve this I
      have moved the resize after we have freed the oldtnode.  The only side effect
      of this is that we will now be calling resize on more nodes in the case of
      inflate due to the fact that we don't have a good way to test to see if a
      full_tnode on the new node was there before or after the allocation.  This
      should have minimal impact however since the node should already be
      correctly size so it is just the cost of calling should_inflate that we
      will be taking on the node which is only a couple of cycles.
      
      The second issue is the fact that inflate and halve were essentially doing
      the same thing after the new node was added to the trie replacing the old
      one.  As such it wasn't really necessary to keep the code in both functions
      so I have split it out into two other functions, called replace and
      update_children.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      69fa57b1
    • Alexander Duyck's avatar
      fib_trie: Use index & (~0ul << n->bits) instead of index >> n->bits · b3832117
      Alexander Duyck authored
      In doing performance testing and analysis of the changes I recently found
      that by shifting the index I had created an unnecessary dependency.
      
      I have updated the code so that we instead shift a mask by bits and then
      just test against that as that should save us about 2 CPU cycles since we
      can generate the mask while the key and pos are being processed.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3832117
    • David S. Miller's avatar
      Merge branch 'mlx4-next' · bc579ae5
      David S. Miller authored
      Or Gerlitz says:
      
      ====================
      mlx4: Fix and enhance the device reset flow
      
      This series from Yishai Hadas fixes the device reset flow and adds SRIOV support.
      
      Reset flows are required whenever a device experiences errors, is unresponsive,
      or is not in a deterministic state. In such cases, the driver is expected to
      reset the HW and continue operation. When SRIOV is enabled, these requirements
      apply both to PF and VF devices.
      
      Currently, the mlx4 reset flow doesn't work properly: when a fatal error is
      detected on the FW internal buffer the chip is not reset and stays in its
      bad state. There are cases that assumed to be fatal such as non-responsive FW,
      errors via closing commands but are not handled today.
      
      The AER mechanism should also be fixed:
      - It should use mlx4_load_one instead of __mlx4_init_one which is done
        upon HCA probing.
      - It must be aligned with concurrent catas flow, mark device to be in
        an error state, reset chip, etc.
      - Port types should be restored to their original values before error occurred.
      
      In addition, there the SRIOV use-case isn't supported.
      
      In above cases when the device state becomes fatal we must act as follows:
      1) Reset the chip and mark the HW device state as in fatal error.
      2) Wake up any pending commands, preventing new ones to come in.
      3) Restart the software stack.
      
      We also address the SRIOV mode as follows: In case the PF detects a fatal error,
      it lets VFs know about that, then both itself and VFs are restarted asynchronously.
      However, in case only the VF encountered a fatal case or forced to be reset, they
      reset the VF stuff and then restart software.
      
      changes from V0:
      
      No need to call pci_disable_device upon permanent PCI error. This will
      be done as part of mlx4_remove_one which is called later once we
      return PCI_ERS_RESULT_DISCONNECT from the pci error handler.
      
      Initial toggle value should use only the T bit and not the whole byte value.
      Not doing so sometimes broke SRIOV as of junky value seen by the VF as a
      non-ready comm channel
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bc579ae5
    • Yishai Hadas's avatar
      net/mlx4_core: Reset flow activation upon SRIOV fatal command cases · 0cd93027
      Yishai Hadas authored
      When SRIOV commands are executed over the comm-channel and get
      a fatal error (e.g. timeout, closing command failure) the VF enters
      into error state and reset flow is activated.
      
      To be able to recognize whether the failure was on a closing command, the
      operational code for the given VHCR command is used. Once the device entered
      into an error state we prevent redundant error messages from being printed.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0cd93027
    • Yishai Hadas's avatar
      net/mlx4_core: Enable device recovery flow with SRIOV · 55ad3592
      Yishai Hadas authored
      In SRIOV, both the PF and the VF may attempt device recovery whenever they
      assume that the device is not functioning.  When the PF driver resets the
      device, the VF should detect this and attempt to reinitialize itself.
      
      The VF must be able to reset itself under all circumstances, even
      if the PF is not responsive.
      
      The VF shall reset itself in the following cases:
      
      1. Commands are not processed within reasonable time over the communication channel.
      This is done considering device state and the correct return code based on
      the command as was done in the native mode, done in the next patch.
      
      2. The VF driver receives an internal error event reported by the PF on the
      communication channel. This occurs when the PF driver resets the device or
      when VF is out of sync with the PF.
      
      Add 'VF reset' capability, which allows the VF to reinitialize itself even when the
      PF is not responsive.
      
      As PF and VF may run their reset flow simulantanisly, there are several cases
      that are handled:
      - Prevent freeing VF resources upon FLR, when PF is in its unloading stage.
      - Prevent PF getting VF commands before it has finished initializing its resources.
      - Upon VF startup, check that comm-channel is online before sending
        commands to the PF and getting timed-out.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55ad3592
    • Yishai Hadas's avatar
      net/mlx4_core: Handle AER flow properly · 2ba5fbd6
      Yishai Hadas authored
      Fix AER callbacks to work properly, it includes:
      - Refractoring AER to be aligned with Reset flow support.
      - Sync with concurrent catas flow.
      
      In addition, fix the shutdown PCI callback to sync with
      concurrent catas flow.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2ba5fbd6
    • Yishai Hadas's avatar
      net/mlx4_core: Manage interface state for Reset flow cases · c69453e2
      Yishai Hadas authored
      We need to manage interface state to sync between reset flow and some other
      relative cases such as remove_one. This has to be done to prevent certain
      races. For example in case software stack is down as a result of unload call,
      the remove_one should skip the unload phase.
      
      Implement the remove_one case, handling AER and other cases comes next.
      
      The interface can be up/down, upon remove_one, the state will include an extra
      bit indicating that the device is cleaned-up, forcing other tasks to finish
      before the final cleanup.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c69453e2
    • Yishai Hadas's avatar
      net/mlx4_core: Activate reset flow upon fatal command cases · f5aef5aa
      Yishai Hadas authored
      We activate reset flow upon command fatal errors, when the device enters an
      erroneous state, and must be reset.
      
      The cases below are assumed to be fatal: FW command timed-out, an error from FW
      on closing commands, pci is offline when posting/pending a command.
      
      In those cases we place the device into an error state: chip is reset, pending
      commands are awakened and completed immediately. Subsequent commands will
      return immediately.
      
      The return code in the above cases will depend on the command. Commands which
      free and close resources will return success (because the chip was reset, so
      callers may safely free their kernel resources). Other commands will return -EIO.
      
      Since the device's state was marked as error, the catas poller will
      detect this and restart the device's software stack (as is done when a FW
      internal error is directly detected). The device state is protected by a
      persistent mutex lives on its mlx4_dev, as such no need any more for the
      hcr_mutex which is removed.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5aef5aa
    • Yishai Hadas's avatar
      net/mlx4_core: Enhance the catas flow to support device reset · f6bc11e4
      Yishai Hadas authored
      This includes:
      
      - resetting the chip when a fatal error is detected (the current code
        does not do this).
      
      - exposing the ability to enter error state from outside the catas code
        by calling its functionality. (E.g. FW Command timeout, AER error).
      
      - managing a persistent device state. This is needed to sync between
        reset flow cases.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f6bc11e4
    • Yishai Hadas's avatar
      net/mlx4_core: Refactor the catas flow to work per device · ad9a0bf0
      Yishai Hadas authored
      Using a WQ per device instead of a single global WQ, this allows
      independent reset handling per device even when SRIOV is used.
      
      This comes as a pre-patch for supporting chip reset
      for both native and SRIOV.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad9a0bf0
    • Yishai Hadas's avatar
      net/mlx4_core: Set device configuration data to be persistent across reset · dd0eefe3
      Yishai Hadas authored
      When an HCA enters an internal error state, this is detected by the driver.
      The driver then should reset the HCA and restart the software stack.
      
      Keep ports information and some SRIOV configuration in a persistent area
      to have it valid across reset.
      Signed-off-by: default avatarYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd0eefe3