1. 14 May, 2019 4 commits
  2. 07 May, 2019 7 commits
    • Leon Romanovsky's avatar
      RDMA/ipoib: Allow user space differentiate between valid dev_port · b79656ed
      Leon Romanovsky authored
      Systemd triggers the following warning during IPoIB device load:
      
       mlx5_core 0000:00:0c.0 ib0: "systemd-udevd" wants to know my dev_id.
              Should it look at dev_port instead?
              See Documentation/ABI/testing/sysfs-class-net for more info.
      
      This is caused due to user space attempt to differentiate old systems
      without dev_port and new systems with dev_port. In case dev_port will be
      zero, the systemd will try to read dev_id instead.
      
      There is no need to print a warning in such case, because it is valid
      situation and it is needed to ensure systemd compatibility with old
      kernels.
      
      Link: https://github.com/systemd/systemd/blob/master/src/udev/udev-builtin-net_id.c#L358
      Cc: <stable@vger.kernel.org> # 4.19
      Fixes: f6350da4 ("IB/ipoib: Log sysfs 'dev_id' accesses from userspace")
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      b79656ed
    • Dennis Dalessandro's avatar
      IB/core, ipoib: Do not overreact to SM LID change event · ba7d8117
      Dennis Dalessandro authored
      When IPoIB receives an SM LID change event, it reacts by flushing its
      path record cache and rejoining multicast groups. This is the same
      behavior it performs when it receives a reregistration event. This
      behavior is unnecessary as an SM may have database backup or
      synchronization mechanisms which permit the SM location or LID to change
      without loss of multicast membership and without impact to path records.
      
      Both opensm and the OPA FM issue reregistration events if a new SM is
      started (or restarted with a new config) or an SM event occurs which
      results in loss of multicast membership records by the SM (such as
      opensm failover) or the SM encounters new nodes with Active ports (such
      as after joining 2 fabrics by connecting switches via ISLs). Hence this
      event can be depended on as the trigger for IPoIB cache and multicast
      flushing.
      
      It appears that some drivers, such as qib, and hfi1 issue the
      IB_EVENT_SM_CHANGE but other drivers such as mlx4 and mlx5 do not.
      Empirical testing on Mellanox EDR using ibv_asyncwatch has confirmed
      that Mellanox EDR HCAs do not generate SM change events and that opensm
      does generate reregistration.
      
      An SM LID change event is generated by the mentioned drivers to reflect
      that sm_lid and/or sm_sl in the local port info has changed. The intent
      of this event is to permit applications and ULPs which have a local copy
      of this information (or an address handle using it) to update their
      information.
      
      The intent is that the reregistration event (caused by the SM via a bit
      in Set(PortInfo)) be used to inform nodes that they need to rejoin
      multicast groups, resubscribe for notices and potentially update path
      records.
      
      When an SM migrates or fails over, a SM LID change event can occur. In
      response IPoIB discards path records and multicast membership and loses
      connectivity until these records are restored via SA requests. In very
      large fabrics, it may take minutes for the SM to be ready and for the SA
      responses to be supplied.  This can result in undesirable and
      unnecessary IPoIB connectivity impacts. It also can result in an
      unnecessary storm of SA queries from all nodes in a cluster potentially
      followed by yet another storm if the SM issues the reregistration
      request.
      
      The fact the Mellanox HCAs do not even generate this event, is further
      evidence that on modern IB fabrics there will be no ill side effects
      from the proposed changes below to reduce the reaction by 3 kernel
      components to this event. So these changes should be benign for Mellanox
      IB fabrics and will benefit OPA fabrics while also making ib_core and
      ULP behavor "correct" as intended by the IBTA spec and kernel RDMA event
      APIs.
      
      Address these issues by removing IB_EVENT_SM_CHANGE handling from ipoib.
      IPoIB does not locally store sm_lid nor sm_sl, so it does not need to do
      anything on SM LID change. IPoIB makes use of other ib_core components
      to issue SA requests for it and those components correctly track SM LID
      and SM LID changes.
      
      Also in ib_core multicast handling,  remove the test for
      IB_EVENT_SM_CHANGE. This code is moving all multicast groups to the
      error state, which will trigger rejoins. This code is used by IPoIB as
      well as the connection manager and other clients of multicast groups.
      This kernel module centralizes group membership status and joins since a
      node can only join a given group once but multiple ULPs or applications
      may want to join the same group. It makes use of the sa_query.c
      component in ib_core, which correctly trackes SM LID and SL. This
      component does not track SM LID nor SL itself and hence need not react
      to their changes.
      
      Similarly in the ib_core cache code remove the handling for the
      IB_EVENT_SM_CHANGE.  In this function. The ib_cache_update function
      which is ultimately called is updating local copies of the pkey table,
      gid table and lmc. It does not update nor retain sm_lid nor sm_sl. As
      such it does not need to be called on an SM LID change. It technically
      also does not need to be called on a reregistration. The LID_CHANGE,
      PKEY_CHANGE, GID_CHANGE and port state change events (PORT_ERR,
      PORT_ACTICE) should be sufficient triggers.
      
      It is worth noting that the alternative of simply having the hfi1 and
      qib drivers not generate the SM LID change event was explored. While
      this would duplicate what Mellanox drivers do now, it is not the correct
      behavior and removes the ability for an SM to migrate without requiring
      reregistration. Since both opensm and OPA SM have mechanisms to backup
      or synchronize registration information, it is desirable to let them
      perform SM migrations (with LID or SL changes) without requiring
      reregistration when they deem it appropriate.
      Suggested-by: default avatarTodd Rimmer <todd.rimmer@intel.com>
      Tested-by: default avatarMichael Brooks <michael.brooks@intel.com>
      Reviewed-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: default avatarTodd Rimmer <todd.rimmer@intel.com>
      Signed-off-by: default avatarDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      ba7d8117
    • Leon Romanovsky's avatar
      RDMA/device: Don't fire uevent before device is fully initialized · e7a5b4aa
      Leon Romanovsky authored
      When the refcount is 0 the device is invisible to netlink. However in the
      patch below the refcount = 1 was moved to after the device_add().  This
      creates a race where userspace can issue a netlink query after the
      device_add() event and not see the device as visible.
      
      Ensure that no uevent is fired before device is fully registered.
      
      Fixes: d79af724 ("RDMA/device: Expose ib_device_try_get(()")
      Signed-off-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      e7a5b4aa
    • Gal Pressman's avatar
      lib/scatterlist: Remove leftover from sg_page_iter comment · d2c4ada1
      Gal Pressman authored
      Commit d901b276 ("lib/scatterlist: Provide a DMA page iterator") added
      the sg DMA iterator but a leftover remained in the sg_page_iter
      documentation as you cannot get the page dma address (only the page
      itself), fix it.
      Signed-off-by: default avatarGal Pressman <galpress@amazon.com>
      Reviewed-by: default avatarMukesh Ojha <mojha@codeaurora.org>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      d2c4ada1
    • Gal Pressman's avatar
      RDMA/efa: Add driver to Kconfig/Makefile · f23afd75
      Gal Pressman authored
      Add EFA Makefile and Kconfig.
      Signed-off-by: default avatarGal Pressman <galpress@amazon.com>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      f23afd75
    • Gal Pressman's avatar
      RDMA/efa: Add the efa module · b7f5e880
      Gal Pressman authored
      Add the main EFA module file which takes care of device
      probe/initialization/registration/etc.
      Signed-off-by: default avatarGal Pressman <galpress@amazon.com>
      Reviewed-by: default avatarShiraz Saleem <shiraz.saleem@intel.com>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      b7f5e880
    • Gal Pressman's avatar
      RDMA/efa: Add EFA verbs implementation · 40909f66
      Gal Pressman authored
      Add a file that implements the EFA verbs.
      Signed-off-by: default avatarGal Pressman <galpress@amazon.com>
      Reviewed-by: default avatarShiraz Saleem <shiraz.saleem@intel.com>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      40909f66
  3. 06 May, 2019 20 commits
  4. 03 May, 2019 9 commits