1. 25 May, 2016 5 commits
    • Mark Bloch's avatar
      IB/IPoIB: Allow setting the device address · 492a7e67
      Mark Bloch authored
      In IB networks, and specifically in IPoIB/rdmacm traffic, the device
      address of an IPoIB interface is used as a means to exchange information
      between nodes needed for communication.
      
      Currently an IPoIB interface will always be created with a device
      address based on its node GUID without a way to change that.
      
      This change adds the ability to set the device address of an IPoIB
      interface by value. We use the set mac address ndo to do that.
      
      The flow should be broken down to two:
      1) The GID value is already in the GID table,
         in this case the interface will be able to set carrier up.
      
      2) The GID value is not yet in the GID table,
         in this case the interface won't try to join the multicast group
         and will wait (listen on GID_CHANGE event) until the GID is inserted.
      
      In order to track those changes, we add a new flag:
      * IPOIB_FLAG_DEV_ADDR_SET.
      
      When set, it means the dev_addr is a based on a value in the gid
      table. this bit will be cleared upon a dev_addr change triggered
      by the user and set after validation.
      
      Per IB spec the port GUID can't change if the module is loaded.
      port GUID is the basis for GID at index 0 which is the basis for
      the default device address of a ipoib interface.
      
      The issue is that there are devices that don't follow the spec,
      they change the port GUID while HCA is powered on, so in order
      not to break userspace applications. We need to check if the
      user wanted to control the device address and we assume that
      if he sets the device address back to be based on GID index 0,
      he no longer wishs to control it.
      
      In order to track this, we add an additional flag:
      * IPOIB_FLAG_DEV_ADDR_CTRL
      
      When setting the device address, there is no validation of the upper
      twelve bytes of the device address (flags, qpn, subnet prefix) as those
      bytes are not under the control of the user.
      Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      492a7e67
    • Erez Shitrit's avatar
      IB/ipoib: Support SendOnlyFullMember MCG for SendOnly join · 3b561130
      Erez Shitrit authored
      Check (via an SA query) if the SM supports the new option for SendOnly
      multicast joins.
      If the SM supports that option it will use the new join state to create
      such multicast group.
      If SendOnlyFullMember is supported, we wouldn't use faked FullMember state
      join for SendOnly MCG, use the correct state if supported.
      
      This check is performed at every invocation of mcast_restart task, to be
      sure that the driver stays in sync with the current state of the SM.
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      3b561130
    • Erez Shitrit's avatar
      IB/core: Support new type of join-state for multicast · cd6e9b7e
      Erez Shitrit authored
      There are four types for MCG, FullMember, NonMember, SendOnlyNonMember,
      and the new added type: SendOnlyFullMember.
      Add support for the new SendOnlyFullMember join state.
      
      The new type allows host to send join request as sendonly, it will cause the
      group to be created but without getting packets from this multicast back to the
      host.
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      cd6e9b7e
    • Erez Shitrit's avatar
      IB/SA Agent: Add support for SA agent get ClassPortInfo · 628e6f75
      Erez Shitrit authored
      New SA query function to return the ClassPortInfo struct from the SA.
      If the SM supports FullMemberSendOnly mode for MCG's, it sets a
      capability bit in the capability_mask2 field of the response.
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      628e6f75
    • Erez Shitrit's avatar
      IB/core: Introduce capabilitymask2 field in ClassPortInfo mad · 507f6afa
      Erez Shitrit authored
      Change struct ib_class_port_info to conform to IB Spec 1.3
      That in order to get specific capability mask from ClassPortInfo mad.
      
      >From the IB Spec, ClassPortInfo section:
              "CapabilityMask2 Bits 0-26: Additional class-specific capabilities...
               RespTimeValue the rest 5 bits"
      
      The new struct now has one field for capabilitymask2 (previously was the
      reserved field) and the resp_time field.
      
      And it fixes up qib and srpt, use of the field repurposed to be used as
      capabilitymask2:
      IB/qib: Change pma_get_classportinfo
      IB/srpt: Adjust the use of ib_class_port_info
      Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@mellanox.com>
      Reviewed-by: default avatarHal Rosenstock <hal@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      507f6afa
  2. 18 May, 2016 4 commits
    • Matan Barak's avatar
      IB/mlx5: Fire the CQ completion handler from tasklet · c16d2750
      Matan Barak authored
      Previously, mlx5_ib_cq_comp was executed from interrupt context.
      Under heavy load, this could cause the CPU core to be in an interrupt
      context too long.
      Instead of executing the handler from the interrupt context we
      execute it from a much friendly tasklet context.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      c16d2750
    • Matan Barak's avatar
      net/mlx5_core: Use tasklet for user-space CQ completion events · 94c6825e
      Matan Barak authored
      Previously, we've fired all our completion callbacks straight from
      our ISR.
      
      Some of those callbacks were lightweight (for example, mlx5 Ethernet
      napi callbacks), but some of them did more work (for example,
      the user-space RDMA stack uverbs' completion handler). Besides that,
      doing more than the minimal work in ISR is generally considered wrong,
      it could even lead to a hard lockup of the system. Since when a lot
      of completion events are generated by the hardware, the loop over
      those events could be so long, that we'll get into a hard lockup by
      the system watchdog.
      
      In order to avoid that, add a new way of invoking completion events
      callbacks. In the interrupt itself, we add the CQs which receive
      completion event to a per-EQ list and schedule a tasklet. In the
      tasklet context we loop over all the CQs in the list and invoke the
      user callback.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      94c6825e
    • Christoph Lameter's avatar
      IB/core: Do not require CAP_NET_ADMIN for packet sniffing · e3b6d8cf
      Christoph Lameter authored
      In the Ethernet/TCP world, CAP_NET_RAW is sufficient to allow a program
      to listen to all incoming packets on a specific interface, and the
      higher CAP_NET_ADMIN is required to set the interface into promiscuous
      mode.  We want to emulate that same basic division of privilege in the
      RDMA stack, so when dealing with Raw Ethernet QPs, allow apps with
      CAP_NET_RAW to listen to all incoming flows (and direct them as they see
      fit in their own listen stream).  Do not require CAP_NET_ADMIN just to
      listen to traffic already incoming.  Reserve CAP_NET_ADMIN if we attempt
      to set promiscuous mode.
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      e3b6d8cf
    • shamir rabinovitch's avatar
      IB/mlx4: Fix unaligned access in send_reply_to_slave · 04ef0f1a
      shamir rabinovitch authored
      The problem is that the function 'send_reply_to_slave' gets the
      'req_sa_mad' as a pointer whose address is only aliged to 4 bytes
      but is 8 bytes in size.  This can result in unaligned access faults
      on certain architectures.
      
      Sowmini Varadhan pointed to this reply from Dave Miller that say
      that memcpy should not be used to solve alignment issues:
      https://lkml.org/lkml/2015/10/21/352
      
      Optimization of memcpy to 'ldx' instruction can only happen if the
      compiler knows that the size of the data we are copying is 8 bytes
      and it assumes it is aligned to 8 bytes. If the compiler know the
      type is not aligned to 8 it must not optimize the 8 byte copy.
      Defining the data type as aligned to 4 forces the compiler to treat
      all accesses as though they aren't aligned and avoids the 'ldx'
      optimization.
      
      Full credit for the idea goes to Jason Gunthorpe
      <jgunthorpe@obsidianresearch.com>.
      Signed-off-by: default avatarShamir Rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      04ef0f1a
  3. 13 May, 2016 31 commits