An error occurred fetching the project authors.
  1. 27 Jan, 2012 1 commit
    • Sean Hefty's avatar
      RDMA/ucma: Discard all events for new connections until accepted · 9ced69ca
      Sean Hefty authored
      After reporting a new connection request to user space, the rdma_ucm
      will discard subsequent events until the user has associated a user
      space idenfier with the kernel cm_id.  This is needed to avoid
      reporting a reject/disconnect event to the user for a request that
      they may not have processed.
      
      The user space identifier is set once the user tries to accept the
      connection request.  However, the following race exists in ucma_accept():
      
      	ctx->uid = cmd.uid;
      	<events may be reported now>
      	ret = rdma_accept(ctx->cm_id, ...);
      
      Once ctx->uid has been set, new events may be reported to the user.
      While the above mentioned race is avoided, there is an issue that the
      user _may_ receive a reject/disconnect event if rdma_accept() fails,
      depending on when the event is processed.  To simplify the use of
      rdma_accept(), discard all events unless rdma_accept() succeeds.
      
      This problem was discovered based on questions from Roland Dreier
      <roland@purestorage.com>.
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      9ced69ca
  2. 31 Oct, 2011 1 commit
  3. 13 Oct, 2011 1 commit
  4. 06 Oct, 2011 1 commit
  5. 25 May, 2011 1 commit
    • Sean Hefty's avatar
      RDMA/cma: Pass QP type into rdma_create_id() · b26f9b99
      Sean Hefty authored
      The RDMA CM currently infers the QP type from the port space selected
      by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
      be a 1-1 correspondence between port space and QP type.  For netlink
      export of RDMA CM state, we want to export the QP type to userspace,
      so it is cleaner to explicitly associate a QP type to an ID.
      
      Modify rdma_create_id() to allow the user to specify the QP type, and
      use it to make our selections of datagram versus connected mode.
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      b26f9b99
  6. 23 May, 2011 1 commit
  7. 10 May, 2011 1 commit
    • Hefty, Sean's avatar
      RDMA/cma: Add an ID_REUSEADDR option · a9bb7912
      Hefty, Sean authored
      Lustre requires that clients bind to a privileged port number before
      connecting to a remote server.  On larger clusters (typically more
      than about 1000 nodes), the number of privileged ports is exhausted,
      resulting in lustre being unusable.
      
      To handle this, we add support for reusable addresses to the rdma_cm.
      This mimics the behavior of the socket option SO_REUSEADDR.  A user
      may set an rdma_cm_id to reuse an address before calling
      rdma_bind_addr() (explicitly or implicitly).  If set, other
      rdma_cm_id's may be bound to the same address, provided that they all
      have reuse enabled, and there are no active listens.
      
      If rdma_listen() is called on an rdma_cm_id that has reuse enabled, it
      will only succeed if there are no other id's bound to that same
      address.  The reuse option is exported to user space.  The behavior of
      the kernel reuse implementation was verified against that given by
      sockets.
      
      This patch is derived from a path by Ira Weiny <weiny2@llnl.gov>
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      a9bb7912
  8. 29 Jan, 2011 1 commit
  9. 25 Oct, 2010 1 commit
    • Eli Cohen's avatar
      IB/core: Add VLAN support for IBoE · af7bd463
      Eli Cohen authored
      Add 802.1q VLAN support to IBoE. The VLAN tag is encoded within the
      GID derived from a link local address in the following way:
      
          GID[11] GID[12] contain the VLAN ID when the GID contains a VLAN.
      
      The 3 bits user priority field of the packets are identical to the 3
      bits of the SL.
      
      In case of rdma_cm apps, the TOS field is used to generate the SL
      field by doing a shift right of 5 bits effectively taking to 3 MS bits
      of the TOS field.
      Signed-off-by: default avatarEli Cohen <eli@mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      af7bd463
  10. 23 Oct, 2010 1 commit
    • Steve Wise's avatar
      RDMA/ucma: Allow tuning the max listen backlog · 97cb7e40
      Steve Wise authored
      For iWARP connections, the connect request is carried in a TCP payload
      on an already established TCP connection.  So if the ucma's backlog is
      full, the connection request is transmitted and acked at the TCP level
      by the time the connect request gets dropped in the ucma.  The end
      result is the connection gets rejected by the iWARP provider.
      Further, a 32 node 256NP OpenMPI job will generate > 128 connect
      requests on some ranks.
      
      This patch increases the default max backlog to 1024, and adds a
      sysctl variable so the backlog can be adjusted at run time.
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      97cb7e40
  11. 13 Oct, 2010 1 commit
  12. 21 Apr, 2010 1 commit
    • Roland Dreier's avatar
      IB: Explicitly rule out llseek to avoid BKL in default_llseek() · bc1db9af
      Roland Dreier authored
      Several RDMA user-access drivers have file_operations structures with
      no .llseek method set.  None of the drivers actually do anything with
      f_pos, so this means llseek is essentially a NOP, instead of returning
      an error as leaving other file_operations methods unimplemented would
      do.  This is mostly harmless, except that a NULL .llseek means that
      default_llseek() is used, and this function grabs the BKL, which we
      would like to avoid.
      
      Since llseek does nothing useful on these files, we would like it to
      return an error to userspace instead of silently grabbing the BKL and
      succeeding.  For nearly all of the file types, we take the
      belt-and-suspenders approach of setting the .llseek method to
      no_llseek and also calling nonseekable_open(); the exception is the
      uverbs_event files, which are created with anon_inode_getfile(), which
      already sets f_mode the same way as nonseekable_open() would.
      
      This work is motivated by Arnd Bergmann's bkl-removal tree.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      bc1db9af
  13. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  14. 19 Nov, 2009 1 commit
    • Sean Hefty's avatar
      RDMA/cm: fix loopback address support · 6f8372b6
      Sean Hefty authored
      The RDMA CM is intended to support the use of a loopback address
      when establishing a connection; however, the behavior of the CM
      when loopback addresses are used is confusing and does not always
      work, depending on whether loopback was specified by the server,
      the client, or both.
      
      The defined behavior of rdma_bind_addr is to associate an RDMA
      device with an rdma_cm_id, as long as the user specified a non-
      zero address.  (ie they weren't just trying to reserve a port)
      Currently, if the loopback address is passed to rdam_bind_addr,
      no device is associated with the rdma_cm_id.  Fix this.
      
      If a loopback address is specified by the client as the destination
      address for a connection, it will fail to establish a connection.
      This is true even if the server is listing across all addresses or
      on the loopback address itself.  The issue is that the server tries
      to translate the IP address carried in the REQ message to a local
      net_device address, which fails.  The translation is not needed in
      this case, since the REQ carries the actual HW address that should
      be used.
      
      Finally, cleanup loopback support to be more transport neutral.
      Replace separate calls to get/set the sgid and dgid from the
      device address to a single call that behaves correctly depending
      on the format of the device address.  And support both IPv4 and
      IPv6 address formats.
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      
      [ Fixed RDS build by s/ib_addr_get/rdma_addr_get/  - Roland ]
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      6f8372b6
  15. 16 Nov, 2009 1 commit
    • Sean Hefty's avatar
      RDMA/ucma: Add option to manually set IB path · a7ca1f00
      Sean Hefty authored
      Export rdma_set_ib_paths to user space to allow applications to
      manually set the IB path used for connections.  This allows
      alternative ways for a user space application or library to obtain
      path record information, including retrieving path information
      from cached data, avoiding direct interaction with the IB SA.
      The IB SA is a single, centralized entity that can limit scaling
      on large clusters running MPI applications.
      
      Future changes to the rdma cm can expand on this framework to
      support the full range of features allowed by the IB CM, such as
      separate forward and reverse paths and APM.
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Reviewed-By: default avatarJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      a7ca1f00
  16. 11 Oct, 2009 1 commit
  17. 10 Oct, 2008 1 commit
  18. 04 Aug, 2008 1 commit
    • Roland Dreier's avatar
      RDMA/cma: Remove padding arrays by using struct sockaddr_storage · 3f446754
      Roland Dreier authored
      There are a few places where the RDMA CM code handles IPv6 by doing
      
      	struct sockaddr		addr;
      	u8			pad[sizeof(struct sockaddr_in6) -
      				    sizeof(struct sockaddr)];
      
      This is fragile and ugly; handle this in a better way with just
      
      	struct sockaddr_storage	addr;
      
      [ Also roll in patch from Aleksey Senin <alekseys@voltaire.com> to
        switch to struct sockaddr_storage and get rid of padding arrays in
        struct rdma_addr. ]
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      3f446754
  19. 25 Jul, 2008 1 commit
  20. 20 Jun, 2008 1 commit
  21. 17 Apr, 2008 1 commit
  22. 25 Jan, 2008 1 commit
    • Sean Hefty's avatar
      RDMA/cma: add support for rdma_migrate_id() · 88314e4d
      Sean Hefty authored
      This is based on user feedback from Doug Ledford at RedHat:
      
      Events that occur on an rdma_cm_id are reported to userspace through an
      event channel.  Connection request events are reported on the event
      channel associated with the listen.  When the connection is accepted, a
      new rdma_cm_id is created and automatically uses the listen event
      channel.  This is suboptimal where the user only wants listen events on
      that channel.
      
      Additionally, it may be desirable to have events related to connection
      establishment use a different event channel than those related to
      already established connections.
      
      Allow the user to migrate an rdma_cm_id between event channels. All
      pending events associated with the rdma_cm_id are moved to the new event
      channel.
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      88314e4d
  23. 10 Oct, 2007 1 commit
  24. 24 Apr, 2007 1 commit
  25. 06 Mar, 2007 1 commit
  26. 16 Feb, 2007 1 commit
    • Sean Hefty's avatar
      RDMA/cma: Add multicast communication support · c8f6a362
      Sean Hefty authored
      Extend rdma_cm to support multicast communication.  Multicast support
      is added to the existing RDMA_PS_UDP port space, as well as a new
      RDMA_PS_IPOIB port space.  The latter port space allows joining the
      multicast groups used by IPoIB, which enables offloading IPoIB traffic
      to a separate QP.  The port space determines the signature used in the
      MGID when joining the group.  The newly added RDMA_PS_IPOIB also
      allows for unicast operations, similar to RDMA_PS_UDP.
      
      Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
      since we can no longer assume that the qkey is constant.  This requires
      saving the Q_Key to use when attaching to a device, so that it is
      available when creating the QP.  The Q_Key information is exported to
      the user through the existing rdma_init_qp_attr() interface.
      
      Multicast support is also exported to userspace through the rdma_ucm.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      c8f6a362
  27. 12 Feb, 2007 1 commit
  28. 08 Jan, 2007 2 commits
    • Sean Hefty's avatar
      RDMA/ucma: Don't report events with invalid user context · 0cefcf0b
      Sean Hefty authored
      There's a problem with how rdma cm events are reported to userspace
      that can lead to application crashes.
      
      When a new connection request arrives, a context for the connection is
      allocated in the kernel.  The connection event is then reported to
      userspace.  The userspace library retrieves the event and allocates
      its own context for the connection.  The userspace context is
      associated with the kernel's context when accepting.  This allows the
      kernel to give userspace context with other events.
      
      A problem occurs if a second event for the same connection occurs
      before the user has had a chance to call accept.  The userspace
      context has not yet been set, which causes the librdmacm to crash.
      (This has been seen when the app takes too long to call accept,
      resulting in the remote side timing out and rejecting the connection)
      
      Fix this by ignoring events for new connections until userspace has
      set their context.  This can only happen if an error occurs on a new
      connection before the user accepts it.  This is okay, since the accept
      will just fail later.
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      0cefcf0b
    • Sean Hefty's avatar
      RDMA/ucma: Fix struct ucma_event leak when backlog is full · 30a5ec98
      Sean Hefty authored
      We discard new connection requests while the listen backlog is full,
      but leak a struct ucma_event in the process.  Free the structure in
      this case.
      Signed-off-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
      30a5ec98
  29. 12 Dec, 2006 1 commit