1. 02 Jan, 2022 6 commits
    • Tony Lu's avatar
      net/smc: Introduce TCP ULP support · d7cd421d
      Tony Lu authored
      This implements TCP ULP for SMC, helps applications to replace TCP with
      SMC protocol in place. And we use it to implement transparent
      replacement.
      
      This replaces original TCP sockets with SMC, reuse TCP as clcsock when
      calling setsockopt with TCP_ULP option, and without any overhead.
      
      To replace TCP sockets with SMC, there are two approaches:
      
      - use setsockopt() syscall with TCP_ULP option, if error, it would
        fallback to TCP.
      
      - use BPF prog with types BPF_CGROUP_INET_SOCK_CREATE or others to
        replace transparently. BPF hooks some points in create socket, bind
        and others, users can inject their BPF logics without modifying their
        applications, and choose which connections should be replaced with SMC
        by calling setsockopt() in BPF prog, based on rules, such as TCP tuples,
        PID, cgroup, etc...
      
        BPF doesn't support calling setsockopt with TCP_ULP now, I will send the
        patches after this accepted.
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7cd421d
    • David S. Miller's avatar
      Merge branch 'smc-RDMA-net-namespace' · ab6dd952
      David S. Miller authored
      Tony Lu says:
      
      ====================
      RDMA device net namespace support for SMC
      
      This patch set introduces net namespace support for linkgroups.
      
      Path 1 is the main approach to implement net ns support.
      
      Path 2 - 4 are the additional modifications to let us know the netns.
      Also, I will submit changes of smc-tools to github later.
      
      Currently, smc doesn't support net namespace isolation. The ibdevs
      registered to smc are shared for all linkgroups and connections. When
      running applications in different net namespaces, such as container
      environment, applications should only use the ibdevs that belongs to the
      same net namespace.
      
      This adds a new field, net, in smc linkgroup struct. During first
      contact, it checks and find the linkgroup has same net namespace, if
      not, it is going to create and initialized the net field with first
      link's ibdev net namespace. When finding the rdma devices, it also checks
      the sk net device's and ibdev's net namespaces. After net namespace
      destroyed, the net device and ibdev move to root net namespace,
      linkgroups won't be matched, and wait for lgr free.
      
      If rdma net namespace exclusive mode is not enabled, it behaves as
      before.
      
      Steps to enable and test net namespaces:
      
      1. enable RDMA device net namespace exclusive support
      	rdma system set netns exclusive # default is shared
      
      2. create new net namespace, move and initialize them
      	ip netns add test1
      	rdma dev set mlx5_1 netns test1
      	ip link set dev eth2 netns test1
      	ip netns exec test1 ip link set eth2 up
      	ip netns exec test1 ip addr add ${HOST_IP}/26 dev eth2
      
      3. setup server and client, connect N <-> M
      	ip netns exec test1 smc_run sockperf server --tcp # server
      	ip netns exec test1 smc_run sockperf pp --tcp -i ${SERVER_IP} # client
      
      4. netns isolated linkgroups (2 * 2 mesh) with their own linkgroups
        - server
      LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
      00000100 SERV     SINGLE      0       0
      00000200 SERV     SINGLE      0       0
      00000300 SERV     SINGLE      0       0
      00000400 SERV     SINGLE      0       0
      
        - client
      LG-ID    LG-Role  LG-Type  VLAN  #Conns  PNET-ID
      00000100 CLNT     SINGLE      0       0
      00000200 CLNT     SINGLE      0       0
      00000300 CLNT     SINGLE      0       0
      00000400 CLNT     SINGLE      0       0
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab6dd952
    • Tony Lu's avatar
      net/smc: Add net namespace for tracepoints · a838f508
      Tony Lu authored
      This prints net namespace ID, helps us to distinguish different net
      namespaces when using tracepoints.
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a838f508
    • Tony Lu's avatar
      net/smc: Print net namespace in log · de2fea7b
      Tony Lu authored
      This adds net namespace ID to the kernel log, net_cookie is unique in
      the whole system. It is useful in container environment.
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de2fea7b
    • Tony Lu's avatar
      net/smc: Add netlink net namespace support · 79d39fc5
      Tony Lu authored
      This adds net namespace ID to diag of linkgroup, helps us to distinguish
      different namespaces, and net_cookie is unique in the whole system.
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      79d39fc5
    • Tony Lu's avatar
      net/smc: Introduce net namespace support for linkgroup · 0237a3a6
      Tony Lu authored
      Currently, rdma device supports exclusive net namespace isolation,
      however linkgroup doesn't know and support ibdev net namespace.
      Applications in the containers don't want to share the nics if we
      enabled rdma exclusive mode. Every net namespaces should have their own
      linkgroups.
      
      This patch introduce a new field net for linkgroup, which is standing
      for the ibdev net namespace in the linkgroup. The net in linkgroup is
      initialized with the net namespace of link's ibdev. It compares the net
      of linkgroup and sock or ibdev before choose it, if no matched, create
      new one in current net namespace. If rdma net namespace exclusive mode
      is not enabled, it behaves as before.
      Signed-off-by: default avatarTony Lu <tonylu@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0237a3a6
  2. 31 Dec, 2021 34 commits