1. 21 Oct, 2010 1 commit
    • Julian Anastasov's avatar
      ipvs: fix CHECKSUM_PARTIAL for TCP, UDP · 5bc9068e
      Julian Anastasov authored
       	Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP,
      UDP not tested because it needs network card with HW CSUM support.
      May be fixes problem where IPVS can not be used in virtual boxes.
      Problem appears with DNAT to local address when the local stack
      sends reply in CHECKSUM_PARTIAL mode.
      
       	Fix tcp_dnat_handler and udp_dnat_handler to provide
      vaddr and daddr in right order (old and new IP) when calling
      tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL).
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      5bc9068e
  2. 19 Oct, 2010 3 commits
    • Eduardo Blanco's avatar
      Fixed race condition at ip_vs.ko module init. · d86bef73
      Eduardo Blanco authored
      Lists were initialized after the module was registered.  Multiple ipvsadm
      processes at module load triggered a race condition that resulted in a null
      pointer dereference in do_ip_vs_get_ctl(). As a result, __ip_vs_mutex
      was left locked preventing all further ipvsadm commands.
      Signed-off-by: default avatarEduardo J. Blanco <ejblanco@google.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      d86bef73
    • Hans Schillstrom's avatar
      ipvs: IPv6 tunnel mode · 714f095f
      Hans Schillstrom authored
      IPv6 encapsulation uses a bad source address for the tunnel.
      i.e. VIP will be used as local-addr and encap. dst addr.
      Decapsulation will not accept this.
      
      Example
      LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100)
         (eth0 2003::1:0:1/96)
      RS  (ethX 2003::1:0:5/96)
      
      tcpdump
      2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40)  2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0
      
      In Linux IPv6 impl. you can't have a tunnel with an any cast address
      receiving packets (I have not tried to interpret RFC 2473)
      To have receive capabilities the tunnel must have:
       - Local address set as multicast addr or an unicast addr
       - Remote address set as an unicast addr.
       - Loop back addres or Link local address are not allowed.
      
      This causes us to setup a tunnel in the Real Server with the
      LVS as the remote address, here you can't use the VIP address since it's
      used inside the tunnel.
      
      Solution
      Use outgoing interface IPv6 address (match against the destination).
      i.e. use ip6_route_output() to look up the route cache and
      then use ipv6_dev_get_saddr(...) to set the source address of the
      encapsulated packet.
      
      Additionally, cache the results in new destination
      fields: dst_cookie and dst_saddr and properly check the
      returned dst from ip6_route_output. We now add xfrm_lookup
      call only for the tunneling method where the source address
      is a local one.
      Signed-off-by: default avatarHans Schillstrom <hans.schillstrom@ericsson.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      714f095f
    • Pablo Neira Ayuso's avatar
      netfilter: ctnetlink: add expectation deletion events · ebbf41df
      Pablo Neira Ayuso authored
      This patch allows to listen to events that inform about
      expectations destroyed.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      ebbf41df
  3. 18 Oct, 2010 2 commits
  4. 13 Oct, 2010 6 commits
  5. 04 Oct, 2010 17 commits
  6. 28 Sep, 2010 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: ctnetlink: add support for user-space expectation helpers · bc01befd
      Pablo Neira Ayuso authored
      This patch adds the basic infrastructure to support user-space
      expectation helpers via ctnetlink and the netfilter queuing
      infrastructure NFQUEUE. Basically, this patch:
      
      * adds NF_CT_EXPECT_USERSPACE flag to identify user-space
        created expectations. I have also added a sanity check in
        __nf_ct_expect_check() to avoid that kernel-space helpers
        may create an expectation if the master conntrack has no
        helper assigned.
      * adds some branches to check if the master conntrack helper
        exists, otherwise we skip the code that refers to kernel-space
        helper such as the local expectation list and the expectation
        policy.
      * allows to set the timeout for user-space expectations with
        no helper assigned.
      * a list of expectations created from user-space that depends
        on ctnetlink (if this module is removed, they are deleted).
      * includes USERSPACE in the /proc output for expectations
        that have been created by a user-space helper.
      
      This patch also modifies ctnetlink to skip including the helper
      name in the Netlink messages if no kernel-space helper is set
      (since no user-space expectation has not kernel-space kernel
      assigned).
      
      You can access an example user-space FTP conntrack helper at:
      http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-userspace-POC.tar.bzSigned-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      bc01befd
  7. 22 Sep, 2010 3 commits
  8. 21 Sep, 2010 4 commits
    • Julian Anastasov's avatar
      ipvs: changes related to service usecnt · 26c15cfd
      Julian Anastasov authored
      	Change the usage of svc usecnt during command execution:
      
      - we check if svc is registered but we do not need to hold usecnt
      reference while under __ip_vs_mutex, only the packet handling needs
      it during scheduling
      
      - change __ip_vs_service_get to __ip_vs_service_find and
      __ip_vs_svc_fwm_get to __ip_vs_svc_fwm_find because now caller
      will increase svc->usecnt
      
      - put common code that calls update_service in __ip_vs_update_dest
      
      - put common code in ip_vs_unlink_service() and use it to unregister
      the service
      
      - add comment that svc should not be accessed after ip_vs_del_service
      anymore
      
      - all IP_VS_WAIT_WHILE calls are now unified: usecnt > 0
      
      - Properly log the app ports
      
      	As result, some problems are fixed:
      
      - possible use-after-free of svc in ip_vs_genl_set_cmd after
      ip_vs_del_service because our usecnt reference does not guarantee that
      svc is not freed on refcnt==0, eg. when no dests are moved to trash
      
      - possible usecnt leak in do_ip_vs_set_ctl after ip_vs_del_service
      when the service is not freed now, for example, when some
      destionations are moved into trash and svc->refcnt remains above 0.
      It is harmless because svc is not in hash anymore.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      26c15cfd
    • Changli Gao's avatar
      netfilter: save the hash of the tuple in the original direction for latter use · 99f07e91
      Changli Gao authored
      Since we don't change the tuple in the original direction, we can save it
      in ct->tuplehash[IP_CT_DIR_REPLY].hnode.pprev for __nf_conntrack_confirm()
      use.
      
      __hash_conntrack() is split into two steps: hash_conntrack_raw() is used
      to get the raw hash, and __hash_bucket() is used to get the bucket id.
      
      In SYN-flood case, early_drop() doesn't need to recompute the hash again.
      Signed-off-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      99f07e91
    • Julian Anastasov's avatar
      ipvs: make rerouting optional with snat_reroute · 8a803040
      Julian Anastasov authored
      	Add new sysctl flag "snat_reroute". Recent kernels use
      ip_route_me_harder() to route LVS-NAT responses properly by
      VIP when there are multiple paths to client. But setups
      that do not have alternative default routes can skip this
      routing lookup by using snat_reroute=0.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      8a803040
    • Julian Anastasov's avatar
      ipvs: netfilter connection tracking changes · f4bc17cd
      Julian Anastasov authored
      	Add more code to IPVS to work with Netfilter connection
      tracking and fix some problems.
      
      - Allow IPVS to be compiled without connection tracking as in
      2.6.35 and before. This can avoid keeping conntracks for all
      IPVS connections because this costs memory. ip_vs_ftp still
      depends on connection tracking and NAT as implemented for 2.6.36.
      
      - Add sysctl var "conntrack" to enable connection tracking for
      all IPVS connections. For loaded IPVS directors it needs
      tuning of nf_conntrack_max limit.
      
      - Add IP_VS_CONN_F_NFCT connection flag to request the connection
      to use connection tracking. This allows user space to provide this
      flag, for example, in dest->conn_flags. This can be useful to
      request connection tracking per real server instead of forcing it
      for all connections with the "conntrack" sysctl. This flag is
      set currently only by ip_vs_ftp and of course by "conntrack" sysctl.
      
      - Add ip_vs_nfct.c file to hold all connection tracking code,
      by this way main code should not depend of netfilter conntrack
      support.
      
      - Return back the ip_vs_post_routing handler as in 2.6.35 and use
      skb->ipvs_property=1 to allow IPVS to work without connection
      tracking
      
      Connection tracking:
      
      - most of the code is already in 2.6.36-rc
      
      - alter conntrack reply tuple for LVS-NAT connections when first packet
      from client is forwarded and conntrack state is NEW or RELATED.
      Additionally, alter reply for RELATED connections from real server,
      again for packet in original direction.
      
      - add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering
      reply) for LVS-TUN early because we want to call nf_reset. It is
      needed because we add IPIP header and the original conntrack
      should be preserved, not destroyed. The transmitted IPIP packets
      can reuse same conntrack, so we do not set skb->ipvs_property.
      
      - try to destroy conntrack when the IPVS connection is destroyed.
      It is not fatal if conntrack disappears before that, it depends
      on the used timers.
      
      Fix problems from long time:
      
      - add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      f4bc17cd
  9. 17 Sep, 2010 1 commit
  10. 16 Sep, 2010 2 commits