1. 21 Oct, 2010 6 commits
    • Julian Anastasov's avatar
      ipvs: fix CHECKSUM_PARTIAL for TUN method · 4256f1aa
      Julian Anastasov authored
       	The recent change in IP_VS_XMIT_TUNNEL to set
      CHECKSUM_NONE is not correct. After adding IPIP header
      skb->csum becomes invalid but the CHECKSUM_PARTIAL
      case must be supported. So, use skb_forward_csum() which is
      most suitable for us to allow local clients to send IPIP
      to remote real server.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      4256f1aa
    • Julian Anastasov's avatar
      ipvs: stop ICMP from FORWARD to local · 489fdeda
      Julian Anastasov authored
       	Delivering locally ICMP from FORWARD hook is not supported.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      489fdeda
    • Julian Anastasov's avatar
      ipvs: do not schedule conns from real servers · 190ecd27
      Julian Anastasov authored
       	This patch is needed to avoid scheduling of
      packets from local real server when we add ip_vs_in
      in LOCAL_OUT hook to support local client.
      
       	Currently, when ip_vs_in can not find existing
      connection it tries to create new one by calling ip_vs_schedule.
      
       	The default indication from ip_vs_schedule was if
      connection was scheduled to real server. If real server is
      not available we try to use the bypass forwarding method
      or to send ICMP error. But in some cases we do not want to use
      the bypass feature. So, add flag 'ignored' to indicate if
      the scheduler ignores this packet.
      
       	Make sure we do not create new connections from replies.
      We can hit this problem for persistent services and local real
      server when ip_vs_in is added to LOCAL_OUT hook to handle
      local clients.
      
       	Also, make sure ip_vs_schedule ignores SYN packets
      for Active FTP DATA from local real server. The FTP DATA
      connection should be created on SYN+ACK from client to assign
      correct connection daddr.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      190ecd27
    • Julian Anastasov's avatar
      ipvs: switch to notrack mode · cf356d69
      Julian Anastasov authored
       	Change skb->ipvs_property semantic. This is preparation
      to support ip_vs_out processing in LOCAL_OUT. ipvs_property=1
      will be used to avoid expensive lookups for traffic sent by
      transmitters. Now when conntrack support is not used we call
      ip_vs_notrack method to avoid problems in OUTPUT and
      POST_ROUTING hooks instead of exiting POST_ROUTING as before.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      cf356d69
    • Julian Anastasov's avatar
      ipvs: optimize checksums for apps · 8b27b10f
      Julian Anastasov authored
       	Avoid full checksum calculation for apps that can provide
      info whether csum was broken after payload mangling. For now only
      ip_vs_ftp mangles payload and it updates the csum, so the full
      recalculation is avoided for all packets.
      
       	Add CHECKSUM_UNNECESSARY for snat_handler (TCP and UDP).
      It is needed to support SNAT from local address for the case
      when csum is fully recalculated.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      8b27b10f
    • Julian Anastasov's avatar
      ipvs: fix CHECKSUM_PARTIAL for TCP, UDP · 5bc9068e
      Julian Anastasov authored
       	Fix CHECKSUM_PARTIAL handling. Tested for IPv4 TCP,
      UDP not tested because it needs network card with HW CSUM support.
      May be fixes problem where IPVS can not be used in virtual boxes.
      Problem appears with DNAT to local address when the local stack
      sends reply in CHECKSUM_PARTIAL mode.
      
       	Fix tcp_dnat_handler and udp_dnat_handler to provide
      vaddr and daddr in right order (old and new IP) when calling
      tcp_partial_csum_update/udp_partial_csum_update (CHECKSUM_PARTIAL).
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      5bc9068e
  2. 19 Oct, 2010 3 commits
    • Eduardo Blanco's avatar
      Fixed race condition at ip_vs.ko module init. · d86bef73
      Eduardo Blanco authored
      Lists were initialized after the module was registered.  Multiple ipvsadm
      processes at module load triggered a race condition that resulted in a null
      pointer dereference in do_ip_vs_get_ctl(). As a result, __ip_vs_mutex
      was left locked preventing all further ipvsadm commands.
      Signed-off-by: default avatarEduardo J. Blanco <ejblanco@google.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      d86bef73
    • Hans Schillstrom's avatar
      ipvs: IPv6 tunnel mode · 714f095f
      Hans Schillstrom authored
      IPv6 encapsulation uses a bad source address for the tunnel.
      i.e. VIP will be used as local-addr and encap. dst addr.
      Decapsulation will not accept this.
      
      Example
      LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100)
         (eth0 2003::1:0:1/96)
      RS  (ethX 2003::1:0:5/96)
      
      tcpdump
      2003::2:0:100 > 2003::1:0:5: IP6 (hlim 63, next-header TCP (6) payload length: 40)  2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312 (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val 1904932 ecr 0,nop,wscale 3], length 0
      
      In Linux IPv6 impl. you can't have a tunnel with an any cast address
      receiving packets (I have not tried to interpret RFC 2473)
      To have receive capabilities the tunnel must have:
       - Local address set as multicast addr or an unicast addr
       - Remote address set as an unicast addr.
       - Loop back addres or Link local address are not allowed.
      
      This causes us to setup a tunnel in the Real Server with the
      LVS as the remote address, here you can't use the VIP address since it's
      used inside the tunnel.
      
      Solution
      Use outgoing interface IPv6 address (match against the destination).
      i.e. use ip6_route_output() to look up the route cache and
      then use ipv6_dev_get_saddr(...) to set the source address of the
      encapsulated packet.
      
      Additionally, cache the results in new destination
      fields: dst_cookie and dst_saddr and properly check the
      returned dst from ip6_route_output. We now add xfrm_lookup
      call only for the tunneling method where the source address
      is a local one.
      Signed-off-by: default avatarHans Schillstrom <hans.schillstrom@ericsson.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      714f095f
    • Pablo Neira Ayuso's avatar
      netfilter: ctnetlink: add expectation deletion events · ebbf41df
      Pablo Neira Ayuso authored
      This patch allows to listen to events that inform about
      expectations destroyed.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      ebbf41df
  3. 18 Oct, 2010 2 commits
  4. 13 Oct, 2010 6 commits
  5. 04 Oct, 2010 17 commits
  6. 28 Sep, 2010 1 commit
    • Pablo Neira Ayuso's avatar
      netfilter: ctnetlink: add support for user-space expectation helpers · bc01befd
      Pablo Neira Ayuso authored
      This patch adds the basic infrastructure to support user-space
      expectation helpers via ctnetlink and the netfilter queuing
      infrastructure NFQUEUE. Basically, this patch:
      
      * adds NF_CT_EXPECT_USERSPACE flag to identify user-space
        created expectations. I have also added a sanity check in
        __nf_ct_expect_check() to avoid that kernel-space helpers
        may create an expectation if the master conntrack has no
        helper assigned.
      * adds some branches to check if the master conntrack helper
        exists, otherwise we skip the code that refers to kernel-space
        helper such as the local expectation list and the expectation
        policy.
      * allows to set the timeout for user-space expectations with
        no helper assigned.
      * a list of expectations created from user-space that depends
        on ctnetlink (if this module is removed, they are deleted).
      * includes USERSPACE in the /proc output for expectations
        that have been created by a user-space helper.
      
      This patch also modifies ctnetlink to skip including the helper
      name in the Netlink messages if no kernel-space helper is set
      (since no user-space expectation has not kernel-space kernel
      assigned).
      
      You can access an example user-space FTP conntrack helper at:
      http://people.netfilter.org/pablo/userspace-conntrack-helpers/nf-ftp-helper-userspace-POC.tar.bzSigned-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      bc01befd
  7. 22 Sep, 2010 3 commits
  8. 21 Sep, 2010 2 commits
    • Julian Anastasov's avatar
      ipvs: changes related to service usecnt · 26c15cfd
      Julian Anastasov authored
      	Change the usage of svc usecnt during command execution:
      
      - we check if svc is registered but we do not need to hold usecnt
      reference while under __ip_vs_mutex, only the packet handling needs
      it during scheduling
      
      - change __ip_vs_service_get to __ip_vs_service_find and
      __ip_vs_svc_fwm_get to __ip_vs_svc_fwm_find because now caller
      will increase svc->usecnt
      
      - put common code that calls update_service in __ip_vs_update_dest
      
      - put common code in ip_vs_unlink_service() and use it to unregister
      the service
      
      - add comment that svc should not be accessed after ip_vs_del_service
      anymore
      
      - all IP_VS_WAIT_WHILE calls are now unified: usecnt > 0
      
      - Properly log the app ports
      
      	As result, some problems are fixed:
      
      - possible use-after-free of svc in ip_vs_genl_set_cmd after
      ip_vs_del_service because our usecnt reference does not guarantee that
      svc is not freed on refcnt==0, eg. when no dests are moved to trash
      
      - possible usecnt leak in do_ip_vs_set_ctl after ip_vs_del_service
      when the service is not freed now, for example, when some
      destionations are moved into trash and svc->refcnt remains above 0.
      It is harmless because svc is not in hash anymore.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@verge.net.au>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      26c15cfd
    • Changli Gao's avatar
      netfilter: save the hash of the tuple in the original direction for latter use · 99f07e91
      Changli Gao authored
      Since we don't change the tuple in the original direction, we can save it
      in ct->tuplehash[IP_CT_DIR_REPLY].hnode.pprev for __nf_conntrack_confirm()
      use.
      
      __hash_conntrack() is split into two steps: hash_conntrack_raw() is used
      to get the raw hash, and __hash_bucket() is used to get the bucket id.
      
      In SYN-flood case, early_drop() doesn't need to recompute the hash again.
      Signed-off-by: default avatarChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      99f07e91