• Eugene Crosser's avatar
    vrf: Revert "Reset skb conntrack connection..." · 55161e67
    Eugene Crosser authored
    This reverts commit 09e856d5.
    
    When an interface is enslaved in a VRF, prerouting conntrack hook is
    called twice: once in the context of the original input interface, and
    once in the context of the VRF interface. If no special precausions are
    taken, this leads to creation of two conntrack entries instead of one,
    and breaks SNAT.
    
    Commit above was intended to avoid creation of extra conntrack entries
    when input interface is enslaved in a VRF. It did so by resetting
    conntrack related data associated with the skb when it enters VRF context.
    
    However it breaks netfilter operation. Imagine a use case when conntrack
    zone must be assigned based on the original input interface, rather than
    VRF interface (that would make original interfaces indistinguishable). One
    could create netfilter rules similar to these:
    
            chain rawprerouting {
                    type filter hook prerouting priority raw;
                    iif realiface1 ct zone set 1 return
                    iif realiface2 ct zone set 2 return
            }
    
    This works before the mentioned commit, but not after: zone assignment
    is "forgotten", and any subsequent NAT or filtering that is dependent
    on the conntrack zone does not work.
    
    Here is a reproducer script that demonstrates the difference in behaviour.
    
    ==========
    #!/bin/sh
    
    # This script demonstrates unexpected change of nftables behaviour
    # caused by commit 09e856d5 ""vrf: Reset skb conntrack
    # connection on VRF rcv"
    # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
    #
    # Before the commit, it was possible to assign conntrack zone to a
    # packet (or mark it for `notracking`) in the prerouting chanin, raw
    # priority, based on the `iif` (interface from which the packet
    # arrived).
    # After the change, # if the interface is enslaved in a VRF, such
    # assignment is lost. Instead, assignment based on the `iif` matching
    # the VRF master interface is honored. Thus it is impossible to
    # distinguish packets based on the original interface.
    #
    # This script demonstrates this change of behaviour: conntrack zone 1
    # or 2 is assigned depending on the match with the original interface
    # or the vrf master interface. It can be observed that conntrack entry
    # appears in different zone in the kernel versions before and after
    # the commit.
    
    IPIN=172.30.30.1
    IPOUT=172.30.30.2
    PFXL=30
    
    ip li sh vein >/dev/null 2>&1 && ip li del vein
    ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
    nft list table testct >/dev/null 2>&1 && nft delete table testct
    
    ip li add vein type veth peer veout
    ip li add tvrf type vrf table 9876
    ip li set veout master tvrf
    ip li set vein up
    ip li set veout up
    ip li set tvrf up
    /sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
    /sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
    ip addr add $IPIN/$PFXL dev vein
    ip addr add $IPOUT/$PFXL dev veout
    
    nft -f - <<__END__
    table testct {
    	chain rawpre {
    		type filter hook prerouting priority raw;
    		iif { veout, tvrf } meta nftrace set 1
    		iif veout ct zone set 1 return
    		iif tvrf ct zone set 2 return
    		notrack
    	}
    	chain rawout {
    		type filter hook output priority raw;
    		notrack
    	}
    }
    __END__
    
    uname -rv
    conntrack -F
    ping -W 1 -c 1 -I vein $IPOUT
    conntrack -L
    Signed-off-by: default avatarEugene Crosser <crosser@average.org>
    Acked-by: default avatarDavid Ahern <dsahern@kernel.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    55161e67
vrf.c 46.2 KB