• Lukas Wunner's avatar
    netfilter: Introduce egress hook · 8537f786
    Lukas Wunner authored
    Commit e687ad60 ("netfilter: add netfilter ingress hook after
    handle_ing() under unique static key") introduced the ability to
    classify packets on ingress.
    
    Allow the same on egress.  Position the hook immediately before a packet
    is handed to tc and then sent out on an interface, thereby mirroring the
    ingress order.  This order allows marking packets in the netfilter
    egress hook and subsequently using the mark in tc.  Another benefit of
    this order is consistency with a lot of existing documentation which
    says that egress tc is performed after netfilter hooks.
    
    Egress hooks already exist for the most common protocols, such as
    NF_INET_LOCAL_OUT or NF_ARP_OUT, and those are to be preferred because
    they are executed earlier during packet processing.  However for more
    exotic protocols, there is currently no provision to apply netfilter on
    egress.  A common workaround is to enslave the interface to a bridge and
    use ebtables, or to resort to tc.  But when the ingress hook was
    introduced, consensus was that users should be given the choice to use
    netfilter or tc, whichever tool suits their needs best:
    https://lore.kernel.org/netdev/20150430153317.GA3230@salvia/
    This hook is also useful for NAT46/NAT64, tunneling and filtering of
    locally generated af_packet traffic such as dhclient.
    
    There have also been occasional user requests for a netfilter egress
    hook in the past, e.g.:
    https://www.spinics.net/lists/netfilter/msg50038.html
    
    Performance measurements with pktgen surprisingly show a speedup rather
    than a slowdown with this commit:
    
    * Without this commit:
      Result: OK: 34240933(c34238375+d2558) usec, 100000000 (60byte,0frags)
      2920481pps 1401Mb/sec (1401830880bps) errors: 0
    
    * With this commit:
      Result: OK: 33997299(c33994193+d3106) usec, 100000000 (60byte,0frags)
      2941410pps 1411Mb/sec (1411876800bps) errors: 0
    
    * Without this commit + tc egress:
      Result: OK: 39022386(c39019547+d2839) usec, 100000000 (60byte,0frags)
      2562631pps 1230Mb/sec (1230062880bps) errors: 0
    
    * With this commit + tc egress:
      Result: OK: 37604447(c37601877+d2570) usec, 100000000 (60byte,0frags)
      2659259pps 1276Mb/sec (1276444320bps) errors: 0
    
    * With this commit + nft egress:
      Result: OK: 41436689(c41434088+d2600) usec, 100000000 (60byte,0frags)
      2413320pps 1158Mb/sec (1158393600bps) errors: 0
    
    Tested on a bare-metal Core i7-3615QM, each measurement was performed
    three times to verify that the numbers are stable.
    
    Commands to perform a measurement:
    modprobe pktgen
    echo "add_device lo@3" > /proc/net/pktgen/kpktgend_3
    samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i 'lo@3' -n 100000000
    
    Commands for testing tc egress:
    tc qdisc add dev lo clsact
    tc filter add dev lo egress protocol ip prio 1 u32 match ip dst 4.3.2.1/32
    
    Commands for testing nft egress:
    nft add table netdev t
    nft add chain netdev t co \{ type filter hook egress device lo priority 0 \; \}
    nft add rule netdev t co ip daddr 4.3.2.1/32 drop
    
    All testing was performed on the loopback interface to avoid distorting
    measurements by the packet handling in the low-level Ethernet driver.
    Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
    Cc: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
    8537f786
dev.c 262 KB