• Shmulik Ladkani's avatar
    net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow... · b8247f09
    Shmulik Ladkani authored
    net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs
    
    Given:
     - tap0 and vxlan0 are bridged
     - vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400)
    
    Assume GSO skbs arriving from tap0 having a gso_size as determined by
    user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500).
    
    After encapsulation these skbs have skb_gso_network_seglen that exceed
    eth0's ip_skb_dst_mtu.
    
    These skbs are accidentally passed to ip_finish_output2 AS IS.
    Alas, each final segment (segmented either by validate_xmit_skb or by
    hardware UFO) would be larger than eth0 mtu.
    As a result, those above-mtu segments get dropped on certain networks.
    
    This behavior is not aligned with the NON-GSO case:
    Assume a non-gso 1500-sized IP packet arrives from tap0. After
    encapsulation, the vxlan datagram is fragmented normally at the
    ip_finish_output-->ip_fragment code path.
    
    The expected behavior for the GSO case would be segmenting the
    "gso-oversized" skb first, then fragmenting each segment according to
    dst mtu, and finally passing the resulting fragments to ip_finish_output2.
    
    'ip_finish_output_gso' already supports this "Slowpath" behavior,
    according to the IPSKB_FRAG_SEGS flag, which is only set during ipv4
    forwarding (not set in the bridged case).
    
    In order to support the bridged case, we'll mark skbs arriving from an
    ingress interface that get udp-encaspulated as "allowed to be fragmented",
    causing their network_seglen to be validated by 'ip_finish_output_gso'
    (and fragment if needed).
    
    Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the
    gso and non-gso cases), which serves users wishing to forbid
    fragmentation at the udp tunnel endpoint.
    
    Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Cc: Florian Westphal <fw@strlen.de>
    Signed-off-by: default avatarShmulik Ladkani <shmulik.ladkani@gmail.com>
    Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    b8247f09
ip_tunnel_core.c 12.2 KB