• Talal Ahmad's avatar
    net: avoid double accounting for pure zerocopy skbs · 9b65b17d
    Talal Ahmad authored
    Track skbs containing only zerocopy data and avoid charging them to
    kernel memory to correctly account the memory utilization for
    msg_zerocopy. All of the data in such skbs is held in user pages which
    are already accounted to user. Before this change, they are charged
    again in kernel in __zerocopy_sg_from_iter. The charging in kernel is
    excessive because data is not being copied into skb frags. This
    excessive charging can lead to kernel going into memory pressure
    state which impacts all sockets in the system adversely. Mark pure
    zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
    charge/uncharge for data in such skbs.
    
    Initially, an skb is marked pure zerocopy when it is empty and in
    zerocopy path. skb can then change from a pure zerocopy skb to mixed
    data skb (zerocopy and copy data) if it is at tail of write queue and
    there is room available in it and non-zerocopy data is being sent in
    the next sendmsg call. At this time sk_mem_charge is done for the pure
    zerocopied data and the pure zerocopy flag is unmarked. We found that
    this happens very rarely on workloads that pass MSG_ZEROCOPY.
    
    A pure zerocopy skb can later be coalesced into normal skb if they are
    next to each other in queue but this patch prevents coalescing from
    happening. This avoids complexity of charging when skb downgrades from
    pure zerocopy to mixed. This is also rare.
    
    In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
    for SKB_TRUESIZE(skb_end_offset(skb)) is done for sk_mem_charge in
    tcp_skb_entail for an skb without data.
    
    Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
    with zerocopy showed that before this patch the 'sock' variable in
    memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
    sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
    change it is 0. This is due to no charge to sk_forward_alloc for
    zerocopy data and shows memory utilization for kernel is lowered.
    
    With this commit we don't see the warning we saw in previous commit
    which resulted in commit 84882cf7.
    Signed-off-by: default avatarTalal Ahmad <talalahmad@google.com>
    Acked-by: default avatarArjun Roy <arjunroy@google.com>
    Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
    Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    9b65b17d
tcp_output.c 118 KB