• Neil Horman's avatar
    packet: Enhance AF_PACKET implementation to not require high order contiguous... · 0e3125c7
    Neil Horman authored
    packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
    MIME-Version: 1.0
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: 8bit
    
    Version 4 of this patch.
    
    Change notes:
    1) Removed extra memset.  Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)
    
    Summary:
    It was shown to me recently that systems under high load were driven very deep
    into swap when tcpdump was run.  The reason this happened was because the
    AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
    application to specify how many entries an AF_PACKET socket will have and how
    large each entry will be.  It seems the default setting for tcpdump is to set
    the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
    allocation.  Thats difficult under good circumstances, and horrid under memory
    pressure.
    
    I thought it would be good to make that a bit more usable.  I was going to do a
    simple conversion of the ring buffer from contigous pages to iovecs, but
    unfortunately, the metadata which AF_PACKET places in these buffers can easily
    span a page boundary, and given that these buffers get mapped into user space,
    and the data layout doesn't easily allow for a change to padding between frames
    to avoid that, a simple iovec change is just going to break user space ABI
    consistency.
    
    So I've done this, I've added a three tiered mechanism to the af_packet set_ring
    socket option.  It attempts to allocate memory in the following order:
    
    1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
    digging into swap
    
    2) Using vmalloc
    
    3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
    needed to get the memory
    
    The effect is that we don't disturb the system as much when we're under load,
    while still being able to conduct tcpdumps effectively.
    
    Tested successfully by me.
    Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
    Acked-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
    Acked-by: default avatarMaciej Żenczykowski <zenczykowski@gmail.com>
    Reported-by: default avatarMaciej Żenczykowski <zenczykowski@gmail.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    0e3125c7
af_packet.c 61.4 KB