• Daniel Borkmann's avatar
    mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls · 0708a0af
    Daniel Borkmann authored
    syzkaller was recently triggering an oversized kvmalloc() warning via
    xdp_umem_create().
    
    The triggered warning was added back in 7661809d ("mm: don't allow
    oversized kvmalloc() calls"). The rationale for the warning for huge
    kvmalloc sizes was as a reaction to a security bug where the size was
    more than UINT_MAX but not everything was prepared to handle unsigned
    long sizes.
    
    Anyway, the AF_XDP related call trace from this syzkaller report was:
    
      kvmalloc include/linux/mm.h:806 [inline]
      kvmalloc_array include/linux/mm.h:824 [inline]
      kvcalloc include/linux/mm.h:829 [inline]
      xdp_umem_pin_pages net/xdp/xdp_umem.c:102 [inline]
      xdp_umem_reg net/xdp/xdp_umem.c:219 [inline]
      xdp_umem_create+0x6a5/0xf00 net/xdp/xdp_umem.c:252
      xsk_setsockopt+0x604/0x790 net/xdp/xsk.c:1068
      __sys_setsockopt+0x1fd/0x4e0 net/socket.c:2176
      __do_sys_setsockopt net/socket.c:2187 [inline]
      __se_sys_setsockopt net/socket.c:2184 [inline]
      __x64_sys_setsockopt+0xb5/0x150 net/socket.c:2184
      do_syscall_x64 arch/x86/entry/common.c:50 [inline]
      do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
      entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    Björn mentioned that requests for >2GB allocation can still be valid:
    
      The structure that is being allocated is the page-pinning accounting.
      AF_XDP has an internal limit of U32_MAX pages, which is *a lot*, but
      still fewer than what memcg allows (PAGE_COUNTER_MAX is a LONG_MAX/
      PAGE_SIZE on 64 bit systems). [...]
    
      I could just change from U32_MAX to INT_MAX, but as I stated earlier
      that has a hacky feeling to it. [...] From my perspective, the code
      isn't broken, with the memcg limits in consideration. [...]
    
    Linus says:
    
      [...] Pretty much every time this has come up, the kernel warning has
      shown that yes, the code was broken and there really wasn't a reason
      for doing allocations that big.
    
      Of course, some people would be perfectly fine with the allocation
      failing, they just don't want the warning. I didn't want __GFP_NOWARN
      to shut it up originally because I wanted people to see all those
      cases, but these days I think we can just say "yeah, people can shut
      it up explicitly by saying 'go ahead and fail this allocation, don't
      warn about it'".
    
      So enough time has passed that by now I'd certainly be ok with [it].
    
    Thus allow call-sites to silence such userspace triggered splats if the
    allocation requests have __GFP_NOWARN. For xdp_umem_pin_pages()'s call
    to kvcalloc() this is already the case, so nothing else needed there.
    
    Fixes: 7661809d ("mm: don't allow oversized kvmalloc() calls")
    Reported-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
    Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    Tested-by: syzbot+11421fbbff99b989670e@syzkaller.appspotmail.com
    Cc: Björn Töpel <bjorn@kernel.org>
    Cc: Magnus Karlsson <magnus.karlsson@intel.com>
    Cc: Willy Tarreau <w@1wt.eu>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Andrii Nakryiko <andrii@kernel.org>
    Cc: Jakub Kicinski <kuba@kernel.org>
    Cc: David S. Miller <davem@davemloft.net>
    Link: https://lore.kernel.org/bpf/CAJ+HfNhyfsT5cS_U9EC213ducHs9k9zNxX9+abqC0kTrPbQ0gg@mail.gmail.com
    Link: https://lore.kernel.org/bpf/20211201202905.b9892171e3f5b9a60f9da251@linux-foundation.orgReviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
    Ackd-by: default avatarMichal Hocko <mhocko@suse.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    0708a0af
util.c 27.3 KB