1. 13 Jul, 2019 1 commit
  2. 12 Jul, 2019 3 commits
    • Max Kellermann's avatar
      Revert "NFS: readdirplus optimization by cache mechanism" (memleak) · db531db9
      Max Kellermann authored
      This reverts commit be4c2d47.
      
      That commit caused a severe memory leak in nfs_readdir_make_qstr().
      
      When listing a directory with more than 100 files (this is how many
      struct nfs_cache_array_entry elements fit in one 4kB page), all
      allocated file name strings past those 100 leak.
      
      The root of the leakage is that those string pointers are managed in
      pages which are never linked into the page cache.
      
      fs/nfs/dir.c puts pages into the page cache by calling
      read_cache_page(); the callback function nfs_readdir_filler() will
      then fill the given page struct which was passed to it, which is
      already linked in the page cache (by do_read_cache_page() calling
      add_to_page_cache_lru()).
      
      Commit be4c2d47 added another (local) array of allocated pages, to
      be filled with more data, instead of discarding excess items received
      from the NFS server.  Those additional pages can be used by the next
      nfs_readdir_filler() call (from within the same nfs_readdir() call).
      
      The leak happens when some of those additional pages are never used
      (copied to the page cache using copy_highpage()).  The pages will be
      freed by nfs_readdir_free_pages(), but their contents will not.  The
      commit did not invoke nfs_readdir_clear_array() (and doing so would
      have been dangerous, because it did not track which of those pages
      were already copied to the page cache, risking double free bugs).
      
      How to reproduce the leak:
      
      - Use a kernel with CONFIG_SLUB_DEBUG_ON.
      
      - Create a directory on a NFS mount with more than 100 files with
        names long enough to use the "kmalloc-32" slab (so we can easily
        look up the allocation counts):
      
        for i in `seq 110`; do touch ${i}_0123456789abcdef; done
      
      - Drop all caches:
      
        echo 3 >/proc/sys/vm/drop_caches
      
      - Check the allocation counter:
      
        grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
        30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1
      
      - Request a directory listing and check the allocation counters again:
      
        ls
        [...]
        grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
        30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1
      
      There are now 120 new allocations.
      
      - Drop all caches and check the counters again:
      
        echo 3 >/proc/sys/vm/drop_caches
        grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
        30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1
      
      110 allocations are gone, but 10 have leaked and will never be freed.
      
      Unhelpfully, those allocations are explicitly excluded from KMEMLEAK,
      that's why my initial attempts with KMEMLEAK were not successful:
      
      	/*
      	 * Avoid a kmemleak false positive. The pointer to the name is stored
      	 * in a page cache page which kmemleak does not scan.
      	 */
      	kmemleak_not_leak(string->name);
      
      It would be possible to solve this bug without reverting the whole
      commit:
      
      - keep track of which pages were not used, and call
        nfs_readdir_clear_array() on them, or
      - manually link those pages into the page cache
      
      But for now I have decided to just revert the commit, because the real
      fix would require complex considerations, risking more dangerous
      (crash) bugs, which may seem unsuitable for the stable branches.
      Signed-off-by: default avatarMax Kellermann <mk@cm4all.com>
      Cc: stable@vger.kernel.org # v5.1+
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      db531db9
    • Trond Myklebust's avatar
      SUNRPC: Fix transport accounting when caller specifies an rpc_xprt · a101b043
      Trond Myklebust authored
      Ensure that we do the required accounting for the round robin queue
      when the caller to rpc_init_task() has passed in a transport to be
      used.
      Reported-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Reported-by: default avatarNeil Brown <neilb@suse.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      a101b043
    • Trond Myklebust's avatar
      Merge tag 'nfs-rdma-for-5.3-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · 347543e6
      Trond Myklebust authored
      NFSoRDMA client updates for 5.3
      
      New features:
      - Add a way to place MRs back on the free list
      - Reduce context switching
      - Add new trace events
      
      Bugfixes and cleanups:
      - Fix a BUG when tracing is enabled with NFSv4.1
      - Fix a use-after-free in rpcrdma_post_recvs
      - Replace use of xdr_stream_pos in rpcrdma_marshal_req
      - Fix occasional transport deadlock
      - Fix show_nfs_errors macros, other tracing improvements
      - Remove RPCRDMA_REQ_F_PENDING and fr_state
      - Various simplifications and refactors
      347543e6
  3. 09 Jul, 2019 17 commits
  4. 06 Jul, 2019 19 commits