1. 11 Nov, 2015 9 commits
    • Randy Dunlap's avatar
      fs: fix inode.c kernel-doc warning · 034ae4ba
      Randy Dunlap authored
      Fix kernel-doc warning in fs/inode.c:
      
      ..//fs/inode.c:1606: warning: No description found for parameter 'inode'
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      034ae4ba
    • Eric Biggers's avatar
      fs/pipe.c: return error code rather than 0 in pipe_write() · 6ae08069
      Eric Biggers authored
      pipe_write() would return 0 if it failed to merge the beginning of the
      data to write with the last, partially filled pipe buffer.  It should
      return an error code instead.  Userspace programs could be confused by
      write() returning 0 when called with a nonzero 'count'.
      
      The EFAULT error case was a regression from f0d1bec9 ("new helper:
      copy_page_from_iter()"), while the ops->confirm() error case was a much
      older bug.
      
      Test program:
      
      	#include <assert.h>
      	#include <errno.h>
      	#include <unistd.h>
      
      	int main(void)
      	{
      		int fd[2];
      		char data[1] = {0};
      
      		assert(0 == pipe(fd));
      		assert(1 == write(fd[1], data, 1));
      
      		/* prior to this patch, write() returned 0 here  */
      		assert(-1 == write(fd[1], NULL, 1));
      		assert(errno == EFAULT);
      	}
      
      Cc: stable@vger.kernel.org # at least v3.15+
      Signed-off-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      6ae08069
    • Eric Biggers's avatar
      fs/pipe.c: preserve alloc_file() error code · e9bb1f9b
      Eric Biggers authored
      If sys_pipe() was unable to allocate a 'struct file', it always failed
      with ENFILE, which means "The number of simultaneously open files in the
      system would exceed a system-imposed limit." However, alloc_file()
      actually returns an ERR_PTR value and might fail with other error codes.
      Currently, in addition to ENFILE, it can fail with ENOMEM, potentially
      when there are few open files in the system.  Update sys_pipe() to
      preserve this error code.
      
      In a prior submission of a similar patch (1) some concern was raised
      about introducing a new error code for sys_pipe().  However, for most
      system calls, programs cannot assume that new error codes will never be
      introduced.  In addition, ENOMEM was, in fact, already a possible error
      code for sys_pipe(), in the case where the file descriptor table could
      not be expanded due to insufficient memory.
      
      	(1) http://comments.gmane.org/gmane.linux.kernel/1357942Signed-off-by: default avatarEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e9bb1f9b
    • Maciej W. Rozycki's avatar
      binfmt_elf: Don't clobber passed executable's file header · b582ef5c
      Maciej W. Rozycki authored
      Do not clobber the buffer space passed from `search_binary_handler' and
      originally preloaded by `prepare_binprm' with the executable's file
      header by overwriting it with its interpreter's file header.  Instead
      keep the buffer space intact and directly use the data structure locally
      allocated for the interpreter's file header, fixing a bug introduced in
      2.1.14 with loadable module support (linux-mips.org commit beb11695
      [Import of Linux/MIPS 2.1.14], predating kernel.org repo's history).
      Adjust the amount of data read from the interpreter's file accordingly.
      
      This was not an issue before loadable module support, because back then
      `load_elf_binary' was executed only once for a given ELF executable,
      whether the function succeeded or failed.
      
      With loadable module support supported and enabled, upon a failure of
      `load_elf_binary' -- which may for example be caused by architecture
      code rejecting an executable due to a missing hardware feature requested
      in the file header -- a module load is attempted and then the function
      reexecuted by `search_binary_handler'.  With the executable's file
      header replaced with its interpreter's file header the executable can
      then be erroneously accepted in this subsequent attempt.
      
      Cc: stable@vger.kernel.org # all the way back
      Signed-off-by: default avatarMaciej W. Rozycki <macro@imgtec.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b582ef5c
    • David Howells's avatar
      FS-Cache: Handle a write to the page immediately beyond the EOF marker · 102f4d90
      David Howells authored
      Handle a write being requested to the page immediately beyond the EOF
      marker on a cache object.  Currently this gets an assertion failure in
      CacheFiles because the EOF marker is used there to encode information about
      a partial page at the EOF - which could lead to an unknown blank spot in
      the file if we extend the file over it.
      
      The problem is actually in fscache where we check the index of the page
      being written against store_limit.  store_limit is set to the number of
      pages that we're allowed to store by fscache_set_store_limit() - which
      means it's one more than the index of the last page we're allowed to store.
      The problem is that we permit writing to a page with an index _equal_ to
      the store limit - when we should reject that case.
      
      Whilst we're at it, change the triggered assertion in CacheFiles to just
      return -ENOBUFS instead.
      
      The assertion failure looks something like this:
      
      CacheFiles: Assertion failed
      1000 < 7b1 is false
      ------------[ cut here ]------------
      kernel BUG at fs/cachefiles/rdwr.c:962!
      ...
      RIP: 0010:[<ffffffffa02c9e83>]  [<ffffffffa02c9e83>] cachefiles_write_page+0x273/0x2d0 [cachefiles]
      
      Cc: stable@vger.kernel.org # v2.6.31+; earlier - that + backport of a17754fb (at least)
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      102f4d90
    • NeilBrown's avatar
      cachefiles: perform test on s_blocksize when opening cache file. · 95201a40
      NeilBrown authored
      cachefiles requires that s_blocksize in the cache is not greater than
      PAGE_SIZE, and performs the check every time a block is accessed.
      
      Move the test to the place where the file is "opened", where other
      file-validity tests are performed.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      95201a40
    • Kinglong Mee's avatar
      FS-Cache: Don't override netfs's primary_index if registering failed · b130ed59
      Kinglong Mee authored
      Only override netfs->primary_index when registering success.
      
      Cc: stable@vger.kernel.org # v2.6.30+
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      b130ed59
    • Kinglong Mee's avatar
      FS-Cache: Increase reference of parent after registering, netfs success · 86108c2e
      Kinglong Mee authored
      If netfs exist, fscache should not increase the reference of parent's
      usage and n_children, otherwise, never be decreased.
      
      v2: thanks David's suggest,
       move increasing reference of parent if success
       use kmem_cache_free() freeing primary_index directly
      
      v3: don't move "netfs->primary_index->parent = &fscache_fsdef_index;"
      
      Cc: stable@vger.kernel.org # v2.6.30+
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      86108c2e
    • Daniel Borkmann's avatar
      debugfs: fix refcount imbalance in start_creating · 0ee9608c
      Daniel Borkmann authored
      In debugfs' start_creating(), we pin the file system to safely access
      its root. When we failed to create a file, we unpin the file system via
      failed_creating() to release the mount count and eventually the reference
      of the vfsmount.
      
      However, when we run into an error during lookup_one_len() when still
      in start_creating(), we only release the parent's mutex but not so the
      reference on the mount. Looks like it was done in the past, but after
      splitting portions of __create_file() into start_creating() and
      end_creating() via 190afd81 ("debugfs: split the beginning and the
      end of __create_file() off"), this seemed missed. Noticed during code
      review.
      
      Fixes: 190afd81 ("debugfs: split the beginning and the end of __create_file() off")
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0ee9608c
  2. 08 Nov, 2015 1 commit
  3. 07 Nov, 2015 30 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · ad804a0b
      Linus Torvalds authored
      Merge second patch-bomb from Andrew Morton:
      
       - most of the rest of MM
      
       - procfs
      
       - lib/ updates
      
       - printk updates
      
       - bitops infrastructure tweaks
      
       - checkpatch updates
      
       - nilfs2 update
      
       - signals
      
       - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
         dma-debug, dma-mapping, ...
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
        ipc,msg: drop dst nil validation in copy_msg
        include/linux/zutil.h: fix usage example of zlib_adler32()
        panic: release stale console lock to always get the logbuf printed out
        dma-debug: check nents in dma_sync_sg*
        dma-mapping: tidy up dma_parms default handling
        pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
        kexec: use file name as the output message prefix
        fs, seqfile: always allow oom killer
        seq_file: reuse string_escape_str()
        fs/seq_file: use seq_* helpers in seq_hex_dump()
        coredump: change zap_threads() and zap_process() to use for_each_thread()
        coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
        signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
        signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
        signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
        signals: kill block_all_signals() and unblock_all_signals()
        nilfs2: fix gcc uninitialized-variable warnings in powerpc build
        nilfs2: fix gcc unused-but-set-variable warnings
        MAINTAINERS: nilfs2: add header file for tracing
        nilfs2: add tracepoints for analyzing reading and writing metadata files
        ...
      ad804a0b
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · ab9f2faf
      Linus Torvalds authored
      Pull rdma updates from Doug Ledford:
       "This is my initial round of 4.4 merge window patches.  There are a few
        other things I wish to get in for 4.4 that aren't in this pull, as
        this represents what has gone through merge/build/run testing and not
        what is the last few items for which testing is not yet complete.
      
         - "Checksum offload support in user space" enablement
         - Misc cxgb4 fixes, add T6 support
         - Misc usnic fixes
         - 32 bit build warning fixes
         - Misc ocrdma fixes
         - Multicast loopback prevention extension
         - Extend the GID cache to store and return attributes of GIDs
         - Misc iSER updates
         - iSER clustering update
         - Network NameSpace support for rdma CM
         - Work Request cleanup series
         - New Memory Registration API"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
        IB/core, cma: Make __attribute_const__ declarations sparse-friendly
        IB/core: Remove old fast registration API
        IB/ipath: Remove fast registration from the code
        IB/hfi1: Remove fast registration from the code
        RDMA/nes: Remove old FRWR API
        IB/qib: Remove old FRWR API
        iw_cxgb4: Remove old FRWR API
        RDMA/cxgb3: Remove old FRWR API
        RDMA/ocrdma: Remove old FRWR API
        IB/mlx4: Remove old FRWR API support
        IB/mlx5: Remove old FRWR API support
        IB/srp: Dont allocate a page vector when using fast_reg
        IB/srp: Remove srp_finish_mapping
        IB/srp: Convert to new registration API
        IB/srp: Split srp_map_sg
        RDS/IW: Convert to new memory registration API
        svcrdma: Port to new memory registration API
        xprtrdma: Port to new memory registration API
        iser-target: Port to new memory registration API
        IB/iser: Port to new fast registration API
        ...
      ab9f2faf
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial · 75021d28
      Linus Torvalds authored
      Pull trivial updates from Jiri Kosina:
       "Trivial stuff from trivial tree that can be trivially summed up as:
      
         - treewide drop of spurious unlikely() before IS_ERR() from Viresh
           Kumar
      
         - cosmetic fixes (that don't really affect basic functionality of the
           driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek
      
         - various comment / printk fixes and updates all over the place"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
        bcache: Really show state of work pending bit
        hwmon: applesmc: fix comment typos
        Kconfig: remove comment about scsi_wait_scan module
        class_find_device: fix reference to argument "match"
        debugfs: document that debugfs_remove*() accepts NULL and error values
        net: Drop unlikely before IS_ERR(_OR_NULL)
        mm: Drop unlikely before IS_ERR(_OR_NULL)
        fs: Drop unlikely before IS_ERR(_OR_NULL)
        drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
        drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
        UBI: Update comments to reflect UBI_METAONLY flag
        pktcdvd: drop null test before destroy functions
      75021d28
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 6f1da317
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
       "Highlights:
      
         - Intel Skylake Win8 precision touchpads support fixes/improvements
           from Mika Westerberg
      
         - Lenovo Yoga 2 quirk from Ritesh Raj Sarraf
      
         - potential uninitialized buffer access fix in HID core from Richard
           Purdie
      
         - Wacom Intuos and Wacom Cintiq 2 support improvements from Jason
           Gerecke and Ping Cheng
      
         - initiation of sysfs deprecation process for most of the roccat
           drivers, from the roccat support maintiner Stefan Achatz
      
         - quite a few device ID / quirk additions and small fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (30 commits)
        HID: logitech: Add support for G29
        HID: logitech: Simplify wheel detection scheme
        HID: wacom: Call 'wacom_query_tablet_data' only after 'hid_hw_start'
        HID: wacom: Fix ABS_MISC reporting for Cintiq Companion 2
        HID: wacom: Remove useless conditions from 'wacom_query_tablet_data'
        HID: wacom: fix Intuos wireless report id issue
        HID: fix some indenting issues
        HID: wacom: Expect 'touch_max' touches if HID_DG_CONTACTCOUNT not present
        HID: wacom: Tie cached HID_DG_CONTACTCOUNT indices to report ID
        HID: roccat: Fixed resubmit: Deprecating most Roccat sysfs attributes
        HID: wacom: Report full pressure range for Intuos, Cintiq 13HD Touch
        HID: wacom: Add support for Cintiq Companion 2
        HID: multitouch: Fetch feature reports on demand for Win8 devices
        HID: sensor-hub: Add quirk for Lenovo Yoga 2 with ITE Chips
        HID: usbhid: Fix for the WiiU adapter from Mayflash
        HID: corsair: boolify struct k90_led.removed
        HID: corsair: Add Corsair Vengeance K90 driver
        HID: hid-input: allow input_configured callback return errors
        HID: multitouch: Add suffix for HID_DG_TOUCHPAD
        HID: i2c-hid: Fill in physical device providing HID functionality
        ...
      6f1da317
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · 99aaa9c6
      Linus Torvalds authored
      Pull livepatching fix from Jiri Kosina:
       "A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
        (as in such case it's possible for module struct to share a page with
        executable text, which is currently not being handled with grace) from
        Josh Poimboeuf"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
      99aaa9c6
    • Davidlohr Bueso's avatar
      ipc,msg: drop dst nil validation in copy_msg · 5f2a2d5d
      Davidlohr Bueso authored
      d0edd852 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
      nil dst parameter check, originally being a full BUG_ON.  However, this
      check seems quite unnecessary when the only purpose is for
      ceckpoint/restore (MSG_COPY flag):
      
      o The copy variable is set initially to nil, apparently as a way of
        ensuring that prepare_copy is previously called.  Which is in fact done,
        unconditionally at the beginning of do_msgrcv.
      
      o There is no concurrency with 'copy' (stack allocated in do_msgrcv).
      
      Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
      always handled by IS_ERR() family.  Therefore remove this check altogether
      as it can never occur with the current users.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f2a2d5d
    • Anish Bhatt's avatar
      include/linux/zutil.h: fix usage example of zlib_adler32() · cb7ae262
      Anish Bhatt authored
      alder32 was renamed to zlib_adler32 since before 2.6.11.
      Signed-off-by: default avatarAnish Bhatt <anish@chelsio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb7ae262
    • Vitaly Kuznetsov's avatar
      panic: release stale console lock to always get the logbuf printed out · 08d78658
      Vitaly Kuznetsov authored
      In some cases we may end up killing the CPU holding the console lock
      while still having valuable data in logbuf. E.g. I'm observing the
      following:
      
      - A crash is happening on one CPU and console_unlock() is being called on
        some other.
      
      - console_unlock() tries to print out the buffer before releasing the lock
        and on slow console it takes time.
      
      - in the meanwhile crashing CPU does lots of printk()-s with valuable data
        (which go to the logbuf) and sends IPIs to all other CPUs.
      
      - console_unlock() finishes printing previous chunk and enables interrupts
        before trying to print out the rest, the CPU catches the IPI and never
        releases console lock.
      
      This is not the only possible case: in VT/fb subsystems we have many other
      console_lock()/console_unlock() users.  Non-masked interrupts (or
      receiving NMI in case of extreme slowness) will have the same result.
      Getting the whole console buffer printed out on crash should be top
      priority.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08d78658
    • Robin Murphy's avatar
      dma-debug: check nents in dma_sync_sg* · 7f830642
      Robin Murphy authored
      Like dma_unmap_sg, dma_sync_sg* should be called with the original number
      of entries passed to dma_map_sg, so do the same check in the sync path as
      we do in the unmap path.
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Sakari Ailus <sakari.ailus@iki.fi>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f830642
    • Robin Murphy's avatar
      dma-mapping: tidy up dma_parms default handling · 002edb6f
      Robin Murphy authored
      Many DMA controllers and other devices set max_segment_size to
      indicate their scatter-gather capability, but have no interest in
      segment_boundary_mask. However, the existence of a dma_parms structure
      precludes the use of any default value, leaving them as zeros (assuming
      a properly kzalloc'ed structure). If a well-behaved IOMMU (or SWIOTLB)
      then tries to respect this by ensuring a mapped segment does not cross
      a zero-byte boundary, hilarity ensues.
      
      Since zero is a nonsensical value for either parameter, treat it as an
      indicator for "default", as might be expected. In the process, clean up
      a bit by replacing the bare constants with slightly more meaningful
      macros and removing the superfluous "else" statements.
      
      [akpm@linux-foundation.org: dma-mapping.h needs sizes.h for SZ_64K]
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Sakari Ailus <sakari.ailus@iki.fi>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      002edb6f
    • Ben Segall's avatar
      pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode · 8639b461
      Ben Segall authored
      setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of
      the current pid namespace.  This is in contrast to both the other modes of
      setpriority and the example of kill(-1).  Fix this.  getpriority and
      ioprio have the same failure mode, fix them too.
      
      Eric said:
      
      : After some more thinking about it this patch sounds justifiable.
      :
      : My goal with namespaces is not to build perfect isolation mechanisms
      : as that can get into ill defined territory, but to build well defined
      : mechanisms.  And to handle the corner cases so you can use only
      : a single namespace with well defined results.
      :
      : In this case you have found the two interfaces I am aware of that
      : identify processes by uid instead of by pid.  Which quite frankly is
      : weird.  Unfortunately the weird unexpected cases are hard to handle
      : in the usual way.
      :
      : I was hoping for a little more information.  Changes like this one we
      : have to be careful of because someone might be depending on the current
      : behavior.  I don't think they are and I do think this make sense as part
      : of the pid namespace.
      Signed-off-by: default avatarBen Segall <bsegall@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ambrose Feinstein <ambrose@google.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8639b461
    • Minfei Huang's avatar
      kexec: use file name as the output message prefix · de90a6bc
      Minfei Huang authored
      kexec output message misses the prefix "kexec", when Dave Young split the
      kexec code.  Now, we use file name as the output message prefix.
      
      Currently, the format of output message:
      [  140.290795] SYSC_kexec_load: hello, world
      [  140.291534] kexec: sanity_check_segment_list: hello, world
      
      Ideally, the format of output message:
      [   30.791503] kexec: SYSC_kexec_load, Hello, world
      [   79.182752] kexec_core: sanity_check_segment_list, Hello, world
      
      Remove the custom prefix "kexec" in output message.
      Signed-off-by: default avatarMinfei Huang <mnfhuang@gmail.com>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de90a6bc
    • Greg Thelen's avatar
      fs, seqfile: always allow oom killer · 0f930902
      Greg Thelen authored
      Since 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill
      processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or
      smaller allocations; but larger allocations can use the oom killer via
      vmalloc().  Thus reads of small files can return ENOMEM, but larger files
      use the oom killer to avoid ENOMEM.
      
      The effect of this bug is that reads from /proc and other virtual
      filesystems can return ENOMEM instead of the preferred behavior - oom
      killing something (possibly the calling process).  I don't know of anyone
      except Google who has noticed the issue.
      
      I suspect the fix is more needed in smaller systems where there isn't any
      reclaimable memory.  But these seem like the kinds of systems which
      probably don't use the oom killer for production situations.
      
      Memory overcommit requires use of the oom killer to select a victim
      regardless of file size.
      
      Enable oom killer for small seq_buf_alloc() allocations.
      
      Fixes: 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill processes")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f930902
    • Andy Shevchenko's avatar
      seq_file: reuse string_escape_str() · 25c6bb76
      Andy Shevchenko authored
      strint_escape_str() escapes input string by given criteria.  In case of
      seq_escape() the criteria is to convert some characters to their octal
      representation.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25c6bb76
    • Andy Shevchenko's avatar
      fs/seq_file: use seq_* helpers in seq_hex_dump() · 8b91a318
      Andy Shevchenko authored
      This improves code readability.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b91a318
    • Oleg Nesterov's avatar
      coredump: change zap_threads() and zap_process() to use for_each_thread() · d61ba589
      Oleg Nesterov authored
      Change zap_threads() paths to use for_each_thread() rather than
      while_each_thread().
      
      While at it, change zap_threads() to avoid the nested if's to make the
      code more readable and lessen the indentation.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kyle Walker <kwalker@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Stanislav Kozina <skozina@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d61ba589
    • Oleg Nesterov's avatar
      coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP · 5fa534c9
      Oleg Nesterov authored
      task_will_free_mem() is wrong in many ways, and in particular the
      SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the
      coredumping without SIGNAL_GROUP_COREDUMP bit set.
      
      change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if
      other CLONE_VM processes can't react to SIGKILL.  Fortunately, at least
      oom-kill case if fine; it kills all tasks sharing the same mm, so it
      should also kill the process which actually dumps the core.
      
      The change in prepare_signal() is not strictly necessary, it just ensures
      that the patch does not bring another subtle behavioural change.  But it
      reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kyle Walker <kwalker@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Stanislav Kozina <skozina@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5fa534c9
    • Oleg Nesterov's avatar
      signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) · 9317bb96
      Oleg Nesterov authored
      jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason,
      SIGCONT will wake a stopped task up even if it is ignored.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9317bb96
    • Oleg Nesterov's avatar
      signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() · 9a13049e
      Oleg Nesterov authored
      jffs2_garbage_collect_thread() can race with SIGCONT and sleep in
      TASK_STOPPED state after it was already sent. Add the new helper,
      kernel_signal_stop(), which does this correctly.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a13049e
    • Oleg Nesterov's avatar
      signal: turn dequeue_signal_lock() into kernel_dequeue_signal() · be0e6f29
      Oleg Nesterov authored
      1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
         matches another "for kthreads only" kernel_sigaction() helper.
      
      2. Remove the "tsk" and "mask" arguments, they are always current
         and current->blocked. And it is simply wrong if tsk != current.
      
      3. We could also remove the 3rd "siginfo_t *info" arg but it looks
         potentially useful. However we can simplify the callers if we
         change kernel_dequeue_signal() to accept info => NULL.
      
      4. Remove _irqsave, it is never called from atomic context.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be0e6f29
    • Oleg Nesterov's avatar
      signals: kill block_all_signals() and unblock_all_signals() · 2e01fabe
      Oleg Nesterov authored
      It is hardly possible to enumerate all problems with block_all_signals()
      and unblock_all_signals().  Just for example,
      
      1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is
         multithreaded. Another thread can dequeue the signal and force the
         group stop.
      
      2. Even is the caller is single-threaded, it will "stop" anyway. It
         will not sleep, but it will spin in kernel space until SIGCONT or
         SIGKILL.
      
      And a lot more. In short, this interface doesn't work at all, at least
      the last 10+ years.
      
      Daniel said:
      
        Yeah the only times I played around with the DRM_LOCK stuff was when
        old drivers accidentally deadlocked - my impression is that the entire
        DRM_LOCK thing was never really tested properly ;-) Hence I'm all for
        purging where this leaks out of the drm subsystem.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: default avatarDave Airlie <airlied@redhat.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e01fabe
    • Ryusuke Konishi's avatar
      nilfs2: fix gcc uninitialized-variable warnings in powerpc build · 4f05028f
      Ryusuke Konishi authored
      Some false positive warnings are reported for powerpc build.
      
      The following warnings are reported in
       http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/
      
         CC      fs/nilfs2/super.o
       fs/nilfs2/super.c: In function 'nilfs_resize_fs':
       fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here
         CC      fs/nilfs2/recovery.o
       fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs':
       fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here
       fs/nilfs2/recovery.c: In function 'nilfs_search_super_root':
       fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
      
      Another similar warning is reported in
       http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/
      
         CC      fs/nilfs2/btree.o
       fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert':
       include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here
      
      This cleans out these warnings by forcing the variables to be initialized.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f05028f
    • Ryusuke Konishi's avatar
      nilfs2: fix gcc unused-but-set-variable warnings · 09ef29e0
      Ryusuke Konishi authored
      Fix the following build warnings:
      
       $ make W=1
       [...]
         CC [M]  fs/nilfs2/btree.o
       fs/nilfs2/btree.c: In function 'nilfs_btree_split':
       fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable]
         __u64 newptr;
               ^
       fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable]
         __u64 newkey;
               ^
         CC [M]  fs/nilfs2/dat.o
       fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end':
       fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable]
         __u64 start;
               ^
         CC [M]  fs/nilfs2/segment.o
       fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush':
       fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
         int err;
             ^
         CC [M]  fs/nilfs2/sufile.o
       fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc':
       fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable]
         unsigned long nsegments, ncleansegs, nsus, cnt;
                                  ^
         CC [M]  fs/nilfs2/alloc.o
       fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry':
       fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable]
         unsigned long n, entries_per_group, groups_per_desc_block;
                                             ^
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09ef29e0
    • Ryusuke Konishi's avatar
      MAINTAINERS: nilfs2: add header file for tracing · c35c7ac5
      Ryusuke Konishi authored
      This adds header file "include/trace/events/nilfs2.h" to maintainer-ship
      of nilfs2 so that updates to the nilfs2 header file go to the mailing list
      of nilfs2.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c35c7ac5
    • Hitoshi Mitake's avatar
      nilfs2: add tracepoints for analyzing reading and writing metadata files · a9cd207c
      Hitoshi Mitake authored
      This patch adds tracepoints for analyzing requests of reading and writing
      metadata files.  The tracepoints cover every in-place mdt files (cpfile,
      sufile, and datfile).
      
      Example of tracing mdt_insert_new_block():
                    cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
                    cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
                    cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9cd207c
    • Hitoshi Mitake's avatar
      nilfs2: add tracepoints for analyzing sufile manipulation · 83eec5e6
      Hitoshi Mitake authored
      This patch adds tracepoints which would be useful for analyzing segment
      usage from a perspective of high level sufile manipulation (check, alloc,
      free).  sufile is an important in-place updated metadata file, so
      analyzing the behavior would be useful for performance turning.
      
      example of usage (a case of allocation):
      
      $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
      Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
              segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
              segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benixon Dhas <benixon.dhas@wdc.com>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83eec5e6
    • Hitoshi Mitake's avatar
      nilfs2: add a tracepoint for transaction events · 44fda114
      Hitoshi Mitake authored
      This patch adds a tracepoint for transaction events of nilfs.  With the
      tracepoint, these events can be tracked: begin, abort, commit, trylock,
      lock, and unlock.  Basically, these events have corresponding functions
      e.g.  begin event corresponds nilfs_transaction_begin().  The unlock event
      is an exception.  It corresponds to the iteration in
      nilfs_transaction_lock().
      
      Only one tracepoint is introcued: nilfs2_transaction_transition.  The
      above events are distinguished with newly introduced enum.  With this
      tracepoint, we can analyse a critical section of segment constructoin.
      
      Sample output by tpoint of perf-tools:
                    cp-4457  [000] ...1    63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
              segctord-4371  [001] ...1    68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
              segctord-4371  [001] ...1    68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
              segctord-4371  [001] ...1    68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
              segctord-4371  [001] ...1    68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
              segctord-4371  [001] ...1    68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
              segctord-4371  [001] ...1   132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
      
      This patch also does trivial cleaning of comma usage in collection stage
      transition event for consistent coding style.
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44fda114
    • Hitoshi Mitake's avatar
      nilfs2: add a tracepoint for tracking stage transition of segment construction · 58497703
      Hitoshi Mitake authored
      This patch adds a tracepoint for tracking stage transition of block
      collection in segment construction.  With the tracepoint, we can analysis
      the behavior of segment construction in depth.  It would be useful for
      bottleneck detection and debugging, etc.
      
      The tracepoint is created with the standard trace API of linux (like ext3,
      ext4, f2fs and btrfs).  So we can analysis with existing tools easily.  Of
      course, more detailed analysis will be possible if we can create nilfs
      specific analysis tools.
      
      Below is an example of event dump with Brendan Gregg's perf-tools
      (https://github.com/brendangregg/perf-tools).  Time consumption between
      each stage can be obtained.
      
      $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
      Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
              segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
              segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
              segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
              segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
              segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
              segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
              segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE
      
      For capturing transition correctly, this patch adds wrappers for the
      member scnt of nilfs_cstage.  With this change, every transition of the
      stage can produce trace event in a correct manner.
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58497703
    • Ryusuke Konishi's avatar
      nilfs2: free unused dat file blocks during garbage collection · d0c14a9e
      Ryusuke Konishi authored
      As a nilfs2 volume ages, the amount of available disk space decreases
      little by little due to bloat of DAT (disk address translation) metadata
      file.  Even if we delete all files in a file system and free their block
      addresses from the DAT file through a garbage collection, empty DAT blocks
      are not freed.
      
      This fixes the issue by extending the deallocator of block addresses so
      that empty data blocks and empty bitmap blocks of DAT are deleted.
      
      The following comparison shows the effect of this patch.  Each shows disk
      amount information of a nilfs2 volume that we cleaned out by deleting all
      files and running gc after having filled 90% of its capacity.
      
      Before:
      Filesystem     1K-blocks     Used Available Use% Mounted on
      /dev/sda1      500105212  3022844 472072192   1% /test
      
      After:
      Filesystem     1K-blocks     Used Available Use% Mounted on
      /dev/sda1      500105212    16380 475078656   1% /test
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0c14a9e
    • Ryusuke Konishi's avatar
      nilfs2: add helper functions to delete blocks from dat file · da019954
      Ryusuke Konishi authored
      This adds delete functions for data blocks of metadata files using bitmap
      based allocator.  nilfs_palloc_delete_entry_block() deletes an entry block
      (e.g.  block storing dat entries), and nilfs_palloc_delete_bitmap_block()
      deletes a bitmap block, respectively.
      
      These helpers are intended to be used in the successive change on
      deallocator of block addresses ("nilfs2: free unused dat file blocks
      during garbage collection").
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da019954