1. 08 Nov, 2015 1 commit
  2. 07 Nov, 2015 39 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · ad804a0b
      Linus Torvalds authored
      Merge second patch-bomb from Andrew Morton:
      
       - most of the rest of MM
      
       - procfs
      
       - lib/ updates
      
       - printk updates
      
       - bitops infrastructure tweaks
      
       - checkpatch updates
      
       - nilfs2 update
      
       - signals
      
       - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
         dma-debug, dma-mapping, ...
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
        ipc,msg: drop dst nil validation in copy_msg
        include/linux/zutil.h: fix usage example of zlib_adler32()
        panic: release stale console lock to always get the logbuf printed out
        dma-debug: check nents in dma_sync_sg*
        dma-mapping: tidy up dma_parms default handling
        pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
        kexec: use file name as the output message prefix
        fs, seqfile: always allow oom killer
        seq_file: reuse string_escape_str()
        fs/seq_file: use seq_* helpers in seq_hex_dump()
        coredump: change zap_threads() and zap_process() to use for_each_thread()
        coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
        signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
        signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
        signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
        signals: kill block_all_signals() and unblock_all_signals()
        nilfs2: fix gcc uninitialized-variable warnings in powerpc build
        nilfs2: fix gcc unused-but-set-variable warnings
        MAINTAINERS: nilfs2: add header file for tracing
        nilfs2: add tracepoints for analyzing reading and writing metadata files
        ...
      ad804a0b
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · ab9f2faf
      Linus Torvalds authored
      Pull rdma updates from Doug Ledford:
       "This is my initial round of 4.4 merge window patches.  There are a few
        other things I wish to get in for 4.4 that aren't in this pull, as
        this represents what has gone through merge/build/run testing and not
        what is the last few items for which testing is not yet complete.
      
         - "Checksum offload support in user space" enablement
         - Misc cxgb4 fixes, add T6 support
         - Misc usnic fixes
         - 32 bit build warning fixes
         - Misc ocrdma fixes
         - Multicast loopback prevention extension
         - Extend the GID cache to store and return attributes of GIDs
         - Misc iSER updates
         - iSER clustering update
         - Network NameSpace support for rdma CM
         - Work Request cleanup series
         - New Memory Registration API"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
        IB/core, cma: Make __attribute_const__ declarations sparse-friendly
        IB/core: Remove old fast registration API
        IB/ipath: Remove fast registration from the code
        IB/hfi1: Remove fast registration from the code
        RDMA/nes: Remove old FRWR API
        IB/qib: Remove old FRWR API
        iw_cxgb4: Remove old FRWR API
        RDMA/cxgb3: Remove old FRWR API
        RDMA/ocrdma: Remove old FRWR API
        IB/mlx4: Remove old FRWR API support
        IB/mlx5: Remove old FRWR API support
        IB/srp: Dont allocate a page vector when using fast_reg
        IB/srp: Remove srp_finish_mapping
        IB/srp: Convert to new registration API
        IB/srp: Split srp_map_sg
        RDS/IW: Convert to new memory registration API
        svcrdma: Port to new memory registration API
        xprtrdma: Port to new memory registration API
        iser-target: Port to new memory registration API
        IB/iser: Port to new fast registration API
        ...
      ab9f2faf
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial · 75021d28
      Linus Torvalds authored
      Pull trivial updates from Jiri Kosina:
       "Trivial stuff from trivial tree that can be trivially summed up as:
      
         - treewide drop of spurious unlikely() before IS_ERR() from Viresh
           Kumar
      
         - cosmetic fixes (that don't really affect basic functionality of the
           driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek
      
         - various comment / printk fixes and updates all over the place"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
        bcache: Really show state of work pending bit
        hwmon: applesmc: fix comment typos
        Kconfig: remove comment about scsi_wait_scan module
        class_find_device: fix reference to argument "match"
        debugfs: document that debugfs_remove*() accepts NULL and error values
        net: Drop unlikely before IS_ERR(_OR_NULL)
        mm: Drop unlikely before IS_ERR(_OR_NULL)
        fs: Drop unlikely before IS_ERR(_OR_NULL)
        drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
        drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
        UBI: Update comments to reflect UBI_METAONLY flag
        pktcdvd: drop null test before destroy functions
      75021d28
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 6f1da317
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
       "Highlights:
      
         - Intel Skylake Win8 precision touchpads support fixes/improvements
           from Mika Westerberg
      
         - Lenovo Yoga 2 quirk from Ritesh Raj Sarraf
      
         - potential uninitialized buffer access fix in HID core from Richard
           Purdie
      
         - Wacom Intuos and Wacom Cintiq 2 support improvements from Jason
           Gerecke and Ping Cheng
      
         - initiation of sysfs deprecation process for most of the roccat
           drivers, from the roccat support maintiner Stefan Achatz
      
         - quite a few device ID / quirk additions and small fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (30 commits)
        HID: logitech: Add support for G29
        HID: logitech: Simplify wheel detection scheme
        HID: wacom: Call 'wacom_query_tablet_data' only after 'hid_hw_start'
        HID: wacom: Fix ABS_MISC reporting for Cintiq Companion 2
        HID: wacom: Remove useless conditions from 'wacom_query_tablet_data'
        HID: wacom: fix Intuos wireless report id issue
        HID: fix some indenting issues
        HID: wacom: Expect 'touch_max' touches if HID_DG_CONTACTCOUNT not present
        HID: wacom: Tie cached HID_DG_CONTACTCOUNT indices to report ID
        HID: roccat: Fixed resubmit: Deprecating most Roccat sysfs attributes
        HID: wacom: Report full pressure range for Intuos, Cintiq 13HD Touch
        HID: wacom: Add support for Cintiq Companion 2
        HID: multitouch: Fetch feature reports on demand for Win8 devices
        HID: sensor-hub: Add quirk for Lenovo Yoga 2 with ITE Chips
        HID: usbhid: Fix for the WiiU adapter from Mayflash
        HID: corsair: boolify struct k90_led.removed
        HID: corsair: Add Corsair Vengeance K90 driver
        HID: hid-input: allow input_configured callback return errors
        HID: multitouch: Add suffix for HID_DG_TOUCHPAD
        HID: i2c-hid: Fill in physical device providing HID functionality
        ...
      6f1da317
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching · 99aaa9c6
      Linus Torvalds authored
      Pull livepatching fix from Jiri Kosina:
       "A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
        (as in such case it's possible for module struct to share a page with
        executable text, which is currently not being handled with grace) from
        Josh Poimboeuf"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
        livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
      99aaa9c6
    • Davidlohr Bueso's avatar
      ipc,msg: drop dst nil validation in copy_msg · 5f2a2d5d
      Davidlohr Bueso authored
      d0edd852 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
      nil dst parameter check, originally being a full BUG_ON.  However, this
      check seems quite unnecessary when the only purpose is for
      ceckpoint/restore (MSG_COPY flag):
      
      o The copy variable is set initially to nil, apparently as a way of
        ensuring that prepare_copy is previously called.  Which is in fact done,
        unconditionally at the beginning of do_msgrcv.
      
      o There is no concurrency with 'copy' (stack allocated in do_msgrcv).
      
      Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
      always handled by IS_ERR() family.  Therefore remove this check altogether
      as it can never occur with the current users.
      Signed-off-by: default avatarDavidlohr Bueso <dbueso@suse.de>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f2a2d5d
    • Anish Bhatt's avatar
      include/linux/zutil.h: fix usage example of zlib_adler32() · cb7ae262
      Anish Bhatt authored
      alder32 was renamed to zlib_adler32 since before 2.6.11.
      Signed-off-by: default avatarAnish Bhatt <anish@chelsio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cb7ae262
    • Vitaly Kuznetsov's avatar
      panic: release stale console lock to always get the logbuf printed out · 08d78658
      Vitaly Kuznetsov authored
      In some cases we may end up killing the CPU holding the console lock
      while still having valuable data in logbuf. E.g. I'm observing the
      following:
      
      - A crash is happening on one CPU and console_unlock() is being called on
        some other.
      
      - console_unlock() tries to print out the buffer before releasing the lock
        and on slow console it takes time.
      
      - in the meanwhile crashing CPU does lots of printk()-s with valuable data
        (which go to the logbuf) and sends IPIs to all other CPUs.
      
      - console_unlock() finishes printing previous chunk and enables interrupts
        before trying to print out the rest, the CPU catches the IPI and never
        releases console lock.
      
      This is not the only possible case: in VT/fb subsystems we have many other
      console_lock()/console_unlock() users.  Non-masked interrupts (or
      receiving NMI in case of extreme slowness) will have the same result.
      Getting the whole console buffer printed out on crash should be top
      priority.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: default avatarVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08d78658
    • Robin Murphy's avatar
      dma-debug: check nents in dma_sync_sg* · 7f830642
      Robin Murphy authored
      Like dma_unmap_sg, dma_sync_sg* should be called with the original number
      of entries passed to dma_map_sg, so do the same check in the sync path as
      we do in the unmap path.
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Sakari Ailus <sakari.ailus@iki.fi>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7f830642
    • Robin Murphy's avatar
      dma-mapping: tidy up dma_parms default handling · 002edb6f
      Robin Murphy authored
      Many DMA controllers and other devices set max_segment_size to
      indicate their scatter-gather capability, but have no interest in
      segment_boundary_mask. However, the existence of a dma_parms structure
      precludes the use of any default value, leaving them as zeros (assuming
      a properly kzalloc'ed structure). If a well-behaved IOMMU (or SWIOTLB)
      then tries to respect this by ensuring a mapped segment does not cross
      a zero-byte boundary, hilarity ensues.
      
      Since zero is a nonsensical value for either parameter, treat it as an
      indicator for "default", as might be expected. In the process, clean up
      a bit by replacing the bare constants with slightly more meaningful
      macros and removing the superfluous "else" statements.
      
      [akpm@linux-foundation.org: dma-mapping.h needs sizes.h for SZ_64K]
      Signed-off-by: default avatarRobin Murphy <robin.murphy@arm.com>
      Reviewed-by: default avatarSumit Semwal <sumit.semwal@linaro.org>
      Acked-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Sakari Ailus <sakari.ailus@iki.fi>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      002edb6f
    • Ben Segall's avatar
      pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode · 8639b461
      Ben Segall authored
      setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of
      the current pid namespace.  This is in contrast to both the other modes of
      setpriority and the example of kill(-1).  Fix this.  getpriority and
      ioprio have the same failure mode, fix them too.
      
      Eric said:
      
      : After some more thinking about it this patch sounds justifiable.
      :
      : My goal with namespaces is not to build perfect isolation mechanisms
      : as that can get into ill defined territory, but to build well defined
      : mechanisms.  And to handle the corner cases so you can use only
      : a single namespace with well defined results.
      :
      : In this case you have found the two interfaces I am aware of that
      : identify processes by uid instead of by pid.  Which quite frankly is
      : weird.  Unfortunately the weird unexpected cases are hard to handle
      : in the usual way.
      :
      : I was hoping for a little more information.  Changes like this one we
      : have to be careful of because someone might be depending on the current
      : behavior.  I don't think they are and I do think this make sense as part
      : of the pid namespace.
      Signed-off-by: default avatarBen Segall <bsegall@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ambrose Feinstein <ambrose@google.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8639b461
    • Minfei Huang's avatar
      kexec: use file name as the output message prefix · de90a6bc
      Minfei Huang authored
      kexec output message misses the prefix "kexec", when Dave Young split the
      kexec code.  Now, we use file name as the output message prefix.
      
      Currently, the format of output message:
      [  140.290795] SYSC_kexec_load: hello, world
      [  140.291534] kexec: sanity_check_segment_list: hello, world
      
      Ideally, the format of output message:
      [   30.791503] kexec: SYSC_kexec_load, Hello, world
      [   79.182752] kexec_core: sanity_check_segment_list, Hello, world
      
      Remove the custom prefix "kexec" in output message.
      Signed-off-by: default avatarMinfei Huang <mnfhuang@gmail.com>
      Acked-by: default avatarDave Young <dyoung@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      de90a6bc
    • Greg Thelen's avatar
      fs, seqfile: always allow oom killer · 0f930902
      Greg Thelen authored
      Since 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill
      processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or
      smaller allocations; but larger allocations can use the oom killer via
      vmalloc().  Thus reads of small files can return ENOMEM, but larger files
      use the oom killer to avoid ENOMEM.
      
      The effect of this bug is that reads from /proc and other virtual
      filesystems can return ENOMEM instead of the preferred behavior - oom
      killing something (possibly the calling process).  I don't know of anyone
      except Google who has noticed the issue.
      
      I suspect the fix is more needed in smaller systems where there isn't any
      reclaimable memory.  But these seem like the kinds of systems which
      probably don't use the oom killer for production situations.
      
      Memory overcommit requires use of the oom killer to select a victim
      regardless of file size.
      
      Enable oom killer for small seq_buf_alloc() allocations.
      
      Fixes: 5cec38ac ("fs, seq_file: fallback to vmalloc instead of oom kill processes")
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f930902
    • Andy Shevchenko's avatar
      seq_file: reuse string_escape_str() · 25c6bb76
      Andy Shevchenko authored
      strint_escape_str() escapes input string by given criteria.  In case of
      seq_escape() the criteria is to convert some characters to their octal
      representation.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25c6bb76
    • Andy Shevchenko's avatar
      fs/seq_file: use seq_* helpers in seq_hex_dump() · 8b91a318
      Andy Shevchenko authored
      This improves code readability.
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b91a318
    • Oleg Nesterov's avatar
      coredump: change zap_threads() and zap_process() to use for_each_thread() · d61ba589
      Oleg Nesterov authored
      Change zap_threads() paths to use for_each_thread() rather than
      while_each_thread().
      
      While at it, change zap_threads() to avoid the nested if's to make the
      code more readable and lessen the indentation.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kyle Walker <kwalker@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Stanislav Kozina <skozina@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d61ba589
    • Oleg Nesterov's avatar
      coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP · 5fa534c9
      Oleg Nesterov authored
      task_will_free_mem() is wrong in many ways, and in particular the
      SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the
      coredumping without SIGNAL_GROUP_COREDUMP bit set.
      
      change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if
      other CLONE_VM processes can't react to SIGKILL.  Fortunately, at least
      oom-kill case if fine; it kills all tasks sharing the same mm, so it
      should also kill the process which actually dumps the core.
      
      The change in prepare_signal() is not strictly necessary, it just ensures
      that the patch does not bring another subtle behavioural change.  But it
      reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Kyle Walker <kwalker@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Stanislav Kozina <skozina@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5fa534c9
    • Oleg Nesterov's avatar
      signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) · 9317bb96
      Oleg Nesterov authored
      jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason,
      SIGCONT will wake a stopped task up even if it is ignored.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9317bb96
    • Oleg Nesterov's avatar
      signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() · 9a13049e
      Oleg Nesterov authored
      jffs2_garbage_collect_thread() can race with SIGCONT and sleep in
      TASK_STOPPED state after it was already sent. Add the new helper,
      kernel_signal_stop(), which does this correctly.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a13049e
    • Oleg Nesterov's avatar
      signal: turn dequeue_signal_lock() into kernel_dequeue_signal() · be0e6f29
      Oleg Nesterov authored
      1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
         matches another "for kthreads only" kernel_sigaction() helper.
      
      2. Remove the "tsk" and "mask" arguments, they are always current
         and current->blocked. And it is simply wrong if tsk != current.
      
      3. We could also remove the 3rd "siginfo_t *info" arg but it looks
         potentially useful. However we can simplify the callers if we
         change kernel_dequeue_signal() to accept info => NULL.
      
      4. Remove _irqsave, it is never called from atomic context.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be0e6f29
    • Oleg Nesterov's avatar
      signals: kill block_all_signals() and unblock_all_signals() · 2e01fabe
      Oleg Nesterov authored
      It is hardly possible to enumerate all problems with block_all_signals()
      and unblock_all_signals().  Just for example,
      
      1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is
         multithreaded. Another thread can dequeue the signal and force the
         group stop.
      
      2. Even is the caller is single-threaded, it will "stop" anyway. It
         will not sleep, but it will spin in kernel space until SIGCONT or
         SIGKILL.
      
      And a lot more. In short, this interface doesn't work at all, at least
      the last 10+ years.
      
      Daniel said:
      
        Yeah the only times I played around with the DRM_LOCK stuff was when
        old drivers accidentally deadlocked - my impression is that the entire
        DRM_LOCK thing was never really tested properly ;-) Hence I'm all for
        purging where this leaks out of the drm subsystem.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Acked-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: default avatarDave Airlie <airlied@redhat.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2e01fabe
    • Ryusuke Konishi's avatar
      nilfs2: fix gcc uninitialized-variable warnings in powerpc build · 4f05028f
      Ryusuke Konishi authored
      Some false positive warnings are reported for powerpc build.
      
      The following warnings are reported in
       http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/
      
         CC      fs/nilfs2/super.o
       fs/nilfs2/super.c: In function 'nilfs_resize_fs':
       fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here
         CC      fs/nilfs2/recovery.o
       fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs':
       fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here
       fs/nilfs2/recovery.c: In function 'nilfs_search_super_root':
       fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
      
      Another similar warning is reported in
       http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/
      
         CC      fs/nilfs2/btree.o
       fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert':
       include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized]
       fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here
      
      This cleans out these warnings by forcing the variables to be initialized.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4f05028f
    • Ryusuke Konishi's avatar
      nilfs2: fix gcc unused-but-set-variable warnings · 09ef29e0
      Ryusuke Konishi authored
      Fix the following build warnings:
      
       $ make W=1
       [...]
         CC [M]  fs/nilfs2/btree.o
       fs/nilfs2/btree.c: In function 'nilfs_btree_split':
       fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable]
         __u64 newptr;
               ^
       fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable]
         __u64 newkey;
               ^
         CC [M]  fs/nilfs2/dat.o
       fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end':
       fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable]
         __u64 start;
               ^
         CC [M]  fs/nilfs2/segment.o
       fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush':
       fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
         int err;
             ^
         CC [M]  fs/nilfs2/sufile.o
       fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc':
       fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable]
         unsigned long nsegments, ncleansegs, nsus, cnt;
                                  ^
         CC [M]  fs/nilfs2/alloc.o
       fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry':
       fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable]
         unsigned long n, entries_per_group, groups_per_desc_block;
                                             ^
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09ef29e0
    • Ryusuke Konishi's avatar
      MAINTAINERS: nilfs2: add header file for tracing · c35c7ac5
      Ryusuke Konishi authored
      This adds header file "include/trace/events/nilfs2.h" to maintainer-ship
      of nilfs2 so that updates to the nilfs2 header file go to the mailing list
      of nilfs2.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c35c7ac5
    • Hitoshi Mitake's avatar
      nilfs2: add tracepoints for analyzing reading and writing metadata files · a9cd207c
      Hitoshi Mitake authored
      This patch adds tracepoints for analyzing requests of reading and writing
      metadata files.  The tracepoints cover every in-place mdt files (cpfile,
      sufile, and datfile).
      
      Example of tracing mdt_insert_new_block():
                    cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
                    cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
                    cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9cd207c
    • Hitoshi Mitake's avatar
      nilfs2: add tracepoints for analyzing sufile manipulation · 83eec5e6
      Hitoshi Mitake authored
      This patch adds tracepoints which would be useful for analyzing segment
      usage from a perspective of high level sufile manipulation (check, alloc,
      free).  sufile is an important in-place updated metadata file, so
      analyzing the behavior would be useful for performance turning.
      
      example of usage (a case of allocation):
      
      $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
      Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
              segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
              segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benixon Dhas <benixon.dhas@wdc.com>
      Cc: TK Kato <TK.Kato@wdc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83eec5e6
    • Hitoshi Mitake's avatar
      nilfs2: add a tracepoint for transaction events · 44fda114
      Hitoshi Mitake authored
      This patch adds a tracepoint for transaction events of nilfs.  With the
      tracepoint, these events can be tracked: begin, abort, commit, trylock,
      lock, and unlock.  Basically, these events have corresponding functions
      e.g.  begin event corresponds nilfs_transaction_begin().  The unlock event
      is an exception.  It corresponds to the iteration in
      nilfs_transaction_lock().
      
      Only one tracepoint is introcued: nilfs2_transaction_transition.  The
      above events are distinguished with newly introduced enum.  With this
      tracepoint, we can analyse a critical section of segment constructoin.
      
      Sample output by tpoint of perf-tools:
                    cp-4457  [000] ...1    63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
                    cp-4457  [000] ...1    63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
              segctord-4371  [001] ...1    68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
              segctord-4371  [001] ...1    68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
              segctord-4371  [001] ...1    68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
              segctord-4371  [001] ...1    68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
              segctord-4371  [001] ...1    68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
              segctord-4371  [001] ...1   132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
      
      This patch also does trivial cleaning of comma usage in collection stage
      transition event for consistent coding style.
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      44fda114
    • Hitoshi Mitake's avatar
      nilfs2: add a tracepoint for tracking stage transition of segment construction · 58497703
      Hitoshi Mitake authored
      This patch adds a tracepoint for tracking stage transition of block
      collection in segment construction.  With the tracepoint, we can analysis
      the behavior of segment construction in depth.  It would be useful for
      bottleneck detection and debugging, etc.
      
      The tracepoint is created with the standard trace API of linux (like ext3,
      ext4, f2fs and btrfs).  So we can analysis with existing tools easily.  Of
      course, more detailed analysis will be possible if we can create nilfs
      specific analysis tools.
      
      Below is an example of event dump with Brendan Gregg's perf-tools
      (https://github.com/brendangregg/perf-tools).  Time consumption between
      each stage can be obtained.
      
      $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
      Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
              segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
              segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
              segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
              segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
              segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
              segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
              segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
              segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE
      
      For capturing transition correctly, this patch adds wrappers for the
      member scnt of nilfs_cstage.  With this change, every transition of the
      stage can produce trace event in a correct manner.
      Signed-off-by: default avatarHitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      58497703
    • Ryusuke Konishi's avatar
      nilfs2: free unused dat file blocks during garbage collection · d0c14a9e
      Ryusuke Konishi authored
      As a nilfs2 volume ages, the amount of available disk space decreases
      little by little due to bloat of DAT (disk address translation) metadata
      file.  Even if we delete all files in a file system and free their block
      addresses from the DAT file through a garbage collection, empty DAT blocks
      are not freed.
      
      This fixes the issue by extending the deallocator of block addresses so
      that empty data blocks and empty bitmap blocks of DAT are deleted.
      
      The following comparison shows the effect of this patch.  Each shows disk
      amount information of a nilfs2 volume that we cleaned out by deleting all
      files and running gc after having filled 90% of its capacity.
      
      Before:
      Filesystem     1K-blocks     Used Available Use% Mounted on
      /dev/sda1      500105212  3022844 472072192   1% /test
      
      After:
      Filesystem     1K-blocks     Used Available Use% Mounted on
      /dev/sda1      500105212    16380 475078656   1% /test
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0c14a9e
    • Ryusuke Konishi's avatar
      nilfs2: add helper functions to delete blocks from dat file · da019954
      Ryusuke Konishi authored
      This adds delete functions for data blocks of metadata files using bitmap
      based allocator.  nilfs_palloc_delete_entry_block() deletes an entry block
      (e.g.  block storing dat entries), and nilfs_palloc_delete_bitmap_block()
      deletes a bitmap block, respectively.
      
      These helpers are intended to be used in the successive change on
      deallocator of block addresses ("nilfs2: free unused dat file blocks
      during garbage collection").
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da019954
    • Ryusuke Konishi's avatar
      nilfs2: get rid of nilfs_palloc_group_is_in() · b2258094
      Ryusuke Konishi authored
      This unfolds nilfs_palloc_group_is_in() helper function into
      nilfs_palloc_freev() function to simplify a range check and an index
      calculation repeatedy performed in a loop of the function.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b2258094
    • Ryusuke Konishi's avatar
      nilfs2: refactor nilfs_palloc_find_available_slot() · 18c41b37
      Ryusuke Konishi authored
      The current implementation of nilfs_palloc_find_available_slot() function
      is overkill.  The underlying bit search routine is well optimized, so this
      uses it more simply in nilfs_palloc_find_available_slot().
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      18c41b37
    • Ryusuke Konishi's avatar
      nilfs2: do not call nilfs_mdt_bgl_lock() needlessly · 4e9e63a6
      Ryusuke Konishi authored
      In the bitmap based allocator implementation, nilfs_mdt_bgl_lock() helper
      is frequently used to get a spinlock protecting a target block group.
      This reduces its usage and simplifies arguments of some related functions
      by directly passing a pointer to the spinlock.
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e9e63a6
    • Ryusuke Konishi's avatar
      nilfs2: use nilfs_warning() in allocator implementation · b7bed712
      Ryusuke Konishi authored
      This uses nilfs_warning() to replace "printk(KERN_WARNING ...);" in the
      bitmap based allocator implementation of nilfs2.  The warning messages are
      modified to include the device name and the inode number in each message.
      This makes it clear which metadata file of which device has output
      warnings such as "entry number xxxx already freed".
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b7bed712
    • Julia Lawall's avatar
      nilfs2: drop null test before destroy functions · da80a39f
      Julia Lawall authored
      Remove unneeded NULL test.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@ expression x; @@
      -if (x != NULL)
        \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <Julia.Lawall@lip6.fr>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      da80a39f
    • Joe Perches's avatar
      checkpatch: improve the unnecessary initialisers tests · 6d32f7a3
      Joe Perches authored
      Global and static variables don't need to be initialized to 0.
      
      There is already a test for this but the output message doesn't
      mention booleans initialized to false.
      
      Improve the output message and the test by adding various forms
      with possible specific integer types and possible multiple zeros.
      
      Miscellanea:
      
      o Use a variable to hold the possible 0 test
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarShailendra Verma <shailendra.v@samsung.com>
      Tested-by: default avatarShailendra Verma <shailendra.v@samsung.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d32f7a3
    • Joe Perches's avatar
      checkpatch: improve tests for fixes:, long lines and stack dumps in commit log · 369c8dd3
      Joe Perches authored
      Including BUG and stack dumps in commit logs makes checkpatch produce some
      false positive warning messages.
      
      checkpatch has multiple types of false positives:
      
      o Commit message lines > 75 chars
      o Stack dump address are mistaken for git commit IDs
      o Link: and Fixes: lines are allowed to be > 75 chars.
      o Fixes: style doesn't require ("<commit_description>")
        parentheses and double quotes like other uses of
        git commit ID and description.
      
      Fix these.
      
      Miscellanea:
      
      o Move the test for checking $commit_log_possible_stack_dump
        above the test for a long line commit message
      o Add test for hex address surrounded by square or angle brackets
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Reported-by: default avatarStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      369c8dd3
    • Andy Shevchenko's avatar
      lib/hexdump.c: truncate output in case of overflow · 9f029f54
      Andy Shevchenko authored
      There is a classical off-by-one error in case when we try to place, for
      example, 1+1 bytes as hex in the buffer of size 6.  The expected result is
      to get an output truncated, but in the reality we get 6 bytes filed
      followed by terminating NUL.
      
      Change the logic how we fill the output in case of byte dumping into
      limited space.  This will follow the snprintf() behaviour by truncating
      output even on half bytes.
      
      Fixes: 114fc1af (hexdump: make it return number of bytes placed in buffer)
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reported-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Tested-by: default avatarAaro Koskinen <aaro.koskinen@nokia.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9f029f54
    • Cody P Schafer's avatar
      rbtree: clarify documentation of rbtree_postorder_for_each_entry_safe() · 8de1ee7e
      Cody P Schafer authored
      I noticed that commit a20135ff ("writeback: don't drain
      bdi_writeback_congested on bdi destruction") added a usage of
      rbtree_postorder_for_each_entry_safe() in mm/backing-dev.c which appears
      to try to rb_erase() elements from an rbtree while iterating over it using
      rbtree_postorder_for_each_entry_safe().
      
      Doing this will cause random nodes to be missed by the iteration because
      rb_erase() may rebalance the tree, changing the ordering that we're trying
      to iterate over.
      
      The previous documentation for rbtree_postorder_for_each_entry_safe()
      wasn't clear that this wasn't allowed, it was taken from the docs for
      list_for_each_entry_safe(), where erasing isn't a problem due to
      list_del() not reordering.
      
      Explicitly warn developers about this potential pit-fall.
      
      Note that I haven't fixed the actual issue that (it appears) the commit
      referenced above introduced (not familiar enough with that code).
      
      In general (and in this case), the patterns to follow are:
       - switch to rb_first() + rb_erase(), don't use
         rbtree_postorder_for_each_entry_safe().
       - keep the postorder iteration and don't rb_erase() at all. Instead
         just clear the fields of rb_node & cgwb_congested_tree as required by
         other users of those structures.
      
      [akpm@linux-foundation.org: tweak comments]
      Signed-off-by: default avatarCody P Schafer <dev@codyps.com>
      Cc: John de la Garza <john@jjdev.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8de1ee7e