1. 27 May, 2016 16 commits
    • Minfei Huang's avatar
      MAINTAINERS: add kexec_core.c and kexec_file.c · 10540a69
      Minfei Huang authored
      In the below commits kexec.c was split to kexec.c, kexec_file.c and
      kexec_core.c.
      
      commit a43cac0d ("kexec: split kexec_file syscall code to kexec_file.c")
      commit 2965faa5 ("kexec: split kexec_load syscall from kexec core code")
      
      Both kexec_file.c and kexec_core.c still belong to the kexec component.
      In order to get correct mail lists by using the script get_maintainer.pl,
      add these files to MAINTAINERS.
      
      Link: http://lkml.kernel.org/r/1464189735-59113-1-git-send-email-mnghuan@gmail.comSigned-off-by: default avatarMinfei Huang <mnghuan@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10540a69
    • Vladimir Davydov's avatar
      mm: oom: do not reap task if there are live threads in threadgroup · edd9f723
      Vladimir Davydov authored
      If the current process is exiting, we don't invoke oom killer, instead
      we give it access to memory reserves and try to reap its mm in case
      nobody is going to use it.  There's a mistake in the code performing
      this check - we just ignore any process of the same thread group no
      matter if it is exiting or not - see try_oom_reaper.  Fix it.
      
      Link: http://lkml.kernel.org/r/1464087628-7318-1-git-send-email-vdavydov@virtuozzo.com
      Fixes: 3ef22dff ("oom, oom_reaper: try to reap tasks which skip regular OOM killer path")Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      edd9f723
    • Eryu Guan's avatar
      direct-io: fix direct write stale data exposure from concurrent buffered read · 9ecd10b7
      Eryu Guan authored
      Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are
      not allowed to allocate blocks(get_more_blocks() sets 'create' to 0
      before calling get_block() callback), if it's a sparse file, direct
      writes fall back to buffered writes to avoid stale data exposure from
      concurrent buffered read.  But there're two cases that can result in
      stale data exposure are not correctly detected.
      
      1. The detection for "writing inside i_size" is not sufficient,
         writes can be treated as "extending writes" wrongly.  For example,
         direct write 1FSB (file system block) to a 1FSB sparse file on
         ext2/3/4, starting from offset 0, in this case it's writing inside
         i_size, but 'create' is non-zero, because 'block_in_file' and
         '(i_size_read(inode) >> blkbits' are both zero.
      
      2. Direct writes starting from or beyong i_size (not inside i_size)
         also could trigger block allocation and expose stale data.  For
         example, consider a sparse file with i_size of 2k, and a write to
         offset 2k or 3k into the file, with a filesystem block size of 4k.
         (Thanks to Jeff Moyer for pointing this case out in his review.)
      
      The first problem can be demostrated by running ltp-aiodio test ADSP045
      many times.  When testing on extN filesystems, I see test failures
      occasionally, buffered read could read non-zero (stale) data.
      
      ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1
      
      dio_sparse    0  TINFO  :  Dirtying free blocks
      dio_sparse    0  TINFO  :  Starting I/O tests
      non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa
      non-zero read at offset 0
      dio_sparse    0  TINFO  :  Killing childrens(s)
      dio_sparse    1  TFAIL  :  dio_sparse.c:191: 1 children(s) exited abnormally
      
      The second problem can also be reproduced easily by a hacked dio_sparse
      program, which accepts an option to specify the write offset.
      
      What we should really do is to disable block allocation for writes that
      could result in filling holes inside i_size.
      
      Link: http://lkml.kernel.org/r/1463156728-13357-1-git-send-email-guaneryu@gmail.comReviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarEryu Guan <guaneryu@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ecd10b7
    • Junxiao Bi's avatar
      ocfs2: bump up o2cb network protocol version · 38b52efd
      Junxiao Bi authored
      Two new messages are added to support negotiating hb timeout.  Stop
      nodes frmo talking an old version to mount as they will cause the
      negotiation to fail.
      
      Link: http://lkml.kernel.org/r/1464231615-27939-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      38b52efd
    • Junxiao Bi's avatar
      ocfs2: o2hb: fix hb hung time · 6633ca57
      Junxiao Bi authored
      hr_last_timeout_start should be set as the last time where hb is
      still OK.  When hb write timeout, hung time will be (jiffies -
      hr_last_timeout_start).
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Gang He <ghe@suse.com>
      Cc: rwxybh <rwxybh@126.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6633ca57
    • Junxiao Bi's avatar
      ocfs2: o2hb: don't negotiate if last hb fail · 88dbe98d
      Junxiao Bi authored
      Sometimes io error is returned when storage is down for a while.  Like
      for iscsi device, stroage is made offline when session timeout, and this
      will make all io return -EIO.  For this case, nodes shouldn't do
      negotiate timeout but should fence self.  So let nodes fence self when
      o2hb_do_disk_heartbeat return an error, this is the same behavior with
      o2hb without negotiate timer.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Gang He <ghe@suse.com>
      Cc: rwxybh <rwxybh@126.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      88dbe98d
    • Junxiao Bi's avatar
      ocfs2: o2hb: add some user/debug log · 1bd12902
      Junxiao Bi authored
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Gang He <ghe@suse.com>
      Cc: rwxybh <rwxybh@126.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1bd12902
    • Junxiao Bi's avatar
      ocfs2: o2hb: add NEGOTIATE_APPROVE message · e76f8237
      Junxiao Bi authored
      This message is used to re-queue write timeout timer and negotiate timer
      when all nodes suffer a write hung to storage, this makes node not fence
      self if storage down.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Gang He <ghe@suse.com>
      Cc: rwxybh <rwxybh@126.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e76f8237
    • Junxiao Bi's avatar
      ocfs2: o2hb: add NEGO_TIMEOUT message · 34069b88
      Junxiao Bi authored
      This message is sent to master node when non-master nodes's negotiate
      timer expired.  Master node records these nodes in a bitmap which is
      used to do write timeout timer re-queue decision.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Gang He <ghe@suse.com>
      Cc: rwxybh <rwxybh@126.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34069b88
    • Junxiao Bi's avatar
      ocfs2: o2hb: add negotiate timer · e0cbb798
      Junxiao Bi authored
      This series of patches is to fix the issue that when storage down, all
      nodes will fence self due to write timeout.
      
      With this patch set, all nodes will keep going until storage back
      online, except if the following issue happens, then all nodes will do as
      before to fence self.
      
      1. io error got
      2. network between nodes down
      3. nodes panic
      
      This patch (of 6):
      
      When storage down, all nodes will fence self due to write timeout.  The
      negotiate timer is designed to avoid this, with it node will wait until
      storage up again.
      
      Negotiate timer working in the following way:
      
      1. The timer expires before write timeout timer, its timeout is half
         of write timeout now.  It is re-queued along with write timeout timer.
         If expires, it will send NEGO_TIMEOUT message to master node(node with
         lowest node number).  This message does nothing but marks a bit in a
         bitmap recording which nodes are negotiating timeout on master node.
      
      2. If storage down, nodes will send this message to master node, then
         when master node finds its bitmap including all online nodes, it sends
         NEGO_APPROVL message to all nodes one by one, this message will
         re-queue write timeout timer and negotiate timer.  For any node doesn't
         receive this message or meets some issue when handling this message, it
         will be fenced.  If storage up at any time, o2hb_thread will run and
         re-queue all the timer, nothing will be affected by these two steps.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Gang He <ghe@suse.com>
      Cc: rwxybh <rwxybh@126.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e0cbb798
    • Linus Torvalds's avatar
      Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · dc03c0f9
      Linus Torvalds authored
      Pull misc kbuild updates from Michal Marek:
       "This is the non-critical part of kbuild:
      
         - Coccinelle fixes, one semantic patch less in this round [Vaishali
           Thakkar, Wolfram Sang, Kees Cook]
      
         - rpm-pkg support for (open)SUSE's update-bootloader [Jiří Kosian]
      
         - rpm-pkg restored support for $RPMOPTS [Srinivas Pandruvada]
      
         - deb-pkg fixes for the linux-headers package [Bjørn Mork, Azriel
           Samson]"
      
      * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        coccicheck: Fix missing 0 index in kill loop
        scripts/package/Makefile: rpmbuild add support of RPMOPTS
        builddeb: fix missing headers in linux-headers package
        builddeb: include objtool binary in headers package
        kbuild/mkspec: support 'update-bootloader'-based systems
        scripts: coccinelle: remove check to move constants to right
        Coccinelle: setup_timer: Add space in front of parentheses
      dc03c0f9
    • Linus Torvalds's avatar
      Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · f429d355
      Linus Torvalds authored
      Pull kconfig update from Michal Marek:
      
       - fix for behavior of tristate choice items and fix for documentation
         of existing kconfig behavior [Dirk Gouders]
      
       - more helpful "unexpected data" kconfig warning [Paul Bolle]
      
      * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kconfig/symbol.c: handle choice_values that depend on 'm' symbols
        kconfig-language: elaborate on the type of a choice
        kconfig-language: fix comment on dependency-generated menu structures.
        kconfig: add unexpected data itself to warning
      f429d355
    • Linus Torvalds's avatar
      Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · 5b26fc88
      Linus Torvalds authored
      Pull kbuild updates from Michal Marek:
      
       - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and
         unexports symbols which are not used in the current config [Nicolas
         Pitre]
      
       - several kbuild rule cleanups [Masahiro Yamada]
      
       - warning option adjustments for gcov etc [Arnd Bergmann]
      
       - a few more small fixes
      
      * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits)
        kbuild: move -Wunused-const-variable to W=1 warning level
        kbuild: fix if_change and friends to consider argument order
        kbuild: fix adjust_autoksyms.sh for modules that need only one symbol
        kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line
        gcov: disable -Wmaybe-uninitialized warning
        gcov: disable tree-loop-im to reduce stack usage
        gcov: disable for COMPILE_TEST
        Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
        Kbuild: change CC_OPTIMIZE_FOR_SIZE definition
        kbuild: forbid kernel directory to contain spaces and colons
        kbuild: adjust ksym_dep_filter for some cmd_* renames
        kbuild: Fix dependencies for final vmlinux link
        kbuild: better abstract vmlinux sequential prerequisites
        kbuild: fix call to adjust_autoksyms.sh when output directory specified
        kbuild: Get rid of KBUILD_STR
        kbuild: rename cmd_as_s_S to cmd_cpp_s_S
        kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
        kbuild: drop redundant "PHONY += FORCE"
        kbuild: delete unnecessary "@:"
        kbuild: mark help target as PHONY
        ...
      5b26fc88
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · e12fab28
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "10 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        drivers/pinctrl/intel/pinctrl-baytrail.c: fix build with gcc-4.4
        update "mm/zsmalloc: don't fail if can't create debugfs info"
        dma-debug: avoid spinlock recursion when disabling dma-debug
        mm: oom_reaper: remove some bloat
        memcg: fix mem_cgroup_out_of_memory() return value.
        ocfs2: fix improper handling of return errno
        mm: slub: remove unused virt_to_obj()
        mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta
        mm: make CONFIG_DEFERRED_STRUCT_PAGE_INIT depends on !FLATMEM explicitly
        seqlock: fix raw_read_seqcount_latch()
      e12fab28
    • Linus Torvalds's avatar
      Merge tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 478a1469
      Linus Torvalds authored
      Pull DAX locking updates from Ross Zwisler:
       "Filesystem DAX locking for 4.7
      
         - We use a bit in an exceptional radix tree entry as a lock bit and
           use it similarly to how page lock is used for normal faults.  This
           fixes races between hole instantiation and read faults of the same
           index.
      
         - Filesystem DAX PMD faults are disabled, and will be re-enabled when
           PMD locking is implemented"
      
      * tag 'dax-locking-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        dax: Remove i_mmap_lock protection
        dax: Use radix tree entry lock to protect cow faults
        dax: New fault locking
        dax: Allow DAX code to replace exceptional entries
        dax: Define DAX lock bit for radix tree exceptional entry
        dax: Make huge page handling depend of CONFIG_BROKEN
        dax: Fix condition for filling of PMD holes
      478a1469
    • Linus Torvalds's avatar
      Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 315227f6
      Linus Torvalds authored
      Pull misc DAX updates from Vishal Verma:
       "DAX error handling for 4.7
      
         - Until now, dax has been disabled if media errors were found on any
           device.  This enables the use of DAX in the presence of these
           errors by making all sector-aligned zeroing go through the driver.
      
         - The driver (already) has the ability to clear errors on writes that
           are sent through the block layer using 'DSMs' defined in ACPI 6.1.
      
        Other misc changes:
      
         - When mounting DAX filesystems, check to make sure the partition is
           page aligned.  This is a requirement for DAX, and previously, we
           allowed such unaligned mounts to succeed, but subsequent
           reads/writes would fail.
      
         - Misc/cleanup fixes from Jan that remove unused code from DAX
           related to zeroing, writeback, and some size checks"
      
      * tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        dax: fix a comment in dax_zero_page_range and dax_truncate_page
        dax: for truncate/hole-punch, do zeroing through the driver if possible
        dax: export a low-level __dax_zero_page_range helper
        dax: use sb_issue_zerout instead of calling dax_clear_sectors
        dax: enable dax in the presence of known media errors (badblocks)
        dax: fallback from pmd to pte on error
        block: Update blkdev_dax_capable() for consistency
        xfs: Add alignment check for DAX mount
        ext2: Add alignment check for DAX mount
        ext4: Add alignment check for DAX mount
        block: Add bdev_dax_supported() for dax mount checks
        block: Add vfs_msg() interface
        dax: Remove redundant inode size checks
        dax: Remove pointless writeback from dax_do_io()
        dax: Remove zeroing from dax_io()
        dax: Remove dead zeroing code from fault handlers
        ext2: Avoid DAX zeroing to corrupt data
        ext2: Fix block zeroing in ext2_get_blocks() for DAX
        dax: Remove complete_unwritten argument
        DAX: move RADIX_DAX_ definitions to dax.c
      315227f6
  2. 26 May, 2016 24 commits