1. 16 May, 2016 6 commits
    • Linus Torvalds's avatar
      Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1c19b68a
      Linus Torvalds authored
      Pull locking changes from Ingo Molnar:
       "The main changes in this cycle were:
      
         - pvqspinlock statistics fixes (Davidlohr Bueso)
      
         - flip atomic_fetch_or() arguments (Peter Zijlstra)
      
         - locktorture simplification (Paul E.  McKenney)
      
         - documentation updates (SeongJae Park, David Howells, Davidlohr
           Bueso, Paul E McKenney, Peter Zijlstra, Will Deacon)
      
         - various fixes"
      
      * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/atomics: Flip atomic_fetch_or() arguments
        locking/pvqspinlock: Robustify init_qspinlock_stat()
        locking/pvqspinlock: Avoid double resetting of stats
        lcoking/locktorture: Simplify the torture_runnable computation
        locking/Documentation: Clarify that ACQUIRE applies to loads, RELEASE applies to stores
        locking/Documentation: State purpose of memory-barriers.txt
        locking/Documentation: Add disclaimer
        locking/Documentation/lockdep: Fix spelling mistakes
        locking/lockdep: Deinline register_lock_class(), save 2328 bytes
        locking/locktorture: Fix NULL pointer dereference for cleanup paths
        locking/locktorture: Fix deboosting NULL pointer dereference
        locking/Documentation: Mention smp_cond_acquire()
        locking/Documentation: Insert white spaces consistently
        locking/Documentation: Fix formatting inconsistencies
        locking/Documentation: Add missed subsection in TOC
        locking/Documentation: Fix missed s/lock/acquire renames
        locking/Documentation: Clarify relationship of barrier() to control dependencies
      1c19b68a
    • Linus Torvalds's avatar
      Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 49817c33
      Linus Torvalds authored
      Pull EFI updates from Ingo Molnar:
       "The main changes in this cycle were:
      
         - Drop the unused EFI_SYSTEM_TABLES efi.flags bit and ensure the
           ARM/arm64 EFI System Table mapping is read-only (Ard Biesheuvel)
      
         - Add a comment to explain that one of the code paths in the x86/pat
           code is only executed for EFI boot (Matt Fleming)
      
         - Improve Secure Boot status checks on arm64 and handle unexpected
           errors (Linn Crosetto)
      
         - Remove the global EFI memory map variable 'memmap' as the same
           information is already available in efi::memmap (Matt Fleming)
      
         - Add EFI Memory Attribute table support for ARM/arm64 (Ard
           Biesheuvel)
      
         - Add EFI GOP framebuffer support for ARM/arm64 (Ard Biesheuvel)
      
         - Add EFI Bootloader Control driver for storing reboot(2) data in EFI
           variables for consumption by bootloaders (Jeremy Compostella)
      
         - Add Core EFI capsule support (Matt Fleming)
      
         - Add EFI capsule char driver (Kweh, Hock Leong)
      
         - Unify EFI memory map code for ARM and arm64 (Ard Biesheuvel)
      
         - Add generic EFI support for detecting when firmware corrupts CPU
           status register bits (like IRQ flags) when performing EFI runtime
           service calls (Mark Rutland)
      
        ... and other misc cleanups"
      
      * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
        efivarfs: Make efivarfs_file_ioctl() static
        efi: Merge boolean flag arguments
        efi/capsule: Move 'capsule' to the stack in efi_capsule_supported()
        efibc: Fix excessive stack footprint warning
        efi/capsule: Make efi_capsule_pending() lockless
        efi: Remove unnecessary (and buggy) .memmap initialization from the Xen EFI driver
        efi/runtime-wrappers: Remove ARCH_EFI_IRQ_FLAGS_MASK #ifdef
        x86/efi: Enable runtime call flag checking
        arm/efi: Enable runtime call flag checking
        arm64/efi: Enable runtime call flag checking
        efi/runtime-wrappers: Detect firmware IRQ flag corruption
        efi/runtime-wrappers: Remove redundant #ifdefs
        x86/efi: Move to generic {__,}efi_call_virt()
        arm/efi: Move to generic {__,}efi_call_virt()
        arm64/efi: Move to generic {__,}efi_call_virt()
        efi/runtime-wrappers: Add {__,}efi_call_virt() templates
        efi/arm-init: Reserve rather than unmap the memory map for ARM as well
        efi: Add misc char driver interface to update EFI firmware
        x86/efi: Force EFI reboot to process pending capsules
        efi: Add 'capsule' update support
        ...
      49817c33
    • Linus Torvalds's avatar
      Merge branch 'core-signals-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 230e51f2
      Linus Torvalds authored
      Pull core signal updates from Ingo Molnar:
       "These updates from Stas Sergeev and Andy Lutomirski, improve the
        sigaltstack interface by extending its ABI with the SS_AUTODISARM
        feature, which makes it possible to use swapcontext() in a sighandler
        that works on sigaltstack.  Without this flag, the subsequent signal
        will corrupt the state of the switched-away sighandler.
      
        The inspiration is more robust dosemu signal handling"
      
      * 'core-signals-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        signals/sigaltstack: Change SS_AUTODISARM to (1U << 31)
        signals/sigaltstack: Report current flag bits in sigaltstack()
        selftests/sigaltstack: Fix the sigaltstack test on old kernels
        signals/sigaltstack: If SS_AUTODISARM, bypass on_sig_stack()
        selftests/sigaltstack: Add new testcase for sigaltstack(SS_ONSTACK|SS_AUTODISARM)
        signals/sigaltstack: Implement SS_AUTODISARM flag
        signals/sigaltstack: Prepare to add new SS_xxx flags
        signals/sigaltstack, x86/signals: Unify the x86 sigaltstack check with other architectures
      230e51f2
    • Linus Torvalds's avatar
      Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a3871bd4
      Linus Torvalds authored
      Pull RCU updates from Ingo Molnar:
       "The main changes are:
      
         - Documentation updates, including fixes to the design-level
           requirements documentation and a fixed version of the design-level
           data-structure documentation.  These fixes include removing
           cartoons and getting rid of the html/htmlx duplication.
      
         - Further improvements to the new-age expedited grace periods.
      
         - Miscellaneous fixes.
      
         - Torture-test changes, including a new rcuperf module for measuring
           RCU grace-period performance and scalability, which is useful for
           the expedited-grace-period changes"
      
      * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
        rcutorture: Add boot-time adjustment of leaf fanout
        rcutorture: Add irqs-disabled test for call_rcu()
        rcutorture: Dump trace buffer upon shutdown
        rcutorture: Don't rebuild identical kernel
        rcutorture: Add OS-jitter capability
        documentation: Add documentation for RCU's major data structures
        rcutorture: Convert test duration to seconds early
        torture: Kill qemu, not parent process
        torture: Clarify refusal to run more than one torture test
        rcutorture: Consider FROZEN hotplug notifier transitions
        rcutorture: Remove redundant initialization to zero
        rcuperf: Do not wake up shutdown wait queue if "shutdown" is false.
        rcutorture: Add largish-system rcuperf scenario
        rcutorture: Avoid RCU CPU stall warning and RT throttling
        rcutorture: Add rcuperf holdoff boot parameter to reduce interference
        rcutorture: Make scripts analyze rcuperf trace data, if present
        rcutorture: Make rcuperf collect expedited event-trace data
        rcutorture: Print measure of batching efficiency
        rcutorture: Set rcuperf writer kthreads to real-time priority
        rcutorture: Bind rcuperf reader/writer kthreads to CPUs
        ...
      a3871bd4
    • Linus Torvalds's avatar
      Merge branch 'core-lib-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0052af44
      Linus Torvalds authored
      Pull core/lib update from Ingo Molnar:
       "This contains a single commit that removes an unused facility that the
        scheduler used to make use of"
      
      * 'core-lib-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lib/proportions: Remove unused code
      0052af44
    • George Spelvin's avatar
      namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS · 0fed3ac8
      George Spelvin authored
      The hash mixing between adding the next 64 bits of name
      was just a bit weak.
      
      Replaced with a still very fast but slightly more effective
      mixing function.
      Signed-off-by: default avatarGeorge Spelvin <linux@horizon.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0fed3ac8
  2. 15 May, 2016 2 commits
  3. 14 May, 2016 11 commits
  4. 13 May, 2016 17 commits
  5. 12 May, 2016 4 commits
    • Andrea Arcangeli's avatar
      mm: thp: calculate the mapcount correctly for THP pages during WP faults · 6d0a07ed
      Andrea Arcangeli authored
      This will provide fully accuracy to the mapcount calculation in the
      write protect faults, so page pinning will not get broken by false
      positive copy-on-writes.
      
      total_mapcount() isn't the right calculation needed in
      reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
      that is effectively the full accurate return value for page_mapcount()
      if dealing with Transparent Hugepages, however we only use the
      page_trans_huge_mapcount() during COW faults where it strictly needed,
      due to its higher runtime cost.
      
      This also provide at practical zero cost the total_mapcount
      information which is needed to know if we can still relocate the page
      anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
      can reuse the page no matter if it's a pte or a pmd_trans_huge
      triggering the fault, but we can only relocate the page anon_vma to
      the local vma->anon_vma if we're sure it's only this "vma" mapping the
      whole THP physical range.
      
      Kirill A. Shutemov discovered the problem with moving the page
      anon_vma to the local vma->anon_vma in a previous version of this
      patch and another problem in the way page_move_anon_rmap() was called.
      
      Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
      previous version, because reuse_swap_page must be a macro to call
      page_trans_huge_mapcount from swap.h, so this uses a macro again
      instead of an inline function. With this change at least it's a less
      dangerous usage than it was before, because "page" is used only once
      now, while with the previous code reuse_swap_page(page++) would have
      called page_mapcount on page+1 and it would have increased page twice
      instead of just once.
      
      Dean Luick noticed an uninitialized variable that could result in a
      rmap inefficiency for the non-THP case in a previous version.
      
      Mike Marciniszyn said:
      
      : Our RDMA tests are seeing an issue with memory locking that bisects to
      : commit 61f5d698 ("mm: re-enable THP")
      :
      : The test program registers two rather large MRs (512M) and RDMA
      : writes data to a passive peer using the first and RDMA reads it back
      : into the second MR and compares that data.  The sizes are chosen randomly
      : between 0 and 1024 bytes.
      :
      : The test will get through a few (<= 4 iterations) and then gets a
      : compare error.
      :
      : Tracing indicates the kernel logical addresses associated with the individual
      : pages at registration ARE correct , the data in the "RDMA read response only"
      : packets ARE correct.
      :
      : The "corruption" occurs when the packet crosse two pages that are not physically
      : contiguous.   The second page reads back as zero in the program.
      :
      : It looks like the user VA at the point of the compare error no longer points to
      : the same physical address as was registered.
      :
      : This patch totally resolves the issue!
      
      Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
      Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
      Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Tested-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Tested-by: default avatarJosh Collier <josh.d.collier@intel.com>
      Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
      Cc: <stable@vger.kernel.org>	[4.5]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d0a07ed
    • Zhou Chengming's avatar
      ksm: fix conflict between mmput and scan_get_next_rmap_item · 7496fea9
      Zhou Chengming authored
      A concurrency issue about KSM in the function scan_get_next_rmap_item.
      
      task A (ksmd):				|task B (the mm's task):
      					|
      mm = slot->mm;				|
      down_read(&mm->mmap_sem);		|
      					|
      ...					|
      					|
      spin_lock(&ksm_mmlist_lock);		|
      					|
      ksm_scan.mm_slot go to the next slot;	|
      					|
      spin_unlock(&ksm_mmlist_lock);		|
      					|mmput() ->
      					|	ksm_exit():
      					|
      					|spin_lock(&ksm_mmlist_lock);
      					|if (mm_slot && ksm_scan.mm_slot != mm_slot) {
      					|	if (!mm_slot->rmap_list) {
      					|		easy_to_free = 1;
      					|		...
      					|
      					|if (easy_to_free) {
      					|	mmdrop(mm);
      					|	...
      					|
      					|So this mm_struct may be freed in the mmput().
      					|
      up_read(&mm->mmap_sem);			|
      
      As we can see above, the ksmd thread may access a mm_struct that already
      been freed to the kmem_cache.  Suppose a fork will get this mm_struct from
      the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will
      cause mmap_sem.count to become -1.
      
      As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has
      the same SMP race condition, so fix it too.  My prev fix in function
      scan_get_next_rmap_item will introduce a different SMP race condition, so
      just invert the up_read/spin_unlock order as Andrea Arcangeli said.
      
      Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: default avatarZhou Chengming <zhouchengming1@huawei.com>
      Suggested-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7496fea9
    • Junxiao Bi's avatar
      ocfs2: fix posix_acl_create deadlock · c25a1e06
      Junxiao Bi authored
      Commit 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      refactored code to use posix_acl_create.  The problem with this function
      is that it is not mindful of the cluster wide inode lock making it
      unsuitable for use with ocfs2 inode creation with ACLs.  For example,
      when used in ocfs2_mknod, this function can cause deadlock as follows.
      The parent dir inode lock is taken when calling posix_acl_create ->
      get_acl -> ocfs2_iop_get_acl which takes the inode lock again.  This can
      cause deadlock if there is a blocked remote lock request waiting for the
      lock to be downconverted.  And same deadlock happened in ocfs2_reflink.
      This fix is to revert back using ocfs2_init_acl.
      
      Fixes: 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      Signed-off-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c25a1e06
    • Junxiao Bi's avatar
      ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang · 5ee0fbd5
      Junxiao Bi authored
      Commit 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      introduced this issue.  ocfs2_setattr called by chmod command holds
      cluster wide inode lock when calling posix_acl_chmod.  This latter
      function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl.  These
      two are also called directly from vfs layer for getfacl/setfacl commands
      and therefore acquire the cluster wide inode lock.  If a remote
      conversion request comes after the first inode lock in ocfs2_setattr,
      OCFS2_LOCK_BLOCKED will be set.  And this will cause the second call to
      inode lock from the ocfs2_iop_get_acl() to block indefinetly.
      
      The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which
      does not call back into the filesystem.  Therefore, we restore
      ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that
      instead.
      
      Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      Signed-off-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ee0fbd5