1. 01 Jun, 2016 3 commits
    • Catalin Marinas's avatar
      arm64: Implement ptep_set_access_flags() for hardware AF/DBM · 433dcefb
      Catalin Marinas authored
      commit 66dbd6e6 upstream.
      
      When hardware updates of the access and dirty states are enabled, the
      default ptep_set_access_flags() implementation based on calling
      set_pte_at() directly is potentially racy. This triggers the "racy dirty
      state clearing" warning in set_pte_at() because an existing writable PTE
      is overridden with a clean entry.
      
      There are two main scenarios for this situation:
      
      1. The CPU getting an access fault does not support hardware updates of
         the access/dirty flags. However, a different agent in the system
         (e.g. SMMU) can do this, therefore overriding a writable entry with a
         clean one could potentially lose the automatically updated dirty
         status
      
      2. A more complex situation is possible when all CPUs support hardware
         AF/DBM:
      
         a) Initial state: shareable + writable vma and pte_none(pte)
         b) Read fault taken by two threads of the same process on different
            CPUs
         c) CPU0 takes the mmap_sem and proceeds to handling the fault. It
            eventually reaches do_set_pte() which sets a writable + clean pte.
            CPU0 releases the mmap_sem
         d) CPU1 acquires the mmap_sem and proceeds to handle_pte_fault(). The
            pte entry it reads is present, writable and clean and it continues
            to pte_mkyoung()
         e) CPU1 calls ptep_set_access_flags()
      
         If between (d) and (e) the hardware (another CPU) updates the dirty
         state (clears PTE_RDONLY), CPU1 will override the PTR_RDONLY bit
         marking the entry clean again.
      
      This patch implements an arm64-specific ptep_set_access_flags() function
      to perform an atomic update of the PTE flags.
      
      Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reported-by: default avatarMing Lei <tom.leiming@gmail.com>
      Tested-by: default avatarJulien Grall <julien.grall@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      [will: reworded comment]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      433dcefb
    • Catalin Marinas's avatar
      arm64: Ensure pmd_present() returns false after pmd_mknotpresent() · 790cf8cf
      Catalin Marinas authored
      commit 5bb1cc0f upstream.
      
      Currently, pmd_present() only checks for a non-zero value, returning
      true even after pmd_mknotpresent() (which only clears the type bits).
      This patch converts pmd_present() to using pte_present(), similar to the
      other pmd_*() checks. As a side effect, it will return true for
      PROT_NONE mappings, though they are not yet used by the kernel with
      transparent huge pages.
      
      For consistency, also change pmd_mknotpresent() to only clear the
      PMD_SECT_VALID bit, even though the PMD_TABLE_BIT is already 0 for block
      mappings (no functional change). The unused PMD_SECT_PROT_NONE
      definition is removed as transparent huge pages use the pte page prot
      values.
      
      Fixes: 9c7e535f ("arm64: mm: Route pmd thp functions through pte equivalents")
      Reviewed-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      790cf8cf
    • Catalin Marinas's avatar
      arm64: Fix typo in the pmdp_huge_get_and_clear() definition · 8f2bc2a8
      Catalin Marinas authored
      commit 911f56ee upstream.
      
      With hardware AF/DBM support, pmd modifications (transparent huge pages)
      should be performed atomically using load/store exclusive. The initial
      patches defined the get-and-clear function and __HAVE_ARCH_* macro
      without the "huge" word, leaving the pmdp_huge_get_and_clear() to the
      default, non-atomic implementation.
      
      Fixes: 2f4b829c ("arm64: Add support for hardware updates of the access and dirty pte bits")
      Reviewed-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8f2bc2a8
  2. 15 May, 2016 2 commits
  3. 14 May, 2016 11 commits
  4. 13 May, 2016 17 commits
  5. 12 May, 2016 7 commits
    • Andrea Arcangeli's avatar
      mm: thp: calculate the mapcount correctly for THP pages during WP faults · 6d0a07ed
      Andrea Arcangeli authored
      This will provide fully accuracy to the mapcount calculation in the
      write protect faults, so page pinning will not get broken by false
      positive copy-on-writes.
      
      total_mapcount() isn't the right calculation needed in
      reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
      that is effectively the full accurate return value for page_mapcount()
      if dealing with Transparent Hugepages, however we only use the
      page_trans_huge_mapcount() during COW faults where it strictly needed,
      due to its higher runtime cost.
      
      This also provide at practical zero cost the total_mapcount
      information which is needed to know if we can still relocate the page
      anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
      can reuse the page no matter if it's a pte or a pmd_trans_huge
      triggering the fault, but we can only relocate the page anon_vma to
      the local vma->anon_vma if we're sure it's only this "vma" mapping the
      whole THP physical range.
      
      Kirill A. Shutemov discovered the problem with moving the page
      anon_vma to the local vma->anon_vma in a previous version of this
      patch and another problem in the way page_move_anon_rmap() was called.
      
      Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
      previous version, because reuse_swap_page must be a macro to call
      page_trans_huge_mapcount from swap.h, so this uses a macro again
      instead of an inline function. With this change at least it's a less
      dangerous usage than it was before, because "page" is used only once
      now, while with the previous code reuse_swap_page(page++) would have
      called page_mapcount on page+1 and it would have increased page twice
      instead of just once.
      
      Dean Luick noticed an uninitialized variable that could result in a
      rmap inefficiency for the non-THP case in a previous version.
      
      Mike Marciniszyn said:
      
      : Our RDMA tests are seeing an issue with memory locking that bisects to
      : commit 61f5d698 ("mm: re-enable THP")
      :
      : The test program registers two rather large MRs (512M) and RDMA
      : writes data to a passive peer using the first and RDMA reads it back
      : into the second MR and compares that data.  The sizes are chosen randomly
      : between 0 and 1024 bytes.
      :
      : The test will get through a few (<= 4 iterations) and then gets a
      : compare error.
      :
      : Tracing indicates the kernel logical addresses associated with the individual
      : pages at registration ARE correct , the data in the "RDMA read response only"
      : packets ARE correct.
      :
      : The "corruption" occurs when the packet crosse two pages that are not physically
      : contiguous.   The second page reads back as zero in the program.
      :
      : It looks like the user VA at the point of the compare error no longer points to
      : the same physical address as was registered.
      :
      : This patch totally resolves the issue!
      
      Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.comSigned-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatar"Kirill A. Shutemov" <kirill@shutemov.name>
      Reviewed-by: default avatarDean Luick <dean.luick@intel.com>
      Tested-by: default avatarAlex Williamson <alex.williamson@redhat.com>
      Tested-by: default avatarMike Marciniszyn <mike.marciniszyn@intel.com>
      Tested-by: default avatarJosh Collier <josh.d.collier@intel.com>
      Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
      Cc: <stable@vger.kernel.org>	[4.5]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6d0a07ed
    • Zhou Chengming's avatar
      ksm: fix conflict between mmput and scan_get_next_rmap_item · 7496fea9
      Zhou Chengming authored
      A concurrency issue about KSM in the function scan_get_next_rmap_item.
      
      task A (ksmd):				|task B (the mm's task):
      					|
      mm = slot->mm;				|
      down_read(&mm->mmap_sem);		|
      					|
      ...					|
      					|
      spin_lock(&ksm_mmlist_lock);		|
      					|
      ksm_scan.mm_slot go to the next slot;	|
      					|
      spin_unlock(&ksm_mmlist_lock);		|
      					|mmput() ->
      					|	ksm_exit():
      					|
      					|spin_lock(&ksm_mmlist_lock);
      					|if (mm_slot && ksm_scan.mm_slot != mm_slot) {
      					|	if (!mm_slot->rmap_list) {
      					|		easy_to_free = 1;
      					|		...
      					|
      					|if (easy_to_free) {
      					|	mmdrop(mm);
      					|	...
      					|
      					|So this mm_struct may be freed in the mmput().
      					|
      up_read(&mm->mmap_sem);			|
      
      As we can see above, the ksmd thread may access a mm_struct that already
      been freed to the kmem_cache.  Suppose a fork will get this mm_struct from
      the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will
      cause mmap_sem.count to become -1.
      
      As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has
      the same SMP race condition, so fix it too.  My prev fix in function
      scan_get_next_rmap_item will introduce a different SMP race condition, so
      just invert the up_read/spin_unlock order as Andrea Arcangeli said.
      
      Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: default avatarZhou Chengming <zhouchengming1@huawei.com>
      Suggested-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Zhen Lei <thunder.leizhen@huawei.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7496fea9
    • Junxiao Bi's avatar
      ocfs2: fix posix_acl_create deadlock · c25a1e06
      Junxiao Bi authored
      Commit 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      refactored code to use posix_acl_create.  The problem with this function
      is that it is not mindful of the cluster wide inode lock making it
      unsuitable for use with ocfs2 inode creation with ACLs.  For example,
      when used in ocfs2_mknod, this function can cause deadlock as follows.
      The parent dir inode lock is taken when calling posix_acl_create ->
      get_acl -> ocfs2_iop_get_acl which takes the inode lock again.  This can
      cause deadlock if there is a blocked remote lock request waiting for the
      lock to be downconverted.  And same deadlock happened in ocfs2_reflink.
      This fix is to revert back using ocfs2_init_acl.
      
      Fixes: 702e5bc6 ("ocfs2: use generic posix ACL infrastructure")
      Signed-off-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c25a1e06
    • Junxiao Bi's avatar
      ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang · 5ee0fbd5
      Junxiao Bi authored
      Commit 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      introduced this issue.  ocfs2_setattr called by chmod command holds
      cluster wide inode lock when calling posix_acl_chmod.  This latter
      function in turn calls ocfs2_iop_get_acl and ocfs2_iop_set_acl.  These
      two are also called directly from vfs layer for getfacl/setfacl commands
      and therefore acquire the cluster wide inode lock.  If a remote
      conversion request comes after the first inode lock in ocfs2_setattr,
      OCFS2_LOCK_BLOCKED will be set.  And this will cause the second call to
      inode lock from the ocfs2_iop_get_acl() to block indefinetly.
      
      The deleted version of ocfs2_acl_chmod() calls __posix_acl_chmod() which
      does not call back into the filesystem.  Therefore, we restore
      ocfs2_acl_chmod(), modify it slightly for locking as needed, and use that
      instead.
      
      Fixes: 743b5f14 ("ocfs2: take inode lock in ocfs2_iop_set/get_acl()")
      Signed-off-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ee0fbd5
    • Arnd Bergmann's avatar
      net: mvneta: bm: fix dependencies again · 2073dbad
      Arnd Bergmann authored
      I tried to fix this before, but my previous fix was incomplete
      and we can still get the same link error in randconfig builds
      because of the way that Kconfig treats the
      
      	default y if MVNETA=y && MVNETA_BM_ENABLE
      
      line that does not actually trigger when MVNETA_BM_ENABLE=m,
      unlike I intended.
      Changing the line to use MVNETA_BM_ENABLE!=n however has
      the desired effect and hopefully makes all configurations
      work as expected.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Fixes: 019ded3a ("net: mvneta: bm: clarify dependencies")
      Acked-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2073dbad
    • Linus Torvalds's avatar
      Merge tag 'keys-fixes-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 02c9c0e9
      Linus Torvalds authored
      Pull keyring fix from David Howells:
       "Fix ASN.1 indefinite length object parsing"
      
      * tag 'keys-fixes-20160512' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        KEYS: Fix ASN.1 indefinite length object parsing
      02c9c0e9
    • Linus Torvalds's avatar
      Merge tag 'sound-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · e5ad8b6d
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "This is a pretty boring pull request as you wish: including a few
        small and trivial HD-audio and USB-audio quirks and a couple of small
        regression fixes in HD-audio"
      
      * tag 'sound-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: usb-audio: Yet another Phoneix Audio device quirk
        ALSA: hda - Fix regression on ATI HDMI audio
        ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
        ALSA: hda - Fix broken reconfig
        ALSA: hda - Fix white noise on Asus UX501VW headset
        ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
      e5ad8b6d