1. 22 Nov, 2013 6 commits
    • Andrea Arcangeli's avatar
      mm: hugetlbfs: fix hugetlbfs optimization · 27c73ae7
      Andrea Arcangeli authored
      Commit 7cb2ef56 ("mm: fix aio performance regression for database
      caused by THP") can cause dereference of a dangling pointer if
      split_huge_page runs during PageHuge() if there are updates to the
      tail_page->private field.
      
      Also it is repeating compound_head twice for hugetlbfs and it is running
      compound_head+compound_trans_head for THP when a single one is needed in
      both cases.
      
      The new code within the PageSlab() check doesn't need to verify that the
      THP page size is never bigger than the smallest hugetlbfs page size, to
      avoid memory corruption.
      
      A longstanding theoretical race condition was found while fixing the
      above (see the change right after the skip_unlock label, that is
      relevant for the compound_lock path too).
      
      By re-establishing the _mapcount tail refcounting for all compound
      pages, this also fixes the below problem:
      
        echo 0 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
      
        BUG: Bad page state in process bash  pfn:59a01
        page:ffffea000139b038 count:0 mapcount:10 mapping:          (null) index:0x0
        page flags: 0x1c00000000008000(tail)
        Modules linked in:
        CPU: 6 PID: 2018 Comm: bash Not tainted 3.12.0+ #25
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        Call Trace:
          dump_stack+0x55/0x76
          bad_page+0xd5/0x130
          free_pages_prepare+0x213/0x280
          __free_pages+0x36/0x80
          update_and_free_page+0xc1/0xd0
          free_pool_huge_page+0xc2/0xe0
          set_max_huge_pages.part.58+0x14c/0x220
          nr_hugepages_store_common.isra.60+0xd0/0xf0
          nr_hugepages_store+0x13/0x20
          kobj_attr_store+0xf/0x20
          sysfs_write_file+0x189/0x1e0
          vfs_write+0xc5/0x1f0
          SyS_write+0x55/0xb0
          system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: default avatarAndrea Arcangeli <aarcange@redhat.com>
      Tested-by: default avatarKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27c73ae7
    • Yuanhan Liu's avatar
      kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly · 044c8d4b
      Yuanhan Liu authored
      Remove CONFIG_USE_GENERIC_SMP_HELPERS left by commit 0a06ff06
      ("kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS").
      Signed-off-by: default avatarYuanhan Liu <yuanhan.liu@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      044c8d4b
    • Greg Thelen's avatar
      ipc,shm: fix shm_file deletion races · a399b29d
      Greg Thelen authored
      When IPC_RMID races with other shm operations there's potential for
      use-after-free of the shm object's associated file (shm_file).
      
      Here's the race before this patch:
      
        TASK 1                     TASK 2
        ------                     ------
        shm_rmid()
          ipc_lock_object()
                                   shmctl()
                                   shp = shm_obtain_object_check()
      
          shm_destroy()
            shum_unlock()
            fput(shp->shm_file)
                                   ipc_lock_object()
                                   shmem_lock(shp->shm_file)
                                   <OOPS>
      
      The oops is caused because shm_destroy() calls fput() after dropping the
      ipc_lock.  fput() clears the file's f_inode, f_path.dentry, and
      f_path.mnt, which causes various NULL pointer references in task 2.  I
      reliably see the oops in task 2 if with shmlock, shmu
      
      This patch fixes the races by:
      1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock().
      2) modify at risk operations to check shm_file while holding
         ipc_object_lock().
      
      Example workloads, which each trigger oops...
      
      Workload 1:
        while true; do
          id=$(shmget 1 4096)
          shm_rmid $id &
          shmlock $id &
          wait
        done
      
        The oops stack shows accessing NULL f_inode due to racing fput:
          _raw_spin_lock
          shmem_lock
          SyS_shmctl
      
      Workload 2:
        while true; do
          id=$(shmget 1 4096)
          shmat $id 4096 &
          shm_rmid $id &
          wait
        done
      
        The oops stack is similar to workload 1 due to NULL f_inode:
          touch_atime
          shmem_mmap
          shm_mmap
          mmap_region
          do_mmap_pgoff
          do_shmat
          SyS_shmat
      
      Workload 3:
        while true; do
          id=$(shmget 1 4096)
          shmlock $id
          shm_rmid $id &
          shmunlock $id &
          wait
        done
      
        The oops stack shows second fput tripping on an NULL f_inode.  The
        first fput() completed via from shm_destroy(), but a racing thread did
        a get_file() and queued this fput():
          locks_remove_flock
          __fput
          ____fput
          task_work_run
          do_notify_resume
          int_signal
      
      Fixes: c2c737a0 ("ipc,shm: shorten critical region for shmat")
      Fixes: 2caacaa8 ("ipc,shm: shorten critical region for shmctl")
      Signed-off-by: default avatarGreg Thelen <gthelen@google.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>  # 3.10.17+ 3.11.6+
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a399b29d
    • Dave Hansen's avatar
      mm: thp: give transparent hugepage code a separate copy_page · 30b0a105
      Dave Hansen authored
      Right now, the migration code in migrate_page_copy() uses copy_huge_page()
      for hugetlbfs and thp pages:
      
             if (PageHuge(page) || PageTransHuge(page))
                      copy_huge_page(newpage, page);
      
      So, yay for code reuse.  But:
      
        void copy_huge_page(struct page *dst, struct page *src)
        {
              struct hstate *h = page_hstate(src);
      
      and a non-hugetlbfs page has no page_hstate().  This works 99% of the
      time because page_hstate() determines the hstate from the page order
      alone.  Since the page order of a THP page matches the default hugetlbfs
      page order, it works.
      
      But, if you change the default huge page size on the boot command-line
      (say default_hugepagesz=1G), then we might not even *have* a 2MB hstate
      so page_hstate() returns null and copy_huge_page() oopses pretty fast
      since copy_huge_page() dereferences the hstate:
      
        void copy_huge_page(struct page *dst, struct page *src)
        {
              struct hstate *h = page_hstate(src);
              if (unlikely(pages_per_huge_page(h) > MAX_ORDER_NR_PAGES)) {
        ...
      
      Mel noticed that the migration code is really the only user of these
      functions.  This moves all the copy code over to migrate.c and makes
      copy_huge_page() work for THP by checking for it explicitly.
      
      I believe the bug was introduced in commit b32967ff ("mm: numa: Add
      THP migration for the NUMA working set scanning fault case")
      
      [akpm@linux-foundation.org: fix coding-style and comment text, per Naoya Horiguchi]
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Tested-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      30b0a105
    • Joe Perches's avatar
      checkpatch: fix "Use of uninitialized value" warnings · c11230f4
      Joe Perches authored
      checkpatch is currently confused about some complex macros and references
      undefined variables $stat and $cond.
      
      Make sure these are defined before using them.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Reported-by: default avatarGerhard Sittig <gsi@denx.de>
      Acked-by: default avatarAndy Whitcroft <apw@canonical.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c11230f4
    • Junxiao Bi's avatar
      configfs: fix race between dentry put and lookup · 76ae281f
      Junxiao Bi authored
      A race window in configfs, it starts from one dentry is UNHASHED and end
      before configfs_d_iput is called.  In this window, if a lookup happen,
      since the original dentry was UNHASHED, so a new dentry will be
      allocated, and then in configfs_attach_attr(), sd->s_dentry will be
      updated to the new dentry.  Then in configfs_d_iput(),
      BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
      
      sys_open:                     sys_close:
       ...                           fput
                                      dput
                                       dentry_kill
                                        __d_drop <--- dentry unhashed here,
                                                 but sd->dentry still point
                                                 to this dentry.
      
       lookup_real
        configfs_lookup
         configfs_attach_attr---> update sd->s_dentry
                                  to new allocated dentry here.
      
                                         d_kill
                                           configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
                                                           triggered here.
      
      To fix it, change configfs_d_iput to not update sd->s_dentry if
      sd->s_count > 2, that means there are another dentry is using the sd
      beside the one that is going to be put.  Use configfs_dirent_lock in
      configfs_attach_attr to sync with configfs_d_iput.
      
      With the following steps, you can reproduce the bug.
      
      1. enable ocfs2, this will mount configfs at /sys/kernel/config and
         fill configure in it.
      
      2. run the following script.
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      76ae281f
  2. 20 Nov, 2013 34 commits
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · 527d1511
      Linus Torvalds authored
      Pull powerpc LE updates from Ben Herrenschmidt:
       "With my previous pull request I mentioned some remaining Little Endian
        patches, notably support for our new ABI, which I was sitting on
        making sure it was all finalized.
      
        The toolchain folks confirmed it now, the new ABI is stable and merged
        with gcc, so we are all good.  Oh and we actually missed the actual
        Kconfig switch for LE so here it is, along with a couple more bug
        fixes.
      
        I have more fixes but not related to LE so I'll send them as a
        separate pull request tomorrow, let's get this one out of the way.
      
        Note that this supports running user space binaries using the new ABI,
        but the kernel itself still needs to be built with the old one.  We'll
        bring fixes for that after -rc1.
      
        Here's Anton log that goes with this series:
      
           This patch series adds support for the new ABI, LPAR support for
           H_SET_MODE and finally adds a kconfig option and defconfig.
      
           ABIv2 support was recently committed to binutils and gcc, and should
           be merged into glibc soon.  There are a number of very nice
           improvements including the removal of function descriptors.  Rusty's
           kernel patches allow binaries of either ABI to work, easing the
           transition"
      
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Wrong DWARF CFI in the kernel vdso for little-endian / ELFv2
        powerpc: Add pseries_le_defconfig
        powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.
        powerpc: Don't use ELFv2 ABI to build the kernel
        powerpc: ELF2 binaries signal handling
        powerpc: ELF2 binaries launched directly.
        powerpc: Set eflags correctly for ELF ABIv2 core dumps.
        powerpc: Add TIF_ELF2ABI flag.
        pseries: Add H_SET_MODE to change exception endianness
        powerpc/pseries: Fix endian issues in pseries EEH code
      527d1511
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha · d5bdaf4f
      Linus Torvalds authored
      Pull alpha updates from Matt Turner:
       "It contains a few fixes and some work from Richard to make alpha
        emulation under QEMU much more usable"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
        alpha: Prevent a NULL ptr dereference in csum_partial_copy.
        alpha: perf: fix out-of-bounds array access triggered from raw event
        alpha: Use qemu+cserve provided high-res clock and alarm.
        alpha: Switch to GENERIC_CLOCKEVENTS
        alpha: Enable the rpcc clocksource for single processor
        alpha: Reorganize rtc handling
        alpha: Primitive support for CPU power down.
        alpha: Allow HZ to be configured
        alpha: Notice if we're being run under QEMU
        alpha: Eliminate compiler warning from memset macro
      d5bdaf4f
    • Linus Torvalds's avatar
      Merge branch 'parisc-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · dc6ec87d
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       - revert an access_ok() patch which broke 32bit userspace on 64bit
         kernels
       - avoid a gcc miscompilation in two internal pa_memcpy() functions by
         not inlining those
       - do not export the definition of SOCK_NONBLOCK via uapi header (fixes
         build of audit package)
       - depending on the fault type we now correctly report either SIGBUS or
         SIGSEGV
       - a small fix to not compare a size_t variable for < 0
      
      * 'parisc-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: size_t is unsigned, so comparison size < 0 doesn't make sense.
        parisc: improve SIGBUS/SIGSEGV error reporting
        parisc: break out SOCK_NONBLOCK define to own asm header file
        parisc: do not inline pa_memcpy() internal functions
        Revert "parisc: implement full version of access_ok()"
      dc6ec87d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 · 8a60ba0a
      Linus Torvalds authored
      Pull AVR32 updates from Hans-Christian Egtvedt.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
        avr32: uapi: be sure of "_UAPI" prefix for all guard macros
        avr32: add kprobe_ctlblk memory struct
        avr32: fix out-of-range jump in large kernels
        avr32: setup crt for early panic()
      8a60ba0a
    • Linus Torvalds's avatar
      Merge tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next · af2e2f32
      Linus Torvalds authored
      Pull squashfs updates from Phillip Lougher:
       "These patches optionally improve the multi-threading peformance of
        Squashfs by adding parallel decompression, and direct decompression
        into the page cache, eliminating an intermediate buffer (removing
        memcpy overhead and lock contention)"
      
      * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
        Squashfs: Check stream is not NULL in decompressor_multi.c
        Squashfs: Directly decompress into the page cache for file data
        Squashfs: Restructure squashfs_readpage()
        Squashfs: Generalise paging handling in the decompressors
        Squashfs: add multi-threaded decompression using percpu variable
        squashfs: Enhance parallel I/O
        Squashfs: Refactor decompressor interface and code
      af2e2f32
    • Linus Torvalds's avatar
      Revert "mm: create a separate slab for page->ptl allocation" · 8b2e9b71
      Linus Torvalds authored
      This reverts commit ea1e7ed3.
      
      Al points out that while the commit *does* actually create a separate
      slab for the page->ptl allocation, that slab is never actually used, and
      the code continues to use kmalloc/kfree.
      
      Damien Wyart points out that the original patch did have the conversion
      to use kmem_cache_alloc/free, so it got lost somewhere on its way to me.
      
      Revert the half-arsed attempt that didn't do anything.  If we really do
      want the special slab (remember: this is all relevant just for debug
      builds, so it's not necessarily all that critical) we might as well redo
      the patch fully.
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Kirill A Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8b2e9b71
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · b5898cd0
      Linus Torvalds authored
      Pull vfs bits and pieces from Al Viro:
       "Assorted bits that got missed in the first pull request + fixes for a
        couple of coredump regressions"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fold try_to_ascend() into the sole remaining caller
        dcache.c: get rid of pointless macros
        take read_seqbegin_or_lock() and friends to seqlock.h
        consolidate simple ->d_delete() instances
        gfs2: endianness misannotations
        dump_emit(): use __kernel_write(), not vfs_write()
        dump_align(): fix the dumb braino
      b5898cd0
    • Al Viro's avatar
      Wrong page freed on preallocate_pmds() failure exit · 2a46eed5
      Al Viro authored
      Note that pmds[i] is simply uninitialized at that point...
      
      Granted, it's very hard to hit (you need split page locks *and*
      kmalloc(sizeof(spinlock_t), GFP_KERNEL) failing), but the code is
      obviously bogus.
      
      Introduced by commit 09ef4939 ("x86: add missed
      pgtable_pmd_page_ctor/dtor calls for preallocated pmds")
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2a46eed5
    • Ulrich Weigand's avatar
      powerpc: Wrong DWARF CFI in the kernel vdso for little-endian / ELFv2 · 28027082
      Ulrich Weigand authored
      I've finally tracked down why my CR signal-unwind test case still
      fails on little-endian.  The problem turned to be that the kernel
      installs a signal trampoline in the vDSO, and provides a DWARF CFI
      record for that trampoline.  This CFI describes the save location
      for CR:
      
        rsave (70, 38*RSIZE + (RSIZE - CRSIZE))
      
      which is correct for big-endian, but points to the wrong word on
      little-endian.   This is wrong no matter which ABI.
      
      In addition, for the ELFv2 ABI, we should not only provide a CFI
      record for register 70 (cr2), but for all CR fields separately.
      Strictly speaking, I guess this would mean providing two separate
      vDSO images, one for ELFv1 processes and one for ELFv2 processes (or
      maybe playing some tricks with conditional DWARF expressions).
      However, having CFI records for the other CR fields in ELFv1 is not
      actually wrong, they just will be ignored.   So it seems the simplest
      fix would be just to always provide CFI for all the fields.
      Signed-off-by: default avatarUlrich Weigand <Ulrich.Weigand@de.ibm.com>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      28027082
    • Anton Blanchard's avatar
      f53e462e
    • Anton Blanchard's avatar
      powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option. · 7c105b63
      Anton Blanchard authored
      With the little endian support merged, we can add the
      CONFIG_CPU_LITTLE_ENDIAN kernel config option.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7c105b63
    • Alistair Popple's avatar
      powerpc: Don't use ELFv2 ABI to build the kernel · b2ca8c89
      Alistair Popple authored
      The kernel doesn't build correctly using the ELFv2 ABI.  This patch
      ensures that the ELFv1 ABI is used when building a kernel with an
      ELFv2 enabled compiler.
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b2ca8c89
    • Rusty Russell's avatar
      powerpc: ELF2 binaries signal handling · d606b92a
      Rusty Russell authored
      For the ELFv2 ABI, the hander is the entry point, not a function descriptor.
      We also need to set up r12, and fortunately the fast_exception_return
      exit path restores r12 for us so nothing else is required.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d606b92a
    • Rusty Russell's avatar
      powerpc: ELF2 binaries launched directly. · 94af3abf
      Rusty Russell authored
      No function descriptor, but we set r12 up and set TIF_RESTOREALL as it
      normally isn't restored on return from syscall.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      94af3abf
    • Rusty Russell's avatar
      powerpc: Set eflags correctly for ELF ABIv2 core dumps. · 918d0355
      Rusty Russell authored
      We leave it at zero (though it could be 1) for old tasks.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      918d0355
    • Rusty Russell's avatar
      powerpc: Add TIF_ELF2ABI flag. · 373c76d6
      Rusty Russell authored
      Little endian ppc64 is getting an exciting new ABI.  This is reflected
      by the bottom two bits of e_flags in the ELF header:
      
      	0 == legacy binaries (v1 ABI)
      	1 == binaries using the old ABI (compiled with a new toolchain)
      	2 == binaries using the new ABI.
      
      We store this in a thread flag, because we need to set it in core
      dumps and for signal delivery.  Our chief concern is that it doesn't
      use function descriptors.
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      373c76d6
    • Anton Blanchard's avatar
      pseries: Add H_SET_MODE to change exception endianness · e844b1ee
      Anton Blanchard authored
      On little endian builds call H_SET_MODE so exceptions have the
      correct endianness. We need to reset the endian during kexec
      so do that in the MMU hashtable clear callback.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e844b1ee
    • Anton Blanchard's avatar
    • Linus Torvalds's avatar
      Merge tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 82023bb7
      Linus Torvalds authored
      Pull more ACPI and power management updates from Rafael Wysocki:
      
       - ACPI-based device hotplug fixes for issues introduced recently and a
         fix for an older error code path bug in the ACPI PCI host bridge
         driver
      
       - Fix for recently broken OMAP cpufreq build from Viresh Kumar
      
       - Fix for a recent hibernation regression related to s2disk
      
       - Fix for a locking-related regression in the ACPI EC driver from
         Puneet Kumar
      
       - System suspend error code path fix related to runtime PM and runtime
         PM documentation update from Ulf Hansson
      
       - cpufreq's conservative governor fix from Xiaoguang Chen
      
       - New processor IDs for intel_idle and turbostat and removal of an
         obsolete Kconfig option from Len Brown
      
       - New device IDs for the ACPI LPSS (Low-Power Subsystem) driver and
         ACPI-based PCI hotplug (ACPIPHP) cleanup from Mika Westerberg
      
       - Removal of several ACPI video DMI blacklist entries that are not
         necessary any more from Aaron Lu
      
       - Rework of the ACPI companion representation in struct device and code
         cleanup related to that change from Rafael J Wysocki, Lan Tianyu and
         Jarkko Nikula
      
       - Fixes for assigning names to ACPI-enumerated I2C and SPI devices from
         Jarkko Nikula
      
      * tag 'pm+acpi-2-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (24 commits)
        PCI / hotplug / ACPI: Drop unused acpiphp_debug declaration
        ACPI / scan: Set flags.match_driver in acpi_bus_scan_fixed()
        ACPI / PCI root: Clear driver_data before failing enumeration
        ACPI / hotplug: Fix PCI host bridge hot removal
        ACPI / hotplug: Fix acpi_bus_get_device() return value check
        cpufreq: governor: Remove fossil comment in the cpufreq_governor_dbs()
        ACPI / video: clean up DMI table for initial black screen problem
        ACPI / EC: Ensure lock is acquired before accessing ec struct members
        PM / Hibernate: Do not crash kernel in free_basic_memory_bitmaps()
        ACPI / AC: Remove struct acpi_device pointer from struct acpi_ac
        spi: Use stable dev_name for ACPI enumerated SPI slaves
        i2c: Use stable dev_name for ACPI enumerated I2C slaves
        ACPI: Provide acpi_dev_name accessor for struct acpi_device device name
        ACPI / bind: Use (put|get)_device() on ACPI device objects too
        ACPI: Eliminate the DEVICE_ACPI_HANDLE() macro
        ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node
        cpufreq: OMAP: Fix compilation error 'r & ret undeclared'
        PM / Runtime: Fix error path for prepare
        PM / Runtime: Update documentation around probe|remove|suspend
        cpufreq: conservative: set requested_freq to policy max when it is over policy max
        ...
      82023bb7
    • Linus Torvalds's avatar
      Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma · e6d69a60
      Linus Torvalds authored
      Pull slave-dmaengine changes from Vinod Koul:
       "This brings for slave dmaengine:
      
         - Change dma notification flag to DMA_COMPLETE from DMA_SUCCESS as
           dmaengine can only transfer and not verify validaty of dma
           transfers
      
         - Bunch of fixes across drivers:
      
            - cppi41 driver fixes from Daniel
      
            - 8 channel freescale dma engine support and updated bindings from
              Hongbo
      
            - msx-dma fixes and cleanup by Markus
      
         - DMAengine updates from Dan:
      
            - Bartlomiej and Dan finalized a rework of the dma address unmap
              implementation.
      
            - In the course of testing 1/ a collection of enhancements to
              dmatest fell out.  Notably basic performance statistics, and
              fixed / enhanced test control through new module parameters
              'run', 'wait', 'noverify', and 'verbose'.  Thanks to Andriy and
              Linus [Walleij] for their review.
      
            - Testing the raid related corner cases of 1/ triggered bugs in
              the recently added 16-source operation support in the ioatdma
              driver.
      
            - Some minor fixes / cleanups to mv_xor and ioatdma"
      
      * 'next' of git://git.infradead.org/users/vkoul/slave-dma: (99 commits)
        dma: mv_xor: Fix mis-usage of mmio 'base' and 'high_base' registers
        dma: mv_xor: Remove unneeded NULL address check
        ioat: fix ioat3_irq_reinit
        ioat: kill msix_single_vector support
        raid6test: add new corner case for ioatdma driver
        ioatdma: clean up sed pool kmem_cache
        ioatdma: fix selection of 16 vs 8 source path
        ioatdma: fix sed pool selection
        ioatdma: Fix bug in selftest after removal of DMA_MEMSET.
        dmatest: verbose mode
        dmatest: convert to dmaengine_unmap_data
        dmatest: add a 'wait' parameter
        dmatest: add basic performance metrics
        dmatest: add support for skipping verification and random data setup
        dmatest: use pseudo random numbers
        dmatest: support xor-only, or pq-only channels in tests
        dmatest: restore ability to start test at module load and init
        dmatest: cleanup redundant "dmatest: " prefixes
        dmatest: replace stored results mechanism, with uniform messages
        Revert "dmatest: append verify result to results"
        ...
      e6d69a60
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 5a1efc6e
      Linus Torvalds authored
      Pull block IO fixes from Jens Axboe:
       "Normally I'd defer my initial for-linus pull request until after the
        merge window, but a race was uncovered in the virtio-blk conversion to
        blk-mq that could cause hangs.  So here's a small collection of fixes
        for you to pull:
      
         - The fix for the virtio-blk IO hang reported by Dave Chinner, from
           Shaohua and myself.
      
         - Add the Insert blktrace event for blk-mq.  This makes 'btt' happy
           when it is doing it's state transition analysis.
      
         - Ensure that blk-mq has disk/partition stats enabled by default,
           instead of making it opt-in.
      
         - A fix for __bio_add_page() and large sector counts"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq: add blktrace insert event trace
        virtio-blk: virtqueue_kick() must be ordered with other virtqueue operations
        blk-mq: ensure that we set REQ_IO_STAT so diskstats work
        bio: fix argument of __bio_add_page() for max_sectors > 0xffff
      5a1efc6e
    • Linus Torvalds's avatar
      Merge tag 'md/3.13' of git://neil.brown.name/md · 6d6e352c
      Linus Torvalds authored
      Pull md update from Neil Brown:
       "Mostly optimisations and obscure bug fixes.
         - raid5 gets less lock contention
         - raid1 gets less contention between normal-io and resync-io during
           resync"
      
      * tag 'md/3.13' of git://neil.brown.name/md:
        md/raid5: Use conf->device_lock protect changing of multi-thread resources.
        md/raid5: Before freeing old multi-thread worker, it should flush them.
        md/raid5: For stripe with R5_ReadNoMerge, we replace REQ_FLUSH with REQ_NOMERGE.
        UAPI: include <asm/byteorder.h> in linux/raid/md_p.h
        raid1: Rewrite the implementation of iobarrier.
        raid1: Add some macros to make code clearly.
        raid1: Replace raise_barrier/lower_barrier with freeze_array/unfreeze_array when reconfiguring the array.
        raid1: Add a field array_frozen to indicate whether raid in freeze state.
        md: Convert use of typedef ctl_table to struct ctl_table
        md/raid5: avoid deadlock when raid5 array has unack badblocks during md_stop_writes.
        md: use MD_RECOVERY_INTR instead of kthread_should_stop in resync thread.
        md: fix some places where mddev_lock return value is not checked.
        raid5: Retry R5_ReadNoMerge flag when hit a read error.
        raid5: relieve lock contention in get_active_stripe()
        raid5: relieve lock contention in get_active_stripe()
        wait: add wait_event_cmd()
        md/raid5.c: add proper locking to error path of raid5_start_reshape.
        md: fix calculation of stacking limits on level change.
        raid5: Use slow_path to release stripe when mddev->thread is null
      6d6e352c
    • Chen Gang's avatar
      avr32: uapi: be sure of "_UAPI" prefix for all guard macros · e7f2c8c1
      Chen Gang authored
      For all uapi headers, need use "_UAPI" prefix for its guard macro
      (which will be stripped by "scripts/headers_installer.sh").
      
      Also remove redundant files (bitsperlong.h, errno.h, fcntl.h, ioctl.h,
      ioctls.h, ipcbuf.h, kvm_para.h, mman.h, poll.h, resource.h, siginfo.h,
      statfs.h, and unistd.h) which are already in Kbuild.
      
      Also be sure that all "#endif" only have one empty line above, and each
      file has guard macro.
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Signed-off-by: default avatarHans-Christian Egtvedt <hegtvedt@cisco.com>
      e7f2c8c1
    • Eirik Aanonsen's avatar
      avr32: add kprobe_ctlblk memory struct · dbc0d691
      Eirik Aanonsen authored
      This re-enables kprobes on AVR32 architecture.
      Signed-off-by: default avatarEirik Aanonsen <eaa@wprmedical.com>
      Signed-off-by: default avatarHans-Christian Egtvedt <egtvedt@samfundet.no>
      dbc0d691
    • Andreas Bießmann's avatar
      avr32: fix out-of-range jump in large kernels · d617b338
      Andreas Bießmann authored
      This patch fixes following error (for big kernels):
      
      ---8<---
      arch/avr32/boot/u-boot/head.o: In function `no_tag_table':
      (.init.text+0x44): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o
      arch/avr32/kernel/built-in.o: In function `bad_return':
      (.ex.text+0x236): relocation truncated to fit: R_AVR32_22H_PCREL against symbol `panic' defined in .text.unlikely section in kernel/built-in.o
      --->8---
      
      It comes up when the kernel increases and 'panic()' is too far away to fit in
      the +/- 2MiB range. Which in turn issues from the 21-bit displacement in
      'br{cond4}' mnemonic which is one of the two ways to do jumps (rjmp has just
      10-bit displacement and therefore a way smaller range). This fact was stated
      before in 8d29b7b9.
      One solution to solve this is to add a local storage for the symbol address
      and just load the $pc with that value.
      Signed-off-by: default avatarAndreas Bießmann <andreas@biessmann.de>
      Acked-by: default avatarHans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: stable@vger.kernel.org
      d617b338
    • Andreas Bießmann's avatar
      avr32: setup crt for early panic() · 7a2a74f4
      Andreas Bießmann authored
      Before the CRT was (fully) set up in kernel_entry (bss cleared before in
      _start, but also not before jump to panic() in no_tag_table case).
      
      This patch fixes this up to have a fully working CRT when branching to panic()
      in no_tag_table.
      Signed-off-by: default avatarAndreas Bießmann <andreas@biessmann.de>
      Acked-by: default avatarHans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: stable@vger.kernel.org
      7a2a74f4
    • Phillip Lougher's avatar
      Squashfs: Check stream is not NULL in decompressor_multi.c · ed4f381e
      Phillip Lougher authored
      Fix static checker complaint that stream is not checked in
      squashfs_decompressor_destroy().
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      ed4f381e
    • Phillip Lougher's avatar
      Squashfs: Directly decompress into the page cache for file data · 0d455c12
      Phillip Lougher authored
      This introduces an implementation of squashfs_readpage_block()
      that directly decompresses into the page cache.
      
      This uses the previously added page handler abstraction to push
      down the necessary kmap_atomic/kunmap_atomic operations on the
      page cache buffers into the decompressors.  This enables
      direct copying into the page cache without using the slow
      kmap/kunmap calls.
      
      The code detects when multiple threads are racing in
      squashfs_readpage() to decompress the same block, and avoids
      this regression by falling back to using an intermediate
      buffer.
      
      This patch enhances the performance of Squashfs significantly
      when multiple processes are accessing the filesystem simultaneously
      because it not only reduces memcopying, but it more importantly
      eliminates the lock contention on the intermediate buffer.
      
      Using single-thread decompression.
      
              dd if=file1 of=/dev/null bs=4096 &
              dd if=file2 of=/dev/null bs=4096 &
              dd if=file3 of=/dev/null bs=4096 &
              dd if=file4 of=/dev/null bs=4096
      
      Before:
      
      629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s
      
      After:
      
      629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      0d455c12
    • Phillip Lougher's avatar
      Squashfs: Restructure squashfs_readpage() · 5f55dbc0
      Phillip Lougher authored
      Restructure squashfs_readpage() splitting it into separate
      functions for datablocks, fragments and sparse blocks.
      
      Move the memcpying (from squashfs cache entry) implementation of
      squashfs_readpage_block into file_cache.c
      
      This allows different implementations to be supported.
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      5f55dbc0
    • Phillip Lougher's avatar
      Squashfs: Generalise paging handling in the decompressors · 846b730e
      Phillip Lougher authored
      Further generalise the decompressors by adding a page handler
      abstraction.  This adds helpers to allow the decompressors
      to access and process the output buffers in an implementation
      independant manner.
      
      This allows different types of output buffer to be passed
      to the decompressors, with the implementation specific
      aspects handled at decompression time, but without the
      knowledge being held in the decompressor wrapper code.
      
      This will allow the decompressors to handle Squashfs
      cache buffers, and page cache pages.
      
      This patch adds the abstraction and an implementation for
      the caches.
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      846b730e
    • Phillip Lougher's avatar
      Squashfs: add multi-threaded decompression using percpu variable · d208383d
      Phillip Lougher authored
      Add a multi-threaded decompression implementation which uses
      percpu variables.
      
      Using percpu variables has advantages and disadvantages over
      implementations which do not use percpu variables.
      
      Advantages:
        * the nature of percpu variables ensures decompression is
          load-balanced across the multiple cores.
        * simplicity.
      
      Disadvantages: it limits decompression to one thread per core.
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      d208383d
    • Minchan Kim's avatar
      squashfs: Enhance parallel I/O · cd59c2ec
      Minchan Kim authored
      Now squashfs have used for only one stream buffer for decompression
      so it hurts parallel read performance so this patch supports
      multiple decompressor to enhance performance parallel I/O.
      
      Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.
      
      dd if=test/test1.dat of=/dev/null &
      dd if=test/test2.dat of=/dev/null &
      dd if=test/test3.dat of=/dev/null &
      dd if=test/test4.dat of=/dev/null &
      
      old : 1m39s -> new : 9s
      
      * From v1
        * Change comp_strm with decomp_strm - Phillip
        * Change/add comments - Phillip
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      cd59c2ec
    • Phillip Lougher's avatar
      Squashfs: Refactor decompressor interface and code · 9508c6b9
      Phillip Lougher authored
      The decompressor interface and code was written from
      the point of view of single-threaded operation.  In doing
      so it mixed a lot of single-threaded implementation specific
      aspects into the decompressor code and elsewhere which makes it
      difficult to seamlessly support multiple different decompressor
      implementations.
      
      This patch does the following:
      
      1.  It removes compressor_options parsing from the decompressor
          init() function.  This allows the decompressor init() function
          to be dynamically called to instantiate multiple decompressors,
          without the compressor options needing to be read and parsed each
          time.
      
      2.  It moves threading and all sleeping operations out of the
          decompressors.  In doing so, it makes the decompressors
          non-blocking wrappers which only deal with interfacing with
          the decompressor implementation.
      
      3. It splits decompressor.[ch] into decompressor generic functions
         in decompressor.[ch], and moves the single threaded
         decompressor implementation into decompressor_single.c.
      
      The result of this patch is Squashfs should now be able to
      support multiple decompressors by adding new decompressor_xxx.c
      files with specialised implementations of the functions in
      decompressor_single.c
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: default avatarMinchan Kim <minchan@kernel.org>
      9508c6b9
    • Jens Axboe's avatar
      blk-mq: add blktrace insert event trace · 01b983c9
      Jens Axboe authored
      We need it to make 'btt' from blktrace happy, otherwise
      we are missing one state transition.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      01b983c9