1. 25 Jun, 2014 6 commits
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · d91d66e8
      Linus Torvalds authored
      Pull powerpc fixes and cleanups from Ben Herrenschmidt:
       "Here are a handful or two of powerpc fixes and simple/trivial
        cleanups.  A bunch of them fix ftrace with the new ABI v2 in Little
        Endian, the rest is a scattering of fairly simple things"
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc: Don't skip ePAPR spin-table CPUs
        powerpc/module: Fix TOC symbol CRC
        powerpc/powernv: Remove OPAL v1 takeover
        powerpc/kmemleak: Do not scan the DART table
        selftests/powerpc: Use the test harness for the TM DSCR test
        powerpc/cell: cbe_thermal.c: Cleaning up a variable is of the wrong type
        powerpc/kprobes: Fix jprobes on ABI v2 (LE)
        powerpc/ftrace: Use pr_fmt() to namespace error messages
        powerpc/ftrace: Fix nop of modules on 64bit LE (ABIv2)
        powerpc/ftrace: Fix inverted check of create_branch()
        powerpc/ftrace: Fix typo in mask of opcode
        powerpc: Add ppc_global_function_entry()
        powerpc/macintosh/smu.c: Fix closing brace followed by if
        powerpc: Remove __arch_swab*
        powerpc: Remove ancient DEBUG_SIG code
        powerpc/kerenl: Enable EEH for IO accessors
      d91d66e8
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 07f4695c
      Linus Torvalds authored
      Pull vhost cleanups from Michael S Tsirkin:
       "Two cleanup patches removing code duplication that got introduced by
        changes in rc1.  Not fixing crashes, but I'd rather not carry the
        duplicate code until the next merge window"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vhost-scsi: don't open-code kvfree
        vhost-net: don't open-code kvfree
      07f4695c
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v3.16-rc1-v2' of... · b8e46d22
      Linus Torvalds authored
      Merge tag 'trace-fixes-v3.16-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull tracing cleanups and fixes from Steven Rostedt:
       "This includes three patches from Oleg Nesterov.  The first is a fix to
        a race condition that happens between enabling/disabling syscall
        tracepoints and new process creations (the check to go into the ptrace
        path for a process can be set when it shouldn't, or not set when it
        should).  Not a major bug but one that should be fixed and even
        applied to stable.
      
        The other two patches are cleanup/fixes that are not that critical,
        but for an -rc1 release would be nice to have.  They both deal with
        syscall tracepoints.
      
        It also includes a patch to introduce a new macro for the
        TRACE_EVENT() format called __field_struct().  Originally, __field()
        was used to record any variable into a trace event, but with the
        addition of setting the "is signed" attribute, the check causes
        anything but a primitive variable to fail to compile.  That is,
        structs and unions can't be used as they once were.  When the "is
        signed" check was introduce there were only primitive variables being
        recorded.  But that will change soon and it was reported that
        __field() causes build failures.
      
        To solve the __field() issue, __field_struct() is introduced to allow
        trace_events to be able to record complex types too"
      
      * tag 'trace-fixes-v3.16-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Add __field_struct macro for TRACE_EVENT()
        tracing: syscall_regfunc() should not skip kernel threads
        tracing: Change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()
        tracing: Fix syscall_*regfunc() vs copy_process() race
      b8e46d22
    • Scott Wood's avatar
      powerpc: Don't skip ePAPR spin-table CPUs · 6663a4fa
      Scott Wood authored
      Commit 59a53afe "powerpc: Don't setup
      CPUs with bad status" broke ePAPR SMP booting.  ePAPR says that CPUs
      that aren't presently running shall have status of disabled, with
      enable-method being used to determine whether the CPU can be enabled.
      
      Fix by checking for spin-table, which is currently the only supported
      enable-method.
      Signed-off-by: default avatarScott Wood <scottwood@freescale.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Emil Medve <Emilian.Medve@Freescale.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6663a4fa
    • Laurent Dufour's avatar
      powerpc/module: Fix TOC symbol CRC · c2cbcf53
      Laurent Dufour authored
      The commit 71ec7c55 introduced the magic symbol ".TOC." for ELFv2 ABI.
      This symbol is built manually and has no CRC value computed. A zero value
      is put in the CRC section to avoid modpost complaining about a missing CRC.
      Unfortunately, this breaks the kernel module loading when the kernel is
      relocated (kdump case for instance) because of the relocation applied to
      the kcrctab values.
      
      This patch compute a CRC value for the TOC symbol which will match the one
      compute by the kernel when it is relocated - aka '0 - relocate_start' done in
      maybe_relocated called by check_version (module.c).
      Signed-off-by: default avatarLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Anton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c2cbcf53
    • Michael Ellerman's avatar
      powerpc/powernv: Remove OPAL v1 takeover · e2500be2
      Michael Ellerman authored
      In commit 27f44888 "Add OPAL takeover from PowerVM" we added support
      for "takeover" on OPAL v1 machines.
      
      This was a mode of operation where we would boot under pHyp, and query
      for the presence of OPAL. If detected we would then do a special
      sequence to take over the machine, and the kernel would end up running
      in hypervisor mode.
      
      OPAL v1 was never a supported product, and was never shipped outside
      IBM. As far as we know no one is still using it.
      
      Newer versions of OPAL do not use the takeover mechanism. Although the
      query for OPAL should be harmless on machines with newer OPAL, we have
      seen a machine where it causes a crash in Open Firmware.
      
      The code in early_init_devtree() to copy boot_command_line into cmd_line
      was added in commit 817c21ad "Get kernel command line accross OPAL
      takeover", and AFAIK is only used by takeover, so should also be
      removed.
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e2500be2
  2. 24 Jun, 2014 18 commits
  3. 23 Jun, 2014 16 commits
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew Morton) · 04b5da4a
      Linus Torvalds authored
      Merge fixes from Andrew Morton:
       "The nmi patch and watchdog patch aren't actually fixes - they're
        features which needed a few last-minutes touchups.
      
        Otherwise, a rather large batch of fixes - ocfs2 review takes a while
        and I got distracted and missed last week's batch"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
        ocfs2/dlm: do not purge lockres that is queued for assert master
        ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount
        ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod
        ocfs2: fix a tiny race when running dirop_fileop_racer
        ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()
        ocfs2: refcount: take rw_lock in ocfs2_reflink
        ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously"
        ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn
        ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()
        mm: fix crashes from mbind() merging vmas
        checkpatch: reduce false positives when checking void function return statements
        ia64: arch/ia64/include/uapi/asm/fcntl.h needs personality.h
        DMA, CMA: fix possible memory leak
        slab: fix oops when reading /proc/slab_allocators
        shmem: fix faulting into a hole while it's punched
        mm: let mm_find_pmd fix buggy race with THP fault
        mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()
        kernel/watchdog.c: print traces for all cpus on lockup detection
        nmi: provide the option to issue an NMI back trace to every cpu but current
        Documentation/accounting/getdelays.c: add missing null-terminate after strncpy call
        ...
      04b5da4a
    • Xue jiufei's avatar
      ocfs2/dlm: do not purge lockres that is queued for assert master · ac4fef4d
      Xue jiufei authored
      When workqueue is delayed, it may occur that a lockres is purged while it
      is still queued for master assert.  it may trigger BUG() as follows.
      
      N1                                         N2
      dlm_get_lockres()
      ->dlm_do_master_requery
                                        is the master of lockres,
                                        so queue assert_master work
      
                                        dlm_thread() start running
                                        and purge the lockres
      
                                        dlm_assert_master_worker()
                                        send assert master message
                                        to other nodes
      receiving the assert_master
      message, set master to N2
      
      dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID,
      if it is RECOVERY lockres, it triggers the BUG().
      
      Another BUG() is triggered when N3 become the new master and send
      assert_master to N1, N1 will trigger the BUG() because owner doesn't
      match.  So we should not purge lockres when it is queued for assert
      master.
      Signed-off-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac4fef4d
    • jiangyiwen's avatar
      ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount · b9aaac5a
      jiangyiwen authored
      The following case may lead to endless loop during umount.
      
      node A         node B               node C       node D
      umount volume,
      migrate lockres1
      to B
                                                       want to lock lockres1,
                                                       send
                                                       MASTER_REQUEST_MSG
                                                       to C
                                          init block mle
                     send
                     MIGRATE_REQUEST_MSG
                     to C
                                          find a block
                                          mle, and then
                                          return
                                          DLM_MIGRATE_RESPONSE_MASTERY_REF
                                          to B
                     set C in refmap
                                          umount successfully
                     try to umount, endless
                     loop occurs when migrate
                     lockres1 since C is in
                     refmap
      
      So we can fix this endless loop case by only returning
      DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
      MIGRATE_REQUEST_MSG.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarjiangyiwen <jiangyiwen@huawei.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Xue jiufei <xuejiufei@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9aaac5a
    • jiangyiwen's avatar
      ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod · 595297a8
      jiangyiwen authored
      When the call to ocfs2_add_entry() failed in ocfs2_symlink() and
      ocfs2_mknod(), iput() will not be called during dput(dentry) because no
      d_instantiate(), and this will lead to umount hung.
      Signed-off-by: default avatarjiangyiwen <jiangyiwen@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      595297a8
    • Yiwen Jiang's avatar
      ocfs2: fix a tiny race when running dirop_fileop_racer · f7a14f32
      Yiwen Jiang authored
      When running dirop_fileop_racer we found a dead lock case.
      
      2 nodes, say Node A and Node B, mount the same ocfs2 volume.  Create
      /race/16/1 in the filesystem, and let the inode number of dir 16 is less
      than the inode number of dir race.
      
      Node A                            Node B
      mv /race/16/1 /race/
                                        right after Node A has got the
                                        EX mode of /race/16/, and tries to
                                        get EX mode of /race
                                        ls /race/16/
      
      In this case, Node A has got the EX mode of /race/16/, and wants to get EX
      mode of /race/.  Node B has got the PR mode of /race/, and wants to get
      the PR mode of /race/16/.  Since EX and PR are mutually exclusive, dead
      lock happens.
      
      This patch fixes this case by locking in ancestor order before trying
      inode number order.
      Signed-off-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f7a14f32
    • Xue jiufei's avatar
      ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list() · a270c6d3
      Xue jiufei authored
      When a lockres in purge list but is still in use, it should be moved to
      the tail of purge list.  dlm_thread will continue to check next lockres in
      purge list.  However, code list_move_tail(&dlm->purge_list,
      &lockres->purge) will do *no* movements, so dlm_thread will purge the same
      lockres in this loop again and again.  If it is in use for a long time,
      other lockres will not be processed.
      Signed-off-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
      Signed-off-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a270c6d3
    • Wengang Wang's avatar
      ocfs2: refcount: take rw_lock in ocfs2_reflink · 8a8ad1c2
      Wengang Wang authored
      This patch tries to fix this crash:
      
       #5 [ffff88003c1cd690] do_invalid_op at ffffffff810166d5
       #6 [ffff88003c1cd730] invalid_op at ffffffff8159b2de
          [exception RIP: ocfs2_direct_IO_get_blocks+359]
          RIP: ffffffffa05dfa27  RSP: ffff88003c1cd7e8  RFLAGS: 00010202
          RAX: 0000000000000000  RBX: ffff88003c1cdaa8  RCX: 0000000000000000
          RDX: 000000000000000c  RSI: ffff880027a95000  RDI: ffff88003c79b540
          RBP: ffff88003c1cd858   R8: 0000000000000000   R9: ffffffff815f6ba0
          R10: 00000000000001c9  R11: 00000000000001c9  R12: ffff88002d271500
          R13: 0000000000000001  R14: 0000000000000000  R15: 0000000000001000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ffff88003c1cd860] do_direct_IO at ffffffff811cd31b
       #8 [ffff88003c1cd950] direct_IO_iovec at ffffffff811cde9c
       #9 [ffff88003c1cd9b0] do_blockdev_direct_IO at ffffffff811ce764
      #10 [ffff88003c1cdb80] __blockdev_direct_IO at ffffffff811ce7cc
      #11 [ffff88003c1cdbb0] ocfs2_direct_IO at ffffffffa05df756 [ocfs2]
      #12 [ffff88003c1cdbe0] generic_file_direct_write_iter at ffffffff8112f935
      #13 [ffff88003c1cdc40] ocfs2_file_write_iter at ffffffffa0600ccc [ocfs2]
      #14 [ffff88003c1cdd50] do_aio_write at ffffffff8119126c
      #15 [ffff88003c1cddc0] aio_rw_vect_retry at ffffffff811d9bb4
      #16 [ffff88003c1cddf0] aio_run_iocb at ffffffff811db880
      #17 [ffff88003c1cde30] io_submit_one at ffffffff811dc238
      #18 [ffff88003c1cde80] do_io_submit at ffffffff811dc437
      #19 [ffff88003c1cdf70] sys_io_submit at ffffffff811dc530
      #20 [ffff88003c1cdf80] system_call_fastpath at ffffffff8159a159
      
      It crashes at
              BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED));
      in ocfs2_direct_IO_get_blocks.
      
      ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in
      ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken
      during the time before (or inside) ocfs2_prepare_inode_for_write() and after
      ocfs2_direct_IO_get_blocks().
      
      It can happen in this case:
      
      Node A(which crashes)				Node B
      ------------------------                 ---------------------------
      ocfs2_file_aio_write
        ocfs2_prepare_inode_for_write
          ocfs2_inode_lock
          ...
          ocfs2_inode_unlock
        #no refcount found
      ....					ocfs2_reflink
                                                ocfs2_inode_lock
                                                ...
                                                ocfs2_inode_unlock
                                                #now, refcount flag set on extent
      
                                              ...
                                              flush change to disk
      
      ocfs2_direct_IO_get_blocks
        ocfs2_get_clusters
          #extent map miss
          #buffer_head miss
          read extents from disk
        found refcount flag on extent
        crash..
      
      Fix:
      Take rw_lock in ocfs2_reflink path
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a8ad1c2
    • Xue jiufei's avatar
      ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously" · b253bfd8
      Xue jiufei authored
      75f82eaa ("ocfs2: fix NULL pointer dereference when dismount and
      ocfs2rec simultaneously") may cause umount hang while shutting down
      truncate log.
      
      The situation is as followes:
      ocfs2_dismout_volume
      -> ocfs2_recovery_exit
        -> free osb->recovery_map
      -> ocfs2_truncate_shutdown
        -> lock global bitmap inode
          -> ocfs2_wait_for_recovery
                -> check whether osb->recovery_map->rm_used is zero
      
      Because osb->recovery_map is already freed, rm_used can be any other
      values, so it may yield umount hang.
      Signed-off-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b253bfd8
    • Tariq Saeed's avatar
      ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and... · 27bf6305
      Tariq Saeed authored
      ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn
      
      Orabug: 18639535
      
      Two node cluster and both nodes hold a lock at PR level and both want to
      convert to EX at the same time.  Master node 1 has sent BAST and then
      closes the connection due to idletime out.  Node 0 receives BAST, sends
      unlock req with cancel flag but gets error -ENOTCONN.  The problem is
      this error is ignored in dlm_send_remote_unlock_request() on the
      **incorrect** assumption that the master is dead.  See NOTE in comment
      why it returns DLM_NORMAL.  Upon getting DLM_NORMAL, node 0 proceeds to
      sends convert (without cancel flg) which fails with -ENOTCONN.  waits 5
      sec and resends.
      
      This time gets DLM_IVLOCKID from the master since lock not found in
      grant, it had been moved to converting queue in response to conv PR->EX
      req.  No way out.
      
      Node 1 (master)				Node 0
      ==============				======
      
        lock mode PR				PR
      
        convert PR -> EX
        mv grant -> convert and que BAST
        ...
                           <-------- convert PR -> EX
        convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX))
        ...
                              BAST (want PR -> NL)
                           ------------------>
        ...
        idle timout, conn closed
                                      ...
                                      In response to BAST,
                                      sends unlock with cancel convert flag
                                      gets -ENOTCONN. Ignores and
                                      sends remote convert request
                                      gets -ENOTCONN, waits 5 Sec, retries
        ...
        reconnects
                         <----------------- convert req goes through on next try
        does not find lock on grant que
                         status DLM_IVLOCKID
                         ------------------>
        ...
      
      No way out.  Fix is to keep retrying unlock with cancel flag until it
      succeeds or the master dies.
      Signed-off-by: default avatarTariq Saeed <tariq.x.saeed@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      27bf6305
    • alex chen's avatar
      ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename() · 5fb1beb0
      alex chen authored
      There are two files a and b in dir /mnt/ocfs2.
      
          node A                           node B
      
        mv a b
        In ocfs2_rename(), after calling
        ocfs2_orphan_add(), the inode of
        file b will be added into orphan
        dir.
      
        If ocfs2_update_entry() fails,
        ocfs2_rename return error and mv
        operation fails. But file b still
        exists in the parent dir.
      
        ocfs2_queue_orphan_scan
         -> ocfs2_queue_recovery_completion
         -> ocfs2_complete_recovery
         -> ocfs2_recover_orphans
        The inode of the file b will be
        put with iput().
      
        ocfs2_evict_inode
         -> ocfs2_delete_inode
         -> ocfs2_wipe_inode
         -> ocfs2_remove_inode
        OCFS2_VALID_FL in the inode
        i_flags will be cleared.
      
                                         The file b still can be accessed
                                         on node B.
                                         ls /mnt/ocfs2
                                         When first read the file b with
                                         ocfs2_read_inode_block(). It will
                                         validate the inode using
                                         ocfs2_validate_inode_block().
                                         Because OCFS2_VALID_FL not set in
                                         the inode i_flags, so the file
                                         system will be readonly.
      
      So we should add inode into orphan dir after updating entry in
      ocfs2_rename().
      Signed-off-by: default avataralex.chen <alex.chen@huawei.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5fb1beb0
    • Hugh Dickins's avatar
      mm: fix crashes from mbind() merging vmas · d05f0cdc
      Hugh Dickins authored
      In v2.6.34 commit 9d8cebd4 ("mm: fix mbind vma merge problem")
      introduced vma merging to mbind(), but it should have also changed the
      convention of passing start vma from queue_pages_range() (formerly
      check_range()) to new_vma_page(): vma merging may have already freed
      that structure, resulting in BUG at mm/mempolicy.c:1738 and probably
      worse crashes.
      
      Fixes: 9d8cebd4 ("mm: fix mbind vma merge problem")
      Reported-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Tested-by: default avatarNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: <stable@vger.kernel.org>	[2.6.34+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d05f0cdc
    • Joe Perches's avatar
      checkpatch: reduce false positives when checking void function return statements · b43ae21b
      Joe Perches authored
      The previous patch had a few too many false positives on styles that
      should be acceptable.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Tested-by: default avatarAnish Bhatt <anish@chelsio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b43ae21b
    • Andrew Morton's avatar
      ia64: arch/ia64/include/uapi/asm/fcntl.h needs personality.h · f9af420f
      Andrew Morton authored
      fs/notify/fanotify/fanotify_user.c: In function 'SYSC_fanotify_init':
      fs/notify/fanotify/fanotify_user.c:726: error: implicit declaration of function 'personality'
      fs/notify/fanotify/fanotify_user.c:726: error: 'PER_LINUX32' undeclared (first use in this function)
      fs/notify/fanotify/fanotify_user.c:726: error: (Each undeclared identifier is reported only once
      fs/notify/fanotify/fanotify_user.c:726: error: for each function it appears in.)
      Reported-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Will Woods <wwoods@redhat.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>	[3.15.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f9af420f
    • Joonsoo Kim's avatar
      DMA, CMA: fix possible memory leak · fe8eea4f
      Joonsoo Kim authored
      We should free memory for bitmap when we find zone mismatch, otherwise
      this memory will leak.
      
      Additionally, I copy code comment from PPC KVM's CMA code to inform why
      we need to check zone mis-match.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Reviewed-by: default avatarMichal Nazarewicz <mina86@mina86.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fe8eea4f
    • Joonsoo Kim's avatar
      slab: fix oops when reading /proc/slab_allocators · 03787301
      Joonsoo Kim authored
      Commit b1cb0982 ("change the management method of free objects of
      the slab") introduced a bug on slab leak detector
      ('/proc/slab_allocators').  This detector works like as following
      decription.
      
       1. traverse all objects on all the slabs.
       2. determine whether it is active or not.
       3. if active, print who allocate this object.
      
      but that commit changed the way how to manage free objects, so the logic
      determining whether it is active or not is also changed.  In before, we
      regard object in cpu caches as inactive one, but, with this commit, we
      mistakenly regard object in cpu caches as active one.
      
      This intoduces kernel oops if DEBUG_PAGEALLOC is enabled.  If
      DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who
      corrupt free memory in the slab.  It unmaps page table mapping if object
      is free and map it if object is active.  When slab leak detector check
      object in cpu caches, it mistakenly think this object active so try to
      access object memory to retrieve caller of allocation.  At this point,
      page table mapping to this object doesn't exist, so oops occurs.
      
      Following is oops message reported from Dave.
      
      It blew up when something tried to read /proc/slab_allocators
      (Just cat it, and you should see the oops below)
      
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in:
        [snip...]
        CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131
        task: ffff8801aa46e890 ti: ffff880076924000 task.ti: ffff880076924000
        RIP: 0010:[<ffffffffaa1a8f4a>]  [<ffffffffaa1a8f4a>] handle_slab+0x8a/0x180
        RSP: 0018:ffff880076925de0  EFLAGS: 00010002
        RAX: 0000000000001000 RBX: 0000000000000000 RCX: 000000005ce85ce7
        RDX: ffffea00079be100 RSI: 0000000000001000 RDI: ffff880107458000
        RBP: ffff880076925e18 R08: 0000000000000001 R09: 0000000000000000
        R10: 0000000000000000 R11: 000000000000000f R12: ffff8801e6f84000
        R13: ffffea00079be100 R14: ffff880107458000 R15: ffff88022bb8d2c0
        FS:  00007fb769e45740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: ffff8801e6f84ff8 CR3: 00000000a22db000 CR4: 00000000001407e0
        DR0: 0000000002695000 DR1: 0000000002695000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602
        Call Trace:
          leaks_show+0xce/0x240
          seq_read+0x28e/0x490
          proc_reg_read+0x3d/0x80
          vfs_read+0x9b/0x160
          SyS_read+0x58/0xb0
          tracesys+0xd4/0xd9
        Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46
        RIP   handle_slab+0x8a/0x180
      
      To fix the problem, I introduce an object status buffer on each slab.
      With this, we can track object status precisely, so slab leak detector
      would not access active object and no kernel oops would occur.  Memory
      overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK
      which is mainly used for debugging, so memory overhead isn't big
      problem.
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03787301
    • Hugh Dickins's avatar
      shmem: fix faulting into a hole while it's punched · f00cdc6d
      Hugh Dickins authored
      Trinity finds that mmap access to a hole while it's punched from shmem
      can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
      from completing, until the reader chooses to stop; with the puncher's
      hold on i_mutex locking out all other writers until it can complete.
      
      It appears that the tmpfs fault path is too light in comparison with its
      hole-punching path, lacking an i_data_sem to obstruct it; but we don't
      want to slow down the common case.
      
      Extend shmem_fallocate()'s existing range notification mechanism, so
      shmem_fault() can refrain from faulting pages into the hole while it's
      punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
      faulting when not).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Tested-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f00cdc6d