1. 03 Apr, 2014 33 commits
    • Goldwyn Rodrigues's avatar
      ocfs2: add dlm_recover_callback_support in sysfs · 765aabbb
      Goldwyn Rodrigues authored
      This is a part of the nocontrold feature which was incorporated sometime
      back.
      
      This is required for backward compatibility of the tools, specifically
      the scenario where the tools with recovery callback is used with a
      kernel not using the recovery callbacks (older kernel + newer tools).
      The tools look for this file to understand if the kernel supports DLM
      recovery callbacks.
      
      For kernels which support recovery callbacks but will miss this patch,
      ocfs2 will continue to use the older API and would still be able to
      mount the filesystem.
      
      [akpm@linux-foundation.org: simplify]
      [sfr@canb.auug.org.au: VERIFY_OCTAL_PERMISSIONS fix up]
      Signed-off-by: default avatarGoldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      765aabbb
    • Junxiao Bi's avatar
      ocfs2: dlm: fix recovery hung · ded2cf71
      Junxiao Bi authored
      There is a race window in dlm_do_recovery() between dlm_remaster_locks()
      and dlm_reset_recovery() when the recovery master nearly finish the
      recovery process for a dead node.  After the master sends FINALIZE_RECO
      message in dlm_remaster_locks(), another node may become the recovery
      master for another dead node, and then send the BEGIN_RECO message to
      all the nodes included the old master, in the handler of this message
      dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
      dlm->reco.new_master will be set to the second dead node and the new
      master, then in dlm_reset_recovery(), these two variables will be reset
      to default value.  This will cause new recovery master can not finish
      the recovery process and hung, at last the whole cluster will hung for
      recovery.
      
      old recovery master:                                 new recovery master:
      dlm_remaster_locks()
                                                        become recovery master for
                                                        another dead node.
                                                        dlm_send_begin_reco_message()
      dlm_begin_reco_handler()
      {
       if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
        return -EAGAIN;
       }
       dlm_set_reco_master(dlm, br->node_idx);
       dlm_set_reco_dead_node(dlm, br->dead_node);
      }
      dlm_reset_recovery()
      {
       dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
       dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
      }
                                                        will hang in dlm_remaster_locks() for
                                                        request dlm locks info
      
      Before send FINALIZE_RECO message, recovery master should set
      DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
      this can break the race windows as the BEGIN_RECO messages will not be
      handled before DLM_RECO_STATE_FINALIZE flag is cleared.
      
      A similar race may happen between new recovery master and normal node
      which is in dlm_finalize_reco_handler(), also fix it.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Reviewed-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ded2cf71
    • Junxiao Bi's avatar
      ocfs2: dlm: fix lock migration crash · 34aa8dac
      Junxiao Bi authored
      This issue was introduced by commit 800deef3 ("ocfs2: use
      list_for_each_entry where benefical") in 2007 where it replaced
      list_for_each with list_for_each_entry.  The variable "lock" will point
      to invalid data if "tmpq" list is empty and a panic will be triggered
      due to this.  Sunil advised reverting it back, but the old version was
      also not right.  At the end of the outer for loop, that
      list_for_each_entry will also set "lock" to an invalid data, then in the
      next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
      data and cause the panic.  So reverting the list_for_each back and reset
      "lock" to NULL to fix this issue.
      
      Another concern is that this seemes can not happen because the "tmpq"
      list should not be empty.  Let me describe how.
      
      old lock resource owner(node 1):                                  migratation target(node 2):
      image there's lockres with a EX lock from node 2 in
      granted list, a NR lock from node x with convert_type
      EX in converting list.
      dlm_empty_lockres() {
       dlm_pick_migration_target() {
         pick node 2 as target as its lock is the first one
         in granted list.
       }
       dlm_migrate_lockres() {
         dlm_mark_lockres_migrating() {
           res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
           wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
      	 //after the above code, we can not dirty lockres any more,
           // so dlm_thread shuffle list will not run
                                                                         downconvert lock from EX to NR
                                                                         upconvert lock from NR to EX
      <<< migration may schedule out here, then
      <<< node 2 send down convert request to convert type from EX to
      <<< NR, then send up convert request to convert type from NR to
      <<< EX, at this time, lockres granted list is empty, and two locks
      <<< in the converting list, node x up convert lock followed by
      <<< node 2 up convert lock.
      
      	 // will set lockres RES_MIGRATING flag, the following
      	 // lock/unlock can not run
           dlm_lockres_release_ast(dlm, res);
         }
      
         dlm_send_one_lockres()
                                                                       dlm_process_recovery_data()
                                                                         for (i=0; i<mres->num_locks; i++)
                                                                           if (ml->node == dlm->node_num)
                                                                             for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
                                                                              list_for_each_entry(lock, tmpq, list)
                                                                              if (lock) break; <<< lock is invalid as grant list is empty.
                                                                             }
                                                                             if (lock->ml.node != ml->node)
                                                                               BUG() >>> crash here
       }
      
      I see the above locks status from a vmcore of our internal bug.
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Reviewed-by: default avatarSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      34aa8dac
    • Darrick J. Wong's avatar
      ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file · 2931cdcb
      Darrick J. Wong authored
      Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
      transaction to complete.  This isn't terribly efficient, since sync_file
      really only needs to wait for the last transaction involving that inode
      to complete, and this doesn't require i_mutex.
      
      Therefore, implement the necessary bits to track the newest tid
      associated with an inode, and teach sync_file to wait for that instead
      of waiting for everything in the journal to commit.  Furthermore, only
      issue the flush request to the drive if jbd2 hasn't already done so.
      
      This also eliminates the deadlock between ocfs2_file_aio_write() and
      ocfs2_sync_file().  aio_write takes i_mutex then calls
      ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
      However, if that dio completion involves calling fsync, then we can get
      into trouble when some ocfs2_sync_file tries to take i_mutex.
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2931cdcb
    • joyce.xue's avatar
      ocfs2: remove unused variable uuid_net_key in ocfs2_initialize_super · a75fe48c
      joyce.xue authored
      Variable uuid_net_key in ocfs2_initialize_super() is not used.  Clean it
      up.
      Signed-off-by: default avatarjoyce.xue <xuejiufei@huawei.com>
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Acked-by: default avatarMark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a75fe48c
    • Wengang Wang's avatar
      ocfs2: change ip_unaligned_aio to of type mutex from atomit_t · c18ceab0
      Wengang Wang authored
      There is a problem that waitqueue_active() may check stale data thus miss
      a wakeup of threads waiting on ip_unaligned_aio.
      
      The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to
      be of type mutex thus the above prolem is avoid.  Another benifit is that
      mutex which works as FIFO is fairer than wake_up_all().
      Signed-off-by: default avatarWengang Wang <wen.gang.wang@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c18ceab0
    • Zongxun Wang's avatar
      ocfs2: fix null pointer dereference when access dlm_state before launching dlm thread · 181a9a04
      Zongxun Wang authored
      When mounting an ocfs2 volume, it will firstly generate a file
      /sys/kernel/debug/o2dlm/<uuid>/dlm_state, and then launch the dlm thread.
      So the following situation will cause a null pointer dereference.
      dlm_debug_init -> access file dlm_state which will call dlm_state_print ->
      dlm_launch_thread
      
      Move dlm_debug_init after dlm_launch_thread and dlm_launch_recovery_thread
      can fix this issue.
      Signed-off-by: default avatarZongxun Wang <wangzongxun@huawei.com>
      Signed-off-by: default avatarJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      181a9a04
    • Geert Uytterhoeven's avatar
    • Geert Uytterhoeven's avatar
      sh: sh7757: switch RSPI clock to dev ID match · ba6e8b8f
      Geert Uytterhoeven authored
      Switch the RSPI MSTP clock on SH7757 from a con ID match to a dev ID
      match, so we can start looking it up using clk_get() with a NULL ID.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@linux-m68k.org>
      Tested-by: default avatarYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba6e8b8f
    • Kuninori Morimoto's avatar
      arch/sh/boards/board-sh7757lcr.c: fixup SDHI register size · f0767e89
      Kuninori Morimoto authored
      sh7757lcr SDHI register size is 0x100
      Signed-off-by: default avatarKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0767e89
    • Bobby Bingham's avatar
      sh: don't pass saved userspace state to exception handlers · a3c19514
      Bobby Bingham authored
      The compiler is permitted to generate code which overwrites the
      parameters to a function.  If those parameters include the only saved
      copy we have of userspace's registers, we're in trouble.
      Signed-off-by: default avatarBobby Bingham <koorogi@koorogi.info>
      Cc: Paul Mundt <paul.mundt@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3c19514
    • Bobby Bingham's avatar
      sh: remove unused do_fpu_error · 7caf62de
      Bobby Bingham authored
      This does not appear to have been used since commit 74d99a5e ("sh:
      SH-2A FPU support") in 2007.
      Signed-off-by: default avatarBobby Bingham <koorogi@koorogi.info>
      Cc: Paul Mundt <paul.mundt@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7caf62de
    • Bobby Bingham's avatar
      sh: push extra copy of r0-r2 for syscall parameters · abafe5d9
      Bobby Bingham authored
      When invoking syscall handlers on sh32, the saved userspace registers
      are at the top of the stack.  This seems to have been intentional, as it
      is an easy way to pass r0, r1, ...  to the handler as parameters 5, 6,
      ...
      
      It causes problems, however, because the compiler is allowed to generate
      code for a function which clobbers that function's own parameters.  For
      example, gcc generates the following code for clone:
      
          <SyS_clone>:
              mov.l   8c020714 <SyS_clone+0xc>,r1  ! 8c020540 <do_fork>
              mov.l   r7,@r15
              mov     r6,r7
              jmp     @r1
              mov     #0,r6
              nop
              .word 0x0540
              .word 0x8c02
      
      The `mov.l r7,@r15` clobbers the saved value of r0 passed from
      userspace.  For most system calls, this might not be a problem, because
      we'll be overwriting r0 with the return value anyway.  But in the case
      of clone, copy_thread will need the original value of r0 if the
      CLONE_SETTLS flag was specified.
      
      The first patch in this series fixes this issue for system calls by
      pushing to the stack and extra copy of r0-r2 before invoking the
      handler.  We discard this copy before restoring the userspace registers,
      so it is not a problem if they are clobbered.
      
      Exception handlers also receive the userspace register values in a
      similar manner, and may hit the same problem.  The second patch removes
      the do_fpu_error handler, which looks susceptible to this problem and
      which, as far as I can tell, has not been used in some time.  The third
      patch addresses other exception handlers.
      
      This patch (of 3):
      
      The userspace registers are stored at the top of the stack when the
      syscall handler is invoked, which allows r0-r2 to act as parameters 5-7.
      Parameters passed on the stack may be clobbered by the syscall handler.
      The solution is to push an extra copy of the registers which might be
      used as syscall parameters to the stack, so that the authoritative set
      of saved register values does not get clobbered.
      
      A few system call handlers are also updated to get the userspace
      registers using current_pt_regs() instead of from the stack.
      Signed-off-by: default avatarBobby Bingham <koorogi@koorogi.info>
      Cc: Paul Mundt <paul.mundt@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      abafe5d9
    • Michael Opdenacker's avatar
      score: remove unused CPU_SCORE7 Kconfig parameter · d0df04f7
      Michael Opdenacker authored
      This removes the CPU_SCORE7 Kconfig parameter, which is no longer used
      anywhere in the source code and Makefiles.
      Signed-off-by: default avatarMichael Opdenacker <michael.opdenacker@free-electrons.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0df04f7
    • Jan Beulich's avatar
      genksyms: fix typeof() handling · dc533240
      Jan Beulich authored
      Recent increased use of typeof() throughout the tree resulted in a
      number of symbols (25 in a typical distro config of ours) not getting a
      proper CRC calculated for them anymore, due to the parser in genksyms
      not coping with several of these uses (interestingly in the majority of
      [if not all] cases the problem is due to the use of typeof() in code
      preceding a certain export, not in the declaration/definition of the
      exported function/object itself; I wasn't able to find a way to address
      this more general parser shortcoming).
      
      The use of parameter_declaration is a little more relaxed than would be
      ideal (permitting not just a bare type specification, but also one with
      identifier), but since the same code is being passed through an actual
      compiler, there's no apparent risk of allowing through any broken code.
      
      Otoh using parameter_declaration instead of the ad hoc
      "decl_specifier_seq '*'" / "decl_specifier_seq" pair allows all types to
      be handled rather than just plain ones and pointers to plain ones.
      Signed-off-by: default avatarJan Beulich <jbeulich@suse.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc533240
    • Jan Kara's avatar
      fanotify: move unrelated handling from copy_event_to_user() · d507816b
      Jan Kara authored
      Move code moving event structure to access_list from copy_event_to_user()
      to fanotify_read() where it is more logical (so that we can immediately
      see in the main loop that we either move the event to a different list
      or free it).  Also move special error handling for permission events
      from copy_event_to_user() to the main loop to have it in one place with
      error handling for normal events.  This makes copy_event_to_user()
      really only copy the event to user without any side effects.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d507816b
    • Jan Kara's avatar
      fanotify: reorganize loop in fanotify_read() · d8aaab4f
      Jan Kara authored
      Swap the error / "read ok" branches in the main loop of fanotify_read().
      We will grow the "read ok" part in the next patch and this makes the
      indentation easier.  Also it is more common to have error conditions
      inside an 'if' instead of the fast path.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8aaab4f
    • Jan Kara's avatar
      fanotify: convert access_mutex to spinlock · 9573f793
      Jan Kara authored
      access_mutex is used only to guard operations on access_list.  There's
      no need for sleeping within this lock so just make a spinlock out of it.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9573f793
    • Jan Kara's avatar
      fanotify: use fanotify event structure for permission response processing · f083441b
      Jan Kara authored
      Currently, fanotify creates new structure to track the fact that
      permission event has been reported to userspace and someone is waiting
      for a response to it.  As event structures are now completely in the
      hands of each notification framework, we can use the event structure for
      this tracking instead of allocating a new structure.
      
      Since this makes the event structures for normal events and permission
      events even more different and the structures have different lifetime
      rules, we split them into two separate structures (where permission
      event structure contains the structure for a normal event).  This makes
      normal events 8 bytes smaller and the code a tad bit cleaner.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f083441b
    • Jan Kara's avatar
      fanotify: remove useless bypass_perm check · 3298cf37
      Jan Kara authored
      The prepare_for_access_response() function checks whether
      group->fanotify_data.bypass_perm is set.  However this test can never be
      true because prepare_for_access_response() is called only from
      fanotify_read() which means fanotify group is alive with an active fd
      while bypass_perm is set from fanotify_release() when all file
      descriptors pointing to the group are closed and the group is going
      away.
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3298cf37
    • Fabian Frederick's avatar
      fs/freevxfs/vxfs_lookup.c: update function comment · ddae82d8
      Fabian Frederick authored
      nameidata was replaced by flags in commit 00cd8dd3 ("stop passing
      nameidata to ->lookup()").
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ddae82d8
    • Fabian Frederick's avatar
      fs/cifs/cifsfs.c: add __init to cifs_init_inodecache() · 9ee108b2
      Fabian Frederick authored
      cifs_init_inodecache is only called by __init init_cifs.
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9ee108b2
    • Li Zefan's avatar
      kmemleak: change some global variables to int · 8910ae89
      Li Zefan authored
      They don't have to be atomic_t, because they are simple boolean toggles.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8910ae89
    • Li Zefan's avatar
      kmemleak: remove redundant code · 5f3bf19a
      Li Zefan authored
      Remove kmemleak_padding() and kmemleak_release().
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f3bf19a
    • Li Zefan's avatar
      kmemleak: allow freeing internal objects after kmemleak was disabled · c89da70c
      Li Zefan authored
      Currently if kmemleak is disabled, the kmemleak objects can never be
      freed, no matter if it's disabled by a user or due to fatal errors.
      
      Those objects can be a big waste of memory.
      
          OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
        1200264 1197433  99%    0.30K  46164       26    369312K kmemleak_object
      
      With this patch, after kmemleak was disabled you can reclaim memory
      with:
      
      	# echo clear > /sys/kernel/debug/kmemleak
      
      Also inform users about this with a printk.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c89da70c
    • Li Zefan's avatar
      kmemleak: free internal objects only if there're no leaks to be reported · dc9b3f42
      Li Zefan authored
      Currently if you stop kmemleak thread before disabling kmemleak,
      kmemleak objects will be freed and so you won't be able to check
      previously reported leaks.
      
      With this patch, kmemleak objects won't be freed if there're leaks that
      can be reported.
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc9b3f42
    • Nishanth Aravamudan's avatar
      kthread: ensure locality of task_struct allocations · 81c98869
      Nishanth Aravamudan authored
      In the presence of memoryless nodes, numa_node_id() will return the
      current CPU's NUMA node, but that may not be where we expect to allocate
      from memory from.  Instead, we should rely on the fallback code in the
      memory allocator itself, by using NUMA_NO_NODE.  Also, when calling
      kthread_create_on_node(), use the nearest node with memory to the cpu in
      question, rather than the node it is running on.
      Signed-off-by: default avatarNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Reviewed-by: default avatarChristoph Lameter <cl@linux.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      81c98869
    • Jan Kara's avatar
      bdi: avoid oops on device removal · 5acda9d1
      Jan Kara authored
      After commit 839a8e86 ("writeback: replace custom worker pool
      implementation with unbound workqueue") when device is removed while we
      are writing to it we crash in bdi_writeback_workfn() ->
      set_worker_desc() because bdi->dev is NULL.
      
      This can happen because even though bdi_unregister() cancels all pending
      flushing work, nothing really prevents new ones from being queued from
      balance_dirty_pages() or other places.
      
      Fix the problem by clearing BDI_registered bit in bdi_unregister() and
      checking it before scheduling of any flushing work.
      
      Fixes: 839a8e86Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Derek Basehore <dbasehore@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5acda9d1
    • Derek Basehore's avatar
      backing_dev: fix hung task on sync · 6ca738d6
      Derek Basehore authored
      bdi_wakeup_thread_delayed() used the mod_delayed_work() function to
      schedule work to writeback dirty inodes.  The problem with this is that
      it can delay work that is scheduled for immediate execution, such as the
      work from sync_inodes_sb().  This can happen since mod_delayed_work()
      can now steal work from a work_queue.  This fixes the problem by using
      queue_delayed_work() instead.  This is a regression caused by commit
      839a8e86 ("writeback: replace custom worker pool implementation with
      unbound workqueue").
      
      The reason that this causes a problem is that laptop-mode will change
      the delay, dirty_writeback_centisecs, to 60000 (10 minutes) by default.
      In the case that bdi_wakeup_thread_delayed() races with
      sync_inodes_sb(), sync will be stopped for 10 minutes and trigger a hung
      task.  Even if dirty_writeback_centisecs is not long enough to cause a
      hung task, we still don't want to delay sync for that long.
      
      We fix the problem by using queue_delayed_work() when we want to
      schedule writeback sometime in future.  This function doesn't change the
      timer if it is already armed.
      
      For the same reason, we also change bdi_writeback_workfn() to
      immediately queue the work again in the case that the work_list is not
      empty.  The same problem can happen if the sync work is run on the
      rescue worker.
      
      [jack@suse.cz: update changelog, add comment, use bdi_wakeup_thread_delayed()]
      Signed-off-by: default avatarDerek Basehore <dbasehore@chromium.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zento.linux.org.uk>
      Reviewed-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Derek Basehore <dbasehore@chromium.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Benson Leung <bleung@chromium.org>
      Cc: Sonny Rao <sonnyrao@chromium.org>
      Cc: Luigi Semenzato <semenzato@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6ca738d6
    • Matt Fleming's avatar
      sh: fix format string bug in stack tracer · a0c32761
      Matt Fleming authored
      Kees reported the following error:
      
         arch/sh/kernel/dumpstack.c: In function 'print_trace_address':
         arch/sh/kernel/dumpstack.c:118:2: error: format not a string literal and no format arguments [-Werror=format-security]
      
      Use the "%s" format so that it's impossible to interpret 'data' as a
      format string.
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Reported-by: default avatarKees Cook <keescook@chromium.org>
      Acked-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a0c32761
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 59ecc260
      Linus Torvalds authored
      Pull crypto updates from Herbert Xu:
       "Here is the crypto update for 3.15:
         - Added 3DES driver for OMAP4/AM43xx
         - Added AVX2 acceleration for SHA
         - Added hash-only AEAD algorithms in caam
         - Removed tegra driver as it is not functioning and the hardware is
           too slow
         - Allow blkcipher walks over AEAD (needed for ARM)
         - Fixed unprotected FPU/SSE access in ghash-clmulni-intel
         - Fixed highmem crash in omap-sham
         - Add (zero entropy) randomness when initialising hardware RNGs
         - Fixed unaligned ahash comletion functions
         - Added soft module depedency for crc32c for initrds that use crc32c"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (60 commits)
        crypto: ghash-clmulni-intel - use C implementation for setkey()
        crypto: x86/sha1 - reduce size of the AVX2 asm implementation
        crypto: x86/sha1 - fix stack alignment of AVX2 variant
        crypto: x86/sha1 - re-enable the AVX variant
        crypto: sha - SHA1 transform x86_64 AVX2
        crypto: crypto_wq - Fix late crypto work queue initialization
        crypto: caam - add missing key_dma unmap
        crypto: caam - add support for aead null encryption
        crypto: testmgr - add aead null encryption test vectors
        crypto: export NULL algorithms defines
        crypto: caam - remove error propagation handling
        crypto: hash - Simplify the ahash_finup implementation
        crypto: hash - Pull out the functions to save/restore request
        crypto: hash - Fix the pointer voodoo in unaligned ahash
        crypto: caam - Fix first parameter to caam_init_rng
        crypto: omap-sham - Map SG pages if they are HIGHMEM before accessing
        crypto: caam - Dynamic memory allocation for caam_rng_ctx object
        crypto: allow blkcipher walks over AEAD data
        crypto: remove direct blkcipher_walk dependency on transform
        hwrng: add randomness to system from rng sources
        ...
      59ecc260
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · bea80318
      Linus Torvalds authored
      Pull security subsystem updates from James Morris:
       "Apart from reordering the SELinux mmap code to ensure DAC is called
        before MAC, these are minor maintenance updates"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
        selinux: correctly label /proc inodes in use before the policy is loaded
        selinux: put the mmap() DAC controls before the MAC controls
        selinux: fix the output of ./scripts/get_maintainer.pl for SELinux
        evm: enable key retention service automatically
        ima: skip memory allocation for empty files
        evm: EVM does not use MD5
        ima: return d_name.name if d_path fails
        integrity: fix checkpatch errors
        ima: fix erroneous removal of security.ima xattr
        security: integrity: Use a more current logging style
        MAINTAINERS: email updates and other misc. changes
        ima: reduce memory usage when a template containing the n field is used
        ima: restore the original behavior for sending data with ima template
        Integrity: Pass commname via get_task_comm()
        fs: move i_readcount
        ima: use static const char array definitions
        security: have cap_dentry_init_security return error
        ima: new helper: file_inode(file)
        kernel: Mark function as static in kernel/seccomp.c
        capability: Use current logging styles
        ...
      bea80318
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · cd6362be
      Linus Torvalds authored
      Pull networking updates from David Miller:
       "Here is my initial pull request for the networking subsystem during
        this merge window:
      
         1) Support for ESN in AH (RFC 4302) from Fan Du.
      
         2) Add full kernel doc for ethtool command structures, from Ben
            Hutchings.
      
         3) Add BCM7xxx PHY driver, from Florian Fainelli.
      
         4) Export computed TCP rate information in netlink socket dumps, from
            Eric Dumazet.
      
         5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
            Dichtel.
      
         6) Convert many drivers to pci_enable_msix_range(), from Alexander
            Gordeev.
      
         7) Record SKB timestamps more efficiently, from Eric Dumazet.
      
         8) Switch to microsecond resolution for TCP round trip times, also
            from Eric Dumazet.
      
         9) Clean up and fix 6lowpan fragmentation handling by making use of
            the existing inet_frag api for it's implementation.
      
        10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.
      
        11) Auto size SKB lengths when composing netlink messages based upon
            past message sizes used, from Eric Dumazet.
      
        12) qdisc dumps can take a long time, add a cond_resched(), From Eric
            Dumazet.
      
        13) Sanitize netpoll core and drivers wrt.  SKB handling semantics.
            Get rid of never-used-in-tree netpoll RX handling.  From Eric W
            Biederman.
      
        14) Support inter-address-family and namespace changing in VTI tunnel
            driver(s).  From Steffen Klassert.
      
        15) Add Altera TSE driver, from Vince Bridgers.
      
        16) Optimizing csum_replace2() so that it doesn't adjust the checksum
            by checksumming the entire header, from Eric Dumazet.
      
        17) Expand BPF internal implementation for faster interpreting, more
            direct translations into JIT'd code, and much cleaner uses of BPF
            filtering in non-socket ocntexts.  From Daniel Borkmann and Alexei
            Starovoitov"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
        netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
        net: Add a test to see if a skb is freeable in irq context
        qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
        net: ptp: move PTP classifier in its own file
        net: sxgbe: make "core_ops" static
        net: sxgbe: fix logical vs bitwise operation
        net: sxgbe: sxgbe_mdio_register() frees the bus
        Call efx_set_channels() before efx->type->dimension_resources()
        xen-netback: disable rogue vif in kthread context
        net/mlx4: Set proper build dependancy with vxlan
        be2net: fix build dependency on VxLAN
        mac802154: make csma/cca parameters per-wpan
        mac802154: allow only one WPAN to be up at any given time
        net: filter: minor: fix kdoc in __sk_run_filter
        netlink: don't compare the nul-termination in nla_strcmp
        can: c_can: Avoid led toggling for every packet.
        can: c_can: Simplify TX interrupt cleanup
        can: c_can: Store dlc private
        can: c_can: Reduce register access
        can: c_can: Make the code readable
        ...
      cd6362be
  2. 02 Apr, 2014 7 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · 0f1b1e6d
      Linus Torvalds authored
      Pull HID updates from Jiri Kosina:
       - substantial cleanup of the generic and transport layers, in the
         direction of an ultimate goal of making struct hid_device completely
         transport independent, by Benjamin Tissoires
       - cp2112 driver from David Barksdale
       - a lot of fixes and new hardware support (Dualshock 4) to hid-sony
         driver, by Frank Praznik
       - support for Win 8.1 multitouch protocol by Andrew Duggan
       - other smaller fixes / device ID additions
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (75 commits)
        HID: sony: fix force feedback mismerge
        HID: sony: Set the quriks flag for Bluetooth controllers
        HID: sony: Fix Sixaxis cable state detection
        HID: uhid: Add UHID_CREATE2 + UHID_INPUT2
        HID: hyperv: fix _raw_request() prototype
        HID: hyperv: Implement a stub raw_request() entry point
        HID: hid-sensor-hub: fix sleeping function called from invalid context
        HID: multitouch: add support for Win 8.1 multitouch touchpads
        HID: remove hid_output_raw_report transport implementations
        HID: sony: do not rely on hid_output_raw_report
        HID: cp2112: remove the last hid_output_raw_report() call
        HID: cp2112: remove various hid_out_raw_report calls
        HID: multitouch: add support of other generic collections in hid-mt
        HID: multitouch: remove pen special handling
        HID: multitouch: remove registered devices with default behavior
        HID: hidp: Add a comment that some devices depend on the current behavior of uniq
        HID: sony: Prevent duplicate controller connections.
        HID: sony: Perform a boundry check on the sixaxis battery level index.
        HID: sony: Fix work queue issues
        HID: sony: Fix multi-line comment styling
        ...
      0f1b1e6d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial · 159d8133
      Linus Torvalds authored
      Pull trivial tree updates from Jiri Kosina:
       "Usual rocket science -- mostly documentation and comment updates"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
        sparse: fix comment
        doc: fix double words
        isdn: capi: fix "CAPI_VERSION" comment
        doc: DocBook: Fix typos in xml and template file
        Bluetooth: add module name for btwilink
        driver core: unexport static function create_syslog_header
        mmc: core: typo fix in printk specifier
        ARM: spear: clean up editing mistake
        net-sysfs: fix comment typo 'CONFIG_SYFS'
        doc: Insert MODULE_ in module-signing macros
        Documentation: update URL to hfsplus Technote 1150
        gpio: update path to documentation
        ixgbe: Fix format string in ixgbe_fcoe.
        Kconfig: Remove useless "default N" lines
        user_namespace.c: Remove duplicated word in comment
        CREDITS: fix formatting
        treewide: Fix typo in Documentation/DocBook
        mm: Fix warning on make htmldocs caused by slab.c
        ata: ata-samsung_cf: cleanup in header file
        idr: remove unused prototype of idr_free()
      159d8133
    • Linus Torvalds's avatar
      Merge branch 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 05bf58ca
      Linus Torvalds authored
      Pull sched/idle changes from Ingo Molnar:
       "More idle code reorganization, to prepare for more integration.
      
        (Sent separately because it depended on pending timer work, which is
        now upstream)"
      
      * 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/idle: Add more comments to the code
        sched/idle: Move idle conditions in cpuidle_idle main function
        sched/idle: Reorganize the idle loop
        cpuidle/idle: Move the cpuidle_idle_call function to idle.c
        idle/cpuidle: Split cpuidle_idle_call main function into smaller functions
      05bf58ca
    • Oleg Nesterov's avatar
      pid_namespace: pidns_get() should check task_active_pid_ns() != NULL · d2308225
      Oleg Nesterov authored
      pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
      go away, but task_active_pid_ns(task) is NULL if release_task(task)
      was already called. Alternatively we could change get_pid_ns(ns) to
      check ns != NULL, but it seems that other callers are fine.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Eric W. Biederman ebiederm@xmission.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2308225
    • Linus Torvalds's avatar
      Merge tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 7cbb39d4
      Linus Torvalds authored
      Pull kvm updates from Paolo Bonzini:
       "PPC and ARM do not have much going on this time.  Most of the cool
        stuff, instead, is in s390 and (after a few releases) x86.
      
        ARM has some caching fixes and PPC has transactional memory support in
        guests.  MIPS has some fixes, with more probably coming in 3.16 as
        QEMU will soon get support for MIPS KVM.
      
        For x86 there are optimizations for debug registers, which trigger on
        some Windows games, and other important fixes for Windows guests.  We
        now expose to the guest Broadwell instruction set extensions and also
        Intel MPX.  There's also a fix/workaround for OS X guests, nested
        virtualization features (preemption timer), and a couple kvmclock
        refinements.
      
        For s390, the main news is asynchronous page faults, together with
        improvements to IRQs (floating irqs and adapter irqs) that speed up
        virtio devices"
      
      * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
        KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
        KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
        KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
        KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
        KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
        KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
        KVM: PPC: Book3S HV: Add transactional memory support
        KVM: Specify byte order for KVM_EXIT_MMIO
        KVM: vmx: fix MPX detection
        KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
        KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
        KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
        KVM: s390: clear local interrupts at cpu initial reset
        KVM: s390: Fix possible memory leak in SIGP functions
        KVM: s390: fix calculation of idle_mask array size
        KVM: s390: randomize sca address
        KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
        KVM: Bump KVM_MAX_IRQ_ROUTES for s390
        KVM: s390: irq routing for adapter interrupts.
        KVM: s390: adapter interrupt sources
        ...
      7cbb39d4
    • Linus Torvalds's avatar
      Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · 64056a94
      Linus Torvalds authored
      Pull virtio updates from Rusty Russell:
       "Nothing exciting: virtio-blk users might see a bit of a boost from the
        doubling of the default queue length though"
      
      * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
        virtio-blk: base queue-depth on virtqueue ringsize or module param
        Revert a02bbb1c: MAINTAINERS: add virtio-dev ML for virtio
        virtio: fail adding buffer on broken queues.
        virtio-rng: don't crash if virtqueue is broken.
        virtio_balloon: don't crash if virtqueue is broken.
        virtio_blk: don't crash, report error if virtqueue is broken.
        virtio_net: don't crash if virtqueue is broken.
        virtio_balloon: don't softlockup on huge balloon changes.
        virtio: Use pci_enable_msix_exact() instead of pci_enable_msix()
        MAINTAINERS: virtio-dev is subscribers only
        tools/virtio: add a missing )
        tools/virtio: fix missing kmemleak_ignore symbol
        tools/virtio: update internal copies of headers
      64056a94
    • Linus Torvalds's avatar
      Merge branch 'for-3.15' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping · 7474043e
      Linus Torvalds authored
      Pull DMA-mapping updates from Marek Szyprowski:
       "This contains extension for more efficient handling of io address
        space for dma-mapping subsystem for ARM architecture"
      
      * 'for-3.15' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
        arm: dma-mapping: remove order parameter from arm_iommu_create_mapping()
        arm: dma-mapping: Add support to extend DMA IOMMU mappings
      7474043e