1. 24 Jan, 2013 16 commits
    • Tejun Heo's avatar
      workqueue: rename nr_running variables · e6e380ed
      Tejun Heo authored
      Rename per-cpu and unbound nr_running variables such that they match
      the pool variables.
      
      This patch doesn't introduce any functional changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      e6e380ed
    • Tejun Heo's avatar
      workqueue: remove global_cwq · a60dc39c
      Tejun Heo authored
      global_cwq is now nothing but a container for per-cpu standard
      worker_pools.  Declare the worker pools directly as
      cpu/unbound_std_worker_pools[] and remove global_cwq.
      
      * ____cacheline_aligned_in_smp moved from global_cwq to worker_pool.
        This probably would have made sense even before this change as we
        want each pool to be aligned.
      
      * get_gcwq() is replaced with std_worker_pools() which returns the
        pointer to the standard pool array for a given CPU.
      
      * __alloc_workqueue_key() updated to use get_std_worker_pool() instead
        of open-coding pool determination.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      
      v2: Joonsoo pointed out that it'd better to align struct worker_pool
          rather than the array so that every pool is aligned.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      a60dc39c
    • Tejun Heo's avatar
      workqueue: remove worker_pool->gcwq · 4e8f0a60
      Tejun Heo authored
      The only remaining user of pool->gcwq is std_worker_pool_pri().
      Reimplement it using get_gcwq() and remove worker_pool->gcwq.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      4e8f0a60
    • Tejun Heo's avatar
      workqueue: replace for_each_worker_pool() with for_each_std_worker_pool() · 38db41d9
      Tejun Heo authored
      for_each_std_worker_pool() takes @cpu instead of @gcwq.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      38db41d9
    • Tejun Heo's avatar
      workqueue: make freezing/thawing per-pool · a1056305
      Tejun Heo authored
      Instead of holding locks from both pools and then processing the pools
      together, make freezing/thwaing per-pool - grab locks of one pool,
      process it, release it and then proceed to the next pool.
      
      While this patch changes processing order across pools, order within
      each pool remains the same.  As each pool is independent, this
      shouldn't break anything.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      a1056305
    • Tejun Heo's avatar
      workqueue: make hotplug processing per-pool · 94cf58bb
      Tejun Heo authored
      Instead of holding locks from both pools and then processing the pools
      together, make hotplug processing per-pool - grab locks of one pool,
      process it, release it and then proceed to the next pool.
      
      rebind_workers() is updated to take and process @pool instead of @gcwq
      which results in a lot of de-indentation.  gcwq_claim_assoc_and_lock()
      and its counterpart are replaced with in-line per-pool locking.
      
      While this patch changes processing order across pools, order within
      each pool remains the same.  As each pool is independent, this
      shouldn't break anything.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      94cf58bb
    • Tejun Heo's avatar
      workqueue: move global_cwq->lock to worker_pool · d565ed63
      Tejun Heo authored
      Move gcwq->lock to pool->lock.  The conversion is mostly
      straight-forward.  Things worth noting are
      
      * In many places, this removes the need to use gcwq completely.  pool
        is used directly instead.  get_std_worker_pool() is added to help
        some of these conversions.  This also leaves get_work_gcwq() without
        any user.  Removed.
      
      * In hotplug and freezer paths, the pools belonging to a CPU are often
        processed together.  This patch makes those paths hold locks of all
        pools, with highpri lock nested inside, to keep the conversion
        straight-forward.  These nested lockings will be removed by
        following patches.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      d565ed63
    • Tejun Heo's avatar
      workqueue: move global_cwq->cpu to worker_pool · ec22ca5e
      Tejun Heo authored
      Move gcwq->cpu to pool->cpu.  This introduces a couple places where
      gcwq->pools[0].cpu is used.  These will soon go away as gcwq is
      further reduced.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      ec22ca5e
    • Tejun Heo's avatar
      workqueue: move busy_hash from global_cwq to worker_pool · c9e7cf27
      Tejun Heo authored
      There's no functional necessity for the two pools on the same CPU to
      share the busy hash table.  It's also likely to be a bottleneck when
      implementing pools with user-specified attributes.
      
      This patch makes busy_hash per-pool.  The conversion is mostly
      straight-forward.  Changes worth noting are,
      
      * Large block of changes in rebind_workers() is moving the block
        inside for_each_worker_pool() as now there are separate hash tables
        for each pool.  This changes the order of operations but doesn't
        break anything.
      
      * Thre for_each_worker_pool() loops in gcwq_unbind_fn() are combined
        into one.  This again changes the order of operaitons but doesn't
        break anything.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      c9e7cf27
    • Tejun Heo's avatar
      workqueue: record pool ID instead of CPU in work->data when off-queue · 7c3eed5c
      Tejun Heo authored
      Currently, when a work item is off-queue, work->data records the CPU
      it was last on, which is used to locate the last executing instance
      for non-reentrance, flushing, etc.
      
      We're in the process of removing global_cwq and making worker_pool the
      top level abstraction.  This patch makes work->data point to the pool
      it was last associated with instead of CPU.
      
      After the previous WORK_OFFQ_POOL_CPU and worker_poo->id additions,
      the conversion is fairly straight-forward.  WORK_OFFQ constants and
      functions are modified to record and read back pool ID instead.
      worker_pool_by_id() is added to allow looking up pool from ID.
      get_work_pool() replaces get_work_gcwq(), which is reimplemented using
      get_work_pool().  get_work_pool_id() replaces work_cpu().
      
      This patch shouldn't introduce any observable behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      7c3eed5c
    • Tejun Heo's avatar
      workqueue: add worker_pool->id · 9daf9e67
      Tejun Heo authored
      Add worker_pool->id which is allocated from worker_pool_idr.  This
      will be used to record the last associated worker_pool in work->data.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      9daf9e67
    • Tejun Heo's avatar
      workqueue: introduce WORK_OFFQ_CPU_NONE · 715b06b8
      Tejun Heo authored
      Currently, when a work item is off queue, high bits of its data
      encodes the last CPU it was on.  This is scheduled to be changed to
      pool ID, which will make it impossible to use WORK_CPU_NONE to
      indicate no association.
      
      This patch limits the number of bits which are used for off-queue cpu
      number to 31 (so that the max fits in an int) and uses the highest
      possible value - WORK_OFFQ_CPU_NONE - to indicate no association.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      715b06b8
    • Tejun Heo's avatar
      workqueue: make GCWQ_FREEZING a pool flag · 35b6bb63
      Tejun Heo authored
      Make GCWQ_FREEZING a pool flag POOL_FREEZING.  This patch doesn't
      change locking - FREEZING on both pools of a CPU are set or clear
      together while holding gcwq->lock.  It shouldn't cause any functional
      difference.
      
      This leaves gcwq->flags w/o any flags.  Removed.
      
      While at it, convert BUG_ON()s in freeze_workqueue_begin() and
      thaw_workqueues() to WARN_ON_ONCE().
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      35b6bb63
    • Tejun Heo's avatar
      workqueue: make GCWQ_DISASSOCIATED a pool flag · 24647570
      Tejun Heo authored
      Make GCWQ_DISASSOCIATED a pool flag POOL_DISASSOCIATED.  This patch
      doesn't change locking - DISASSOCIATED on both pools of a CPU are set
      or clear together while holding gcwq->lock.  It shouldn't cause any
      functional difference.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      24647570
    • Tejun Heo's avatar
      workqueue: use std_ prefix for the standard per-cpu pools · e34cdddb
      Tejun Heo authored
      There are currently two worker pools per cpu (including the unbound
      cpu) and they are the only pools in use.  New class of pools are
      scheduled to be added and some pool related APIs will be added
      inbetween.  Call the existing pools the standard pools and prefix them
      with std_.  Do this early so that new APIs can use std_ prefix from
      the beginning.
      
      This patch doesn't introduce any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      e34cdddb
    • Tejun Heo's avatar
      workqueue: unexport work_cpu() · e2905b29
      Tejun Heo authored
      This function no longer has any external users.  Unexport it.  It will
      be removed later on.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      e2905b29
  2. 18 Jan, 2013 4 commits
    • Tejun Heo's avatar
      workqueue: implement current_is_async() · 84b233ad
      Tejun Heo authored
      This function queries whether %current is an async worker executing an
      async item.  This will be used to implement warning on synchronous
      request_module() from async workers.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      84b233ad
    • Tejun Heo's avatar
      workqueue: move struct worker definition to workqueue_internal.h · 2eaebdb3
      Tejun Heo authored
      This will be used to implement an inline function to query whether
      %current is a workqueue worker and, if so, allow determining which
      work item it's executing.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      2eaebdb3
    • Tejun Heo's avatar
      workqueue: rename kernel/workqueue_sched.h to kernel/workqueue_internal.h · ea138446
      Tejun Heo authored
      Workqueue wants to expose more interface internal to kernel/.  Instead
      of adding a new header file, repurpose kernel/workqueue_sched.h.
      Rename it to workqueue_internal.h and add include protector.
      
      This patch doesn't introduce any functional changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      ea138446
    • Tejun Heo's avatar
      workqueue: set PF_WQ_WORKER on rescuers · 111c225a
      Tejun Heo authored
      PF_WQ_WORKER is used to tell scheduler that the task is a workqueue
      worker and needs wq_worker_sleeping/waking_up() invoked on it for
      concurrency management.  As rescuers never participate in concurrency
      management, PF_WQ_WORKER wasn't set on them.
      
      There's a need for an interface which can query whether %current is
      executing a work item and if so which.  Such interface requires a way
      to identify all tasks which may execute work items and PF_WQ_WORKER
      will be used for that.  As all normal workers always have PF_WQ_WORKER
      set, we only need to add it to rescuers.
      
      As rescuers start with WORKER_PREP but never clear it, it's always
      NOT_RUNNING and there's no need to worry about it interfering with
      concurrency management even if PF_WQ_WORKER is set; however, unlike
      normal workers, rescuers currently don't have its worker struct as
      kthread_data().  It uses the associated workqueue_struct instead.
      This is problematic as wq_worker_sleeping/waking_up() expect struct
      worker at kthread_data().
      
      This patch adds worker->rescue_wq and start rescuer kthreads with
      worker struct as kthread_data and sets PF_WQ_WORKER on rescuers.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      111c225a
  3. 19 Dec, 2012 1 commit
    • Tejun Heo's avatar
      workqueue: fix find_worker_executing_work() brekage from hashtable conversion · 023f27d3
      Tejun Heo authored
      42f8570f ("workqueue: use new hashtable implementation") incorrectly
      made busy workers hashed by the pointer value of worker instead of
      work.  This broke find_worker_executing_work() which in turn broke a
      lot of fundamental operations of workqueue - non-reentrancy and
      flushing among others.  The flush malfunction triggered warning in
      disk event code in Fengguang's automated test.
      
       write_dev_root_ (3265) used greatest stack depth: 2704 bytes left
       ------------[ cut here ]------------
       WARNING: at /c/kernel-tests/src/stable/block/genhd.c:1574 disk_clear_events+0x\
      cf/0x108()
       Hardware name: Bochs
       Modules linked in:
       Pid: 3328, comm: ata_id Not tainted 3.7.0-01930-gbff6343 #1167
       Call Trace:
        [<ffffffff810997c4>] warn_slowpath_common+0x83/0x9c
        [<ffffffff810997f7>] warn_slowpath_null+0x1a/0x1c
        [<ffffffff816aea77>] disk_clear_events+0xcf/0x108
        [<ffffffff811bd8be>] check_disk_change+0x27/0x59
        [<ffffffff822e48e2>] cdrom_open+0x49/0x68b
        [<ffffffff81ab0291>] idecd_open+0x88/0xb7
        [<ffffffff811be58f>] __blkdev_get+0x102/0x3ec
        [<ffffffff811bea08>] blkdev_get+0x18f/0x30f
        [<ffffffff811bebfd>] blkdev_open+0x75/0x80
        [<ffffffff8118f510>] do_dentry_open+0x1ea/0x295
        [<ffffffff8118f5f0>] finish_open+0x35/0x41
        [<ffffffff8119c720>] do_last+0x878/0xa25
        [<ffffffff8119c993>] path_openat+0xc6/0x333
        [<ffffffff8119cf37>] do_filp_open+0x38/0x86
        [<ffffffff81190170>] do_sys_open+0x6c/0xf9
        [<ffffffff8119021e>] sys_open+0x21/0x23
        [<ffffffff82c1c3d9>] system_call_fastpath+0x16/0x1b
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      023f27d3
  4. 18 Dec, 2012 19 commits
    • Tejun Heo's avatar
      workqueue: consider work function when searching for busy work items · a2c1c57b
      Tejun Heo authored
      To avoid executing the same work item concurrenlty, workqueue hashes
      currently busy workers according to their current work items and looks
      up the the table when it wants to execute a new work item.  If there
      already is a worker which is executing the new work item, the new item
      is queued to the found worker so that it gets executed only after the
      current execution finishes.
      
      Unfortunately, a work item may be freed while being executed and thus
      recycled for different purposes.  If it gets recycled for a different
      work item and queued while the previous execution is still in
      progress, workqueue may make the new work item wait for the old one
      although the two aren't really related in any way.
      
      In extreme cases, this false dependency may lead to deadlock although
      it's extremely unlikely given that there aren't too many self-freeing
      work item users and they usually don't wait for other work items.
      
      To alleviate the problem, record the current work function in each
      busy worker and match it together with the work item address in
      find_worker_executing_work().  While this isn't complete, it ensures
      that unrelated work items don't interact with each other and in the
      very unlikely case where a twisted wq user triggers it, it's always
      onto itself making the culprit easy to spot.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarAndrey Isakov <andy51@gmx.ru>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51701
      Cc: stable@vger.kernel.org
      a2c1c57b
    • Sasha Levin's avatar
      workqueue: use new hashtable implementation · 42f8570f
      Sasha Levin authored
      Switch workqueues to use the new hashtable implementation. This reduces the
      amount of generic unrelated code in the workqueues.
      
      This patch depends on d9b482c8 ("hashtable: introduce a small and naive
      hashtable") which was merged in v3.6.
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      42f8570f
    • Linus Torvalds's avatar
      Merge branch 'akpm' (Andrew's patch-bomb) · 848b8141
      Linus Torvalds authored
      Merge misc patches from Andrew Morton:
       "Incoming:
      
         - lots of misc stuff
      
         - backlight tree updates
      
         - lib/ updates
      
         - Oleg's percpu-rwsem changes
      
         - checkpatch
      
         - rtc
      
         - aoe
      
         - more checkpoint/restart support
      
        I still have a pile of MM stuff pending - Pekka should be merging
        later today after which that is good to go.  A number of other things
        are twiddling thumbs awaiting maintainer merges."
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (180 commits)
        scatterlist: don't BUG when we can trivially return a proper error.
        docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output
        fs, fanotify: add @mflags field to fanotify output
        docs: add documentation about /proc/<pid>/fdinfo/<fd> output
        fs, notify: add procfs fdinfo helper
        fs, exportfs: add exportfs_encode_inode_fh() helper
        fs, exportfs: escape nil dereference if no s_export_op present
        fs, epoll: add procfs fdinfo helper
        fs, eventfd: add procfs fdinfo helper
        procfs: add ability to plug in auxiliary fdinfo providers
        tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test
        breakpoint selftests: print failure status instead of cause make error
        kcmp selftests: print fail status instead of cause make error
        kcmp selftests: make run_tests fix
        mem-hotplug selftests: print failure status instead of cause make error
        cpu-hotplug selftests: print failure status instead of cause make error
        mqueue selftests: print failure status instead of cause make error
        vm selftests: print failure status instead of cause make error
        ubifs: use prandom_bytes
        mtd: nandsim: use prandom_bytes
        ...
      848b8141
    • Eric W. Biederman's avatar
      efi: Fix the build with user namespaces enabled. · 99295618
      Eric W. Biederman authored
      When compiling efivars.c the build fails with:
      
         CC      drivers/firmware/efivars.o
        drivers/firmware/efivars.c: In function ‘efivarfs_get_inode’:
        drivers/firmware/efivars.c:886:31: error: incompatible types when assigning to type ‘kgid_t’ from type ‘int’
        make[2]: *** [drivers/firmware/efivars.o] Error 1
        make[1]: *** [drivers/firmware/efivars.o] Error 2
      
      Fix the build error by removing the duplicate initialization of i_uid and
      i_gid inode_init_always has already initialized them to 0.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99295618
    • Stephen Rothwell's avatar
      mm,numa: fix update_mmu_cache_pmd call · ce4a9cc5
      Stephen Rothwell authored
      This build error is currently hidden by the fact that the x86
      implementation of 'update_mmu_cache_pmd()' is a macro that doesn't use
      its last argument, but commit b32967ff ("mm: numa: Add THP migration
      for the NUMA working set scanning fault case") introduced a call with
      the wrong third argument.
      
      In the akpm tree, it causes this build error:
      
        mm/migrate.c: In function 'migrate_misplaced_transhuge_page_put':
        mm/migrate.c:1666:2: error: incompatible type for argument 3 of 'update_mmu_cache_pmd'
        arch/x86/include/asm/pgtable.h:792:20: note: expected 'struct pmd_t *' but argument is of type 'pmd_t'
      
      Fix it.
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce4a9cc5
    • Nick Bowler's avatar
      scatterlist: don't BUG when we can trivially return a proper error. · 6fd59a83
      Nick Bowler authored
      There is absolutely no reason to crash the kernel when we have a
      perfectly good return value already available to use for conveying
      failure status.
      
      Let's return an error code instead of crashing the kernel: that sounds
      like a much better plan.
      
      [akpm@linux-foundation.org: s/E2BIG/EINVAL/]
      Signed-off-by: default avatarNick Bowler <nbowler@elliptictech.com>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6fd59a83
    • Cyrill Gorcunov's avatar
      docs: update documentation about /proc/<pid>/fdinfo/<fd> fanotify output · e71ec593
      Cyrill Gorcunov authored
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e71ec593
    • Cyrill Gorcunov's avatar
      fs, fanotify: add @mflags field to fanotify output · e6dbcafb
      Cyrill Gorcunov authored
      The kernel keeps FAN_MARK_IGNORED_SURV_MODIFY bit separately from
      fsnotify_mark::mask|ignored_mask thus put it in @mflags (mark flags)
      field so the user-space reader will be able to detect if such bit were
      used on mark creation procedure.
      
       | pos:	0
       | flags:	04002
       | fanotify flags:10 event-flags:0
       | fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
       | fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e6dbcafb
    • Cyrill Gorcunov's avatar
      docs: add documentation about /proc/<pid>/fdinfo/<fd> output · f1d8c162
      Cyrill Gorcunov authored
      [akpm@linux-foundation.org: tweak documentation]
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1d8c162
    • Cyrill Gorcunov's avatar
      fs, notify: add procfs fdinfo helper · be77196b
      Cyrill Gorcunov authored
      This allow us to print out fsnotify details such as watchee inode, device,
      mask and optionally a file handle.
      
      For inotify objects if kernel compiled with exportfs support the output
      will be
      
       | pos:	0
       | flags:	02000000
       | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
       | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:11a1000020542153
       | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:49b1060023552153
      
      If kernel compiled without exportfs support, the file handle
      won't be provided but inode and device only.
      
       | pos:	0
       | flags:	02000000
       | inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0
       | inotify wd:2 ino:a111 sdev:800013 mask:800afce ignored_mask:0
       | inotify wd:1 ino:6b149 sdev:800013 mask:800afce ignored_mask:0
      
      For fanotify the output is like
      
       | pos:	0
       | flags:	04002
       | fanotify flags:10 event-flags:0
       | fanotify mnt_id:12 mask:3b ignored_mask:0
       | fanotify ino:50205 sdev:800013 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:05020500fb1d47e7
      
      To minimize impact on general fsnotify code the new functionality
      is gathered in fs/notify/fdinfo.c file.
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      be77196b
    • Cyrill Gorcunov's avatar
      fs, exportfs: add exportfs_encode_inode_fh() helper · 711c7bf9
      Cyrill Gorcunov authored
      We will need this helper in the next patch to provide a file handle for
      inotify marks in /proc/pid/fdinfo output.
      
      The patch is rather providing the way to use inodes directly when dentry
      is not available (like in case of inotify system).
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      711c7bf9
    • Cyrill Gorcunov's avatar
      fs, exportfs: escape nil dereference if no s_export_op present · ab49bdec
      Cyrill Gorcunov authored
      This routine will be used to generate a file handle in fdinfo output for
      inotify subsystem, where if no s_export_op present the general
      export_encode_fh should be used.  Thus add a test if s_export_op present
      inside exportfs_encode_fh itself.
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ab49bdec
    • Cyrill Gorcunov's avatar
      fs, epoll: add procfs fdinfo helper · 138d22b5
      Cyrill Gorcunov authored
      This allows us to print out eventpoll target file descriptor, events and
      data, the /proc/pid/fdinfo/fd consists of
      
       | pos:	0
       | flags:	02
       | tfd:        5 events:       1d data: ffffffffffffffff enabled: 1
      
      [avagin@: fix for unitialized ret variable]
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      138d22b5
    • Cyrill Gorcunov's avatar
      fs, eventfd: add procfs fdinfo helper · cbac5542
      Cyrill Gorcunov authored
      This allows us to print out raw counter value.  The /proc/pid/fdinfo/fd
      output is
      
       | pos:	0
       | flags:	04002
       | eventfd-count:               5a
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cbac5542
    • Cyrill Gorcunov's avatar
      procfs: add ability to plug in auxiliary fdinfo providers · 55985dd7
      Cyrill Gorcunov authored
      This patch brings ability to print out auxiliary data associated with
      file in procfs interface /proc/pid/fdinfo/fd.
      
      In particular further patches make eventfd, evenpoll, signalfd and
      fsnotify to print additional information complete enough to restore
      these objects after checkpoint.
      
      To simplify the code we add show_fdinfo callback inside struct
      file_operations (as Al and Pavel are proposing).
      Signed-off-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      55985dd7
    • Dave Jones's avatar
      tools/testing/selftests/kcmp/kcmp_test.c: print reason for failure in kcmp_test · 2bf1cbf1
      Dave Jones authored
      I was curious why sys_kcmp wasn't working, which led me to the testcase.
      It turned out I hadn't enabled CHECKPOINT_RESTORE in the kernel I was
      testing.  Add a decoding of errno to the testcase to make that obvious.
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      Acked-by: default avatarCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2bf1cbf1
    • Dave Young's avatar
      breakpoint selftests: print failure status instead of cause make error · 5a55f8bb
      Dave Young authored
      In case breakpoint test exit non zero value it will cause make error.
      Better way is just print the test failure status.
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a55f8bb
    • Dave Young's avatar
      kcmp selftests: print fail status instead of cause make error · ed8ad10c
      Dave Young authored
      In case kcmp_test exit non zero value it will cause make error.
      Better way is just print the test failure status.
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed8ad10c
    • Dave Young's avatar
      kcmp selftests: make run_tests fix · 63d23367
      Dave Young authored
      make run_tests need the target is run_tests instead of run-tests
      Also gcc output should be kcmp_test. Fix these two issues.
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63d23367