1. 01 Apr, 2012 25 commits
    • Tejun Heo's avatar
      blkcg: export conf/stat helpers to prepare for reorganization · 829fdb50
      Tejun Heo authored
      conf/stat handling is about to be moved to policy implementation from
      blkcg core.  Export conf/stat helpers from blkcg core so that
      blk-throttle and cfq-iosched can use them.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      829fdb50
    • Tejun Heo's avatar
      blkcg: simplify blkg_conf_prep() · 726fa694
      Tejun Heo authored
      blkg_conf_prep() implements "MAJ:MIN VAL" parsing manually, which is
      unnecessary.  Just use sscanf("%u:%u %llu").  This might not reject
      some malformed input (extra input at the end) but we don't care.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      726fa694
    • Tejun Heo's avatar
      blkcg: restructure blkio_group configruation setting · 3a8b31d3
      Tejun Heo authored
      As part of userland interface restructuring, this patch updates
      per-blkio_group configuration setting.  Instead of funneling
      everything through a master function which has hard-coded cases for
      each config file it may handle, the common part is factored into
      blkg_conf_prep() and blkg_conf_finish() and different configuration
      setters are implemented using the helpers.
      
      While this doesn't result in immediate LOC reduction, this enables
      further cleanups and more modular implementation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      3a8b31d3
    • Tejun Heo's avatar
      blkcg: restructure configuration printing · c4682aec
      Tejun Heo authored
      Similarly to the previous stat restructuring, this patch restructures
      conf printing code such that,
      
      * Conf printing uses the same helpers as stat.
      
      * Printing function doesn't require hardcoded switching on the config
        being printed.  Note that this isn't complete yet for throttle
        confs.  The next patch will convert setting for these confs and will
        complete the transition.
      
      * Printing uses read_seq_string callback (other methods will be phased
        out).
      
      Note that blkio_group_conf.iops[2] is changed to u64 so that they can
      be manipulated with the same functions.  This is transitional and will
      go away later.
      
      After this patch, per-device configurations - weight, bps and iops -
      use __blkg_prfill_u64() for printing which uses white space as
      delimiter instead of tab.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      c4682aec
    • Tejun Heo's avatar
      blkcg: drop blkiocg_file_write_u64() · 627f29f4
      Tejun Heo authored
      blkiocg_file_write_u64() has single switch case.  Drop
      blkiocg_file_write_u64(), rename blkio_weight_write() to
      blkcg_set_weight() and use it directly for .write_u64 callback.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      627f29f4
    • Tejun Heo's avatar
      blkcg: restructure statistics printing · d3d32e69
      Tejun Heo authored
      blkcg stats handling is a mess.  None of the stats has much to do with
      blkcg core but they are all implemented in blkcg core.  Code sharing
      is achieved by mixing common code with hard-coded cases for each stat
      counter.
      
      This patch restructures statistics printing such that
      
      * Common logic exists as helper functions and specific print functions
        use the helpers to implement specific cases.
      
      * Printing functions serving multiple counters don't require hardcoded
        switching on specific counters.
      
      * Printing uses read_seq_string callback (other methods will be phased
        out).
      
      This change enables further cleanups and relocating stats code to the
      policy implementation it belongs to.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d3d32e69
    • Tejun Heo's avatar
      blkcg: introduce blkg_stat and blkg_rwstat · edcb0722
      Tejun Heo authored
      blkcg uses u64_stats_sync to avoid reading wrong u64 statistic values
      on 32bit archs and some stat counters have subtypes to distinguish
      read/writes and sync/async IOs.  The stat code paths are confusing and
      involve a lot of going back and forth between blkcg core and specific
      policy implementations, and synchronization and subtype handling are
      open coded in blkcg core.
      
      This patch introduces struct blkg_stat and blkg_rwstat which, with
      accompanying operations, encapsulate stat updating and accessing with
      proper synchronization.
      
      blkg_stat is simple u64 counter with 64bit read-access protection.
      blkg_rwstat is the one with rw and [a]sync subcounters and takes @rw
      flags to distinguish IO subtypes (%REQ_WRITE and %REQ_SYNC) and
      replaces stat_sub_type indexed arrays.
      
      All counters in blkio_group_stats and blkio_group_stats_cpu are
      replaced with either blkg_stat or blkg_rwstat along with all users.
      
      This does add one u64_stats_sync per counter and increase stats_sync
      operations but they're empty/noops on 64bit archs and blkcg doesn't
      have too many counters, especially with DEBUG_BLK_CGROUP off.
      
      While the currently resulting code isn't necessarily simpler at the
      moment, this will enable further clean up of blkcg stats code.
      
      - BLKIO_STAT_{READ|WRITE|SYNC|ASYNC|TOTAL} renamed to
        BLKG_RWSTAT_{READ|WRITE|SYNC|ASYNC|TOTAL}.
      
      - blkg_stat_add() replaces blkio_add_stat() and
        blkio_check_and_dec_stat().  Note that BUG_ON() on underflow in the
        latter function no longer exists.  It's *way* better to have
        underflowed stat counters than oopsing.
      
      - blkio_group_stats->dequeue is now a proper u64 stat counter instead
        of ulong.
      
      - reset_stats() updated to clear each stat counters individually and
        BLKG_STATS_DEBUG_CLEAR_{START|SIZE} are removed.
      
      - Some functions reconstruct rw flags from direction and sync
        booleans.  This will be removed by future patches.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      edcb0722
    • Tejun Heo's avatar
      blkcg: BLKIO_STAT_CPU_SECTORS doesn't have subcounters · 2aa4a152
      Tejun Heo authored
      BLKIO_STAT_CPU_SECTORS doesn't need read/write/sync/async subcounters
      and is counted by blkio_group_stats_cpu->sectors; however, it still
      holds a member in blkio_group_stats_cpu->stat_arr_cpu.
      
      Rearrange stat_type_cpu and define BLKIO_STAT_CPU_ARR_NR and use it
      for stat_arr_cpu[] size so that only SERVICE_BYTES and SERVICED have
      subcounters.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      2aa4a152
    • Tejun Heo's avatar
      blkcg: remove unused @pol and @plid parameters · aaec55a0
      Tejun Heo authored
      @pol to blkg_to_pdata() and @plid to blkg_lookup_create() are no
      longer necessary.  Drop them.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      aaec55a0
    • Tejun Heo's avatar
      Merge branch 'for-3.5' of ../cgroup into block/for-3.5/core-merged · 959d851c
      Tejun Heo authored
      cgroup/for-3.5 contains the following changes which blk-cgroup needs
      to proceed with the on-going cleanup.
      
      * Dynamic addition and removal of cftypes to make config/stat file
        handling modular for policies.
      
      * cgroup removal update to not wait for css references to drain to fix
        blkcg removal hang caused by cfq caching cfqgs.
      
      Pull in cgroup/for-3.5 into block/for-3.5/core.  This causes the
      following conflicts in block/blk-cgroup.c.
      
      * 761b3ef5 "cgroup: remove cgroup_subsys argument from callbacks"
        conflicts with blkiocg_pre_destroy() addition and blkiocg_attach()
        removal.  Resolved by removing @subsys from all subsys methods.
      
      * 676f7c8f "cgroup: relocate cftype and cgroup_subsys definitions in
        controllers" conflicts with ->pre_destroy() and ->attach() updates
        and removal of modular config.  Resolved by dropping forward
        declarations of the methods and applying updates to the relocated
        blkio_subsys.
      
      * 4baf6e33 "cgroup: convert all non-memcg controllers to the new
        cftype interface" builds upon the previous item.  Resolved by adding
        ->base_cftypes to the relocated blkio_subsys.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      959d851c
    • Tejun Heo's avatar
      cgroup: make css->refcnt clearing on cgroup removal optional · 48ddbe19
      Tejun Heo authored
      Currently, cgroup removal tries to drain all css references.  If there
      are active css references, the removal logic waits and retries
      ->pre_detroy() until either all refs drop to zero or removal is
      cancelled.
      
      This semantics is unusual and adds non-trivial complexity to cgroup
      core and IMHO is fundamentally misguided in that it couples internal
      implementation details (references to internal data structure) with
      externally visible operation (rmdir).  To userland, this is a behavior
      peculiarity which is unnecessary and difficult to expect (css refs is
      otherwise invisible from userland), and, to policy implementations,
      this is an unnecessary restriction (e.g. blkcg wants to hold css refs
      for caching purposes but can't as that becomes visible as rmdir hang).
      
      Unfortunately, memcg currently depends on ->pre_destroy() retrials and
      cgroup removal vetoing and can't be immmediately switched to the new
      behavior.  This patch introduces the new behavior of not waiting for
      css refs to drain and maintains the old behavior for subsystems which
      have __DEPRECATED_clear_css_refs set.
      
      Once, memcg is updated, we can drop the code paths for the old
      behavior as proposed in the following patch.  Note that the following
      patch is incorrect in that dput work item is in cgroup and may lose
      some of dputs when multiples css's are released back-to-back, and
      __css_put() triggers check_for_release() when refcnt reaches 0 instead
      of 1; however, it shows what part can be removed.
      
        http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251
      
      Note that, in not-too-distant future, cgroup core will start emitting
      warning messages for subsys which require the old behavior, so please
      get moving.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      48ddbe19
    • Tejun Heo's avatar
      cgroup: use negative bias on css->refcnt to block css_tryget() · 28b4c27b
      Tejun Heo authored
      When a cgroup is about to be removed, cgroup_clear_css_refs() is
      called to check and ensure that there are no active css references.
      
      This is currently achieved by dropping the refcnt to zero iff it has
      only the base ref.  If all css refs could be dropped to zero, ref
      clearing is successful and CSS_REMOVED is set on all css.  If not, the
      base ref is restored.  While css ref is zero w/o CSS_REMOVED set, any
      css_tryget() attempt on it busy loops so that they are atomic
      w.r.t. the whole css ref clearing.
      
      This does work but dropping and re-instating the base ref is somewhat
      hairy and makes it difficult to add more logic to the put path as
      there are two of them - the regular css_put() and the reversible base
      ref clearing.
      
      This patch updates css ref clearing such that blocking new
      css_tryget() and putting the base ref are separate operations.
      CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and
      css_tryget() busy loops while refcnt is negative.  After all css refs
      are deactivated, if they were all one, ref clearing succeeded and
      CSS_REMOVED is set and the base ref is put using the regular
      css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts
      and the original postive values are restored.
      
      css_refcnt() accessor which always returns the unbiased positive
      reference counts is added and used to simplify refcnt usages.  While
      at it, relocate and reformat comments in cgroup_has_css_refs().
      
      This separates css->refcnt deactivation and putting the base ref,
      which enables the next patch to make ref clearing optional.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      28b4c27b
    • Tejun Heo's avatar
      cgroup: implement cgroup_rm_cftypes() · 79578621
      Tejun Heo authored
      Implement cgroup_rm_cftypes() which removes an array of cftypes from a
      subsystem.  It can be called whether the target subsys is attached or
      not.  cgroup core will remove the specified file from all existing
      cgroups.
      
      This will be used to improve sub-subsys modularity and will be helpful
      for unified hierarchy.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      79578621
    • Tejun Heo's avatar
      cgroup: introduce struct cfent · 05ef1d7c
      Tejun Heo authored
      This patch adds cfent (cgroup file entry) which is the association
      between a cgroup and a file.  This is in-cgroup representation of
      files under a cgroup directory.  This simplifies walking walking
      cgroup files and thus cgroup_clear_directory(), which is now
      implemented in two parts - cgroup_rm_file() and a loop around it.
      
      cgroup_rm_file() will be used to implement cftype removal and cfent is
      scheduled to serve cgroup specific per-file data (e.g. for sysfs-like
      "sever" semantics).
      
      v2: - cfe was freed from cgroup_rm_file() which led to use-after-free
            if the file had openers at the time of removal.  Moved to
            cgroup_diput().
      
          - cgroup_clear_directory() triggered WARN_ON_ONCE() if d_subdirs
            wasn't empty after removing all files.  This triggered
            spuriously if some files were open during directory clearing.
            Removed.
      
      v3: - In cgroup_diput(), WARN_ONCE(!list_empty(&cfe->node)) could be
            spuriously triggered for root cgroups because they don't go
            through cgroup_clear_directory() on unmount.  Don't trigger WARN
            for root cgroups.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Glauber Costa <glommer@parallels.com>
      05ef1d7c
    • Tejun Heo's avatar
      cgroup: relocate __d_cgrp() and __d_cft() · f6ea9372
      Tejun Heo authored
      Move the two macros upwards as they'll be used earlier in the file.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      f6ea9372
    • Tejun Heo's avatar
      cgroup: remove cgroup_add_file[s]() · db0416b6
      Tejun Heo authored
      No controller is using cgroup_add_files[s]().  Unexport them, and
      convert cgroup_add_files() to handle NULL entry terminated array
      instead of taking count explicitly and continue creation on failure
      for internal use.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      db0416b6
    • Tejun Heo's avatar
      cgroup: convert memcg controller to the new cftype interface · 6bc10349
      Tejun Heo authored
      Convert memcg to use the new cftype based interface.  kmem support
      abuses ->populate() for mem_cgroup_sockets_init() so it can't be
      removed at the moment.
      
      tcp_memcontrol is updated so that tcp_files[] is registered via a
      __initcall.  This change also allows removing the forward declaration
      of tcp_files[].  Removed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      6bc10349
    • Tejun Heo's avatar
      memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP · af36f906
      Tejun Heo authored
      Instead of conditioning creation of memsw files on do_swap_account,
      always create the files if compiled-in and fail read/write attempts
      with -EOPNOTSUPP if !do_swap_account.
      
      This is suggested by KAMEZAWA to simplify memcg file creation so that
      it can use cgroup->subsys_cftypes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      af36f906
    • Tejun Heo's avatar
      cgroup: convert all non-memcg controllers to the new cftype interface · 4baf6e33
      Tejun Heo authored
      Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio,
      net_cls and device controllers to use the new cftype based interface.
      Termination entry is added to cftype arrays and populate callbacks are
      replaced with cgroup_subsys->base_cftypes initializations.
      
      This is functionally identical transformation.  There shouldn't be any
      visible behavior change.
      
      memcg is rather special and will be converted separately.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <paul@paulmenage.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      4baf6e33
    • Tejun Heo's avatar
      cgroup: relocate cftype and cgroup_subsys definitions in controllers · 676f7c8f
      Tejun Heo authored
      blk-cgroup, netprio_cgroup, cls_cgroup and tcp_memcontrol
      unnecessarily define cftype array and cgroup_subsys structures at the
      top of the file, which is unconventional and necessiates forward
      declaration of methods.
      
      This patch relocates those below the definitions of the methods and
      removes the forward declarations.  Note that forward declaration of
      tcp_files[] is added in tcp_memcontrol.c for tcp_init_cgroup().  This
      will be removed soon by another patch.
      
      This patch doesn't introduce any functional change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      676f7c8f
    • Tejun Heo's avatar
      cgroup: merge cft_release_agent cftype array into the base files array · 6e6ff25b
      Tejun Heo authored
      Now that cftype can express whether a file should only be on root,
      cft_release_agent can be merged into the base files cftypes array.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      6e6ff25b
    • Tejun Heo's avatar
      cgroup: implement cgroup_add_cftypes() and friends · 8e3f6541
      Tejun Heo authored
      Currently, cgroup directories are populated by subsys->populate()
      callback explicitly creating files on each cgroup creation.  This
      level of flexibility isn't needed or desirable.  It provides largely
      unused flexibility which call for abuses while severely limiting what
      the core layer can do through the lack of structure and conventions.
      
      Per each cgroup file type, the only distinction that cgroup users is
      making is whether a cgroup is root or not, which can easily be
      expressed with flags.
      
      This patch introduces cgroup_add_cftypes().  These deal with cftypes
      instead of individual files - controllers indicate that certain types
      of files exist for certain subsystem.  Newly added CFTYPE_*_ON_ROOT
      flags indicate whether a cftype should be excluded or created only on
      the root cgroup.
      
      cgroup_add_cftypes() can be called any time whether the target
      subsystem is currently attached or not.  cgroup core will create files
      on the existing cgroups as necessary.
      
      Also, cgroup_subsys->base_cftypes is added to ease registration of the
      base files for the subsystem.  If non-NULL on subsys init, the cftypes
      pointed to by ->base_cftypes are automatically registered on subsys
      init / load.
      
      Further patches will convert the existing users and remove the file
      based interface.  Note that this interface allows dynamic addition of
      files to an active controller.  This will be used for sub-controller
      modularity and unified hierarchy in the longer term.
      
      This patch implements the new mechanism but doesn't apply it to any
      user.
      
      v2: replaced DECLARE_CGROUP_CFTYPES[_COND]() with
          cgroup_subsys->base_cftypes, which works better for cgroup_subsys
          which is loaded as module.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      8e3f6541
    • Tejun Heo's avatar
      cgroup: build list of all cgroups under a given cgroupfs_root · b0ca5a84
      Tejun Heo authored
      Build a list of all cgroups anchored at cgroupfs_root->allcg_list and
      going through cgroup->allcg_node.  The list is protected by
      cgroup_mutex and will be used to improve cgroup file handling.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      b0ca5a84
    • Tejun Heo's avatar
      cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir() · ff4c8d50
      Tejun Heo authored
      cgroup_populate_dir() currently clears all files and then repopulate
      the directory; however, the clearing part is only useful when it's
      called from cgroup_remount().  Relocate the invocation to
      cgroup_remount().
      
      This is to prepare for further cgroup file handling updates.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      ff4c8d50
    • Tejun Heo's avatar
      cgroup: deprecate remount option changes · 8b5a5a9d
      Tejun Heo authored
      This patch marks the following features for deprecation.
      
      * Rebinding subsys by remount: Never reached useful state - only works
        on empty hierarchies.
      
      * release_agent update by remount: release_agent itself will be
        replaced with conventional fsnotify notification.
      
      v2: Lennart pointed out that "name=" is necessary for mounts w/o any
          controller attached.  Drop "name=" deprecation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      8b5a5a9d
  2. 31 Mar, 2012 15 commits
    • Linus Torvalds's avatar
      Linux 3.4-rc1 · dd775ae2
      Linus Torvalds authored
      dd775ae2
    • Linus Torvalds's avatar
      Merge branch 's3-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/amit/virtio-console · b7ffff4b
      Linus Torvalds authored
      Pull virtio S3 support patches from Amit Shah:
       "Turns out S3 is not different from S4 for virtio devices: the device
        is assumed to be reset, so the host and guest state are to be assumed
        to be out of sync upon resume.  We handle the S4 case with exactly the
        same scenario, so just point the suspend/resume routines to the
        freeze/restore ones.
      
        Once that is done, we also use the PM API's macro to initialise the
        sleep functions.
      
        A couple of cleanups are included: there's no need for special thaw
        processing in the balloon driver, so that's addressed in patches 1 and
        2.
      
        Testing: both S3 and S4 support have been tested using these patches
        using a similar method used earlier during S4 patch development: a
        guest is started with virtio-blk as the only disk, a virtio network
        card, a virtio-serial port and a virtio balloon device.  Ping from
        guest to host, dd /dev/zero to a file on the disk, and IO from the
        host on the virtio-serial port, all at once, while exercising S4 and
        S3 (separately) were tested.  They all continue to work fine after
        resume.  virtio balloon values too were tested by inflating and
        deflating the balloon."
      
      Pulling from Amit, since Rusty is off getting married (and presumably
      shaving people).
      
      * 's3-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/amit/virtio-console:
        virtio-pci: switch to PM ops macro to initialise PM functions
        virtio-pci: S3 support
        virtio-pci: drop restore_common()
        virtio: drop thaw PM operation
        virtio: balloon: Allow stats update after restore from S4
      b7ffff4b
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 8bb1f229
      Linus Torvalds authored
      Pull second try at vfs part d#2 from Al Viro:
       "Miklos' first series (with do_lookup() rewrite split into edible
        chunks) + assorted bits and pieces.
      
        The 'untangling of do_lookup()' series is is a splitup of what used to
        be a monolithic patch from Miklos, so this series is basically "how do
        I convince myself that his patch is correct (or find a hole in it)".
        No holes found and I like the resulting cleanup, so in it went..."
      
      Changes from try 1: Fix a boot problem with selinux, and commit messages
      prettied up a bit.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (24 commits)
        vfs: fix out-of-date dentry_unhash() comment
        vfs: split __lookup_hash
        untangling do_lookup() - take __lookup_hash()-calling case out of line.
        untangling do_lookup() - switch to calling __lookup_hash()
        untangling do_lookup() - merge d_alloc_and_lookup() callers
        untangling do_lookup() - merge failure exits in !dentry case
        untangling do_lookup() - massage !dentry case towards __lookup_hash()
        untangling do_lookup() - get rid of need_reval in !dentry case
        untangling do_lookup() - eliminate a loop.
        untangling do_lookup() - expand the area under ->i_mutex
        untangling do_lookup() - isolate !dentry stuff from the rest of it.
        vfs: move MAY_EXEC check from __lookup_hash()
        vfs: don't revalidate just looked up dentry
        vfs: fix d_need_lookup/d_revalidate order in do_lookup
        ext3: move headers to fs/ext3/
        migrate ext2_fs.h guts to fs/ext2/ext2.h
        new helper: ext2_image_size()
        get rid of pointless includes of ext2_fs.h
        ext2: No longer export ext2_fs.h to user space
        mtdchar: kill persistently held vfsmount
        ...
      8bb1f229
    • Linus Torvalds's avatar
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f22e08a7
      Linus Torvalds authored
      Pull scheduler fixes from Ingo Molnar.
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched: Fix incorrect usage of for_each_cpu_mask() in select_fallback_rq()
        sched: Fix __schedule_bug() output when called from an interrupt
        sched/arch: Introduce the finish_arch_post_lock_switch() scheduler callback
      f22e08a7
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f187e9fd
      Linus Torvalds authored
      Pull perf updates and fixes from Ingo Molnar:
       "It's mostly fixes, but there's also two late items:
      
         - preliminary GTK GUI support for perf report
         - PMU raw event format descriptors in sysfs, to be parsed by tooling
      
        The raw event format in sysfs is a new ABI.  For example for the 'CPU'
        PMU we have:
      
          aldebaran:~> ll /sys/bus/event_source/devices/cpu/format/*
          -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/any
          -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/cmask
          -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/edge
          -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/event
          -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/inv
          -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/offcore_rsp
          -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/pc
          -r--r--r--. 1 root root 4096 Mar 31 10:29 /sys/bus/event_source/devices/cpu/format/umask
      
        those lists of fields contain a specific format:
      
          aldebaran:~> cat /sys/bus/event_source/devices/cpu/format/offcore_rsp
          config1:0-63
      
        So, those who wish to specify raw events can now use the following
        event format:
      
          -e cpu/cmask=1,event=2,umask=3
      
        Most people will not want to specify any events (let alone raw
        events), they'll just use whatever default event the tools use.
      
        But for more obscure PMU events that have no cross-architecture
        generic events the above syntax is more usable and a bit more
        structured than specifying hex numbers."
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
        perf tools: Remove auto-generated bison/flex files
        perf annotate: Fix off by one symbol hist size allocation and hit accounting
        perf tools: Add missing ref-cycles event back to event parser
        perf annotate: addr2line wants addresses in same format as objdump
        perf probe: Finder fails to resolve function name to address
        tracing: Fix ent_size in trace output
        perf symbols: Handle NULL dso in dso__name_len
        perf symbols: Do not include libgen.h
        perf tools: Fix bug in raw sample parsing
        perf tools: Fix display of first level of callchains
        perf tools: Switch module.h into export.h
        perf: Move mmap page data_head offset assertion out of header
        perf: Fix mmap_page capabilities and docs
        perf diff: Fix to work with new hists design
        perf tools: Fix modifier to be applied on correct events
        perf tools: Fix various casting issues for 32 bits
        perf tools: Simplify event_read_id exit path
        tracing: Fix ftrace stack trace entries
        tracing: Move the tracing_on/off() declarations into CONFIG_TRACING
        perf report: Add a simple GTK2-based 'perf report' browser
        ...
      f187e9fd
    • Linus Torvalds's avatar
      Merge tag 'parisc-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6 · adb3b1f3
      Linus Torvalds authored
      Pull PARISC misc updates from James Bottomley:
       "This is a couple of minor updates (fixing lws futex locking and
        removing some obsolete cpu_*_map calls)."
      
      * tag 'parisc-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
        [PARISC] remove references to cpu_*_map.
        [PARISC] futex: Use same lock set as lws calls
      adb3b1f3
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6 · a75ee6ec
      Linus Torvalds authored
      Pull SCSI updates from James Bottomley:
       "This is primarily another round of driver updates (lpfc, bfa, fcoe,
        ipr) plus a new ufshcd driver.  There shouldn't be anything
        controversial in here (The final deletion of scsi proc_ops which
        caused some build breakage has been held over until the next merge
        window to give us more time to stabilise it).
      
        I'm afraid, with me moving continents at exactly the wrong time,
        anything submitted after the merge window opened has been held over to
        the next merge window."
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (63 commits)
        [SCSI] ipr: Driver version 2.5.3
        [SCSI] ipr: Increase alignment boundary of command blocks
        [SCSI] ipr: Increase max concurrent oustanding commands
        [SCSI] ipr: Remove unnecessary memory barriers
        [SCSI] ipr: Remove unnecessary interrupt clearing on new adapters
        [SCSI] ipr: Fix target id allocation re-use problem
        [SCSI] atp870u, mpt2sas, qla4xxx use pci_dev->revision
        [SCSI] fcoe: Drop the rtnl_mutex before calling fcoe_ctlr_link_up
        [SCSI] bfa: Update the driver version to 3.0.23.0
        [SCSI] bfa: BSG and User interface fixes.
        [SCSI] bfa: Fix to avoid vport delete hang on request queue full scenario.
        [SCSI] bfa: Move service parameter programming logic into firmware.
        [SCSI] bfa: Revised Fabric Assigned Address(FAA) feature implementation.
        [SCSI] bfa: Flash controller IOC pll init fixes.
        [SCSI] bfa: Serialize the IOC hw semaphore unlock logic.
        [SCSI] bfa: Modify ISR to process pending completions
        [SCSI] bfa: Add fc host issue lip support
        [SCSI] mpt2sas: remove extraneous sas_log_info messages
        [SCSI] libfc: fcoe_transport_create fails in single-CPU environment
        [SCSI] fcoe: reduce contention for fcoe_rx_list lock [v2]
        ...
      a75ee6ec
    • J. Bruce Fields's avatar
      vfs: fix out-of-date dentry_unhash() comment · c0d02594
      J. Bruce Fields authored
      64252c75 "vfs: remove dget() from
      dentry_unhash()" changed the implementation but not the comment.
      
      Cc: Sage Weil <sage@newdream.net>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c0d02594
    • Miklos Szeredi's avatar
      vfs: split __lookup_hash · bad61189
      Miklos Szeredi authored
      Split __lookup_hash into two component functions:
      
       lookup_dcache - tries cached lookup, returns whether real lookup is needed
       lookup_real - calls i_op->lookup
      
      This eliminates code duplication between d_alloc_and_lookup() and
      d_inode_lookup().
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      bad61189
    • Al Viro's avatar
    • Al Viro's avatar
      untangling do_lookup() - switch to calling __lookup_hash() · a3255546
      Al Viro authored
      now we have __lookup_hash() open-coded if !dentry case;
      just call the damn thing instead...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a3255546
    • Al Viro's avatar
      a6ecdfcf
    • Al Viro's avatar
      ec335e91
    • Al Viro's avatar
      untangling do_lookup() - massage !dentry case towards __lookup_hash() · d774a058
      Al Viro authored
      Reorder if-else cases for starters...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d774a058
    • Al Viro's avatar
      untangling do_lookup() - get rid of need_reval in !dentry case · 08b0ab7c
      Al Viro authored
      Everything arriving into if (!dentry) will have need_reval = 1.
      Indeed, the only way to get there with need_reval reset to 0 would
      be via
      	if (unlikely(d_need_lookup(dentry)))
      		goto unlazy;
      	if (unlikely(dentry->d_flags & DCACHE_OP_REVALIDATE)) {
      		status = d_revalidate(dentry, nd);
      	if (unlikely(status <= 0)) {
      		if (status != -ECHILD)
      			need_reval = 0;
      		goto unlazy;
      ...
      unlazy:
      	/* no assignments to dentry */
      	if (dentry && unlikely(d_need_lookup(dentry))) {
      		dput(dentry);
      		dentry = NULL;
      	}
      and if d_need_lookup() had already been false the first time around, it
      will remain false on the second call as well.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      08b0ab7c