1. 25 Apr, 2016 4 commits
  2. 18 Apr, 2016 1 commit
    • Rafael J. Wysocki's avatar
      cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set · c9d9c929
      Rafael J. Wysocki authored
      Since governor operations are generally skipped if cpufreq_suspended
      is set, cpufreq_start_governor() should do nothing in that case.
      
      That function is called in the cpufreq_online() path, and may also
      be called from cpufreq_offline() in some cases, which are invoked
      by the nonboot CPUs disabing/enabling code during system suspend
      to RAM and resume.  That happens when all devices have been
      suspended, so if the cpufreq driver relies on things like I2C to
      get the current frequency, it may not be ready to do that then.
      
      To prevent problems from happening for this reason, make
      cpufreq_update_current_freq(), which is the only function invoked
      by cpufreq_start_governor() that doesn't check cpufreq_suspended
      already, return 0 upfront if cpufreq_suspended is set.
      
      Fixes: 3bbf8fe3 (cpufreq: Always update current frequency before startig governor)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      c9d9c929
  3. 10 Apr, 2016 1 commit
    • Rafael J. Wysocki's avatar
      intel_pstate: Avoid getting stuck in high P-states when idle · ffb81056
      Rafael J. Wysocki authored
      Jörg Otte reports that commit a4675fbc (cpufreq: intel_pstate:
      Replace timers with utilization update callbacks) caused the CPUs in
      his Haswell-based system to stay in the very high frequency region
      even if the system is completely idle.
      
      That turns out to be an existing problem in the intel_pstate driver's
      P-state selection algorithm for Core processors.  Namely, all
      decisions made by that algorithm are based on the average frequency
      of the CPU between sampling events and on the P-state requested on
      the last invocation, so it may get stuck at a very hight frequency
      even if the utilization of the CPU is very low (in fact, it may get
      stuck in a inadequate P-state regardless of the CPU utilization).
      The only way to kick it out of that limbo is a sufficiently long idle
      period (3 times longer than the prescribed sampling interval), but if
      that doesn't happen often enough (eg. due to a timing change like
      after the above commit), the P-state of the CPU may be inadequate
      pretty much all the time.
      
      To address the most egregious manifestations of that issue, reset the
      core_busy value used to determine the next P-state to request if the
      utilization of the CPU, determined with the help of the MPERF
      feedback register and the TSC, is below 1%.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771Reported-and-tested-by: default avatarJörg Otte <jrg.otte@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ffb81056
  4. 08 Apr, 2016 17 commits
  5. 05 Apr, 2016 3 commits
  6. 01 Apr, 2016 9 commits
    • Rafael J. Wysocki's avatar
      cpufreq: schedutil: New governor based on scheduler utilization data · 9bdcb44e
      Rafael J. Wysocki authored
      Add a new cpufreq scaling governor, called "schedutil", that uses
      scheduler-provided CPU utilization information as input for making
      its decisions.
      
      Doing that is possible after commit 34e2c555 (cpufreq: Add
      mechanism for registering utilization update callbacks) that
      introduced cpufreq_update_util() called by the scheduler on
      utilization changes (from CFS) and RT/DL task status updates.
      In particular, CPU frequency scaling decisions may be based on
      the the utilization data passed to cpufreq_update_util() by CFS.
      
      The new governor is relatively simple.
      
      The frequency selection formula used by it depends on whether or not
      the utilization is frequency-invariant.  In the frequency-invariant
      case the new CPU frequency is given by
      
      	next_freq = 1.25 * max_freq * util / max
      
      where util and max are the last two arguments of cpufreq_update_util().
      In turn, if util is not frequency-invariant, the maximum frequency in
      the above formula is replaced with the current frequency of the CPU:
      
      	next_freq = 1.25 * curr_freq * util / max
      
      The coefficient 1.25 corresponds to the frequency tipping point at
      (util / max) = 0.8.
      
      All of the computations are carried out in the utilization update
      handlers provided by the new governor.  One of those handlers is
      used for cpufreq policies shared between multiple CPUs and the other
      one is for policies with one CPU only (and therefore it doesn't need
      to use any extra synchronization means).
      
      The governor supports fast frequency switching if that is supported
      by the cpufreq driver in use and possible for the given policy.
      In the fast switching case, all operations of the governor take
      place in its utilization update handlers.  If fast switching cannot
      be used, the frequency switch operations are carried out with the
      help of a work item which only calls __cpufreq_driver_target()
      (under a mutex) to trigger a frequency update (to a value already
      computed beforehand in one of the utilization update handlers).
      
      Currently, the governor treats all of the RT and DL tasks as
      "unknown utilization" and sets the frequency to the allowed
      maximum when updated from the RT or DL sched classes.  That
      heavy-handed approach should be replaced with something more
      subtle and specifically targeted at RT and DL tasks.
      
      The governor shares some tunables management code with the
      "ondemand" and "conservative" governors and uses some common
      definitions from cpufreq_governor.h, but apart from that it
      is stand-alone.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      9bdcb44e
    • Rafael J. Wysocki's avatar
      cpufreq: Support for fast frequency switching · b7898fda
      Rafael J. Wysocki authored
      Modify the ACPI cpufreq driver to provide a method for switching
      CPU frequencies from interrupt context and update the cpufreq core
      to support that method if available.
      
      Introduce a new cpufreq driver callback, ->fast_switch, to be
      invoked for frequency switching from interrupt context by (future)
      governors supporting that feature via (new) helper function
      cpufreq_driver_fast_switch().
      
      Add two new policy flags, fast_switch_possible, to be set by the
      cpufreq driver if fast frequency switching can be used for the
      given policy and fast_switch_enabled, to be set by the governor
      if it is going to use fast frequency switching for the given
      policy.  Also add a helper for setting the latter.
      
      Since fast frequency switching is inherently incompatible with
      cpufreq transition notifiers, make it possible to set the
      fast_switch_enabled only if there are no transition notifiers
      already registered and make the registration of new transition
      notifiers fail if fast_switch_enabled is set for at least one
      policy.
      
      Implement the ->fast_switch callback in the ACPI cpufreq driver
      and make it set fast_switch_possible during policy initialization
      as appropriate.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      b7898fda
    • Rafael J. Wysocki's avatar
      cpufreq: Move governor symbols to cpufreq.h · 379480d8
      Rafael J. Wysocki authored
      Move definitions of symbols related to transition latency and
      sampling rate to include/linux/cpufreq.h so they can be used by
      (future) goverernors located outside of drivers/cpufreq/.
      
      No functional changes.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      379480d8
    • Rafael J. Wysocki's avatar
      cpufreq: Move governor attribute set headers to cpufreq.h · 66893b6a
      Rafael J. Wysocki authored
      Move definitions and function headers related to struct gov_attr_set
      to include/linux/cpufreq.h so they can be used by (future) goverernors
      located outside of drivers/cpufreq/.
      
      No functional changes.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      66893b6a
    • Rafael J. Wysocki's avatar
      cpufreq: governor: Move abstract gov_attr_set code to seperate file · 2d0c58ad
      Rafael J. Wysocki authored
      Move abstract code related to struct gov_attr_set to a separate (new)
      file so it can be shared with (future) goverernors that won't share
      more code with "ondemand" and "conservative".
      
      No intentional functional changes.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      2d0c58ad
    • Rafael J. Wysocki's avatar
      cpufreq: governor: New data type for management part of dbs_data · 0dd3c1d6
      Rafael J. Wysocki authored
      In addition to fields representing governor tunables, struct dbs_data
      contains some fields needed for the management of objects of that
      type.  As it turns out, that part of struct dbs_data may be shared
      with (future) governors that won't use the common code used by
      "ondemand" and "conservative", so move it to a separate struct type
      and modify the code using struct dbs_data to follow.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      0dd3c1d6
    • Rafael J. Wysocki's avatar
      cpufreq: sched: Helpers to add and remove update_util hooks · 0bed612b
      Rafael J. Wysocki authored
      Replace the single helper for adding and removing cpufreq utilization
      update hooks, cpufreq_set_update_util_data(), with a pair of helpers,
      cpufreq_add_update_util_hook() and cpufreq_remove_update_util_hook(),
      and modify the users of cpufreq_set_update_util_data() accordingly.
      
      With the new helpers, the code using them doesn't need to worry
      about the internals of struct update_util_data and in particular
      it doesn't need to worry about populating the func field in it
      properly upfront.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      0bed612b
    • Rafael J. Wysocki's avatar
      Merge back intel_pstate fixes for v4.6. · 9fa64d64
      Rafael J. Wysocki authored
      * pm-cpufreq:
        intel_pstate: Avoid extra invocation of intel_pstate_sample()
        intel_pstate: Do not set utilization update hook too early
      9fa64d64
    • Rafael J. Wysocki's avatar
      intel_pstate: Avoid extra invocation of intel_pstate_sample() · febce40f
      Rafael J. Wysocki authored
      The initialization of intel_pstate for a given CPU involves populating
      the fields of its struct cpudata that represent the previous sample,
      but currently that is done in a problematic way.
      
      Namely, intel_pstate_init_cpu() makes an extra call to
      intel_pstate_sample() so it reads the current register values that
      will be used to populate the "previous sample" record during the
      next invocation of intel_pstate_sample().  However, after commit
      a4675fbc (cpufreq: intel_pstate: Replace timers with utilization
      update callbacks) that doesn't work for last_sample_time, because
      the time value is passed to intel_pstate_sample() as an argument now.
      Passing 0 to it from intel_pstate_init_cpu() is problematic, because
      that causes cpu->last_sample_time == 0 to be visible in
      get_target_pstate_use_performance() (and hence the extra
      cpu->last_sample_time > 0 check in there) and effectively allows
      the first invocation of intel_pstate_sample() from
      intel_pstate_update_util() to happen immediately after the
      initialization which may lead to a significant "turn on"
      effect in the governor algorithm.
      
      To mitigate that issue, rework the initialization to avoid the
      extra intel_pstate_sample() call from intel_pstate_init_cpu().
      Instead, make intel_pstate_sample() return false if it has been
      called with cpu->sample.time equal to zero, which will make
      intel_pstate_update_util() skip the sample in that case, and
      reset cpu->sample.time from intel_pstate_set_update_util_hook()
      to make the algorithm start properly every time the hook is set.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      febce40f
  7. 31 Mar, 2016 1 commit
    • Rafael J. Wysocki's avatar
      intel_pstate: Do not set utilization update hook too early · bb6ab52f
      Rafael J. Wysocki authored
      The utilization update hook in the intel_pstate driver is set too
      early, as it only should be set after the policy has been fully
      initialized by the core.  That may cause intel_pstate_update_util()
      to use incorrect data and put the CPUs into incorrect P-states as
      a result.
      
      To prevent that from happening, make intel_pstate_set_policy() set
      the utilization update hook instead of intel_pstate_init_cpu() so
      intel_pstate_update_util() only runs when all things have been
      initialized as appropriate.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      bb6ab52f
  8. 26 Mar, 2016 4 commits
    • Linus Torvalds's avatar
      Linux 4.6-rc1 · f55532a0
      Linus Torvalds authored
      f55532a0
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · d5a38f6e
      Linus Torvalds authored
      Pull Ceph updates from Sage Weil:
       "There is quite a bit here, including some overdue refactoring and
        cleanup on the mon_client and osd_client code from Ilya, scattered
        writeback support for CephFS and a pile of bug fixes from Zheng, and a
        few random cleanups and fixes from others"
      
      [ I already decided not to pull this because of it having been rebased
        recently, but ended up changing my mind after all.  Next time I'll
        really hold people to it.  Oh well.   - Linus ]
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
        libceph: use KMEM_CACHE macro
        ceph: use kmem_cache_zalloc
        rbd: use KMEM_CACHE macro
        ceph: use lookup request to revalidate dentry
        ceph: kill ceph_get_dentry_parent_inode()
        ceph: fix security xattr deadlock
        ceph: don't request vxattrs from MDS
        ceph: fix mounting same fs multiple times
        ceph: remove unnecessary NULL check
        ceph: avoid updating directory inode's i_size accidentally
        ceph: fix race during filling readdir cache
        libceph: use sizeof_footer() more
        ceph: kill ceph_empty_snapc
        ceph: fix a wrong comparison
        ceph: replace CURRENT_TIME by current_fs_time()
        ceph: scattered page writeback
        libceph: add helper that duplicates last extent operation
        libceph: enable large, variable-sized OSD requests
        libceph: osdc->req_mempool should be backed by a slab pool
        libceph: make r_request msg_size calculation clearer
        ...
      d5a38f6e
    • Linus Torvalds's avatar
      Merge tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 698f415c
      Linus Torvalds authored
      Pull orangefs filesystem from Mike Marshall.
      
      This finally merges the long-pending orangefs filesystem, which has been
      much cleaned up with input from Al Viro over the last six months.  From
      the documentation file:
      
       "OrangeFS is an LGPL userspace scale-out parallel storage system.  It
        is ideal for large storage problems faced by HPC, BigData, Streaming
        Video, Genomics, Bioinformatics.
      
        Orangefs, originally called PVFS, was first developed in 1993 by Walt
        Ligon and Eric Blumer as a parallel file system for Parallel Virtual
        Machine (PVM) as part of a NASA grant to study the I/O patterns of
        parallel programs.
      
        Orangefs features include:
      
          - Distributes file data among multiple file servers
          - Supports simultaneous access by multiple clients
          - Stores file data and metadata on servers using local file system
            and access methods
          - Userspace implementation is easy to install and maintain
          - Direct MPI support
          - Stateless"
      
      see Documentation/filesystems/orangefs.txt for more in-depth details.
      
      * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits)
        orangefs: fix orangefs_superblock locking
        orangefs: fix do_readv_writev() handling of error halfway through
        orangefs: have ->kill_sb() evict the VFS side of things first
        orangefs: sanitize ->llseek()
        orangefs-bufmap.h: trim unused junk
        orangefs: saner calling conventions for getting a slot
        orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
        orangefs: get rid of readdir_handle_s
        ornagefs: ensure that truncate has an up to date inode size
        orangefs: move code which sets i_link to orangefs_inode_getattr
        orangefs: remove needless wrapper around GFP_KERNEL
        orangefs: remove wrapper around mutex_lock(&inode->i_mutex)
        orangefs: refactor inode type or link_target change detection
        orangefs: use new getattr for revalidate and remove old getattr
        orangefs: use new getattr in inode getattr and permission
        orangefs: use new orangefs_inode_getattr to get size in write and llseek
        orangefs: use new orangefs_inode_getattr to create new inodes
        orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr
        orangefs: remove inode->i_lock wrapper
        orangefs: put register_chrdev immediately before register_filesystem
        ...
      698f415c
    • Linus Torvalds's avatar
      Merge tag 'ntb-4.6' of git://github.com/jonmason/ntb · b4cec5f6
      Linus Torvalds authored
      Pull NTB bug fixes from Jon Mason:
       "NTB bug fixes for tasklet from spinning forever, link errors,
        translation window setup, NULL ptr dereference, and ntb-perf errors.
      
        Also, a modification to the driver API that makes _addr functions
        optional"
      
      * tag 'ntb-4.6' of git://github.com/jonmason/ntb:
        NTB: Remove _addr functions from ntb_hw_amd
        NTB: Make _addr functions optional in the API
        NTB: Fix incorrect clean up routine in ntb_perf
        NTB: Fix incorrect return check in ntb_perf
        ntb: fix possible NULL dereference
        ntb: add missing setup of translation window
        ntb: stop link work when we do not have memory
        ntb: stop tasklet from spinning forever during shutdown.
        ntb: perf test: fix address space confusion
      b4cec5f6