1. 04 May, 2016 1 commit
    • Rafael J. Wysocki's avatar
      intel_pstate: Fix intel_pstate_get() · 6d45b719
      Rafael J. Wysocki authored
      After commit 8fa520af "intel_pstate: Remove freq calculation from
      intel_pstate_calc_busy()" intel_pstate_get() calls get_avg_frequency()
      to compute the average frequency, which is problematic for two reasons.
      
      First, intel_pstate_get() may be invoked before the driver reads the
      CPU feedback registers for the first time and if that happens,
      get_avg_frequency() will attempt to divide by zero.
      
      Second, the get_avg_frequency() call in intel_pstate_get() is racy
      with respect to intel_pstate_sample() and it may end up returning
      completely meaningless values for this reason.
      
      Moreover, after commit 7349ec04 "intel_pstate: Move
      intel_pstate_calc_busy() into get_target_pstate_use_performance()"
      sample.core_pct_busy is never computed on Atom, but it is used in
      intel_pstate_adjust_busy_pstate() in that case too.
      
      To address those problems notice that if sample.core_pct_busy
      was used in the average frequency computation carried out by
      get_avg_frequency(), both the divide by zero problem and the
      race with respect to intel_pstate_sample() would be avoided.
      
      Accordingly, move the invocation of intel_pstate_calc_busy() from
      get_target_pstate_use_performance() to intel_pstate_update_util(),
      which also will take care of the uninitialized sample.core_pct_busy
      on Atom, and modify get_avg_frequency() to use sample.core_pct_busy
      as per the above.
      Reported-by: default avatarkernel test robot <ying.huang@linux.intel.com>
      Link: http://marc.info/?l=linux-kernel&m=146226437623173&w=4
      Fixes: 8fa520af "intel_pstate: Remove freq calculation from intel_pstate_calc_busy()"
      Fixes: 7349ec04 "intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()"
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6d45b719
  2. 02 May, 2016 1 commit
  3. 28 Apr, 2016 1 commit
    • Sudeep Holla's avatar
      cpufreq: st: enable selective initialization based on the platform · 2482bc31
      Sudeep Holla authored
      The sti-cpufreq does unconditional registration of the cpufreq-dt driver
      which causes issue on an multi-platform build. For example, on Vexpress
      TC2 platform, we get the following error on boot:
      
      cpu cpu0: OPP-v2 not supported
      cpu cpu0: Not doing voltage scaling
      cpu: dev_pm_opp_of_cpumask_add_table: couldn't find opp table
      	for cpu:0, -19
      cpu cpu0: dev_pm_opp_get_max_volt_latency: Invalid regulator (-6)
      ...
      arm_big_little: bL_cpufreq_register: Failed registering platform driver:
      		vexpress-spc, err: -17
      
      The actual driver fails to initialise as cpufreq-dt is probed
      successfully, which is incorrect. This issue can happen to any platform
      not using cpufreq-dt in a multi-platform build.
      
      This patch adds a check to do selective initialization of the driver.
      
      Fixes: ab0ea257 (cpufreq: st: Provide runtime initialised driver for ST's platforms)
      Signed-off-by: default avatarSudeep Holla <sudeep.holla@arm.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarLee Jones <lee.jones@linaro.org>
      Cc: 4.5+ <stable@vger.kernel.org> # 4.5+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2482bc31
  4. 25 Apr, 2016 2 commits
  5. 18 Apr, 2016 1 commit
    • Rafael J. Wysocki's avatar
      cpufreq: Abort cpufreq_update_current_freq() for cpufreq_suspended set · c9d9c929
      Rafael J. Wysocki authored
      Since governor operations are generally skipped if cpufreq_suspended
      is set, cpufreq_start_governor() should do nothing in that case.
      
      That function is called in the cpufreq_online() path, and may also
      be called from cpufreq_offline() in some cases, which are invoked
      by the nonboot CPUs disabing/enabling code during system suspend
      to RAM and resume.  That happens when all devices have been
      suspended, so if the cpufreq driver relies on things like I2C to
      get the current frequency, it may not be ready to do that then.
      
      To prevent problems from happening for this reason, make
      cpufreq_update_current_freq(), which is the only function invoked
      by cpufreq_start_governor() that doesn't check cpufreq_suspended
      already, return 0 upfront if cpufreq_suspended is set.
      
      Fixes: 3bbf8fe3 (cpufreq: Always update current frequency before startig governor)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      c9d9c929
  6. 10 Apr, 2016 1 commit
    • Rafael J. Wysocki's avatar
      intel_pstate: Avoid getting stuck in high P-states when idle · ffb81056
      Rafael J. Wysocki authored
      Jörg Otte reports that commit a4675fbc (cpufreq: intel_pstate:
      Replace timers with utilization update callbacks) caused the CPUs in
      his Haswell-based system to stay in the very high frequency region
      even if the system is completely idle.
      
      That turns out to be an existing problem in the intel_pstate driver's
      P-state selection algorithm for Core processors.  Namely, all
      decisions made by that algorithm are based on the average frequency
      of the CPU between sampling events and on the P-state requested on
      the last invocation, so it may get stuck at a very hight frequency
      even if the utilization of the CPU is very low (in fact, it may get
      stuck in a inadequate P-state regardless of the CPU utilization).
      The only way to kick it out of that limbo is a sufficiently long idle
      period (3 times longer than the prescribed sampling interval), but if
      that doesn't happen often enough (eg. due to a timing change like
      after the above commit), the P-state of the CPU may be inadequate
      pretty much all the time.
      
      To address the most egregious manifestations of that issue, reset the
      core_busy value used to determine the next P-state to request if the
      utilization of the CPU, determined with the help of the MPERF
      feedback register and the TSC, is below 1%.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=115771Reported-and-tested-by: default avatarJörg Otte <jrg.otte@gmail.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ffb81056
  7. 05 Apr, 2016 3 commits
  8. 01 Apr, 2016 1 commit
    • Rafael J. Wysocki's avatar
      intel_pstate: Avoid extra invocation of intel_pstate_sample() · febce40f
      Rafael J. Wysocki authored
      The initialization of intel_pstate for a given CPU involves populating
      the fields of its struct cpudata that represent the previous sample,
      but currently that is done in a problematic way.
      
      Namely, intel_pstate_init_cpu() makes an extra call to
      intel_pstate_sample() so it reads the current register values that
      will be used to populate the "previous sample" record during the
      next invocation of intel_pstate_sample().  However, after commit
      a4675fbc (cpufreq: intel_pstate: Replace timers with utilization
      update callbacks) that doesn't work for last_sample_time, because
      the time value is passed to intel_pstate_sample() as an argument now.
      Passing 0 to it from intel_pstate_init_cpu() is problematic, because
      that causes cpu->last_sample_time == 0 to be visible in
      get_target_pstate_use_performance() (and hence the extra
      cpu->last_sample_time > 0 check in there) and effectively allows
      the first invocation of intel_pstate_sample() from
      intel_pstate_update_util() to happen immediately after the
      initialization which may lead to a significant "turn on"
      effect in the governor algorithm.
      
      To mitigate that issue, rework the initialization to avoid the
      extra intel_pstate_sample() call from intel_pstate_init_cpu().
      Instead, make intel_pstate_sample() return false if it has been
      called with cpu->sample.time equal to zero, which will make
      intel_pstate_update_util() skip the sample in that case, and
      reset cpu->sample.time from intel_pstate_set_update_util_hook()
      to make the algorithm start properly every time the hook is set.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      febce40f
  9. 31 Mar, 2016 1 commit
    • Rafael J. Wysocki's avatar
      intel_pstate: Do not set utilization update hook too early · bb6ab52f
      Rafael J. Wysocki authored
      The utilization update hook in the intel_pstate driver is set too
      early, as it only should be set after the policy has been fully
      initialized by the core.  That may cause intel_pstate_update_util()
      to use incorrect data and put the CPUs into incorrect P-states as
      a result.
      
      To prevent that from happening, make intel_pstate_set_policy() set
      the utilization update hook instead of intel_pstate_init_cpu() so
      intel_pstate_update_util() only runs when all things have been
      initialized as appropriate.
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      bb6ab52f
  10. 22 Mar, 2016 7 commits
  11. 19 Mar, 2016 2 commits
    • Richard Cochran's avatar
      cpufreq: acpi-cpufreq: Clean up hot plug notifier callback · ed72662a
      Richard Cochran authored
      This driver has two issues.  First, it tries to fiddle with the hot
      plugged CPU's MSR on the UP_PREPARE event, at a time when the CPU is
      not yet online.  Second, the driver sets the "boost-disable" bit for a
      CPU when going down, but does not clear the bit again if the CPU comes
      up again due to DOWN_FAILED.
      
      This patch fixes the issues by changing the driver to react to the
      ONLINE/DOWN_FAILED events instead of UP_PREPARE.  As an added benefit,
      the driver also becomes symmetric with respect to the hot plug
      mechanism.
      Signed-off-by: default avatarRichard Cochran <rcochran@linutronix.de>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ed72662a
    • Rafael J. Wysocki's avatar
      intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts · fdfdb2b1
      Rafael J. Wysocki authored
      After commit a4675fbc (cpufreq: intel_pstate: Replace timers with
      utilization update callbacks) wrmsrl_on_cpu() cannot be called in the
      intel_pstate_adjust_busy_pstate() path as that is executed with
      disabled interrupts.  However, atom_set_pstate() called from there
      via intel_pstate_set_pstate() uses wrmsrl_on_cpu() to update the
      IA32_PERF_CTL MSR which triggers the WARN_ON_ONCE() in
      smp_call_function_single().
      
      The reason why wrmsrl_on_cpu() is used by atom_set_pstate() is
      because intel_pstate_set_pstate() calling it is also invoked during
      the initialization and cleanup of the driver and in those cases it is
      not guaranteed to be run on the CPU that is being updated.  However,
      in the case when intel_pstate_set_pstate() is called by
      intel_pstate_adjust_busy_pstate(), wrmsrl() can be used to update
      the register safely.  Moreover, intel_pstate_set_pstate() already
      contains code that only is executed if the function is called by
      intel_pstate_adjust_busy_pstate() and there is a special argument
      passed to it because of that.
      
      To fix the problem at hand, rearrange the code taking the above
      observations into account.
      
      First, replace the ->set() callback in struct pstate_funcs with a
      ->get_val() one that will return the value to be written to the
      IA32_PERF_CTL MSR without updating the register.
      
      Second, split intel_pstate_set_pstate() into two functions,
      intel_pstate_update_pstate() to be called by
      intel_pstate_adjust_busy_pstate() that will contain all of the
      intel_pstate_set_pstate() code which only needs to be executed in
      that case and will use wrmsrl() to update the MSR (after obtaining
      the value to write to it from the ->get_val() callback), and
      intel_pstate_set_min_pstate() to be invoked during the
      initialization and cleanup that will set the P-state to the
      minimum one and will update the MSR using wrmsrl_on_cpu().
      
      Finally, move the code shared between intel_pstate_update_pstate()
      and intel_pstate_set_min_pstate() to a new static inline function
      intel_pstate_record_pstate() and make them both call it.
      
      Of course, that unifies the handling of the IA32_PERF_CTL MSR writes
      between Atom and Core.
      
      Fixes: a4675fbc (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
      Reported-and-tested-by: default avatarJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      fdfdb2b1
  12. 18 Mar, 2016 1 commit
  13. 10 Mar, 2016 7 commits
  14. 09 Mar, 2016 11 commits