1. 15 Jun, 2015 4 commits
    • Shailendra Verma's avatar
    • Viresh Kumar's avatar
      cpufreq: governor: Serialize governor callbacks · 732b6d61
      Viresh Kumar authored
      There are several races reported in cpufreq core around governors (only
      ondemand and conservative) by different people.
      
      There are at least two race scenarios present in governor code:
       (a) Concurrent access/updates of governor internal structures.
      
       It is possible that fields such as 'dbs_data->usage_count', etc.  are
       accessed simultaneously for different policies using same governor
       structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
       because of this we can dereference bad pointers.
      
       For example consider a system with two CPUs with separate 'struct
       cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
       CPU0 switching to powersave and CPU1 to ondemand:
      	CPU0				CPU1
      
      	store*				store*
      
      	cpufreq_governor_exit()		cpufreq_governor_init()
      					dbs_data = cdata->gdbs_data;
      
      	if (!--dbs_data->usage_count)
      		kfree(dbs_data);
      
      					dbs_data->usage_count++;
      					*Bad pointer dereference*
      
       There are other races possible between EXIT and START/STOP/LIMIT as
       well. Its really complicated.
      
       (b) Switching governor state in bad sequence:
      
       For example trying to switch a governor to START state, when the
       governor is in EXIT state. There are some checks present in
       __cpufreq_governor() but they aren't sufficient as they compare events
       against 'policy->governor_enabled', where as we need to take governor's
       state into account, which can be used by multiple policies.
      
      These two issues need to be solved separately and the responsibility
      should be properly divided between cpufreq and governor core.
      
      The first problem is more about the governor core, as it needs to
      protect its structures properly. And the second problem should be fixed
      in cpufreq core instead of governor, as its all about sequence of
      events.
      
      This patch is trying to solve only the first problem.
      
      There are two types of data we need to protect,
      - 'struct common_dbs_data': No matter what, there is going to be a
        single copy of this per governor.
      - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
        will have per-policy copy of this data, otherwise a single copy.
      
      Because of such complexities, the mutex present in 'struct dbs_data' is
      insufficient to solve our problem. For example we need to protect
      fetching of 'dbs_data' from different structures at the beginning of
      cpufreq_governor_dbs(), to make sure it isn't currently being updated.
      
      This can be fixed if we can guarantee serialization of event parsing
      code for an individual governor. This is best solved with a mutex per
      governor, and the placeholder for that is 'struct common_dbs_data'.
      
      And so this patch moves the mutex from 'struct dbs_data' to 'struct
      common_dbs_data' and takes it at the beginning and drops it at the end
      of cpufreq_governor_dbs().
      
      Tested with and without following configuration options:
      
      CONFIG_LOCKDEP_SUPPORT=y
      CONFIG_DEBUG_RT_MUTEXES=y
      CONFIG_DEBUG_PI_LIST=y
      CONFIG_DEBUG_SPINLOCK=y
      CONFIG_DEBUG_MUTEXES=y
      CONFIG_DEBUG_LOCK_ALLOC=y
      CONFIG_PROVE_LOCKING=y
      CONFIG_LOCKDEP=y
      CONFIG_DEBUG_ATOMIC_SLEEP=y
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      732b6d61
    • Viresh Kumar's avatar
      cpufreq: governor: split cpufreq_governor_dbs() · 714a2d9c
      Viresh Kumar authored
      cpufreq_governor_dbs() is hardly readable, it is just too big and
      complicated. Lets make it more readable by splitting out event specific
      routines.
      
      Order of statements is changed at few places, but that shouldn't bring
      any functional change.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      714a2d9c
    • Viresh Kumar's avatar
      cpufreq: governor: register notifier from cs_init() · 8e0484d2
      Viresh Kumar authored
      Notifiers are required only for conservative governor and the common
      governor code is unnecessarily polluted with that. Handle that from
      cs_init/exit() instead of cpufreq_governor_dbs().
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8e0484d2
  2. 10 Jun, 2015 7 commits
    • Viresh Kumar's avatar
      cpufreq: Remove cpufreq_update_policy() · 37829029
      Viresh Kumar authored
      cpufreq_update_policy() was kept as a separate routine earlier as it was
      handling migration of sysfs directories, which isn't the case anymore.
      It is only updating policy->cpu now and is called by a single caller.
      
      The WARN_ON() isn't really required anymore, as we are just updating the
      cpu now, not moving the sysfs directories.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      37829029
    • Viresh Kumar's avatar
      cpufreq: Restart governor as soon as possible · 9591becb
      Viresh Kumar authored
      __cpufreq_remove_dev_finish() is doing two things today:
      - Restarts the governor if some CPUs from concerned policy are still
        online.
      - Frees the policy if all CPUs are offline.
      
      The first task of restarting the governor can be moved to
      __cpufreq_remove_dev_prepare() to restart the governor early. There is
      no race between _prepare() and _finish() as they would be handling
      completely different cases. _finish() will only be required if we are
      going to free the policy and that has nothing to do with restarting the
      governor.
      Original-by: default avatarSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9591becb
    • Viresh Kumar's avatar
      cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free() · 3654c5cc
      Viresh Kumar authored
      cpufreq_policy_put_kobj() is actually part of freeing the policy and can
      be called from cpufreq_policy_free() directly instead of a separate
      call.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3654c5cc
    • Viresh Kumar's avatar
      cpufreq: Initialize policy->kobj while allocating policy · 2fc3384d
      Viresh Kumar authored
      policy->kobj is required to be initialized once in the lifetime of a
      policy.  Currently we are initializing it from __cpufreq_add_dev() and
      that doesn't look to be the best place for doing so as we have to do
      this on special cases (like: !recover_policy).
      
      We can initialize it from a more obvious place cpufreq_policy_alloc()
      and that will make code look cleaner, specially the error handling part.
      
      The error handling part of __cpufreq_add_dev() was doing almost the same
      thing while recover_policy is true or false. Fix that as well by always
      calling cpufreq_policy_put_kobj() with an additional parameter to skip
      notification part of it.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2fc3384d
    • Viresh Kumar's avatar
      cpufreq: Stop migrating sysfs files on hotplug · 87549141
      Viresh Kumar authored
      When we hot-unplug a cpu, we remove its sysfs cpufreq directory and if
      the outgoing cpu was the owner of policy->kobj earlier then we migrate
      the sysfs directory to under another online cpu.
      
      There are few disadvantages this brings:
      - Code Complexity
      - Slower hotplug/suspend/resume
      - sysfs file permissions are reset after all policy->cpus are offlined
      - CPUFreq stats history lost after all policy->cpus are offlined
      - Special management of sysfs stuff during suspend/resume
      
      To overcome these, this patch modifies the way sysfs directories are
      managed:
      - Select sysfs kobjects owner while initializing policy and don't change
        it during hotplugs. Track it with kobj_cpu created earlier.
      
      - Create symlinks for all related CPUs (can be offline) instead of
        affected CPUs on policy initialization and remove them only when the
        policy is freed.
      
      - Free policy structure only on the removal of cpufreq-driver and not
        during hotplug/suspend/resume, detected by checking 'struct
        subsys_interface *' (Valid only when called from
        subsys_interface_unregister() while unregistering driver).
      
      Apart from this, special care is taken to handle physical hoplug of CPUs
      as we wouldn't remove sysfs links or remove policies on logical
      hotplugs. Physical hotplug happens in the following sequence.
      
      Hot removal:
      - CPU is offlined first, ~ 'echo 0 >
        /sys/devices/system/cpu/cpuX/online'
      - Then its device is removed along with all sysfs files, cpufreq core
        notified with cpufreq_remove_dev() callback from subsys-interface..
      
      Hot addition:
      - First the device along with its sysfs files is added, cpufreq core
        notified with cpufreq_add_dev() callback from subsys-interface..
      - CPU is onlined, ~ 'echo 1 > /sys/devices/system/cpu/cpuX/online'
      
      We call the same routines with both hotplug and subsys callbacks, and we
      sense physical hotplug with cpu_offline() check in subsys callback. We
      can handle most of the stuff with regular hotplug callback paths and
      add/remove cpufreq sysfs links or free policy from subsys callbacks.
      Original-by: default avatarSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      87549141
    • Viresh Kumar's avatar
      cpufreq: Don't allow updating inactive policies from sysfs · 11e584cf
      Viresh Kumar authored
      Later commits would change the way policies are managed today. Policies
      wouldn't be freed on cpu hotplug (currently they aren't freed only for
      suspend), and while the CPU is offline, the sysfs cpufreq files would
      still be present.
      
      User may accidentally try to update the sysfs files in following
      directory: '/sys/devices/system/cpu/cpuX/cpufreq/'. And that would
      result in undefined behavior as policy wouldn't be active then.
      
      Apart from updating the store() routine, we also update __cpufreq_get()
      which can call cpufreq_out_of_sync(). The later routine tries to update
      policy->cur and starts notifying kernel about it.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      11e584cf
    • Doug Smythies's avatar
      intel_pstate: Force setting target pstate when required · 6c1e4591
      Doug Smythies authored
      During initialization and exit it is possible that the target pstate
      might not actually be set. Furthermore, the result can be that the
      driver and the processor are out of synch and, under some conditions,
      the driver might never send the processor the proper target pstate.
      
      This patch adds a bypass or do_checks flag to the call to
      intel_pstate_set_pstate. If bypass, then specifically bypass clamp
      checks and the do not send if it is the same as last time check. If
      do_checks, then, and as before, do the current policy clamp checks,
      and do not do actual send if the new target is the same as the old.
      Signed-off-by: default avatarDoug Smythies <dsmythies@telus.net>
      Reported-by: default avatarMarien Zwart <marien.zwart@gmail.com>
      Reported-by: default avatarAlex Lochmann <alexander.lochmann@tu-dortmund.de>
      Reported-by: default avatarPiotr Ko?aczkowski <pkolaczk@gmail.com>
      Reported-by: default avatarClemens Eisserer <linuxhippy@gmail.com>
      Tested-by: default avatarMarien Zwart <marien.zwart@gmail.com>
      Tested-by: default avatarDoug Smythies <dsmythies@telus.net>
      [ rjw: Dropped pointless symbol definitions, rebased ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6c1e4591
  3. 09 Jun, 2015 1 commit
  4. 22 May, 2015 2 commits
  5. 15 May, 2015 6 commits
  6. 14 May, 2015 1 commit
  7. 12 May, 2015 1 commit
  8. 07 May, 2015 5 commits
  9. 05 May, 2015 1 commit
  10. 04 May, 2015 8 commits
  11. 03 May, 2015 4 commits