1. 16 Jun, 2015 2 commits
    • Felipe Balbi's avatar
      cpufreq: dt: allow driver to boot automatically · 07949bf9
      Felipe Balbi authored
      by adding the missing MODULE_ALIAS(), cpufreq-dt
      can be autoloaded by udev/systemd.
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      Acked-by: default avatarNishanth Menon <nm@ti.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      07949bf9
    • Prarit Bhargava's avatar
      intel_pstate: Fix overflow in busy_scaled due to long delay · 7180dddf
      Prarit Bhargava authored
      The kernel may delay interrupts for a long time which can result in timers
      being delayed. If this occurs the intel_pstate driver will crash with a
      divide by zero error:
      
      divide error: 0000 [#1] SMP
      Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul
       crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
      CPU: 113 PID: 0 Comm: swapper/113 Tainted: G        W   --------------   3.10.0-229.1.2.el7.x86_64 #1
      Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014
      task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000
      RIP: 0010:[<ffffffff814a9279>]  [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
      RSP: 0018:ffff883fff4e3db8  EFLAGS: 00010206
      RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d
      RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001
      R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5
      R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246
      FS:  0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071
       ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd
       ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100
      Call Trace:
       <IRQ>
      
       [<ffffffff81099dc1>] ? run_posix_cpu_timers+0x51/0x840
       [<ffffffff810b69dd>] ? trigger_load_balance+0x5d/0x200
       [<ffffffff814a9100>] ? pid_param_set+0x130/0x130
       [<ffffffff8107df56>] call_timer_fn+0x36/0x110
       [<ffffffff814a9100>] ? pid_param_set+0x130/0x130
       [<ffffffff8107fdcf>] run_timer_softirq+0x21f/0x320
       [<ffffffff81077b2f>] __do_softirq+0xef/0x280
       [<ffffffff816156dc>] call_softirq+0x1c/0x30
       [<ffffffff81015d95>] do_softirq+0x65/0xa0
       [<ffffffff81077ec5>] irq_exit+0x115/0x120
       [<ffffffff81616355>] smp_apic_timer_interrupt+0x45/0x60
       [<ffffffff81614a1d>] apic_timer_interrupt+0x6d/0x80
       <EOI>
      
       [<ffffffff814a9c32>] ? cpuidle_enter_state+0x52/0xc0
       [<ffffffff814a9c28>] ? cpuidle_enter_state+0x48/0xc0
       [<ffffffff814a9d65>] cpuidle_idle_call+0xc5/0x200
       [<ffffffff8101d14e>] arch_cpu_idle+0xe/0x30
       [<ffffffff810c67c1>] cpu_startup_entry+0xf1/0x290
       [<ffffffff8104228a>] start_secondary+0x1ba/0x230
      Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
      RIP  [<ffffffff814a9279>] intel_pstate_timer_func+0x179/0x3d0
       RSP <ffff883fff4e3db8>
      
      The kernel values for cpudata for CPU 113 were:
      
      struct cpudata {
        cpu = 113,
        timer = {
          entry = {
            next = 0x0,
            prev = 0xdead000000200200
          },
          expires = 8357799745,
          base = 0xffff883fe84ec001,
          function = 0xffffffff814a9100 <intel_pstate_timer_func>,
          data = 18446612406765768960,
      <snip>
          i_gain = 0,
          d_gain = 0,
          deadband = 0,
          last_err = 22489
        },
        last_sample_time = {
          tv64 = 4063132438017305
        },
        prev_aperf = 287326796397463,
        prev_mperf = 251427432090198,
        sample = {
          core_pct_busy = 23081,
          aperf = 2937407,
          mperf = 3257884,
          freq = 2524484,
          time = {
            tv64 = 4063149215234118
          }
        }
      }
      
      which results in the time between samples = last_sample_time - sample.time
      = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.
      
      The duration between reads of the APERF and MPERF registers overflowed a s32
      sized integer in intel_pstate_get_scaled_busy()'s call to div_fp().  The result
      is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.
      
      While the kernel shouldn't be delaying for a long time, it can and does
      happen and the intel_pstate driver should not panic in this situation.  This
      patch changes the div_fp() function to use div64_s64() to allow for "long"
      division.  This will avoid the overflow condition on long delays.
      
      [v2]: use div64_s64() in div_fp()
      Signed-off-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7180dddf
  2. 15 Jun, 2015 6 commits
    • Tang Yuantian's avatar
      cpufreq: qoriq: optimize the CPU frequency switching time · 8a95c144
      Tang Yuantian authored
      Each time the CPU switches its frequency, the clock nodes in
      DTS are walked through to find proper clock source. This is
      very time-consuming, for example, it is up to 500+ us on T4240.
      Besides, switching time varies from clock to clock.
      To optimize this, each input clock of CPU is buffered, so that
      it can be picked up instantly when needed.
      
      Since for each CPU each input clock is stored in a pointer
      which takes 4 or 8 bytes memory and normally there are several
      input clocks per CPU, that will not take much memory as well.
      Signed-off-by: default avatarTang Yuantian <Yuantian.Tang@freescale.com>
      Acked-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8a95c144
    • Shailendra Verma's avatar
    • Shailendra Verma's avatar
    • Viresh Kumar's avatar
      cpufreq: governor: Serialize governor callbacks · 732b6d61
      Viresh Kumar authored
      There are several races reported in cpufreq core around governors (only
      ondemand and conservative) by different people.
      
      There are at least two race scenarios present in governor code:
       (a) Concurrent access/updates of governor internal structures.
      
       It is possible that fields such as 'dbs_data->usage_count', etc.  are
       accessed simultaneously for different policies using same governor
       structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
       because of this we can dereference bad pointers.
      
       For example consider a system with two CPUs with separate 'struct
       cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
       CPU0 switching to powersave and CPU1 to ondemand:
      	CPU0				CPU1
      
      	store*				store*
      
      	cpufreq_governor_exit()		cpufreq_governor_init()
      					dbs_data = cdata->gdbs_data;
      
      	if (!--dbs_data->usage_count)
      		kfree(dbs_data);
      
      					dbs_data->usage_count++;
      					*Bad pointer dereference*
      
       There are other races possible between EXIT and START/STOP/LIMIT as
       well. Its really complicated.
      
       (b) Switching governor state in bad sequence:
      
       For example trying to switch a governor to START state, when the
       governor is in EXIT state. There are some checks present in
       __cpufreq_governor() but they aren't sufficient as they compare events
       against 'policy->governor_enabled', where as we need to take governor's
       state into account, which can be used by multiple policies.
      
      These two issues need to be solved separately and the responsibility
      should be properly divided between cpufreq and governor core.
      
      The first problem is more about the governor core, as it needs to
      protect its structures properly. And the second problem should be fixed
      in cpufreq core instead of governor, as its all about sequence of
      events.
      
      This patch is trying to solve only the first problem.
      
      There are two types of data we need to protect,
      - 'struct common_dbs_data': No matter what, there is going to be a
        single copy of this per governor.
      - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
        will have per-policy copy of this data, otherwise a single copy.
      
      Because of such complexities, the mutex present in 'struct dbs_data' is
      insufficient to solve our problem. For example we need to protect
      fetching of 'dbs_data' from different structures at the beginning of
      cpufreq_governor_dbs(), to make sure it isn't currently being updated.
      
      This can be fixed if we can guarantee serialization of event parsing
      code for an individual governor. This is best solved with a mutex per
      governor, and the placeholder for that is 'struct common_dbs_data'.
      
      And so this patch moves the mutex from 'struct dbs_data' to 'struct
      common_dbs_data' and takes it at the beginning and drops it at the end
      of cpufreq_governor_dbs().
      
      Tested with and without following configuration options:
      
      CONFIG_LOCKDEP_SUPPORT=y
      CONFIG_DEBUG_RT_MUTEXES=y
      CONFIG_DEBUG_PI_LIST=y
      CONFIG_DEBUG_SPINLOCK=y
      CONFIG_DEBUG_MUTEXES=y
      CONFIG_DEBUG_LOCK_ALLOC=y
      CONFIG_PROVE_LOCKING=y
      CONFIG_LOCKDEP=y
      CONFIG_DEBUG_ATOMIC_SLEEP=y
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      732b6d61
    • Viresh Kumar's avatar
      cpufreq: governor: split cpufreq_governor_dbs() · 714a2d9c
      Viresh Kumar authored
      cpufreq_governor_dbs() is hardly readable, it is just too big and
      complicated. Lets make it more readable by splitting out event specific
      routines.
      
      Order of statements is changed at few places, but that shouldn't bring
      any functional change.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      714a2d9c
    • Viresh Kumar's avatar
      cpufreq: governor: register notifier from cs_init() · 8e0484d2
      Viresh Kumar authored
      Notifiers are required only for conservative governor and the common
      governor code is unnecessarily polluted with that. Handle that from
      cs_init/exit() instead of cpufreq_governor_dbs().
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8e0484d2
  3. 10 Jun, 2015 7 commits
    • Viresh Kumar's avatar
      cpufreq: Remove cpufreq_update_policy() · 37829029
      Viresh Kumar authored
      cpufreq_update_policy() was kept as a separate routine earlier as it was
      handling migration of sysfs directories, which isn't the case anymore.
      It is only updating policy->cpu now and is called by a single caller.
      
      The WARN_ON() isn't really required anymore, as we are just updating the
      cpu now, not moving the sysfs directories.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      37829029
    • Viresh Kumar's avatar
      cpufreq: Restart governor as soon as possible · 9591becb
      Viresh Kumar authored
      __cpufreq_remove_dev_finish() is doing two things today:
      - Restarts the governor if some CPUs from concerned policy are still
        online.
      - Frees the policy if all CPUs are offline.
      
      The first task of restarting the governor can be moved to
      __cpufreq_remove_dev_prepare() to restart the governor early. There is
      no race between _prepare() and _finish() as they would be handling
      completely different cases. _finish() will only be required if we are
      going to free the policy and that has nothing to do with restarting the
      governor.
      Original-by: default avatarSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9591becb
    • Viresh Kumar's avatar
      cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free() · 3654c5cc
      Viresh Kumar authored
      cpufreq_policy_put_kobj() is actually part of freeing the policy and can
      be called from cpufreq_policy_free() directly instead of a separate
      call.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3654c5cc
    • Viresh Kumar's avatar
      cpufreq: Initialize policy->kobj while allocating policy · 2fc3384d
      Viresh Kumar authored
      policy->kobj is required to be initialized once in the lifetime of a
      policy.  Currently we are initializing it from __cpufreq_add_dev() and
      that doesn't look to be the best place for doing so as we have to do
      this on special cases (like: !recover_policy).
      
      We can initialize it from a more obvious place cpufreq_policy_alloc()
      and that will make code look cleaner, specially the error handling part.
      
      The error handling part of __cpufreq_add_dev() was doing almost the same
      thing while recover_policy is true or false. Fix that as well by always
      calling cpufreq_policy_put_kobj() with an additional parameter to skip
      notification part of it.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2fc3384d
    • Viresh Kumar's avatar
      cpufreq: Stop migrating sysfs files on hotplug · 87549141
      Viresh Kumar authored
      When we hot-unplug a cpu, we remove its sysfs cpufreq directory and if
      the outgoing cpu was the owner of policy->kobj earlier then we migrate
      the sysfs directory to under another online cpu.
      
      There are few disadvantages this brings:
      - Code Complexity
      - Slower hotplug/suspend/resume
      - sysfs file permissions are reset after all policy->cpus are offlined
      - CPUFreq stats history lost after all policy->cpus are offlined
      - Special management of sysfs stuff during suspend/resume
      
      To overcome these, this patch modifies the way sysfs directories are
      managed:
      - Select sysfs kobjects owner while initializing policy and don't change
        it during hotplugs. Track it with kobj_cpu created earlier.
      
      - Create symlinks for all related CPUs (can be offline) instead of
        affected CPUs on policy initialization and remove them only when the
        policy is freed.
      
      - Free policy structure only on the removal of cpufreq-driver and not
        during hotplug/suspend/resume, detected by checking 'struct
        subsys_interface *' (Valid only when called from
        subsys_interface_unregister() while unregistering driver).
      
      Apart from this, special care is taken to handle physical hoplug of CPUs
      as we wouldn't remove sysfs links or remove policies on logical
      hotplugs. Physical hotplug happens in the following sequence.
      
      Hot removal:
      - CPU is offlined first, ~ 'echo 0 >
        /sys/devices/system/cpu/cpuX/online'
      - Then its device is removed along with all sysfs files, cpufreq core
        notified with cpufreq_remove_dev() callback from subsys-interface..
      
      Hot addition:
      - First the device along with its sysfs files is added, cpufreq core
        notified with cpufreq_add_dev() callback from subsys-interface..
      - CPU is onlined, ~ 'echo 1 > /sys/devices/system/cpu/cpuX/online'
      
      We call the same routines with both hotplug and subsys callbacks, and we
      sense physical hotplug with cpu_offline() check in subsys callback. We
      can handle most of the stuff with regular hotplug callback paths and
      add/remove cpufreq sysfs links or free policy from subsys callbacks.
      Original-by: default avatarSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      87549141
    • Viresh Kumar's avatar
      cpufreq: Don't allow updating inactive policies from sysfs · 11e584cf
      Viresh Kumar authored
      Later commits would change the way policies are managed today. Policies
      wouldn't be freed on cpu hotplug (currently they aren't freed only for
      suspend), and while the CPU is offline, the sysfs cpufreq files would
      still be present.
      
      User may accidentally try to update the sysfs files in following
      directory: '/sys/devices/system/cpu/cpuX/cpufreq/'. And that would
      result in undefined behavior as policy wouldn't be active then.
      
      Apart from updating the store() routine, we also update __cpufreq_get()
      which can call cpufreq_out_of_sync(). The later routine tries to update
      policy->cur and starts notifying kernel about it.
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: default avatarSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      11e584cf
    • Doug Smythies's avatar
      intel_pstate: Force setting target pstate when required · 6c1e4591
      Doug Smythies authored
      During initialization and exit it is possible that the target pstate
      might not actually be set. Furthermore, the result can be that the
      driver and the processor are out of synch and, under some conditions,
      the driver might never send the processor the proper target pstate.
      
      This patch adds a bypass or do_checks flag to the call to
      intel_pstate_set_pstate. If bypass, then specifically bypass clamp
      checks and the do not send if it is the same as last time check. If
      do_checks, then, and as before, do the current policy clamp checks,
      and do not do actual send if the new target is the same as the old.
      Signed-off-by: default avatarDoug Smythies <dsmythies@telus.net>
      Reported-by: default avatarMarien Zwart <marien.zwart@gmail.com>
      Reported-by: default avatarAlex Lochmann <alexander.lochmann@tu-dortmund.de>
      Reported-by: default avatarPiotr Ko?aczkowski <pkolaczk@gmail.com>
      Reported-by: default avatarClemens Eisserer <linuxhippy@gmail.com>
      Tested-by: default avatarMarien Zwart <marien.zwart@gmail.com>
      Tested-by: default avatarDoug Smythies <dsmythies@telus.net>
      [ rjw: Dropped pointless symbol definitions, rebased ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6c1e4591
  4. 09 Jun, 2015 1 commit
  5. 22 May, 2015 2 commits
  6. 15 May, 2015 6 commits
  7. 14 May, 2015 1 commit
  8. 12 May, 2015 1 commit
  9. 07 May, 2015 5 commits
  10. 05 May, 2015 1 commit
  11. 04 May, 2015 8 commits