1. 24 Jun, 2019 19 commits
    • Patrick Bellasi's avatar
      sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks · 982d9cdc
      Patrick Bellasi authored
      Each time a frequency update is required via schedutil, a frequency is
      selected to (possibly) satisfy the utilization reported by each
      scheduling class and irqs. However, when utilization clamping is in use,
      the frequency selection should consider userspace utilization clamping
      hints.  This will allow, for example, to:
      
       - boost tasks which are directly affecting the user experience
         by running them at least at a minimum "requested" frequency
      
       - cap low priority tasks not directly affecting the user experience
         by running them only up to a maximum "allowed" frequency
      
      These constraints are meant to support a per-task based tuning of the
      frequency selection thus supporting a fine grained definition of
      performance boosting vs energy saving strategies in kernel space.
      
      Add support to clamp the utilization of RUNNABLE FAIR and RT tasks
      within the boundaries defined by their aggregated utilization clamp
      constraints.
      
      Do that by considering the max(min_util, max_util) to give boosted tasks
      the performance they need even when they happen to be co-scheduled with
      other capped tasks.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-10-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      982d9cdc
    • Patrick Bellasi's avatar
      sched/uclamp: Set default clamps for RT tasks · 1a00d999
      Patrick Bellasi authored
      By default FAIR tasks start without clamps, i.e. neither boosted nor
      capped, and they run at the best frequency matching their utilization
      demand.  This default behavior does not fit RT tasks which instead are
      expected to run at the maximum available frequency, if not otherwise
      required by explicitly capping them.
      
      Enforce the correct behavior for RT tasks by setting util_min to max
      whenever:
      
       1. the task is switched to the RT class and it does not already have a
          user-defined clamp value assigned.
      
       2. an RT task is forked from a parent with RESET_ON_FORK set.
      
      NOTE: utilization clamp values are cross scheduling class attributes and
      thus they are never changed/reset once a value has been explicitly
      defined from user-space.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-9-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1a00d999
    • Patrick Bellasi's avatar
      sched/uclamp: Reset uclamp values on RESET_ON_FORK · a87498ac
      Patrick Bellasi authored
      A forked tasks gets the same clamp values of its parent however, when
      the RESET_ON_FORK flag is set on parent, e.g. via:
      
         sys_sched_setattr()
            sched_setattr()
               __sched_setscheduler(attr::SCHED_FLAG_RESET_ON_FORK)
      
      the new forked task is expected to start with all attributes reset to
      default values.
      
      Do that for utilization clamp values too by checking the reset request
      from the existing uclamp_fork() call which already provides the required
      initialization for other uclamp related bits.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-8-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a87498ac
    • Patrick Bellasi's avatar
      sched/uclamp: Extend sched_setattr() to support utilization clamping · a509a7cd
      Patrick Bellasi authored
      The SCHED_DEADLINE scheduling class provides an advanced and formal
      model to define tasks requirements that can translate into proper
      decisions for both task placements and frequencies selections. Other
      classes have a more simplified model based on the POSIX concept of
      priorities.
      
      Such a simple priority based model however does not allow to exploit
      most advanced features of the Linux scheduler like, for example, driving
      frequencies selection via the schedutil cpufreq governor. However, also
      for non SCHED_DEADLINE tasks, it's still interesting to define tasks
      properties to support scheduler decisions.
      
      Utilization clamping exposes to user-space a new set of per-task
      attributes the scheduler can use as hints about the expected/required
      utilization for a task. This allows to implement a "proactive" per-task
      frequency control policy, a more advanced policy than the current one
      based just on "passive" measured task utilization. For example, it's
      possible to boost interactive tasks (e.g. to get better performance) or
      cap background tasks (e.g. to be more energy/thermal efficient).
      
      Introduce a new API to set utilization clamping values for a specified
      task by extending sched_setattr(), a syscall which already allows to
      define task specific properties for different scheduling classes. A new
      pair of attributes allows to specify a minimum and maximum utilization
      the scheduler can consider for a task.
      
      Do that by validating the required clamp values before and then applying
      the required changes using _the_ same pattern already in use for
      __setscheduler(). This ensures that the task is re-enqueued with the new
      clamp values.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-7-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a509a7cd
    • Patrick Bellasi's avatar
      sched/core: Allow sched_setattr() to use the current policy · 1d6362fa
      Patrick Bellasi authored
      The sched_setattr() syscall mandates that a policy is always specified.
      This requires to always know which policy a task will have when
      attributes are configured and this makes it impossible to add more
      generic task attributes valid across different scheduling policies.
      Reading the policy before setting generic tasks attributes is racy since
      we cannot be sure it is not changed concurrently.
      
      Introduce the required support to change generic task attributes without
      affecting the current task policy. This is done by adding an attribute flag
      (SCHED_FLAG_KEEP_POLICY) to enforce the usage of the current policy.
      
      Add support for the SETPARAM_POLICY policy, which is already used by the
      sched_setparam() POSIX syscall, to the sched_setattr() non-POSIX
      syscall.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-6-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1d6362fa
    • Patrick Bellasi's avatar
      sched/uclamp: Add system default clamps · e8f14172
      Patrick Bellasi authored
      Tasks without a user-defined clamp value are considered not clamped
      and by default their utilization can have any value in the
      [0..SCHED_CAPACITY_SCALE] range.
      
      Tasks with a user-defined clamp value are allowed to request any value
      in that range, and the required clamp is unconditionally enforced.
      However, a "System Management Software" could be interested in limiting
      the range of clamp values allowed for all tasks.
      
      Add a privileged interface to define a system default configuration via:
      
        /proc/sys/kernel/sched_uclamp_util_{min,max}
      
      which works as an unconditional clamp range restriction for all tasks.
      
      With the default configuration, the full SCHED_CAPACITY_SCALE range of
      values is allowed for each clamp index. Otherwise, the task-specific
      clamp is capped by the corresponding system default value.
      
      Do that by tracking, for each task, the "effective" clamp value and
      bucket the task has been refcounted in at enqueue time. This
      allows to lazy aggregate "requested" and "system default" values at
      enqueue time and simplifies refcounting updates at dequeue time.
      
      The cached bucket ids are used to avoid (relatively) more expensive
      integer divisions every time a task is enqueued.
      
      An active flag is used to report when the "effective" value is valid and
      thus the task is actually refcounted in the corresponding rq's bucket.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-5-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e8f14172
    • Patrick Bellasi's avatar
      sched/uclamp: Enforce last task's UCLAMP_MAX · e496187d
      Patrick Bellasi authored
      When a task sleeps it removes its max utilization clamp from its CPU.
      However, the blocked utilization on that CPU can be higher than the max
      clamp value enforced while the task was running. This allows undesired
      CPU frequency increases while a CPU is idle, for example, when another
      CPU on the same frequency domain triggers a frequency update, since
      schedutil can now see the full not clamped blocked utilization of the
      idle CPU.
      
      Fix this by using:
      
        uclamp_rq_dec_id(p, rq, UCLAMP_MAX)
          uclamp_rq_max_value(rq, UCLAMP_MAX, clamp_value)
      
      to detect when a CPU has no more RUNNABLE clamped tasks and to flag this
      condition.
      
      Don't track any minimum utilization clamps since an idle CPU never
      requires a minimum frequency. The decay of the blocked utilization is
      good enough to reduce the CPU frequency.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-4-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      e496187d
    • Patrick Bellasi's avatar
      sched/uclamp: Add bucket local max tracking · 60daf9c1
      Patrick Bellasi authored
      Because of bucketization, different task-specific clamp values are
      tracked in the same bucket.  For example, with 20% bucket size and
      assuming to have:
      
        Task1: util_min=25%
        Task2: util_min=35%
      
      both tasks will be refcounted in the [20..39]% bucket and always boosted
      only up to 20% thus implementing a simple floor aggregation normally
      used in histograms.
      
      In systems with only few and well-defined clamp values, it would be
      useful to track the exact clamp value required by a task whenever
      possible. For example, if a system requires only 23% and 47% boost
      values then it's possible to track the exact boost required by each
      task using only 3 buckets of ~33% size each.
      
      Introduce a mechanism to max aggregate the requested clamp values of
      RUNNABLE tasks in the same bucket. Keep it simple by resetting the
      bucket value to its base value only when a bucket becomes inactive.
      Allow a limited and controlled overboosting margin for tasks recounted
      in the same bucket.
      
      In systems where the boost values are not known in advance, it is still
      possible to control the maximum acceptable overboosting margin by tuning
      the number of clamp groups. For example, 20 groups ensure a 5% maximum
      overboost.
      
      Remove the rq bucket initialization code since a correct bucket value
      is now computed when a task is refcounted into a CPU's rq.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-3-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      60daf9c1
    • Patrick Bellasi's avatar
      sched/uclamp: Add CPU's clamp buckets refcounting · 69842cba
      Patrick Bellasi authored
      Utilization clamping allows to clamp the CPU's utilization within a
      [util_min, util_max] range, depending on the set of RUNNABLE tasks on
      that CPU. Each task references two "clamp buckets" defining its minimum
      and maximum (util_{min,max}) utilization "clamp values". A CPU's clamp
      bucket is active if there is at least one RUNNABLE tasks enqueued on
      that CPU and refcounting that bucket.
      
      When a task is {en,de}queued {on,from} a rq, the set of active clamp
      buckets on that CPU can change. If the set of active clamp buckets
      changes for a CPU a new "aggregated" clamp value is computed for that
      CPU. This is because each clamp bucket enforces a different utilization
      clamp value.
      
      Clamp values are always MAX aggregated for both util_min and util_max.
      This ensures that no task can affect the performance of other
      co-scheduled tasks which are more boosted (i.e. with higher util_min
      clamp) or less capped (i.e. with higher util_max clamp).
      
      A task has:
         task_struct::uclamp[clamp_id]::bucket_id
      to track the "bucket index" of the CPU's clamp bucket it refcounts while
      enqueued, for each clamp index (clamp_id).
      
      A runqueue has:
         rq::uclamp[clamp_id]::bucket[bucket_id].tasks
      to track how many RUNNABLE tasks on that CPU refcount each
      clamp bucket (bucket_id) of a clamp index (clamp_id).
      It also has a:
         rq::uclamp[clamp_id]::bucket[bucket_id].value
      to track the clamp value of each clamp bucket (bucket_id) of a clamp
      index (clamp_id).
      
      The rq::uclamp::bucket[clamp_id][] array is scanned every time it's
      needed to find a new MAX aggregated clamp value for a clamp_id. This
      operation is required only when it's dequeued the last task of a clamp
      bucket tracking the current MAX aggregated clamp value. In this case,
      the CPU is either entering IDLE or going to schedule a less boosted or
      more clamped task.
      The expected number of different clamp values configured at build time
      is small enough to fit the full unordered array into a single cache
      line, for configurations of up to 7 buckets.
      
      Add to struct rq the basic data structures required to refcount the
      number of RUNNABLE tasks for each clamp bucket. Add also the max
      aggregation required to update the rq's clamp value at each
      enqueue/dequeue event.
      
      Use a simple linear mapping of clamp values into clamp buckets.
      Pre-compute and cache bucket_id to avoid integer divisions at
      enqueue/dequeue time.
      Signed-off-by: default avatarPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alessio Balsini <balsini@android.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: https://lkml.kernel.org/r/20190621084217.8167-2-patrick.bellasi@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      69842cba
    • Dietmar Eggemann's avatar
      sched/fair: Rename weighted_cpuload() to cpu_runnable_load() · a3df0679
      Dietmar Eggemann authored
      The term 'weighted' is not needed since there is no 'unweighted' load.
      Instead use the term 'runnable' to distinguish 'runnable' load
      (avg.runnable_load_avg) used in load balance from load (avg.load_avg)
      which is the sum of 'runnable' and 'blocked' load.
      Signed-off-by: default avatarDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valentin Schneider <valentin.schneider@arm.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: https://lkml.kernel.org/r/57f27a7f-2775-d832-e965-0f4d51bb1954@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a3df0679
    • Qais Yousef's avatar
      sched/debug: Export the newly added tracepoints · a056a5be
      Qais Yousef authored
      So that external modules can hook into them and extract the info they
      need. Since these new tracepoints have no events associated with them
      exporting these tracepoints make them useful for external modules to
      perform testing and debugging. There's no other way otherwise to access
      them.
      
      BPF doesn't have infrastructure to access these bare tracepoints either.
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Uwe Kleine-Konig <u.kleine-koenig@pengutronix.de>
      Link: https://lkml.kernel.org/r/20190604111459.2862-7-qais.yousef@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      a056a5be
    • Qais Yousef's avatar
      sched/debug: Add sched_overutilized tracepoint · f9f240f9
      Qais Yousef authored
      The new tracepoint allows us to track the changes in overutilized
      status.
      
      Overutilized status is associated with EAS. It indicates that the system
      is in high performance state. EAS is disabled when the system is in this
      state since there's not much energy savings while high performance tasks
      are pushing the system to the limit and it's better to default to the
      spreading behavior of the scheduler.
      
      This tracepoint helps understanding and debugging the conditions under
      which this happens.
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Uwe Kleine-Konig <u.kleine-koenig@pengutronix.de>
      Link: https://lkml.kernel.org/r/20190604111459.2862-6-qais.yousef@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f9f240f9
    • Qais Yousef's avatar
      sched/debug: Add new tracepoint to track PELT at se level · 8de6242c
      Qais Yousef authored
      The new tracepoint allows tracking PELT signals at sched_entity level.
      Which is supported in CFS tasks and taskgroups only.
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Uwe Kleine-Konig <u.kleine-koenig@pengutronix.de>
      Link: https://lkml.kernel.org/r/20190604111459.2862-5-qais.yousef@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8de6242c
    • Qais Yousef's avatar
      sched/debug: Add new tracepoints to track PELT at rq level · ba19f51f
      Qais Yousef authored
      The new tracepoints allow tracking PELT signals at rq level for all
      scheduling classes + irq.
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Uwe Kleine-Konig <u.kleine-koenig@pengutronix.de>
      Link: https://lkml.kernel.org/r/20190604111459.2862-4-qais.yousef@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ba19f51f
    • Qais Yousef's avatar
      sched/debug: Add a new sched_trace_*() helper functions · 3c93a0c0
      Qais Yousef authored
      The new functions allow modules to access internal data structures of
      unexported struct cfs_rq and struct rq to extract important information
      from the tracepoints to be introduced in later patches.
      
      While at it fix alphabetical order of struct declarations in sched.h
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Uwe Kleine-Konig <u.kleine-koenig@pengutronix.de>
      Link: https://lkml.kernel.org/r/20190604111459.2862-3-qais.yousef@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3c93a0c0
    • Qais Yousef's avatar
      sched/autogroup: Make autogroup_path() always available · 9ba5090a
      Qais Yousef authored
      Remove the #ifdef CONFIG_SCHED_DEBUG.
      
      Some of the tracepoints to be introduced in later patches need to access
      this function. Hence make it always available since the tracepoints are
      not protected by CONFIG_SCHED_DEBUG.
      Signed-off-by: default avatarQais Yousef <qais.yousef@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Uwe Kleine-Konig <u.kleine-koenig@pengutronix.de>
      Link: https://lkml.kernel.org/r/20190604111459.2862-2-qais.yousef@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9ba5090a
    • Pavel Begunkov's avatar
      sched/wait: Deduplicate code with do-while · 016190a4
      Pavel Begunkov authored
      Statements in the loop's body and before it are identical.
      Use do-while to not repeat it.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/43ffea6ee2152b90dedf962eac851609e4197218.1560256112.git.asml.silence@gmail.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      016190a4
    • Vincent Guittot's avatar
      sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity() · 8ec59c0f
      Vincent Guittot authored
      The 'struct sched_domain *sd' parameter to arch_scale_cpu_capacity() is
      unused since commit:
      
        765d0af1 ("sched/topology: Remove the ::smt_gain field from 'struct sched_domain'")
      
      Remove it.
      Signed-off-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: default avatarValentin Schneider <valentin.schneider@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: gregkh@linuxfoundation.org
      Cc: linux@armlinux.org.uk
      Cc: quentin.perret@arm.com
      Cc: rafael@kernel.org
      Link: https://lkml.kernel.org/r/1560783617-5827-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8ec59c0f
    • Ingo Molnar's avatar
      d2abae71
  2. 22 Jun, 2019 9 commits
    • Linus Torvalds's avatar
      Linux 5.2-rc6 · 4b972a01
      Linus Torvalds authored
      4b972a01
    • Linus Torvalds's avatar
      Merge tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 6698a71a
      Linus Torvalds authored
      Pull iommu fix from Joerg Roedel:
       "Revert a commit from the previous pile of fixes which causes new
        lockdep splats. It is better to revert it for now and work on a better
        and more well tested fix"
      
      * tag 'iommu-fix-v5.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
      6698a71a
    • Peter Xu's avatar
      Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock" · 0aafc8ae
      Peter Xu authored
      This reverts commit 7560cc3c.
      
      With 5.2.0-rc5 I can easily trigger this with lockdep and iommu=pt:
      
          ======================================================
          WARNING: possible circular locking dependency detected
          5.2.0-rc5 #78 Not tainted
          ------------------------------------------------------
          swapper/0/1 is trying to acquire lock:
          00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
          but task is already holding lock:
          00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
          which lock already depends on the new lock.
          the existing dependency chain (in reverse order) is:
          -> #1 (device_domain_lock){....}:
                 _raw_spin_lock_irqsave+0x3c/0x50
                 dmar_insert_one_dev_info+0xbb/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
          -> #0 (&(&iommu->lock)->rlock){+.+.}:
                 lock_acquire+0x9e/0x170
                 _raw_spin_lock+0x25/0x30
                 domain_context_mapping_one+0xa5/0x4e0
                 pci_for_each_dma_alias+0x30/0x140
                 dmar_insert_one_dev_info+0x3b2/0x510
                 domain_add_dev_info+0x50/0x90
                 dev_prepare_static_identity_mapping+0x30/0x68
                 intel_iommu_init+0xddd/0x1422
                 pci_iommu_init+0x16/0x3f
                 do_one_initcall+0x5d/0x2b4
                 kernel_init_freeable+0x218/0x2c1
                 kernel_init+0xa/0x100
                 ret_from_fork+0x3a/0x50
      
          other info that might help us debug this:
           Possible unsafe locking scenario:
                 CPU0                    CPU1
                 ----                    ----
            lock(device_domain_lock);
                                         lock(&(&iommu->lock)->rlock);
                                         lock(device_domain_lock);
            lock(&(&iommu->lock)->rlock);
      
           *** DEADLOCK ***
          2 locks held by swapper/0/1:
           #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
           #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
      
          stack backtrace:
          CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
          Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
          Call Trace:
           dump_stack+0x85/0xc0
           print_circular_bug.cold.57+0x15c/0x195
           __lock_acquire+0x152a/0x1710
           lock_acquire+0x9e/0x170
           ? domain_context_mapping_one+0xa5/0x4e0
           _raw_spin_lock+0x25/0x30
           ? domain_context_mapping_one+0xa5/0x4e0
           domain_context_mapping_one+0xa5/0x4e0
           ? domain_context_mapping_one+0x4e0/0x4e0
           pci_for_each_dma_alias+0x30/0x140
           dmar_insert_one_dev_info+0x3b2/0x510
           domain_add_dev_info+0x50/0x90
           dev_prepare_static_identity_mapping+0x30/0x68
           intel_iommu_init+0xddd/0x1422
           ? printk+0x58/0x6f
           ? lockdep_hardirqs_on+0xf0/0x180
           ? do_early_param+0x8e/0x8e
           ? e820__memblock_setup+0x63/0x63
           pci_iommu_init+0x16/0x3f
           do_one_initcall+0x5d/0x2b4
           ? do_early_param+0x8e/0x8e
           ? rcu_read_lock_sched_held+0x55/0x60
           ? do_early_param+0x8e/0x8e
           kernel_init_freeable+0x218/0x2c1
           ? rest_init+0x230/0x230
           kernel_init+0xa/0x100
           ret_from_fork+0x3a/0x50
      
      domain_context_mapping_one() is taking device_domain_lock first then
      iommu lock, while dmar_insert_one_dev_info() is doing the reverse.
      
      That should be introduced by commit:
      
      7560cc3c ("iommu/vt-d: Fix lock inversion between iommu->lock and
                    device_domain_lock", 2019-05-27)
      
      So far I still cannot figure out how the previous deadlock was
      triggered (I cannot find iommu lock taken before calling of
      iommu_flush_dev_iotlb()), however I'm pretty sure that that change
      should be incomplete at least because it does not fix all the places
      so we're still taking the locks in different orders, while reverting
      that commit is very clean to me so far that we should always take
      device_domain_lock first then the iommu lock.
      
      We can continue to try to find the real culprit mentioned in
      7560cc3c, but for now I think we should revert it to fix current
      breakage.
      
      CC: Joerg Roedel <joro@8bytes.org>
      CC: Lu Baolu <baolu.lu@linux.intel.com>
      CC: dave.jiang@intel.com
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Tested-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      0aafc8ae
    • Linus Torvalds's avatar
      Merge tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b253d5f3
      Linus Torvalds authored
      Pull PCI fix from Bjorn Helgaas:
       "If an IOMMU is present, ignore the P2PDMA whitelist we added for v5.2
        because we don't yet know how to support P2PDMA in that case (Logan
        Gunthorpe)"
      
      * tag 'pci-v5.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
      b253d5f3
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f4102766
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Three driver fixes (and one version number update): a suspend hang in
        ufs, a qla hard lock on module removal and a qedi panic during
        discovery"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: qla2xxx: Fix hardlockup in abort command during driver remove
        scsi: ufs: Avoid runtime suspend possibly being blocked forever
        scsi: qedi: update driver version to 8.37.0.20
        scsi: qedi: Check targetname while finding boot target information
      f4102766
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · a8282bf0
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "This is a frustratingly large batch at rc5. Some of these were sent
        earlier but were missed by me due to being distracted by other things,
        and some took a while to track down due to needing manual bisection on
        old hardware. But still we clearly need to improve our testing of KVM,
        and of 32-bit, so that we catch these earlier.
      
        Summary: seven fixes, all for bugs introduced this cycle.
      
         - The commit to add KASAN support broke booting on 32-bit SMP
           machines, due to a refactoring that moved some setup out of the
           secondary CPU path.
      
         - A fix for another 32-bit SMP bug introduced by the fast syscall
           entry implementation for 32-bit BOOKE. And a build fix for the same
           commit.
      
         - Our change to allow the DAWR to be force enabled on Power9
           introduced a bug in KVM, where we clobber r3 leading to a host
           crash.
      
         - The same commit also exposed a previously unreachable bug in the
           nested KVM handling of DAWR, which could lead to an oops in a
           nested host.
      
         - One of the DMA reworks broke the b43legacy WiFi driver on some
           people's powermacs, fix it by enabling a 30-bit ZONE_DMA on 32-bit.
      
         - A fix for TLB flushing in KVM introduced a new bug, as it neglected
           to also flush the ERAT, this could lead to memory corruption in the
           guest.
      
        Thanks to: Aaro Koskinen, Christoph Hellwig, Christophe Leroy, Larry
        Finger, Michael Neuling, Suraj Jitindar Singh"
      
      * tag 'powerpc-5.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
        powerpc: enable a 30-bit ZONE_DMA for 32-bit pmac
        KVM: PPC: Book3S HV: Only write DAWR[X] when handling h_set_dawr in real mode
        KVM: PPC: Book3S HV: Fix r3 corruption in h_set_dabr()
        powerpc/32: fix build failure on book3e with KVM
        powerpc/booke: fix fast syscall entry on SMP
        powerpc/32s: fix initial setup of segment registers on secondary CPU
      a8282bf0
    • Marcel Holtmann's avatar
      Bluetooth: Fix regression with minimum encryption key size alignment · 693cd8ce
      Marcel Holtmann authored
      When trying to align the minimum encryption key size requirement for
      Bluetooth connections, it turns out doing this in a central location in
      the HCI connection handling code is not possible.
      
      Original Bluetooth version up to 2.0 used a security model where the
      L2CAP service would enforce authentication and encryption.  Starting
      with Bluetooth 2.1 and Secure Simple Pairing that model has changed into
      that the connection initiator is responsible for providing an encrypted
      ACL link before any L2CAP communication can happen.
      
      Now connecting Bluetooth 2.1 or later devices with Bluetooth 2.0 and
      before devices are causing a regression.  The encryption key size check
      needs to be moved out of the HCI connection handling into the L2CAP
      channel setup.
      
      To achieve this, the current check inside hci_conn_security() has been
      moved into l2cap_check_enc_key_size() helper function and then called
      from four decisions point inside L2CAP to cover all combinations of
      Secure Simple Pairing enabled devices and device using legacy pairing
      and legacy service security model.
      
      Fixes: d5bb334a ("Bluetooth: Align minimum encryption key size for LE and BR/EDR connections")
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203643Signed-off-by: default avatarMarcel Holtmann <marcel@holtmann.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      693cd8ce
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c356dc4b
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix leak of unqueued fragments in ipv6 nf_defrag, from Guillaume
          Nault.
      
       2) Don't access the DDM interface unless the transceiver implements it
          in bnx2x, from Mauro S. M. Rodrigues.
      
       3) Don't double fetch 'len' from userspace in sock_getsockopt(), from
          JingYi Hou.
      
       4) Sign extension overflow in lio_core, from Colin Ian King.
      
       5) Various netem bug fixes wrt. corrupted packets from Jakub Kicinski.
      
       6) Fix epollout hang in hvsock, from Sunil Muthuswamy.
      
       7) Fix regression in default fib6_type, from David Ahern.
      
       8) Handle memory limits in tcp_fragment more appropriately, from Eric
          Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
        tcp: refine memory limit test in tcp_fragment()
        inet: clear num_timeout reqsk_alloc()
        net: mvpp2: debugfs: Add pmap to fs dump
        ipv6: Default fib6_type to RTN_UNICAST when not set
        net: hns3: Fix inconsistent indenting
        net/af_iucv: always register net_device notifier
        net/af_iucv: build proper skbs for HiperTransport
        net/af_iucv: remove GFP_DMA restriction for HiperTransport
        net: dsa: mv88e6xxx: fix shift of FID bits in mv88e6185_g1_vtu_loadpurge()
        hvsock: fix epollout hang from race condition
        net/udp_gso: Allow TX timestamp with UDP GSO
        net: netem: fix use after free and double free with packet corruption
        net: netem: fix backlog accounting for corrupted GSO frames
        net: lio_core: fix potential sign-extension overflow on large shift
        tipc: pass tunnel dev as NULL to udp_tunnel(6)_xmit_skb
        ip6_tunnel: allow not to count pkts on tstats by passing dev as NULL
        ip_tunnel: allow not to count pkts on tstats by setting skb's dev to NULL
        tun: wake up waitqueues after IFF_UP is set
        net: remove duplicate fetch in sock_getsockopt
        tipc: fix issues with early FAILOVER_MSG from peer
        ...
      c356dc4b
    • Eric Dumazet's avatar
      tcp: refine memory limit test in tcp_fragment() · b6653b36
      Eric Dumazet authored
      tcp_fragment() might be called for skbs in the write queue.
      
      Memory limits might have been exceeded because tcp_sendmsg() only
      checks limits at full skb (64KB) boundaries.
      
      Therefore, we need to make sure tcp_fragment() wont punish applications
      that might have setup very low SO_SNDBUF values.
      
      Fixes: f070ef2a ("tcp: tcp_fragment() should apply sane memory limits")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Tested-by: default avatarChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b6653b36
  3. 21 Jun, 2019 12 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 121bddf3
      Linus Torvalds authored
      Pull rdma fixes from Doug Ledford:
       "This is probably our last -rc pull request. We don't have anything
        else outstanding at the moment anyway, and with the summer months on
        us and people taking trips, I expect the next weeks leading up to the
        merge window to be pretty calm and sedate.
      
        This has two simple, no brainer fixes for the EFA driver.
      
        Then it has ten not quite so simple fixes for the hfi1 driver. The
        problem with them is that they aren't simply one liner typo fixes.
        They're still fixes, but they're more complex issues like livelock
        under heavy load where the answer was to change work queue usage and
        spinlock usage to resolve the problem, or issues with orphaned
        requests during certain types of failures like link down which
        required some more complex work to fix too. They all look like
        legitimate fixes to me, they just aren't small like I wish they were.
      
        Summary:
      
         - 2 minor EFA fixes
      
         - 10 hfi1 fixes related to scaling issues"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
        RDMA/efa: Handle mmap insertions overflow
        RDMA/efa: Fix success return value in case of error
        IB/hfi1: Handle port down properly in pio
        IB/hfi1: Handle wakeup of orphaned QPs for pio
        IB/hfi1: Wakeup QPs orphaned on wait list after flush
        IB/hfi1: Use aborts to trigger RC throttling
        IB/hfi1: Create inline to get extended headers
        IB/hfi1: Silence txreq allocation warnings
        IB/hfi1: Avoid hardlockup with flushlist_lock
        IB/hfi1: Correct tid qp rcd to match verbs context
        IB/hfi1: Close PSM sdma_progress sleep window
        IB/hfi1: Validate fault injection opcode user input
      121bddf3
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs · c036f7da
      Linus Torvalds authored
      Pull more NFS client fixes from Anna Schumaker:
       "These are mostly refcounting issues that people have found recently.
        The revert fixes a suspend recovery performance issue.
      
         - SUNRPC: Fix a credential refcount leak
      
         - Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
      
         - SUNRPC: Fix xps refcount imbalance on the error path
      
         - NFS4: Only set creation opendata if O_CREAT"
      
      * tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        SUNRPC: Fix a credential refcount leak
        Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
        net :sunrpc :clnt :Fix xps refcount imbalance on the error path
        NFS4: Only set creation opendata if O_CREAT
      c036f7da
    • Andy Lutomirski's avatar
      x86/vdso: Prevent segfaults due to hoisted vclock reads · ff17bbe0
      Andy Lutomirski authored
      GCC 5.5.0 sometimes cleverly hoists reads of the pvclock and/or hvclock
      pages before the vclock mode checks.  This creates a path through
      vclock_gettime() in which no vclock is enabled at all (due to disabled
      TSC on old CPUs, for example) but the pvclock or hvclock page
      nevertheless read.  This will segfault on bare metal.
      
      This fixes commit 459e3a21 ("gcc-9: properly declare the
      {pv,hv}clock_page storage") in the sense that, before that commit, GCC
      didn't seem to generate the offending code.  There was nothing wrong
      with that commit per se, and -stable maintainers should backport this to
      all supported kernels regardless of whether the offending commit was
      present, since the same crash could just as easily be triggered by the
      phase of the moon.
      
      On GCC 9.1.1, this doesn't seem to affect the generated code at all, so
      I'm not too concerned about performance regressions from this fix.
      
      Cc: stable@vger.kernel.org
      Cc: x86@kernel.org
      Cc: Borislav Petkov <bp@alien8.de>
      Reported-by: default avatarDuncan Roe <duncan_roe@optusnet.com.au>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ff17bbe0
    • Trond Myklebust's avatar
      SUNRPC: Fix a credential refcount leak · 19d55046
      Trond Myklebust authored
      All callers of __rpc_clone_client() pass in a value for args->cred,
      meaning that the credential gets assigned and referenced in
      the call to rpc_new_client().
      Reported-by: default avatarIdo Schimmel <idosch@idosch.org>
      Fixes: 79caa5fa ("SUNRPC: Cache cred of process creating the rpc_client")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Tested-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      19d55046
    • Anna Schumaker's avatar
      Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE" · 502980e8
      Anna Schumaker authored
      Jon Hunter reports:
        "I have been noticing intermittent failures with a system suspend test on
         some of our machines that have a NFS mounted root file-system. Bisecting
         this issue points to your commit 43123581 ("SUNRPC: Declare RPC
         timers as TIMER_DEFERRABLE") and reverting this on top of v5.2-rc3 does
         appear to resolve the problem.
      
         The cause of the suspend failure appears to be a long delay observed
         sometimes when resuming from suspend, and this is causing our test to
         timeout."
      
      This reverts commit 43123581.
      Reported-by: default avatarJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      502980e8
    • Lin Yi's avatar
      net :sunrpc :clnt :Fix xps refcount imbalance on the error path · b9622614
      Lin Yi authored
      rpc_clnt_add_xprt take a reference to struct rpc_xprt_switch, but forget
      to release it before return, may lead to a memory leak.
      Signed-off-by: default avatarLin Yi <teroincn@163.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      b9622614
    • Benjamin Coddington's avatar
      NFS4: Only set creation opendata if O_CREAT · 90910519
      Benjamin Coddington authored
      We can end up in nfs4_opendata_alloc during task exit, in which case
      current->fs has already been cleaned up.  This leads to a crash in
      current_umask().
      
      Fix this by only setting creation opendata if we are actually doing an open
      with O_CREAT.  We can drop the check for NULL nfs4_open_createattrs, since
      O_CREAT will never be set for the recovery path.
      Suggested-by: default avatarTrond Myklebust <trondmy@hammerspace.com>
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarAnna Schumaker <Anna.Schumaker@Netapp.com>
      90910519
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · a4c33bbb
      Linus Torvalds authored
      Pull ARM fix from Russell King:
       "Just one ARM fix this time around for Jason Donenfeld, fixing a
        problem with the VDSO generation on big endian"
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: 8867/1: vdso: pass --be8 to linker if necessary
      a4c33bbb
    • Linus Torvalds's avatar
      Merge tag 'drm-fixes-2019-06-21' of git://anongit.freedesktop.org/drm/drm · 0728f6c3
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Just catching up on the week since back from holidays, everything
        seems quite sane.
      
        core:
         - copy_to_user fix for really legacy codepaths.
      
        vmwgfx:
         - two dma fixes
         - one virt hw interaction fix
      
        i915:
         - modesetting fix
         - gvt fix
      
        panfrost:
         - BO unmapping fix
      
        imx:
         - image converter fixes"
      
      * tag 'drm-fixes-2019-06-21' of git://anongit.freedesktop.org/drm/drm:
        drm/i915: Don't clobber M/N values during fastset check
        drm: return -EFAULT if copy_to_user() fails
        drm/panfrost: Make sure a BO is only unmapped when appropriate
        drm/i915/gvt: ignore unexpected pvinfo write
        gpu: ipu-v3: image-convert: Fix image downsize coefficients
        gpu: ipu-v3: image-convert: Fix input bytesperline for packed formats
        gpu: ipu-v3: image-convert: Fix input bytesperline width/height align
        drm/vmwgfx: fix a warning due to missing dma_parms
        drm/vmwgfx: Honor the sg list segment size limitation
        drm/vmwgfx: Use the backdoor port if the HB port is not available
      0728f6c3
    • Linus Torvalds's avatar
      Merge tag 'staging-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · db54615e
      Linus Torvalds authored
      Pull staging/IIO/counter fixes from Greg KH:
       "Here are some small driver bugfixes for some staging/iio/counter
        drivers.
      
        Staging and IIO have been lumped together for a while, as those
        subsystems cross the areas a log, and counter is used by IIO, so
        that's why they are all in one pull request here.
      
        These are small fixes for reported issues in some iio drivers, the
        erofs filesystem, and a build issue for counter code.
      
        All have been in linux-next with no reported issues"
      
      * tag 'staging-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        staging: erofs: add requirements field in superblock
        counter/ftm-quaddec: Add missing dependencies in Kconfig
        staging: iio: adt7316: Fix build errors when GPIOLIB is not set
        iio: temperature: mlx90632 Relax the compatibility check
        iio: imu: st_lsm6dsx: fix PM support for st_lsm6dsx i2c controller
        staging:iio:ad7150: fix threshold mode config bit
      db54615e
    • Linus Torvalds's avatar
      Merge tag 'char-misc-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · b7b8a44f
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are a number of small driver fixes for 5.2-rc6
      
        Nothing major, just fixes for reported issues:
         - soundwire fixes
         - thunderbolt fixes
         - MAINTAINERS update for fpga maintainer change
         - binder bugfix
         - habanalabs 64bit pointer fix
         - documentation updates
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'char-misc-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        habanalabs: use u64_to_user_ptr() for reading user pointers
        doc: fix documentation about UIO_MEM_LOGICAL using
        MAINTAINERS / Documentation: Thorsten Scherer is the successor of Gavin Schenk
        docs: fb: Add TER16x32 to the available font names
        MAINTAINERS: fpga: hand off maintainership to Moritz
        thunderbolt: Implement CIO reset correctly for Titan Ridge
        binder: fix possible UAF when freeing buffer
        thunderbolt: Make sure device runtime resume completes before taking domain lock
        soundwire: intel: set dai min and max channels correctly
        soundwire: stream: fix bad unlock balance
        soundwire: stream: fix out of boundary access on port properties
      b7b8a44f
    • Linus Torvalds's avatar
      Merge tag 'usb-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · cf242421
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are four small USB fixes for 5.2-rc6.
      
        They include two xhci bugfixes, a chipidea fix, and a small dwc2 fix.
        Nothing major, just nice things to get resolved for reported issues.
      
        All have been in linux-next with no reported issues"
      
      * tag 'usb-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        xhci: detect USB 3.2 capable host controllers correctly
        usb: xhci: Don't try to recover an endpoint if port is in error state.
        usb: dwc2: Use generic PHY width in params setup
        usb: chipidea: udc: workaround for endpoint conflict issue
      cf242421