1. 29 Sep, 2023 1 commit
    • Valentin Schneider's avatar
      sched/deadline: Make dl_rq->pushable_dl_tasks update drive dl_rq->overloaded · 5fe77659
      Valentin Schneider authored
      dl_rq->dl_nr_migratory is increased whenever a DL entity is enqueued and it has
      nr_cpus_allowed > 1. Unlike the pushable_dl_tasks tree, dl_rq->dl_nr_migratory
      includes a dl_rq's current task. This means a dl_rq can have a migratable
      current, N non-migratable queued tasks, and be flagged as overloaded and have
      its CPU set in the dlo_mask, despite having an empty pushable_tasks tree.
      
      Make an dl_rq's overload logic be driven by {enqueue,dequeue}_pushable_dl_task(),
      in other words make DL RQs only be flagged as overloaded if they have at
      least one runnable-but-not-current migratable task.
      
       o push_dl_task() is unaffected, as it is a no-op if there are no pushable
         tasks.
      
       o pull_dl_task() now no longer scans runqueues whose sole migratable task is
         their current one, which it can't do anything about anyway.
         It may also now pull tasks to a DL RQ with dl_nr_running > 1 if only its
         current task is migratable.
      
      Since dl_rq->dl_nr_migratory becomes unused, remove it.
      
      RT had the exact same mechanism (rt_rq->rt_nr_migratory) which was dropped
      in favour of relying on rt_rq->pushable_tasks, see:
      
        612f769e ("sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask")
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarJuri Lelli <juri.lelli@redhat.com>
      Link: https://lore.kernel.org/r/20230928150251.463109-1-vschneid@redhat.com
      5fe77659
  2. 25 Sep, 2023 1 commit
    • Valentin Schneider's avatar
      sched/rt: Make rt_rq->pushable_tasks updates drive rto_mask · 612f769e
      Valentin Schneider authored
      Sebastian noted that the rto_push_work IRQ work can be queued for a CPU
      that has an empty pushable_tasks list, which means nothing useful will be
      done in the IPI other than queue the work for the next CPU on the rto_mask.
      
      rto_push_irq_work_func() only operates on tasks in the pushable_tasks list,
      but the conditions for that irq_work to be queued (and for a CPU to be
      added to the rto_mask) rely on rq_rt->nr_migratory instead.
      
      nr_migratory is increased whenever an RT task entity is enqueued and it has
      nr_cpus_allowed > 1. Unlike the pushable_tasks list, nr_migratory includes a
      rt_rq's current task. This means a rt_rq can have a migratible current, N
      non-migratible queued tasks, and be flagged as overloaded / have its CPU
      set in the rto_mask, despite having an empty pushable_tasks list.
      
      Make an rt_rq's overload logic be driven by {enqueue,dequeue}_pushable_task().
      Since rt_rq->{rt_nr_migratory,rt_nr_total} become unused, remove them.
      
      Note that the case where the current task is pushed away to make way for a
      migration-disabled task remains unchanged: the migration-disabled task has
      to be in the pushable_tasks list in the first place, which means it has
      nr_cpus_allowed > 1.
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Tested-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: https://lore.kernel.org/r/20230811112044.3302588-1-vschneid@redhat.com
      612f769e
  3. 24 Sep, 2023 3 commits
  4. 22 Sep, 2023 2 commits
  5. 21 Sep, 2023 7 commits
  6. 19 Sep, 2023 2 commits
  7. 18 Sep, 2023 4 commits
    • GUO Zihua's avatar
      sched/headers: Remove duplicated includes in kernel/sched/sched.h · 7ad0354d
      GUO Zihua authored
      Remove duplicated includes of linux/cgroup.h and linux/psi.h. Both of
      these includes are included regardless of the config and they are all
      protected by ifndef, so no point including them again.
      Signed-off-by: default avatarGUO Zihua <guozihua@huawei.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20230818015633.18370-1-guozihua@huawei.com
      7ad0354d
    • Aaron Lu's avatar
      sched/fair: Ratelimit update to tg->load_avg · 1528c661
      Aaron Lu authored
      When using sysbench to benchmark Postgres in a single docker instance
      with sysbench's nr_threads set to nr_cpu, it is observed there are times
      update_cfs_group() and update_load_avg() shows noticeable overhead on
      a 2sockets/112core/224cpu Intel Sapphire Rapids(SPR):
      
          13.75%    13.74%  [kernel.vmlinux]           [k] update_cfs_group
          10.63%    10.04%  [kernel.vmlinux]           [k] update_load_avg
      
      Annotate shows the cycles are mostly spent on accessing tg->load_avg
      with update_load_avg() being the write side and update_cfs_group() being
      the read side. tg->load_avg is per task group and when different tasks
      of the same taskgroup running on different CPUs frequently access
      tg->load_avg, it can be heavily contended.
      
      E.g. when running postgres_sysbench on a 2sockets/112cores/224cpus Intel
      Sappire Rapids, during a 5s window, the wakeup number is 14millions and
      migration number is 11millions and with each migration, the task's load
      will transfer from src cfs_rq to target cfs_rq and each change involves
      an update to tg->load_avg. Since the workload can trigger as many wakeups
      and migrations, the access(both read and write) to tg->load_avg can be
      unbound. As a result, the two mentioned functions showed noticeable
      overhead. With netperf/nr_client=nr_cpu/UDP_RR, the problem is worse:
      during a 5s window, wakeup number is 21millions and migration number is
      14millions; update_cfs_group() costs ~25% and update_load_avg() costs ~16%.
      
      Reduce the overhead by limiting updates to tg->load_avg to at most once
      per ms. The update frequency is a tradeoff between tracking accuracy and
      overhead. 1ms is chosen because PELT window is roughly 1ms and it
      delivered good results for the tests that I've done. After this change,
      the cost of accessing tg->load_avg is greatly reduced and performance
      improved. Detailed test results below.
      
        ==============================
        postgres_sysbench on SPR:
        25%
        base:   42382±19.8%
        patch:  50174±9.5%  (noise)
      
        50%
        base:   67626±1.3%
        patch:  67365±3.1%  (noise)
      
        75%
        base:   100216±1.2%
        patch:  112470±0.1% +12.2%
      
        100%
        base:    93671±0.4%
        patch:  113563±0.2% +21.2%
      
        ==============================
        hackbench on ICL:
        group=1
        base:    114912±5.2%
        patch:   117857±2.5%  (noise)
      
        group=4
        base:    359902±1.6%
        patch:   361685±2.7%  (noise)
      
        group=8
        base:    461070±0.8%
        patch:   491713±0.3% +6.6%
      
        group=16
        base:    309032±5.0%
        patch:   378337±1.3% +22.4%
      
        =============================
        hackbench on SPR:
        group=1
        base:    100768±2.9%
        patch:   103134±2.9%  (noise)
      
        group=4
        base:    413830±12.5%
        patch:   378660±16.6% (noise)
      
        group=8
        base:    436124±0.6%
        patch:   490787±3.2% +12.5%
      
        group=16
        base:    457730±3.2%
        patch:   680452±1.3% +48.8%
      
        ============================
        netperf/udp_rr on ICL
        25%
        base:    114413±0.1%
        patch:   115111±0.0% +0.6%
      
        50%
        base:    86803±0.5%
        patch:   86611±0.0%  (noise)
      
        75%
        base:    35959±5.3%
        patch:   49801±0.6% +38.5%
      
        100%
        base:    61951±6.4%
        patch:   70224±0.8% +13.4%
      
        ===========================
        netperf/udp_rr on SPR
        25%
        base:   104954±1.3%
        patch:  107312±2.8%  (noise)
      
        50%
        base:    55394±4.6%
        patch:   54940±7.4%  (noise)
      
        75%
        base:    13779±3.1%
        patch:   36105±1.1% +162%
      
        100%
        base:     9703±3.7%
        patch:   28011±0.2% +189%
      
        ==============================================
        netperf/tcp_stream on ICL (all in noise range)
        25%
        base:    43092±0.1%
        patch:   42891±0.5%
      
        50%
        base:    19278±14.9%
        patch:   22369±7.2%
      
        75%
        base:    16822±3.0%
        patch:   17086±2.3%
      
        100%
        base:    18216±0.6%
        patch:   18078±2.9%
      
        ===============================================
        netperf/tcp_stream on SPR (all in noise range)
        25%
        base:    34491±0.3%
        patch:   34886±0.5%
      
        50%
        base:    19278±14.9%
        patch:   22369±7.2%
      
        75%
        base:    16822±3.0%
        patch:   17086±2.3%
      
        100%
        base:    18216±0.6%
        patch:   18078±2.9%
      Reported-by: default avatarNitin Tekchandani <nitin.tekchandani@intel.com>
      Suggested-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: default avatarAaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Reviewed-by: default avatarVincent Guittot <vincent.guittot@linaro.org>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: default avatarDavid Vernet <void@manifault.com>
      Tested-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Tested-by: default avatarSwapnil Sapkal <Swapnil.Sapkal@amd.com>
      Link: https://lkml.kernel.org/r/20230912065808.2530-2-aaron.lu@intel.com
      1528c661
    • Elliot Berman's avatar
      freezer,sched: Use saved_state to reduce some spurious wakeups · 8f0eed4a
      Elliot Berman authored
      After commit f5d39b02 ("freezer,sched: Rewrite core freezer logic"),
      tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN  are
      always woken up on the thaw path. Prior to that commit, tasks could ask
      freezer to consider them "frozen enough" via freezer_do_not_count(). The
      commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which
      allows freezer to immediately mark the task as TASK_FROZEN without
      waking up the task.  This is efficient for the suspend path, but on the
      thaw path, the task is always woken up even if the task didn't need to
      wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although
      these tasks are capable of handling of the wakeup, we can observe a
      power/perf impact from the extra wakeup.
      
      We observed on Android many tasks wait in the TASK_FREEZABLE state
      (particularly due to many of them being binder clients). We observed
      nearly 4x the number of tasks and a corresponding linear increase in
      latency and power consumption when thawing the system. The latency
      increased from ~15ms to ~50ms.
      
      Avoid the spurious wakeups by saving the state of TASK_FREEZABLE tasks.
      If the task was running before entering TASK_FROZEN state
      (__refrigerator()) or if the task received a wake up for the saved
      state, then the task is woken on thaw. saved_state from PREEMPT_RT locks
      can be re-used because freezer would not stomp on the rtlock wait flow:
      TASK_RTLOCK_WAIT isn't considered freezable.
      Reported-by: default avatarPrakash Viswalingam <quic_prakashv@quicinc.com>
      Signed-off-by: default avatarElliot Berman <quic_eberman@quicinc.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8f0eed4a
    • Elliot Berman's avatar
      sched/core: Remove ifdeffery for saved_state · fbaa6a18
      Elliot Berman authored
      In preparation for freezer to also use saved_state, remove the
      CONFIG_PREEMPT_RT compilation guard around saved_state.
      
      On the arm64 platform I tested which did not have CONFIG_PREEMPT_RT,
      there was no statistically significant deviation by applying this patch.
      
      Test methodology:
      
      perf bench sched message -g 40 -l 40
      Signed-off-by: default avatarElliot Berman <quic_eberman@quicinc.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fbaa6a18
  8. 15 Sep, 2023 9 commits
  9. 13 Sep, 2023 8 commits
  10. 10 Sep, 2023 3 commits
    • Linus Torvalds's avatar
      Linux 6.6-rc1 · 0bb80ecc
      Linus Torvalds authored
      0bb80ecc
    • Linus Torvalds's avatar
      Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm · 1548b060
      Linus Torvalds authored
      Pull drm ci scripts from Dave Airlie:
       "This is a bunch of ci integration for the freedesktop gitlab instance
        where we currently do upstream userspace testing on diverse sets of
        GPU hardware. From my perspective I think it's an experiment worth
        going with and seeing how the benefits/noise playout keeping these
        files useful.
      
        Ideally I'd like to get this so we can do pre-merge testing on PRs
        eventually.
      
        Below is some info from danvet on why we've ended up making the
        decision and how we can roll it back if we decide it was a bad plan.
      
        Why in upstream?
      
         - like documentation, testcases, tools CI integration is one of these
           things where you can waste endless amounts of time if you
           accidentally have a version that doesn't match your source code
      
         - but also like the above, there's a balance, this is the initial cut
           of what we think makes sense to keep in sync vs out-of-tree,
           probably needs adjustment
      
         - gitlab supports out-of-repo gitlab integration and that's what's
           been used for the kernel in drm, but it results in per-driver
           fragmentation and lots of duplicated effort. the simple act of
           smashing an arbitrary winner into a topic branch already started
           surfacing patches on dri-devel and sparking good cross driver team
           discussions
      
        Why gitlab?
      
         - it's not any more shit than any of the other CI
      
         - drm userspace uses it extensively for everything in userspace, we
           have a lot of people and experience with this, including
           integration of hw testing labs
      
         - media userspace like gstreamer is also on gitlab.fd.o, and there's
           discussion to extend this to the media subsystem in some fashion
      
        Can this be shared?
      
         - there's definitely a pile of code that could move to scripts/ if
           other subsystem adopt ci integration in upstream kernel git. other
           bits are more drm/gpu specific like the igt-gpu-tests/tools
           integration
      
         - docker images can be run locally or in other CI runners
      
        Will we regret this?
      
         - it's all in one directory, intentionally, for easy deletion
      
         - probably 1-2 years in upstream to see whether this is worth it or a
           Big Mistake. that's roughly what it took to _really_ roll out solid
           CI in the bigger userspace projects we have on gitlab.fd.o like
           mesa3d"
      
      * tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm:
        drm: ci: docs: fix build warning - add missing escape
        drm: Add initial ci/ subdirectory
      1548b060
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e56b2b60
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "Fix preemption delays in the SGX code, remove unnecessarily
        UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
        make the x86 SMP init code a bit more conservative to fix kexec()
        lockups"
      
      * tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
        x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
        x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
        x86/smp: Don't send INIT to non-present and non-booted CPUs
      e56b2b60