1. 19 Apr, 2023 7 commits
    • Linus Torvalds's avatar
      x86: improve on the non-rep 'clear_user' function · 8c9b6a88
      Linus Torvalds authored
      The old version was oddly written to have the repeat count in multiple
      registers.  So instead of taking advantage of %rax being zero, it had
      some sub-counts in it.  All just for a "single word clearing" loop,
      which isn't even efficient to begin with.
      
      So get rid of those games, and just keep all the state in the same
      registers we got it in (and that we should return things in).  That not
      only makes this act much more like 'rep stos' (which this function is
      replacing), but makes it much easier to actually do the obvious loop
      unrolling.
      
      Also rename the function from the now nonsensical 'clear_user_original'
      to what it now clearly is: 'rep_stos_alternative'.
      
      End result: if we don't have a fast 'rep stosb', at least we can have a
      fast fallback for it.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8c9b6a88
    • Linus Torvalds's avatar
      x86: inline the 'rep movs' in user copies for the FSRM case · 577e6a7f
      Linus Torvalds authored
      This does the same thing for the user copies as commit 0db7058e
      ("x86/clear_user: Make it faster") did for clear_user().  In other
      words, it inlines the "rep movs" case when X86_FEATURE_FSRM is set,
      avoiding the function call entirely.
      
      In order to do that, it makes the calling convention for the out-of-line
      case ("copy_user_generic_unrolled") match the 'rep movs' calling
      convention, although it does also end up clobbering a number of
      additional registers.
      
      Also, to simplify code sharing in the low-level assembly with the
      __copy_user_nocache() function (that uses the normal C calling
      convention), we end up with a kind of mixed return value for the
      low-level asm code: it will return the result in both %rcx (to work as
      an alternative for the 'rep movs' case), _and_ in %rax (for the nocache
      case).
      
      We could avoid this by wrapping __copy_user_nocache() callers in an
      inline asm, but since the cost is just an extra register copy, it's
      probably not worth it.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      577e6a7f
    • Linus Torvalds's avatar
      x86: move stac/clac from user copy routines into callers · 3639a535
      Linus Torvalds authored
      This is preparatory work for inlining the 'rep movs' case, but also a
      cleanup.  The __copy_user_nocache() function was mis-used by the rdma
      code to do uncached kernel copies that don't actually want user copies
      at all, and as a result doesn't want the stac/clac either.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3639a535
    • Linus Torvalds's avatar
      x86: don't use REP_GOOD or ERMS for user memory clearing · d2c95f9d
      Linus Torvalds authored
      The modern target to use is FSRS (Fast Short REP STOS), and the other
      cases should only be used for bigger areas (ie mainly things like page
      clearing).
      
      Note! This changes the conditional for the inlining from FSRM ("fast
      short rep movs") to FSRS ("fast short rep stos").
      
      We'll have a separate fixup for AMD microarchitectures that have a good
      'rep stosb' yet do not set the new Intel-specific FSRS bit (because FSRM
      was there first).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d2c95f9d
    • Linus Torvalds's avatar
      x86: don't use REP_GOOD or ERMS for user memory copies · adfcf423
      Linus Torvalds authored
      The modern target to use is FSRM (Fast Short REP MOVS), and the other
      cases should only be used for bigger areas (ie mainly things like page
      clearing).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      adfcf423
    • Linus Torvalds's avatar
      x86: don't use REP_GOOD or ERMS for small memory clearing · 20f3337d
      Linus Torvalds authored
      The modern target to use is FSRS (Fast Short REP STOS), and the other
      cases should only be used for bigger areas (ie mainly things like page
      clearing).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20f3337d
    • Linus Torvalds's avatar
      x86: don't use REP_GOOD or ERMS for small memory copies · 68674f94
      Linus Torvalds authored
      The modern target to use is FSRM (Fast Short REP MOVS), and the other
      cases should only be used for bigger areas (ie mainly things like page
      copying and clearing).
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      68674f94
  2. 16 Apr, 2023 12 commits
  3. 15 Apr, 2023 6 commits
  4. 14 Apr, 2023 14 commits
  5. 13 Apr, 2023 1 commit
    • Linus Torvalds's avatar
      Merge tag 'cgroup-for-6.3-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 44149752
      Linus Torvalds authored
      Pull cgroup fixes from Tejun Heo:
       "This is a relatively big pull request this late in the cycle but the
        major contributor is the cpuset bug which is rather significant:
      
         - Fix several cpuset bugs including one where it wasn't applying the
           target cgroup when tasks are created with CLONE_INTO_CGROUP
      
        With a few smaller fixes:
      
         - Fix inversed locking order in cgroup1 freezer implementation
      
         - Fix garbage cpu.stat::core_sched.forceidle_usec reporting in the
           root cgroup"
      
      * tag 'cgroup-for-6.3-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup/cpuset: Make cpuset_attach_task() skip subpartitions CPUs for top_cpuset
        cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
        cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
        cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
        cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex
        cgroup/cpuset: Fix partition root's cpuset.cpus update bug
        cgroup: fix display of forceidle time at root
      44149752