1. 08 Aug, 2024 3 commits
    • Tejun Heo's avatar
      sched_ext: Improve logging around enable/disable · 344576fa
      Tejun Heo authored
      sched_ext currently doesn't generate messages when the BPF scheduler is
      enabled and disabled unless there are errors. It is useful to have paper
      trail. Improve logging around enable/disable:
      
      - Generate info messages on enable and non-error disable.
      
      - Update error exit message formatting so that it's consistent with
        non-error message. Also, prefix ei->msg with the BPF scheduler's name to
        make it clear where the message is coming from.
      
      - Shorten scx_exit_reason() strings for SCX_EXIT_UNREG* for brevity and
        consistency.
      
      v2: Use pr_*() instead of KERN_* consistently. (David)
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Suggested-by: default avatarPhil Auld <pauld@redhat.com>
      Reviewed-by: default avatarPhil Auld <pauld@redhat.com>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      344576fa
    • Tejun Heo's avatar
      sched_ext: Make scx_rq_online() also test cpu_active() in addition to SCX_RQ_ONLINE · 991ef53a
      Tejun Heo authored
      scx_rq_online() currently only tests SCX_RQ_ONLINE. This isn't fully correct
      - e.g. consume_dispatch_q() uses task_run_on_remote_rq() which tests
      scx_rq_online() to see whether the current rq can run the task, and, if so,
      calls consume_remote_task() to migrate the task to @rq. While the test
      itself was done while locking @rq, @rq can be temporarily unlocked by
      consume_remote_task() and nothing prevents SCX_RQ_ONLINE from going offline
      before the migration takes place.
      
      To address the issue, add cpu_active() test to scx_rq_online(). There is a
      synchronize_rcu() between cpu_active() being cleared and the rq going
      offline, so if an on-going scheduling operation sees cpu_active(), the
      associated rq is guaranteed to not go offline until the scheduling operation
      is complete.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 60c27fb5 ("sched_ext: Implement sched_ext_ops.cpu_online/offline()")
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      991ef53a
    • Tejun Heo's avatar
      sched_ext: Fix unsafe list iteration in process_ddsp_deferred_locals() · 72763ea3
      Tejun Heo authored
      process_ddsp_deferred_locals() executes deferred direct dispatches to the
      local DSQs of remote CPUs. It iterates the tasks on
      rq->scx.ddsp_deferred_locals list, removing and calling
      dispatch_to_local_dsq() on each. However, the list is protected by the rq
      lock that can be dropped by dispatch_to_local_dsq() temporarily, so the list
      can be modified during the iteration, which can lead to oopses and other
      failures.
      
      Fix it by popping from the head of the list instead of iterating the list.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 5b26f7b9 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches")
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      72763ea3
  2. 06 Aug, 2024 6 commits
  3. 04 Aug, 2024 1 commit
    • Tejun Heo's avatar
      Merge branch 'sched/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-6.12 · 0df340ce
      Tejun Heo authored
      Pull tip/sched/core to resolve the following four conflicts. While 2-4 are
      simple context conflicts, 1 is a bit subtle and easy to resolve incorrectly.
      
      1. 2c8d046d ("sched: Add normal_policy()")
         vs.
         faa42d29 ("sched/fair: Make SCHED_IDLE entity be preempted in strict hierarchy")
      
      The former converts direct test on p->policy to use the helper
      normal_policy(). The latter moves the p->policy test to a different
      location. Resolve by converting the test on p->plicy in the new location to
      use normal_policy().
      
      2. a7a9fc54 ("sched_ext: Add boilerplate for extensible scheduler class")
         vs.
         a110a81c ("sched/deadline: Deferrable dl server")
      
      Both add calls to put_prev_task_idle() and set_next_task_idle(). Simple
      context conflict. Resolve by taking changes from both.
      
      3. a7a9fc54 ("sched_ext: Add boilerplate for extensible scheduler class")
         vs.
         c2459100 ("sched/core: Add clearing of ->dl_server in put_prev_task_balance()")
      
      The former changes for_each_class() itertion to use for_each_active_class().
      The latter moves away the adjacent dl_server handling code. Simple context
      conflict. Resolve by taking changes from both.
      
      4. 60c27fb5 ("sched_ext: Implement sched_ext_ops.cpu_online/offline()")
         vs.
         31b164e2 ("sched/smt: Introduce sched_smt_present_inc/dec() helper")
         2f027354 ("sched/core: Introduce sched_set_rq_on/offline() helper")
      
      The former adds scx_rq_deactivate() call. The latter two change code around
      it. Simple context conflict. Resolve by taking changes from both.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      0df340ce
  4. 02 Aug, 2024 1 commit
  5. 01 Aug, 2024 1 commit
  6. 31 Jul, 2024 2 commits
    • David Vernet's avatar
      scx/selftests: Verify we can call create_dsq from prog_run · 958b1891
      David Vernet authored
      We already have some testcases verifying that we can call
      BPF_PROG_TYPE_SYSCALL progs and invoke scx_bpf_exit(). Let's extend that to
      also call scx_bpf_create_dsq() so we get coverage for that as well.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      958b1891
    • David Vernet's avatar
      scx: Allow calling sleepable kfuncs from BPF_PROG_TYPE_SYSCALL · 298dec19
      David Vernet authored
      We currently only allow calling sleepable scx kfuncs (i.e.
      scx_bpf_create_dsq()) from BPF_PROG_TYPE_STRUCT_OPS progs. The idea here
      was that we'd never have to call scx_bpf_create_dsq() outside of a
      sched_ext struct_ops callback, but that might not actually be true. For
      example, a scheduler could do something like the following:
      
      1. Open and load (not yet attach) a scheduler skel
      
      2. Synchronously call into a BPF_PROG_TYPE_SYSCALL prog from user space.
         For example, to initialize an LLC domain, or some other global,
         read-only state.
      
      3. Attach the skel, which actually enables the scheduler
      
      The advantage of doing this is that it can preclude having to do pretty
      ugly boilerplate like initializing a read-only, statically sized array of
      u64[]'s which the kernel consumes literally once at init time to then
      create struct bpf_cpumask objects which are actually queried at runtime.
      
      Doing the above is already possible given that we can invoke core BPF
      kfuncs, such as bpf_cpumask_create(), from BPF_PROG_TYPE_SYSCALL progs. We
      already allow many scx kfuncs to be called from BPF_PROG_TYPE_SYSCALL progs
      (e.g. scx_bpf_kick_cpu()). Let's allow the sleepable kfuncs as well.
      Signed-off-by: default avatarDavid Vernet <void@manifault.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      298dec19
  7. 30 Jul, 2024 1 commit
  8. 29 Jul, 2024 20 commits
  9. 28 Jul, 2024 5 commits
    • Linus Torvalds's avatar
      Linux 6.11-rc1 · 8400291e
      Linus Torvalds authored
      8400291e
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v6.11' of... · a0c04bd5
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix RPM package build error caused by an incorrect locale setup
      
       - Mark modules.weakdep as ghost in RPM package
      
       - Fix the odd combination of -S and -c in stack protector scripts,
         which is an error with the latest Clang
      
      * tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: Fix '-S -c' in x86 stack protector scripts
        kbuild: rpm-pkg: ghost modules.weakdep file
        kbuild: rpm-pkg: Fix C locale setup
      a0c04bd5
    • Linus Torvalds's avatar
      minmax: simplify and clarify min_t()/max_t() implementation · 017fa3e8
      Linus Torvalds authored
      This simplifies the min_t() and max_t() macros by no longer making them
      work in the context of a C constant expression.
      
      That means that you can no longer use them for static initializers or
      for array sizes in type definitions, but there were only a couple of
      such uses, and all of them were converted (famous last words) to use
      MIN_T/MAX_T instead.
      
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      017fa3e8
    • Linus Torvalds's avatar
      minmax: add a few more MIN_T/MAX_T users · 4477b39c
      Linus Torvalds authored
      Commit 3a7e02c0 ("minmax: avoid overly complicated constant
      expressions in VM code") added the simpler MIN_T/MAX_T macros in order
      to avoid some excessive expansion from the rather complicated regular
      min/max macros.
      
      The complexity of those macros stems from two issues:
      
       (a) trying to use them in situations that require a C constant
           expression (in static initializers and for array sizes)
      
       (b) the type sanity checking
      
      and MIN_T/MAX_T avoids both of these issues.
      
      Now, in the whole (long) discussion about all this, it was pointed out
      that the whole type sanity checking is entirely unnecessary for
      min_t/max_t which get a fixed type that the comparison is done in.
      
      But that still leaves min_t/max_t unnecessarily complicated due to
      worries about the C constant expression case.
      
      However, it turns out that there really aren't very many cases that use
      min_t/max_t for this, and we can just force-convert those.
      
      This does exactly that.
      
      Which in turn will then allow for much simpler implementations of
      min_t()/max_t().  All the usual "macros in all upper case will evaluate
      the arguments multiple times" rules apply.
      
      We should do all the same things for the regular min/max() vs MIN/MAX()
      cases, but that has the added complexity of various drivers defining
      their own local versions of MIN/MAX, so that needs another level of
      fixes first.
      
      Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4477b39c
    • Linus Torvalds's avatar
      Merge tag 'ubifs-for-linus-6.11-rc1-take2' of... · 7e2d0ba7
      Linus Torvalds authored
      Merge tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
      
      Pull UBI and UBIFS updates from Richard Weinberger:
      
       - Many fixes for power-cut issues by Zhihao Cheng
      
       - Another ubiblock error path fix
      
       - ubiblock section mismatch fix
      
       - Misc fixes all over the place
      
      * tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
        ubi: Fix ubi_init() ubiblock_exit() section mismatch
        ubifs: add check for crypto_shash_tfm_digest
        ubifs: Fix inconsistent inode size when powercut happens during appendant writing
        ubi: block: fix null-pointer-dereference in ubiblock_create()
        ubifs: fix kernel-doc warnings
        ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarity
        mtd: ubi: Restore missing cleanup on ubi_init() failure path
        ubifs: dbg_orphan_check: Fix missed key type checking
        ubifs: Fix unattached inode when powercut happens in creating
        ubifs: Fix space leak when powercut happens in linking tmpfile
        ubifs: Move ui->data initialization after initializing security
        ubifs: Fix adding orphan entry twice for the same inode
        ubifs: Remove insert_dead_orphan from replaying orphan process
        Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path"
        ubifs: Don't add xattr inode into orphan area
        ubifs: Fix unattached xattr inode if powercut happens after deleting
        mtd: ubi: avoid expensive do_div() on 32-bit machines
        mtd: ubi: make ubi_class constant
        ubi: eba: properly rollback inside self_check_eba
      7e2d0ba7