1. 20 May, 2022 1 commit
  2. 11 May, 2022 1 commit
  3. 22 Apr, 2022 4 commits
  4. 19 Apr, 2022 1 commit
  5. 11 Apr, 2022 1 commit
  6. 07 Apr, 2022 5 commits
  7. 04 Apr, 2022 1 commit
  8. 31 Mar, 2022 1 commit
  9. 29 Mar, 2022 1 commit
  10. 25 Mar, 2022 1 commit
  11. 22 Mar, 2022 4 commits
  12. 18 Mar, 2022 1 commit
  13. 15 Mar, 2022 4 commits
  14. 04 Mar, 2022 1 commit
    • Oleg Nesterov's avatar
      signal, x86: Delay calling signals in atomic on RT enabled kernels · bf9ad37d
      Oleg Nesterov authored
      
      On x86_64 we must disable preemption before we enable interrupts
      for stack faults, int3 and debugging, because the current task is using
      a per CPU debug stack defined by the IST. If we schedule out, another task
      can come in and use the same stack and cause the stack to be corrupted
      and crash the kernel on return.
      
      When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
      one of these is the spin lock used in signal handling.
      
      Some of the debug code (int3) causes do_trap() to send a signal.
      This function calls a spinlock_t lock that has been converted to a
      sleeping lock. If this happens, the above issues with the corrupted
      stack is possible.
      
      Instead of calling the signal right away, for PREEMPT_RT and x86,
      the signal information is stored on the stacks task_struct and
      TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
      code will send the signal when preemption is enabled.
      
      [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
        ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
      [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
      [ tglx: Use a config option ]
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de
      bf9ad37d
  15. 02 Mar, 2022 1 commit
  16. 28 Feb, 2022 1 commit
    • Mateusz Jończyk's avatar
      x86/Kconfig: move and modify CONFIG_I8K · a7a6f65a
      Mateusz Jończyk authored
      
      In Kconfig, inside the "Processor type and features" menu, there is
      the CONFIG_I8K option: "Dell i8k legacy laptop support". This is
      very confusing - enabling CONFIG_I8K is not required for the kernel to
      support old Dell laptops. This option is specific to the dell-smm-hwmon
      driver, which mostly exports some hardware monitoring information and
      allows the user to change fan speed.
      
      This option is misplaced, so move CONFIG_I8K to drivers/hwmon/Kconfig,
      where it belongs.
      
      Also, modify the dependency order - change
              select SENSORS_DELL_SMM
      to
              depends on SENSORS_DELL_SMM
      as it is just a configuration option of dell-smm-hwmon. This includes
      changing the option type from tristate to bool. It was tristate because
      it could select CONFIG_SENSORS_DELL_SMM=m .
      
      When running "make oldconfig" on configurations with
      CONFIG_SENSORS_DELL_SMM enabled , this change will result in an
      additional question (which could be printed several times during
      bisecting). I think that tidying up the configuration is worth it,
      though.
      
      Next patch tweaks the description of CONFIG_I8K.
      Signed-off-by: default avatarMateusz Jończyk <mat.jonczyk@o2.pl>
      Cc: Pali Rohár <pali@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jean Delvare <jdelvare@suse.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Mark Gross <markgross@kernel.org>
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/20220212125654.357408-1-mat.jonczyk@o2.pl
      
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      a7a6f65a
  17. 26 Feb, 2022 1 commit
    • Kees Cook's avatar
      usercopy: Check valid lifetime via stack depth · 2792d84e
      Kees Cook authored
      One of the things that CONFIG_HARDENED_USERCOPY sanity-checks is whether
      an object that is about to be copied to/from userspace is overlapping
      the stack at all. If it is, it performs a number of inexpensive
      bounds checks. One of the finer-grained checks is whether an object
      crosses stack frames within the stack region. Doing this on x86 with
      CONFIG_FRAME_POINTER was cheap/easy. Doing it with ORC was deemed too
      heavy, and was left out (a while ago), leaving the courser whole-stack
      check.
      
      The LKDTM tests USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM
      try to exercise these cross-frame cases to validate the defense is
      working. They have been failing ever since ORC was added (which was
      expected). While Muhammad was investigating various LKDTM failures[1],
      he asked me for additional details on them, and I realized that when
      exact stack frame boundary checking is not available (i.e. everything
      except x86 with FRAME_POINTER), it could check if a stack object is at
      least "current depth valid", in the sense that any object within the
      stack region but not between start-of-stack and current_stack_pointer
      should be considered unavailable (i.e. its lifetime is from a call no
      longer present on the stack).
      
      Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures
      have actually implemented the common global register alias.
      
      Additionally report usercopy bounds checking failures with an offset
      from current_stack_pointer, which may assist with diagnosing failures.
      
      The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests
      (once slightly adjusted in a separate patch) pass again with this fixed.
      
      [1] https://github.com/kernelci/kernelci-project/issues/84
      
      
      
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm@kvack.org
      Reported-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      ---
      v1: https://lore.kernel.org/lkml/20220216201449.2087956-1-keescook@chromium.org
      v2: https://lore.kernel.org/lkml/20220224060342.1855457-1-keescook@chromium.org
      v3: https://lore.kernel.org/lkml/20220225173345.3358109-1-keescook@chromium.org
      v4: - improve commit log (akpm)
      2792d84e
  18. 19 Feb, 2022 1 commit
    • Mark Rutland's avatar
      sched/preempt: Add PREEMPT_DYNAMIC using static keys · 99cf983c
      Mark Rutland authored
      
      Where an architecture selects HAVE_STATIC_CALL but not
      HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
      which will either branch to a callee or return to the caller.
      
      On such architectures, a number of constraints can conspire to make
      those trampolines more complicated and potentially less useful than we'd
      like. For example:
      
      * Hardware and software control flow integrity schemes can require the
        addition of "landing pad" instructions (e.g. `BTI` for arm64), which
        will also be present at the "real" callee.
      
      * Limited branch ranges can require that trampolines generate or load an
        address into a register and perform an indirect branch (or at least
        have a slow path that does so). This loses some of the benefits of
        having a direct branch.
      
      * Interaction with SW CFI schemes can be complicated and fragile, e.g.
        requiring that we can recognise idiomatic codegen and remove
        indirections understand, at least until clang proves more helpful
        mechanisms for dealing with this.
      
      For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
      really only need to enable/disable specific preemption functions. We can
      achieve the same effect without a number of the pain points above by
      using static keys to fold early returns into the preemption functions
      themselves rather than in an out-of-line trampoline, effectively
      inlining the trampoline into the start of the function.
      
      For arm64, this results in good code generation. For example, the
      dynamic_cond_resched() wrapper looks as follows when enabled. When
      disabled, the first `B` is replaced with a `NOP`, resulting in an early
      return.
      
      | <dynamic_cond_resched>:
      |        bti     c
      |        b       <dynamic_cond_resched+0x10>     // or `nop`
      |        mov     w0, #0x0
      |        ret
      |        mrs     x0, sp_el0
      |        ldr     x0, [x0, #8]
      |        cbnz    x0, <dynamic_cond_resched+0x8>
      |        paciasp
      |        stp     x29, x30, [sp, #-16]!
      |        mov     x29, sp
      |        bl      <preempt_schedule_common>
      |        mov     w0, #0x1
      |        ldp     x29, x30, [sp], #16
      |        autiasp
      |        ret
      
      ... compared to the regular form of the function:
      
      | <__cond_resched>:
      |        bti     c
      |        mrs     x0, sp_el0
      |        ldr     x1, [x0, #8]
      |        cbz     x1, <__cond_resched+0x18>
      |        mov     w0, #0x0
      |        ret
      |        paciasp
      |        stp     x29, x30, [sp, #-16]!
      |        mov     x29, sp
      |        bl      <preempt_schedule_common>
      |        mov     w0, #0x1
      |        ldp     x29, x30, [sp], #16
      |        autiasp
      |        ret
      
      Any architecture which implements static keys should be able to use this
      to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
      calls. Since this is likely to have greater overhead than (inlined)
      static calls, PREEMPT_DYNAMIC is only defaulted to enabled when
      HAVE_PREEMPT_DYNAMIC_CALL is selected.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com
      99cf983c
  19. 08 Feb, 2022 1 commit
  20. 28 Jan, 2022 1 commit
    • Steven Rostedt (Google)'s avatar
      ftrace: Have architectures opt-in for mcount build time sorting · 4ed308c4
      Steven Rostedt (Google) authored
      First S390 complained that the sorting of the mcount sections at build
      time caused the kernel to crash on their architecture. Now PowerPC is
      complaining about it too. And also ARM64 appears to be having issues.
      
      It may be necessary to also update the relocation table for the values
      in the mcount table. Not only do we have to sort the table, but also
      update the relocations that may be applied to the items in the table.
      
      If the system is not relocatable, then it is fine to sort, but if it is,
      some architectures may have issues (although x86 does not as it shifts all
      addresses the same).
      
      Add a HAVE_BUILDTIME_MCOUNT_SORT that an architecture can set to say it is
      safe to do the sorting at build time.
      
      Also update the config to compile in build time sorting in the sorttable
      code in scripts/ to depend on CONFIG_BUILDTIME_MCOUNT_SORT.
      
      Link: https://lore.kernel.org/all/944D10DA-8200-4BA9-8D0A-3BED9AA99F82@linux.ibm.com/
      Link: https://lkml.kernel.org...
      4ed308c4
  21. 23 Jan, 2022 1 commit
  22. 20 Jan, 2022 2 commits
    • Marco Elver's avatar
      kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR · bece04b5
      Marco Elver authored
      Until recent versions of GCC and Clang, it was not possible to disable
      KCOV instrumentation via a function attribute.  The relevant function
      attribute was introduced in 540540d0 ("kcov: add
      __no_sanitize_coverage to fix noinstr for all architectures").
      
      x86 was the first architecture to want a working noinstr, and at the
      time no compiler support for the attribute existed yet.  Therefore,
      commit 0f1441b4 ("objtool: Fix noinstr vs KCOV") introduced the
      ability to NOP __sanitizer_cov_*() calls in .noinstr.text.
      
      However, this doesn't work for other architectures like arm64 and s390
      that want a working noinstr per ARCH_WANTS_NO_INSTR.
      
      At the time of 0f1441b4, we didn't yet have ARCH_WANTS_NO_INSTR,
      but now we can move the Kconfig dependency checks to the generic KCOV
      option.  KCOV will be available if:
      
      	- architecture does not care about noinstr, OR
      	- we have objtool support (like on x86), OR
      	- GCC is 12.0 or newer, OR
      	- Clang is 13.0 or newer.
      
      Link: https://lkml.kernel.org/r/20211201152604.3984495-1-elver@google.com
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bece04b5
    • Kefeng Wang's avatar
      mm: percpu: generalize percpu related config · 7ecd19cf
      Kefeng Wang authored
      Patch series "mm: percpu: Cleanup percpu first chunk function".
      
      When supporting page mapping percpu first chunk allocator on arm64, we
      found there are lots of duplicated codes in percpu embed/page first chunk
      allocator.  This patchset is aimed to cleanup them and should no function
      change.
      
      The currently supported status about 'embed' and 'page' in Archs shows
      below,
      
      	embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
      	page:  NEED_PER_CPU_EMBED_FIRST_CHUNK
      
      		embed	page
      	------------------------
      	arm64	  Y	 Y
      	mips	  Y	 N
      	powerpc	  Y	 Y
      	riscv	  Y	 N
      	sparc	  Y	 Y
      	x86	  Y	 Y
      	------------------------
      
      There are two interfaces about percpu first chunk allocator,
      
       extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
                                      size_t atom_size,
                                      pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
      -                               pcpu_fc_alloc_fn_t alloc_fn,
      -                               pcpu_fc_free_fn_t free_fn);
      +                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
      
       extern int __init pcpu_page_first_chunk(size_t reserved_size,
      -                               pcpu_fc_alloc_fn_t alloc_fn,
      -                               pcpu_fc_free_fn_t free_fn,
      -                               pcpu_fc_populate_pte_fn_t populate_pte_fn);
      +                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
      
      The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
      pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
      pcpu_embed/page_first_chunk().
      
      1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
         provided when archs supported NUMA.
      
      2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
         a generic pcpu_populate_pte() which marked '__weak' is provided, if you
         need a different function to populate pte on the arch(like x86), please
         provide its own implementation.
      
      [1] https://github.com/kevin78/linux.git percpu-cleanup
      
      This patch (of 4):
      
      The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
      NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
      duplicate definitions on platforms that subscribe it.
      
      Move them into mm, drop these redundant definitions and instead just
      select it on applicable platforms.
      
      Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
      
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ecd19cf
  23. 15 Jan, 2022 2 commits
  24. 11 Dec, 2021 1 commit
  25. 09 Dec, 2021 1 commit
    • Jarkko Sakkinen's avatar
      x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node · 50468e43
      Jarkko Sakkinen authored
      
      == Problem ==
      
      The amount of SGX memory on a system is determined by the BIOS and it
      varies wildly between systems.  It can be as small as dozens of MB's
      and as large as many GB's on servers.  Just like how applications need
      to know how much regular RAM is available, enclave builders need to
      know how much SGX memory an enclave can consume.
      
      == Solution ==
      
      Introduce a new sysfs file:
      
      	/sys/devices/system/node/nodeX/x86/sgx_total_bytes
      
      to enumerate the amount of SGX memory available in each NUMA node.
      This serves the same function for SGX as /proc/meminfo or
      /sys/devices/system/node/nodeX/meminfo does for normal RAM.
      
      'sgx_total_bytes' is needed today to help drive the SGX selftests.
      SGX-specific swap code is exercised by creating overcommitted enclaves
      which are larger than the physical SGX memory on the system.  They
      currently use a CPUID-based approach which can diverge from the actual
      amount of SGX memory available.  'sgx_total_bytes' ensures that the
      selftests can work efficiently and do not attempt stupid things like
      creating a 100,000 MB enclave on a system with 128 MB of SGX memory.
      
      == Implementation Details ==
      
      Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
      arch specific attribute group, and add an attribute for the amount of
      SGX memory in bytes to each NUMA node:
      
      == ABI Design Discussion ==
      
      As opposed to the per-node ABI, a single, global ABI was considered.
      However, this would prevent enclaves from being able to size
      themselves so that they fit on a single NUMA node.  Essentially, a
      single value would rule out NUMA optimizations for enclaves.
      
      Create a new "x86/" directory inside each "nodeX/" sysfs directory.
      'sgx_total_bytes' is expected to be the first of at least a few
      sgx-specific files to be placed in the new directory.  Just scanning
      /proc/meminfo, these are the no-brainers that we have for RAM, but we
      need for SGX:
      
      	MemTotal:       xxxx kB // sgx_total_bytes (implemented here)
      	MemFree:        yyyy kB // sgx_free_bytes
      	SwapTotal:      zzzz kB // sgx_swapped_bytes
      
      So, at *least* three.  I think we will eventually end up needing
      something more along the lines of a dozen.  A new directory (as
      opposed to being in the nodeX/ "root") directory avoids cluttering the
      root with several "sgx_*" files.
      
      Place the new file in a new "nodeX/x86/" directory because SGX is
      highly x86-specific.  It is very unlikely that any other architecture
      (or even non-Intel x86 vendor) will ever implement SGX.  Using "sgx/"
      as opposed to "x86/" was also considered.  But, there is a real chance
      this can get used for other arch-specific purposes.
      
      [ dhansen: rewrite changelog ]
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
      50468e43