1. 09 Jun, 2023 1 commit
  2. 28 Mar, 2023 1 commit
    • Thomas Gleixner's avatar
      atomics: Provide rcuref - scalable reference counting · ee1ee6db
      Thomas Gleixner authored
      
      atomic_t based reference counting, including refcount_t, uses
      atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is
      implemented with a atomic_try_cmpxchg() loop. High contention of the
      reference count leads to retry loops and scales badly. There is nothing to
      improve on this implementation as the semantics have to be preserved.
      
      Provide rcuref as a scalable alternative solution which is suitable for RCU
      managed objects. Similar to refcount_t it comes with overflow and underflow
      detection and mitigation.
      
      rcuref treats the underlying atomic_t as an unsigned integer and partitions
      this space into zones:
      
        0x00000000 - 0x7FFFFFFF	valid zone (1 .. (INT_MAX + 1) references)
        0x80000000 - 0xBFFFFFFF	saturation zone
        0xC0000000 - 0xFFFFFFFE	dead zone
        0xFFFFFFFF   			no reference
      
      rcuref_get() unconditionally increments the reference count with
      atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the
      reference count with atomic_add_negative_release().
      
      This unconditional increment avoids the inc_not_zero() problem, but
      requires a more complex implementation on the put() side when the count
      drops from 0 to -1.
      
      When this transition is detected then it is attempted to mark the reference
      count dead, by setting it to the midpoint of the dead zone with a single
      atomic_cmpxchg_release() operation. This operation can fail due to a
      concurrent rcuref_get() elevating the reference count from -1 to 0 again.
      
      If the unconditional increment in rcuref_get() hits a reference count which
      is marked dead (or saturated) it will detect it after the fact and bring
      back the reference count to the midpoint of the respective zone. The zones
      provide enough tolerance which makes it practically impossible to escape
      from a zone.
      
      The racy implementation of rcuref_put() requires to protect rcuref_put()
      against a grace period ending in order to prevent a subtle use after
      free. As RCU is the only mechanism which allows to protect against that, it
      is not possible to fully replace the atomic_inc_not_zero() based
      implementation of refcount_t with this scheme.
      
      The final drop is slightly more expensive than the atomic_dec_return()
      counterpart, but that's not the case which this is optimized for. The
      optimization is on the high frequeunt get()/put() pairs and their
      scalability.
      
      The performance of an uncontended rcuref_get()/put() pair where the put()
      is not dropping the last reference is still on par with the plain atomic
      operations, while at the same time providing overflow and underflow
      detection and mitigation.
      
      The performance of rcuref compared to plain atomic_inc_not_zero() and
      atomic_dec_return() based reference counting under contention:
      
       -  Micro benchmark: All CPUs running a increment/decrement loop on an
          elevated reference count, which means the 0 to -1 transition never
          happens.
      
          The performance gain depends on microarchitecture and the number of
          CPUs and has been observed in the range of 1.3X to 4.7X
      
       - Conversion of dst_entry::__refcnt to rcuref and testing with the
          localhost memtier/memcached benchmark. That benchmark shows the
          reference count contention prominently.
      
          The performance gain depends on microarchitecture and the number of
          CPUs and has been observed in the range of 1.1X to 2.6X over the
          previous fix for the false sharing issue vs. struct
          dst_entry::__refcnt.
      
          When memtier is run over a real 1Gb network connection, there is a
          small gain on top of the false sharing fix. The two changes combined
          result in a 2%-5% total gain for that networked test.
      Reported-by: default avatarWangyang Guo <wangyang.guo@intel.com>
      Reported-by: default avatarArjan Van De Ven <arjan.van.de.ven@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20230323102800.158429195@linutronix.de
      ee1ee6db
  3. 19 Mar, 2023 1 commit
    • Jason Baron's avatar
      dyndbg: cleanup dynamic usage in ib_srp.c · 7ce93729
      Jason Baron authored
      Currently, in dynamic_debug.h we only provide
      DEFINE_DYNAMIC_DEBUG_METADATA() and DYNAMIC_DEBUG_BRANCH()
      definitions if CONFIG_DYNAMIC_CORE is enabled. Thus, drivers
      such as infiniband srp (see: drivers/infiniband/ulp/srp/ib_srp.c)
      must provide their own definitions for !CONFIG_DYNAMIC_CORE.
      
      Thus, let's move this !CONFIG_DYNAMIC_CORE case into dynamic_debug.h.
      However, the dynamic debug interfaces should really only be defined
      if CONFIG_DYNAMIC_DEBUG is set or CONFIG_DYNAMIC_CORE is set along
      with DYNAMIC_DEBUG_MODULE, (see:
      Documentation/admin-guide/dynamic-debug-howto.rst). Thus, the
      undefined case becomes: !((CONFIG_DYNAMIC_DEBUG ||
      (CONFIG_DYNAMIC_CORE && DYNAMIC_DEBUG_MODULE)).
      With those changes in place, we can remove the !CONFIG_DYNAMIC_CORE
      case from ib_srp.c
      
      This change was prompted by a build breakeage in ib_srp.c stemming
      from the inclusion of dynamic_debug.h unconditionally in module.h, due
      to commit 7deabd67 ("dyndbg: use the module notifier callbacks").
      In that case, if we have CONFIG_DYNAMIC_CORE=y and
      CONFIG_DYNAMIC_DEBUG=n then the definitions for
      DEFINE_DYNAMIC_DEBUG_METADATA() and DYNAMIC_DEBUG_BRANCH() are defined
      once in ib_srp.c and then again in the dynamic_debug.h. This had been
      working prior to the above referenced commit because dynamic_debug.h
      was only pulled into ib_srp.c conditinally via printk.h if
      CONFIG_DYNAMIC_DEBUG was set.
      
      Also, the exported functions in lib/dynamic_debug.c itself may
      not have a prototype if CONFIG_DYNAMIC_DEBUG=n and
      CONFIG_DYNAMIC_CORE=y. This would trigger the -Wmissing-prototypes
      warning.
      
      The exported functions are behind (include/linux/dynamic_debug.h):
      
      if defined(CONFIG_DYNAMIC_DEBUG) || \
       (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
      
      Thus, by adding -DDYNAMIC_CONFIG_MODULE to the lib/Makefile we
      can ensure that the exported functions have a prototype in all cases,
      since lib/dynamic_debug.c is built whenever
      CONFIG_DYNAMIC_DEBUG_CORE=y.
      
      Fixes: 7deabd67
      
       ("dyndbg: use the module notifier callbacks")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202303071444.sIbZTDCy-lkp@intel.com/
      
      Signed-off-by: default avatarJason Baron <jbaron@akamai.com>
      [mcgrof: adjust commit log, and remove urldefense from URL]
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      7ce93729
  4. 27 Feb, 2023 1 commit
  5. 08 Feb, 2023 3 commits
    • Kees Cook's avatar
      arm64: Support Clang UBSAN trap codes for better reporting · 25b84002
      Kees Cook authored
      
      When building with CONFIG_UBSAN_TRAP=y on arm64, Clang encodes the UBSAN
      check (handler) type in the esr. Extract this and actually report these
      traps as coming from the specific UBSAN check that tripped.
      
      Before:
      
        Internal error: BRK handler: 00000000f20003e8 [#1] PREEMPT SMP
      
      After:
      
        Internal error: UBSAN: shift out of bounds: 00000000f2005514 [#1] PREEMPT SMP
      Acked-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarMukesh Ojha <quic_mojha@quicinc.com>
      Reviewed-by: default avatarFangrui Song <maskray@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: John Stultz <jstultz@google.com>
      Cc: Yongqin Liu <yongqin.liu@linaro.org>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Yury Norov <yury.norov@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: llvm@lists.linux.dev
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      25b84002
    • Rae Moar's avatar
      lib/hashtable_test.c: add test for the hashtable structure · 789538c6
      Rae Moar authored
      
      Add a KUnit test for the kernel hashtable implementation in
      include/linux/hashtable.h.
      
      Note that this version does not yet test each of the rcu
      alternative versions of functions.
      Signed-off-by: default avatarRae Moar <rmoar@google.com>
      Reviewed-by: default avatarDavid Gow <davidgow@google.com>
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      789538c6
    • David Gow's avatar
      kunit: Add "hooks" to call into KUnit when it's built as a module · 7170b7ed
      David Gow authored
      KUnit has several macros and functions intended for use from non-test
      code. These hooks, currently the kunit_get_current_test() and
      kunit_fail_current_test() macros, didn't work when CONFIG_KUNIT=m.
      
      In order to support this case, the required functions and static data
      need to be available unconditionally, even when KUnit itself is not
      built-in. The new 'hooks.c' file is therefore always included, and has
      both the static key required for kunit_get_current_test(), and a table
      of function pointers in struct kunit_hooks_table. This is filled in with
      the real implementations by kunit_install_hooks(), which is kept in
      hooks-impl.h and called when the kunit module is loaded.
      
      This can  be extended for future features which require similar
      "hook" behaviour, such as static stubs, by simply adding new entries to
      the struct, and the appropriate code to set them.
      
      Fixed white-space errors during commit:
      Shuah Khan <skhan@linuxfoundation.org>
      
      Resolved merge conflicts with:
      db105c37
      
       ("kunit: Export kunit_running()")
      This patch supersedes the above.
      Shuah Khan <skhan@linuxfoundation.org>
      Signed-off-by: default avatarDavid Gow <davidgow@google.com>
      Reviewed-by: default avatarRae Moar <rmoar@google.com>
      Reviewed-by: default avatarBrendan Higgins <brendanhiggins@google.com>
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      7170b7ed
  6. 03 Feb, 2023 1 commit
  7. 17 Jan, 2023 1 commit
  8. 02 Dec, 2022 1 commit
  9. 23 Nov, 2022 1 commit
    • Kees Cook's avatar
      kunit/fortify: Validate __alloc_size attribute results · 9124a264
      Kees Cook authored
      
      Validate the effect of the __alloc_size attribute on allocators. If the
      compiler doesn't support __builtin_dynamic_object_size(), skip the
      associated tests.
      
      (For GCC, just remove the "--make_options" line below...)
      
      $ ./tools/testing/kunit/kunit.py run --arch x86_64 \
              --kconfig_add CONFIG_FORTIFY_SOURCE=y \
      	--make_options LLVM=1
              fortify
      ...
      [15:16:30] ================== fortify (10 subtests) ===================
      [15:16:30] [PASSED] known_sizes_test
      [15:16:30] [PASSED] control_flow_split_test
      [15:16:30] [PASSED] alloc_size_kmalloc_const_test
      [15:16:30] [PASSED] alloc_size_kmalloc_dynamic_test
      [15:16:30] [PASSED] alloc_size_vmalloc_const_test
      [15:16:30] [PASSED] alloc_size_vmalloc_dynamic_test
      [15:16:30] [PASSED] alloc_size_kvmalloc_const_test
      [15:16:30] [PASSED] alloc_size_kvmalloc_dynamic_test
      [15:16:30] [PASSED] alloc_size_devm_kmalloc_const_test
      [15:16:30] [PASSED] alloc_size_devm_kmalloc_dynamic_test
      [15:16:30] ===================== [PASSED] fortify =====================
      [15:16:30] ============================================================
      [15:16:30] Testing complete. Ran 10 tests: passed: 10
      [15:16:31] Elapsed time: 8.348s total, 0.002s configuring, 6.923s building, 1.075s running
      
      For earlier GCC prior to version 12, the dynamic tests will be skipped:
      
      [15:18:59] ================== fortify (10 subtests) ===================
      [15:18:59] [PASSED] known_sizes_test
      [15:18:59] [PASSED] control_flow_split_test
      [15:18:59] [PASSED] alloc_size_kmalloc_const_test
      [15:18:59] [SKIPPED] alloc_size_kmalloc_dynamic_test
      [15:18:59] [PASSED] alloc_size_vmalloc_const_test
      [15:18:59] [SKIPPED] alloc_size_vmalloc_dynamic_test
      [15:18:59] [PASSED] alloc_size_kvmalloc_const_test
      [15:18:59] [SKIPPED] alloc_size_kvmalloc_dynamic_test
      [15:18:59] [PASSED] alloc_size_devm_kmalloc_const_test
      [15:18:59] [SKIPPED] alloc_size_devm_kmalloc_dynamic_test
      [15:18:59] ===================== [PASSED] fortify =====================
      [15:18:59] ============================================================
      [15:18:59] Testing complete. Ran 10 tests: passed: 6, skipped: 4
      [15:18:59] Elapsed time: 11.965s total, 0.002s configuring, 10.540s building, 1.068s running
      
      Cc: David Gow <davidgow@google.com>
      Cc: linux-hardening@vger.kernel.org
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      9124a264
  10. 08 Nov, 2022 1 commit
  11. 02 Nov, 2022 1 commit
    • Kees Cook's avatar
      overflow: Introduce overflows_type() and castable_to_type() · 4b21d25b
      Kees Cook authored
      Implement a robust overflows_type() macro to test if a variable or
      constant value would overflow another variable or type. This can be
      used as a constant expression for static_assert() (which requires a
      constant expression[1][2]) when used on constant values. This must be
      constructed manually, since __builtin_add_overflow() does not produce
      a constant expression[3].
      
      Additionally adds castable_to_type(), similar to __same_type(), but for
      checking if a constant value would overflow if cast to a given type.
      
      Add unit tests for overflows_type(), __same_type(), and castable_to_type()
      to the existing KUnit "overflow" test:
      
      [16:03:33] ================== overflow (21 subtests) ==================
      ...
      [16:03:33] [PASSED] overflows_type_test
      [16:03:33] [PASSED] same_type_test
      [16:03:33] [PASSED] castable_to_type_test
      [16:03:33] ==================== [PASSED] overflow =====================
      [16:03:33] ============================================================
      [16:03:33] Testing complete. Ran 21 tests: passed: 21
      [16:03:33] Elapsed time: 24.022s total, 0.002s configuring, 22.598s building, 0.767s running
      
      [1] https://en.cppreference.com/w/c/language/_Static_assert
      [2] C11 standard (ISO/IEC 9899:2011): 6.7.10 Static assertions
      [3] https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
      
      
          6.56 Built-in Functions to Perform Arithmetic with Overflow Checking
          Built-in Function: bool __builtin_add_overflow (type1 a, type2 b,
      
      Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Tom Rix <trix@redhat.com>
      Cc: Daniel Latypov <dlatypov@google.com>
      Cc: Vitor Massaru Iha <vitor@massaru.org>
      Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: linux-hardening@vger.kernel.org
      Cc: llvm@lists.linux.dev
      Co-developed-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Signed-off-by: default avatarGwan-gyeong Mun <gwan-gyeong.mun@intel.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20221024201125.1416422-1-gwan-gyeong.mun@intel.com
      4b21d25b
  12. 01 Nov, 2022 2 commits
  13. 03 Oct, 2022 2 commits
    • Alexander Potapenko's avatar
      kmsan: disable instrumentation of unsupported common kernel code · 79dbd006
      Alexander Potapenko authored
      EFI stub cannot be linked with KMSAN runtime, so we disable
      instrumentation for it.
      
      Instrumenting kcov, stackdepot or lockdep leads to infinite recursion
      caused by instrumentation hooks calling instrumented code again.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-13-glider@google.com
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      79dbd006
    • Andrey Konovalov's avatar
      kasan: move tests to mm/kasan/ · f7e01ab8
      Andrey Konovalov authored
      Move KASAN tests to mm/kasan/ to keep the test code alongside the
      implementation.
      
      Link: https://lkml.kernel.org/r/676398f0aeecd47d2f8e3369ea0e95563f641a36.1662416260.git.andreyknvl@google.com
      
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f7e01ab8
  14. 27 Sep, 2022 1 commit
    • Liam R. Howlett's avatar
      Maple Tree: add new data structure · 54a611b6
      Liam R. Howlett authored
      Patch series "Introducing the Maple Tree"
      
      The maple tree is an RCU-safe range based B-tree designed to use modern
      processor cache efficiently.  There are a number of places in the kernel
      that a non-overlapping range-based tree would be beneficial, especially
      one with a simple interface.  If you use an rbtree with other data
      structures to improve performance or an interval tree to track
      non-overlapping ranges, then this is for you.
      
      The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
      nodes.  With the increased branching factor, it is significantly shorter
      than the rbtree so it has fewer cache misses.  The removal of the linked
      list between subsequent entries also reduces the cache misses and the need
      to pull in the previous and next VMA during many tree alterations.
      
      The first user that is covered in this patch set is the vm_area_struct,
      where three data structures are replaced by the maple tree: the augmented
      rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
      long term goal is to reduce or remove the mmap_lock contention.
      
      The plan is to get to the point where we use the maple tree in RCU mode.
      Readers will not block for writers.  A single write operation will be
      allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
      would be RCU enabled and this mode would be entered once multiple tasks
      are using the mm_struct.
      
      Davidlor said
      
      : Yes I like the maple tree, and at this stage I don't think we can ask for
      : more from this series wrt the MM - albeit there seems to still be some
      : folks reporting breakage.  Fundamentally I see Liam's work to (re)move
      : complexity out of the MM (not to say that the actual maple tree is not
      : complex) by consolidating the three complimentary data structures very
      : much worth it considering performance does not take a hit.  This was very
      : much a turn off with the range locking approach, which worst case scenario
      : incurred in prohibitive overhead.  Also as Liam and Matthew have
      : mentioned, RCU opens up a lot of nice performance opportunities, and in
      : addition academia[1] has shown outstanding scalability of address spaces
      : with the foundation of replacing the locked rbtree with RCU aware trees.
      
      A similar work has been discovered in the academic press
      
      	https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf
      
      Sheer coincidence.  We designed our tree with the intention of solving the
      hardest problem first.  Upon settling on a b-tree variant and a rough
      outline, we researched ranged based b-trees and RCU b-trees and did find
      that article.  So it was nice to find reassurances that we were on the
      right path, but our design choice of using ranges made that paper unusable
      for us.
      
      This patch (of 70):
      
      The maple tree is an RCU-safe range based B-tree designed to use modern
      processor cache efficiently.  There are a number of places in the kernel
      that a non-overlapping range-based tree would be beneficial, especially
      one with a simple interface.  If you use an rbtree with other data
      structures to improve performance or an interval tree to track
      non-overlapping ranges, then this is for you.
      
      The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
      nodes.  With the increased branching factor, it is significantly shorter
      than the rbtree so it has fewer cache misses.  The removal of the linked
      list between subsequent entries also reduces the cache misses and the need
      to pull in the previous and next VMA during many tree alterations.
      
      The first user that is covered in this patch set is the vm_area_struct,
      where three data structures are replaced by the maple tree: the augmented
      rbtree, the vma cache, and the linked list of VMAs in the mm_struct.  The
      long term goal is to reduce or remove the mmap_lock contention.
      
      The plan is to get to the point where we use the maple tree in RCU mode.
      Readers will not block for writers.  A single write operation will be
      allowed at a time.  A reader re-walks if stale data is encountered.  VMAs
      would be RCU enabled and this mode would be entered once multiple tasks
      are using the mm_struct.
      
      There is additional BUG_ON() calls added within the tree, most of which
      are in debug code.  These will be replaced with a WARN_ON() call in the
      future.  There is also additional BUG_ON() calls within the code which
      will also be reduced in number at a later date.  These exist to catch
      things such as out-of-range accesses which would crash anyways.
      
      Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com
      Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com
      
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: default avatarDavid Howells <dhowells@redhat.com>
      Tested-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      54a611b6
  15. 07 Sep, 2022 2 commits
  16. 31 Aug, 2022 1 commit
  17. 24 Aug, 2022 1 commit
  18. 19 Aug, 2022 1 commit
  19. 15 Aug, 2022 1 commit
    • Sander Vanheule's avatar
      lib/cpumask: add inline cpumask_next_wrap() for UP · 2248ccd8
      Sander Vanheule authored
      
      In the uniprocessor case, cpumask_next_wrap() can be simplified, as the
      number of valid argument combinations is limited:
          - 'start' can only be 0
          - 'n' can only be -1 or 0
      
      The only valid CPU that can then be returned, if any, will be the first
      one set in the provided 'mask'.
      
      For NR_CPUS == 1, include/linux/cpumask.h now provides an inline
      definition of cpumask_next_wrap(), which will conflict with the one
      provided by lib/cpumask.c.  Make building of lib/cpumask.o again depend
      on CONFIG_SMP=y (i.e. NR_CPUS > 1) to avoid the re-definition.
      Suggested-by: default avatarYury Norov <yury.norov@gmail.com>
      Signed-off-by: default avatarSander Vanheule <sander@svanheule.net>
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      2248ccd8
  20. 02 Aug, 2022 1 commit
  21. 01 Aug, 2022 1 commit
    • Yury Norov's avatar
      lib/nodemask: inline next_node_in() and node_random() · 36d4b36b
      Yury Norov authored
      
      The functions are pretty thin wrappers around find_bit engine, and
      keeping them in c-file prevents compiler from small_const_nbits()
      optimization, which must take place for all systems with MAX_NUMNODES
      less than BITS_PER_LONG (default is 16 for me).
      
      Moving them to header file doesn't blow up the kernel size:
      add/remove: 1/2 grow/shrink: 9/5 up/down: 968/-88 (880)
      
      CC: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      CC: Paul Mackerras <paulus@samba.org>
      CC: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      CC: Stephen Rothwell <sfr@canb.auug.org.au>
      CC: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      36d4b36b
  22. 18 Jul, 2022 2 commits
  23. 15 Jul, 2022 1 commit
  24. 15 Jun, 2022 1 commit
    • Prasad Sodagudi's avatar
      lib: Add register read/write tracing support · d593d64f
      Prasad Sodagudi authored
      
      Generic MMIO read/write i.e., __raw_{read,write}{b,l,w,q} accessors
      are typically used to read/write from/to memory mapped registers
      and can cause hangs or some undefined behaviour in following few
      cases,
      
      * If the access to the register space is unclocked, for example: if
        there is an access to multimedia(MM) block registers without MM
        clocks.
      
      * If the register space is protected and not set to be accessible from
        non-secure world, for example: only EL3 (EL: Exception level) access
        is allowed and any EL2/EL1 access is forbidden.
      
      * If xPU(memory/register protection units) is controlling access to
        certain memory/register space for specific clients.
      
      and more...
      
      Such cases usually results in instant reboot/SErrors/NOC or interconnect
      hangs and tracing these register accesses can be very helpful to debug
      such issues during initial development stages and also in later stages.
      
      So use ftrace trace events to log such MMIO register accesses which
      provides rich feature set such as early enablement of trace events,
      filtering capability, dumping ftrace logs on console and many more.
      
      Sample output:
      
      rwmmio_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
      rwmmio_post_write: __qcom_geni_serial_console_write+0x160/0x1e0 width=32 val=0xa0d5d addr=0xfffffbfffdbff700
      rwmmio_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 addr=0xfffffbfffdbff610
      rwmmio_post_read: qcom_geni_serial_poll_bit+0x94/0x138 width=32 val=0x0 addr=0xfffffbfffdbff610
      Co-developed-by: default avatarSai Prakash Ranjan <quic_saipraka@quicinc.com>
      Signed-off-by: default avatarPrasad Sodagudi <psodagud@codeaurora.org>
      Signed-off-by: default avatarSai Prakash Ranjan <quic_saipraka@quicinc.com>
      Acked-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      d593d64f
  25. 12 Jun, 2022 1 commit
    • Jason A. Donenfeld's avatar
      crypto: memneq - move into lib/ · abfed87e
      Jason A. Donenfeld authored
      
      This is used by code that doesn't need CONFIG_CRYPTO, so move this into
      lib/ with a Kconfig option so that it can be selected by whatever needs
      it.
      
      This fixes a linker error Zheng pointed out when
      CRYPTO_MANAGER_DISABLE_TESTS!=y and CRYPTO=m:
      
        lib/crypto/curve25519-selftest.o: In function `curve25519_selftest':
        curve25519-selftest.c:(.init.text+0x60): undefined reference to `__crypto_memneq'
        curve25519-selftest.c:(.init.text+0xec): undefined reference to `__crypto_memneq'
        curve25519-selftest.c:(.init.text+0x114): undefined reference to `__crypto_memneq'
        curve25519-selftest.c:(.init.text+0x154): undefined reference to `__crypto_memneq'
      Reported-by: default avatarZheng Bin <zhengbin13@huawei.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: aa127963
      
       ("crypto: lib/curve25519 - re-add selftests")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      abfed87e
  26. 10 Jun, 2022 1 commit
    • Jason A. Donenfeld's avatar
      crypto: memneq - move into lib/ · 920b0442
      Jason A. Donenfeld authored
      
      This is used by code that doesn't need CONFIG_CRYPTO, so move this into
      lib/ with a Kconfig option so that it can be selected by whatever needs
      it.
      
      This fixes a linker error Zheng pointed out when
      CRYPTO_MANAGER_DISABLE_TESTS!=y and CRYPTO=m:
      
        lib/crypto/curve25519-selftest.o: In function `curve25519_selftest':
        curve25519-selftest.c:(.init.text+0x60): undefined reference to `__crypto_memneq'
        curve25519-selftest.c:(.init.text+0xec): undefined reference to `__crypto_memneq'
        curve25519-selftest.c:(.init.text+0x114): undefined reference to `__crypto_memneq'
        curve25519-selftest.c:(.init.text+0x154): undefined reference to `__crypto_memneq'
      Reported-by: default avatarZheng Bin <zhengbin13@huawei.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: aa127963
      
       ("crypto: lib/curve25519 - re-add selftests")
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      920b0442
  27. 22 May, 2022 1 commit
  28. 26 Apr, 2022 2 commits
  29. 21 Mar, 2022 1 commit
    • Kees Cook's avatar
      lib: stackinit: Convert to KUnit · 02788ebc
      Kees Cook authored
      
      Convert stackinit unit tests to KUnit, for better integration
      into the kernel self test framework. Includes a rename of
      test_stackinit.c to stackinit_kunit.c, and CONFIG_TEST_STACKINIT to
      CONFIG_STACKINIT_KUNIT_TEST.
      
      Adjust expected test results based on which stack initialization method
      was chosen:
      
       $ CMD="./tools/testing/kunit/kunit.py run stackinit --raw_output \
              --arch=x86_64 --kconfig_add"
      
       $ $CMD | grep stackinit:
       # stackinit: pass:36 fail:0 skip:29 total:65
      
       $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_USER=y | grep stackinit:
       # stackinit: pass:37 fail:0 skip:28 total:65
      
       $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF=y | grep stackinit:
       # stackinit: pass:55 fail:0 skip:10 total:65
      
       $ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y | grep stackinit:
       # stackinit: pass:62 fail:0 skip:3 total:65
      
       $ $CMD CONFIG_INIT_STACK_ALL_PATTERN=y --make_option LLVM=1 | grep stackinit:
       # stackinit: pass:60 fail:0 skip:5 total:65
      
       $ $CMD CONFIG_INIT_STACK_ALL_ZERO=y --make_option LLVM=1 | grep stackinit:
       # stackinit: pass:60 fail:0 skip:5 total:65
      
      Temporarily remove the userspace-build mode, which will be restored in a
      later patch.
      
      Expand the size of the pre-case switch variable so it doesn't get
      accidentally cleared.
      
      Cc: David Gow <davidgow@google.com>
      Cc: Daniel Latypov <dlatypov@google.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      ---
      v1: https://lore.kernel.org/lkml/20220224055145.1853657-1-keescook@chromium.org
      v2:
       - split "userspace KUnit stub" into separate header and patch (Daniel)
       - Improve commit log and comments (David)
       - Provide mapping of expected XFAIL tests to CONFIGs (David)
      02788ebc
  30. 18 Mar, 2022 1 commit
  31. 07 Mar, 2022 1 commit
  32. 27 Feb, 2022 1 commit
  33. 14 Feb, 2022 1 commit
    • Kees Cook's avatar
      fortify: Detect struct member overflows in memcpy() at compile-time · f68f2ff9
      Kees Cook authored
      memcpy() is dead; long live memcpy()
      
      tl;dr: In order to eliminate a large class of common buffer overflow
      flaws that continue to persist in the kernel, have memcpy() (under
      CONFIG_FORTIFY_SOURCE) perform bounds checking of the destination struct
      member when they have a known size. This would have caught all of the
      memcpy()-related buffer write overflow flaws identified in at least the
      last three years.
      
      Background and analysis:
      
      While stack-based buffer overflow flaws are largely mitigated by stack
      canaries (and similar) features, heap-based buffer overflow flaws continue
      to regularly appear in the kernel. Many classes of heap buffer overflows
      are mitigated by FORTIFY_SOURCE when using the strcpy() family of
      functions, but a significant number remain exposed through the memcpy()
      family of functions.
      
      At its core, FORTIFY_SOURCE uses the compiler's __builtin_object_size()
      internal[0] to determine the available size at a target address based on
      the compile-time known structure layout details. It operates in two
      modes: outer bounds (0) and inner bounds (1). In mode 0, the size of the
      enclosing structure is used. In mode 1, the size of the specific field
      is used. For example:
      
      	struct object {
      		u16 scalar1;	/* 2 bytes */
      		char array[6];	/* 6 bytes */
      		u64 scalar2;	/* 8 bytes */
      		u32 scalar3;	/* 4 bytes */
      		u32 scalar4;	/* 4 bytes */
      	} instance;
      
      __builtin_object_size(instance.array, 0) == 22, since the remaining size
      of the enclosing structure starting from "array" is 22 bytes (6 + 8 +
      4 + 4).
      
      __builtin_object_size(instance.array, 1) == 6, since the remaining size
      of the specific field "array" is 6 bytes.
      
      The initial implementation of FORTIFY_SOURCE used mode 0 because there
      were many cases of both strcpy() and memcpy() functions being used to
      write (or read) across multiple fields in a structure. For example,
      it would catch this, which is writing 2 bytes beyond the end of
      "instance":
      
      	memcpy(&instance.array, data, 25);
      
      While this didn't protect against overwriting adjacent fields in a given
      structure, it would at least stop overflows from reaching beyond the
      end of the structure into neighboring memory, and provided a meaningful
      mitigation of a subset of buffer overflow flaws. However, many desirable
      targets remain within the enclosing structure (for example function
      pointers).
      
      As it happened, there were very few cases of strcpy() family functions
      intentionally writing beyond the end of a string buffer. Once all known
      cases were removed from the kernel, the strcpy() family was tightened[1]
      to use mode 1, providing greater mitigation coverage.
      
      What remains is switching memcpy() to mode 1 as well, but making the
      switch is much more difficult because of how frustrating it can be to
      find existing "normal" uses of memcpy() that expect to write (or read)
      across multiple fields. The root cause of the problem is that the C
      language lacks a common pattern to indicate the intent of an author's
      use of memcpy(), and is further complicated by the available compile-time
      and run-time mitigation behaviors.
      
      The FORTIFY_SOURCE mitigation comes in two halves: the compile-time half,
      when both the buffer size _and_ the length of the copy is known, and the
      run-time half, when only the buffer size is known. If neither size is
      known, there is no bounds checking possible. At compile-time when the
      compiler sees that a length will always exceed a known buffer size,
      a warning can be deterministically emitted. For the run-time half,
      the length is tested against the known size of the buffer, and the
      overflowing operation is detected. (The performance overhead for these
      tests is virtually zero.)
      
      It is relatively easy to find compile-time false-positives since a warning
      is always generated. Fixing the false positives, however, can be very
      time-consuming as there are hundreds of instances. While it's possible
      some over-read conditions could lead to kernel memory exposures, the bulk
      of the risk comes from the run-time flaws where the length of a write
      may end up being attacker-controlled and lead to an overflow.
      
      Many of the compile-time false-positives take a form similar to this:
      
      	memcpy(&instance.scalar2, data, sizeof(instance.scalar2) +
      					sizeof(instance.scalar3));
      
      and the run-time ones are similar, but lack a constant expression for the
      size of the copy:
      
      	memcpy(instance.array, data, length);
      
      The former is meant to cover multiple fields (though its style has been
      frowned upon more recently), but has been technically legal. Both lack
      any expressivity in the C language about the author's _intent_ in a way
      that a compiler can check when the length isn't known at compile time.
      A comment doesn't work well because what's needed is something a compiler
      can directly reason about. Is a given memcpy() call expected to overflow
      into neighbors? Is it not? By using the new struct_group() macro, this
      intent can be much more easily encoded.
      
      It is not as easy to find the run-time false-positives since the code path
      to exercise a seemingly out-of-bounds condition that is actually expected
      may not be trivially reachable. Tightening the restrictions to block an
      operation for a false positive will either potentially create a greater
      flaw (if a copy is truncated by the mitigation), or destabilize the kernel
      (e.g. with a BUG()), making things completely useless for the end user.
      
      As a result, tightening the memcpy() restriction (when there is a
      reasonable level of uncertainty of the number of false positives), needs
      to first WARN() with no truncation. (Though any sufficiently paranoid
      end-user can always opt to set the panic_on_warn=1 sysctl.) Once enough
      development time has passed, the mitigation can be further intensified.
      (Note that this patch is only the compile-time checking step, which is
      a prerequisite to doing run-time checking, which will come in future
      patches.)
      
      Given the potential frustrations of weeding out all the false positives
      when tightening the run-time checks, it is reasonable to wonder if these
      changes would actually add meaningful protection. Looking at just the
      last three years, there are 23 identified flaws with a CVE that mention
      "buffer overflow", and 11 are memcpy()-related buffer overflows.
      
      (For the remaining 12: 7 are array index overflows that would be
      mitigated by systems built with CONFIG_UBSAN_BOUNDS=y: CVE-2019-0145,
      CVE-2019-14835, CVE-2019-14896, CVE-2019-14897, CVE-2019-14901,
      CVE-2019-17666, CVE-2021-28952. 2 are miscalculated allocation
      sizes which could be mitigated with memory tagging: CVE-2019-16746,
      CVE-2019-2181. 1 is an iovec buffer bug maybe mitigated by memory tagging:
      CVE-2020-10742. 1 is a type confusion bug mitigated by stack canaries:
      CVE-2020-10942. 1 is a string handling logic bug with no mitigation I'm
      aware of: CVE-2021-28972.)
      
      At my last count on an x86_64 allmodconfig build, there are 35,294
      calls to memcpy(). With callers instrumented to report all places
      where the buffer size is known but the length remains unknown (i.e. a
      run-time bounds check is added), we can count how many new run-time
      bounds checks are added when the destination and source arguments of
      memcpy() are changed to use "mode 1" bounds checking: 1,276. This means
      for the future run-time checking, there is a worst-case upper bounds
      of 3.6% false positives to fix. In addition, there were around 150 new
      compile-time warnings to evaluate and fix (which have now been fixed).
      
      With this instrumentation it's also possible to compare the places where
      the known 11 memcpy() flaw overflows manifested against the resulting
      list of potential new run-time bounds checks, as a measure of potential
      efficacy of the tightened mitigation. Much to my surprise, horror, and
      delight, all 11 flaws would have been detected by the newly added run-time
      bounds checks, making this a distinctly clear mitigation improvement: 100%
      coverage for known memcpy() flaws, with a possible 2 orders of magnitude
      gain in coverage over existing but undiscovered run-time dynamic length
      flaws (i.e. 1265 newly covered sites in addition to the 11 known), against
      only <4% of all memcpy() callers maybe gaining a false positive run-time
      check, with only about 150 new compile-time instances needing evaluation.
      
      Specifically these would have been mitigated:
      CVE-2020-24490 https://git.kernel.org/linus/a2ec905d1e160a33b2e210e45ad30445ef26ce0e
      CVE-2020-12654 https://git.kernel.org/linus/3a9b153c5591548612c3955c9600a98150c81875
      CVE-2020-12653 https://git.kernel.org/linus/b70261a288ea4d2f4ac7cd04be08a9f0f2de4f4d
      CVE-2019-14895 https://git.kernel.org/linus/3d94a4a8373bf5f45cf5f939e88b8354dbf2311b
      CVE-2019-14816 https://git.kernel.org/linus/7caac62ed598a196d6ddf8d9c121e12e082cac3a
      CVE-2019-14815 https://git.kernel.org/linus/7caac62ed598a196d6ddf8d9c121e12e082cac3a
      CVE-2019-14814 https://git.kernel.org/linus/7caac62ed598a196d6ddf8d9c121e12e082cac3a
      CVE-2019-10126 https://git.kernel.org/linus/69ae4f6aac1578575126319d3f55550e7e440449
      CVE-2019-9500  https://git.kernel.org/linus/1b5e2423164b3670e8bc9174e4762d297990deff
      no-CVE-yet     https://git.kernel.org/linus/130f634da1af649205f4a3dd86cbe5c126b57914
      no-CVE-yet     https://git.kernel.org/linus/d10a87a3535cce2b890897914f5d0d83df669c63
      
      To accelerate the review of potential run-time false positives, it's
      also worth noting that it is possible to partially automate checking
      by examining the memcpy() buffer argument to check for the destination
      struct member having a neighboring array member. It is reasonable to
      expect that the vast majority of run-time false positives would look like
      the already evaluated and fixed compile-time false positives, where the
      most common pattern is neighboring arrays. (And, FWIW, many of the
      compile-time fixes were actual bugs, so it is reasonable to assume we'll
      have similar cases of actual bugs getting fixed for run-time checks.)
      
      Implementation:
      
      Tighten the memcpy() destination buffer size checking to use the actual
      ("mode 1") target buffer size as the bounds check instead of their
      enclosing structure's ("mode 0") size. Use a common inline for memcpy()
      (and memmove() in a following patch), since all the tests are the
      same. All new cross-field memcpy() uses must use the struct_group() macro
      or similar to target a specific range of fields, so that FORTIFY_SOURCE
      can reason about the size and safety of the copy.
      
      For now, cross-member "mode 1" _read_ detection at compile-time will be
      limited to W=1 builds, since it is, unfortunately, very common. As the
      priority is solving write overflows, read overflows will be part of a
      future phase (and can be fixed in parallel, for anyone wanting to look
      at W=1 build output).
      
      For run-time, the "mode 0" size checking and mitigation is left unchanged,
      with "mode 1" to be added in stages. In this patch, no new run-time
      checks are added. Future patches will first bounds-check writes,
      and only perform a WARN() for now. This way any missed run-time false
      positives can be flushed out over the coming several development cycles,
      but system builders who have tested their workloads to be WARN()-free
      can enable the panic_on_warn=1 sysctl to immediately gain a mitigation
      against this class of buffer overflows. Once that is under way, run-time
      bounds-checking of reads can be similarly enabled.
      
      Related classes of flaws that will remain unmitigated:
      
      - memcpy() with flexible array structures, as the compiler does not
        currently have visibility into the size of the trailing flexible
        array. These can be fixed in the future by refactoring such cases
        to use a new set of flexible array structure helpers to perform the
        common serialization/deserialization code patterns doing allocation
        and/or copying.
      
      - memcpy() with raw pointers (e.g. void *, char *, etc), or otherwise
        having their buffer size unknown at compile time, have no good
        mitigation beyond memory tagging (and even that would only protect
        against inter-object overflow, not intra-object neighboring field
        overflows), or refactoring. Some kind of "fat pointer" solution is
        likely needed to gain proper size-of-buffer awareness. (e.g. see
        struct membuf)
      
      - type confusion where a higher level type's allocation size does
        not match the resulting cast type eventually passed to a deeper
        memcpy() call where the compiler cannot see the true type. In
        theory, greater static analysis could catch these, and the use
        of -Warray-bounds will help find some of these.
      
      [0] https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html
      [1] https://git.kernel.org/linus/6a39e62abbafd1d58d1722f40c7d26ef379c6a2f
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      f68f2ff9