1. 09 Nov, 2018 1 commit
    • Borislav Petkov's avatar
      locking/atomics: Fix out-of-tree build · bdf37b4d
      Borislav Petkov authored
      Building a kernel out of tree with:
      
        make O=/tmp/b oldconfig
        cd /tmp/b
        make
      
      gives this error:
      
          CALL    /mnt/kernel/kernel/linux/scripts/atomic/check-atomics.sh
        /bin/bash: scripts/atomic/check-atomics.sh: No such file or directory
        make[3]: *** [/mnt/kernel/kernel/linux/./Kbuild:86: old-atomics] Error 127
        make[3]: *** Waiting for unfinished jobs....
      
      Make the command use the proper build prerequisite which is the absolute
      path to the script.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: arnd@arndb.de
      Cc: aryabinin@virtuozzo.com
      Cc: catalin.marinas@arm.com
      Cc: dvyukov@google.com
      Cc: glider@google.com
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linuxdrivers@attotech.com
      Fixes: 8d325880 ("locking/atomics: Check generated headers are up-to-date")
      Link: http://lkml.kernel.org/r/20181108194128.13368-1-bp@alien8.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      bdf37b4d
  2. 01 Nov, 2018 7 commits
    • Ingo Molnar's avatar
      locking/atomics: Fix scripts/atomic/ script permissions · 4d8e5cd2
      Ingo Molnar authored
      Mark all these scripts executable.
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: linuxdrivers@attotech.com
      Cc: dvyukov@google.com
      Cc: boqun.feng@gmail.com
      Cc: arnd@arndb.de
      Cc: aryabinin@virtuozzo.com
      Cc: glider@google.com
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4d8e5cd2
    • Mark Rutland's avatar
      arm64, locking/atomics: Use instrumented atomics · c0df1081
      Mark Rutland authored
      Now that the generic atomic headers provide instrumented wrappers of all
      the atomics implemented by arm64, let's migrate arm64 over to these.
      
      The additional instrumentation will help to find bugs (e.g. when fuzzing
      with Syzkaller).
      
      Mostly this change involves adding an arch_ prefix to a number of
      function names and macro definitions. When LSE atomics are used, the
      out-of-line LL/SC atomics will be named __ll_sc_arch_atomic_${OP}.
      
      Adding the arch_ prefix requires some whitespace fixups to keep things
      aligned. Some other unusual whitespace is fixed up at the same time
      (e.g. in the cmpxchg wrappers).
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: linuxdrivers@attotech.com
      Cc: dvyukov@google.com
      Cc: boqun.feng@gmail.com
      Cc: arnd@arndb.de
      Cc: aryabinin@virtuozzo.com
      Cc: glider@google.com
      Link: http://lkml.kernel.org/r/20180904104830.2975-7-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c0df1081
    • Mark Rutland's avatar
      locking/atomics: Check generated headers are up-to-date · 8d325880
      Mark Rutland authored
      Now that all the generated atomic headers are in place, it would be good
      to ensure that:
      
      a) the headers are up-to-date when scripting changes.
      
      b) developers don't directly modify the generated headers.
      
      To ensure both of these properties, let's add a Kbuild step to check
      that the generated headers are up-to-date.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: catalin.marinas@arm.com
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linuxdrivers@attotech.com
      Cc: dvyukov@google.com
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: arnd@arndb.de
      Cc: aryabinin@virtuozzo.com
      Cc: glider@google.com
      Link: http://lkml.kernel.org/r/20180904104830.2975-6-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      8d325880
    • Mark Rutland's avatar
      locking/atomics: Switch to generated instrumentation · aa525d06
      Mark Rutland authored
      As a step towards ensuring the atomic* APIs are consistent, let's switch
      to wrappers generated by gen-atomic-instrumented.h, using the same table
      used to generate the fallbacks and atomic-long wrappers.
      
      These are checked in rather than generated with Kbuild, since:
      
      * This allows inspection of the atomics with git grep and ctags on a
        pristine tree, which Linus strongly prefers being able to do.
      
      * The fallbacks are not affected by machine details or configuration
        options, so it is not necessary to regenerate them to take these into
        account.
      
      * These are included by files required *very* early in the build process
        (e.g. for generating bounds.h), and we'd rather not complicate the
        top-level Kbuild file with dependencies.
      
      Generating the atomic headers means that the instrumented wrappers will
      remain in sync with the rest of the atomic APIs, and we gain all the
      ordering variants of each atomic without having to manually expanded
      them all.
      
      The KASAN checks are automatically generated based on the function
      parameters defined in atomics.tbl. Note that try_cmpxchg() now correctly
      treats 'old' as a parameter that may be written to, and not only read as
      the hand-written instrumentation assumed.
      
      Other than the change to try_cmpxchg(), existing code should not be
      affected by this patch. The patch introduces instrumentation for all
      optional atomics (and ordering variants), along with the ifdeffery this
      requires, enabling other architectures to make use of the instrumented
      atomics.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: catalin.marinas@arm.com
      Cc: linuxdrivers@attotech.com
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Link: http://lkml.kernel.org/r/20180904104830.2975-5-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      aa525d06
    • Mark Rutland's avatar
      locking/atomics: Switch to generated atomic-long · b5d47ef9
      Mark Rutland authored
      As a step towards ensuring the atomic* APIs are consistent, let's switch
      to wrappers generated by gen-atomic-long.h, using the same table that
      gen-atomic-fallbacks.h uses to fill in gaps in the atomic_* and
      atomic64_* APIs.
      
      These are checked in rather than generated with Kbuild, since:
      
      * This allows inspection of the atomics with git grep and ctags on a
        pristine tree, which Linus strongly prefers being able to do.
      
      * The fallbacks are not affected by machine details or configuration
        options, so it is not necessary to regenerate them to take these into
        account.
      
      * These are included by files required *very* early in the build process
        (e.g. for generating bounds.h), and we'd rather not complicate the
        top-level Kbuild file with dependencies.
      
      Other than *_INIT() and *_cond_read_acquire(), all API functions are
      implemented as static inline C functions, ensuring consistent type
      promotion and/or truncation without requiring explicit casts to be
      applied to parameters or return values.
      
      Since we typedef atomic_long_t to either atomic_t or atomic64_t, we know
      these types are equivalent, and don't require explicit casts between
      them. However, as the try_cmpxchg*() functions take a pointer for the
      'old' parameter, which may be an int or s64, an explicit cast is
      generated for this.
      
      There should be no functional change as a result of this patch (i.e.
      existing code should not be affected). However, this introduces a number
      of functions into the atomic_long_* API, bringing it into line with the
      atomic_* and atomic64_* APIs.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: catalin.marinas@arm.com
      Cc: linuxdrivers@attotech.com
      Cc: dvyukov@google.com
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: aryabinin@virtuozzo.com
      Cc: glider@google.com
      Link: http://lkml.kernel.org/r/20180904104830.2975-4-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      b5d47ef9
    • Mark Rutland's avatar
      locking/atomics: Switch to generated fallbacks · 9fa45070
      Mark Rutland authored
      As a step to ensuring the atomic* APIs are consistent, switch to fallbacks
      generated by gen-atomic-fallback.sh.
      
      These are checked in rather than generated with Kbuild, since:
      
      * This allows inspection of the atomics with git grep and ctags on a
        pristine tree, which Linus strongly prefers being able to do.
      
      * The fallbacks are not affected by machine details or configuration
        options, so it is not necessary to regenerate them to take these into
        account.
      
      * These are included by files required *very* early in the build process
        (e.g. for generating bounds.h), and we'd rather not complicate the
        top-level Kbuild file with dependencies.
      
      The new fallback header should be equivalent to the old fallbacks in
      <linux/atomic.h>, but:
      
      * It is formatted a little differently due to scripting ensuring things
        are more regular than they used to be.
      
      * Fallbacks are now expanded in-place as static inline functions rather
        than macros.
      
      * The prototypes for fallbacks are arragned consistently with the return
        type on a separate line to try to keep to a sensible line length.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: catalin.marinas@arm.com
      Cc: linuxdrivers@attotech.com
      Cc: dvyukov@google.com
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: arnd@arndb.de
      Cc: aryabinin@virtuozzo.com
      Cc: glider@google.com
      Link: http://lkml.kernel.org/r/20180904104830.2975-3-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9fa45070
    • Mark Rutland's avatar
      locking/atomics: Add common header generation files · ace9bad4
      Mark Rutland authored
      To minimize repetition, to allow for future rework, and to ensure
      regularity of the various atomic APIs, we'd like to automatically
      generate (the bulk of) a number of headers related to atomics.
      
      This patch adds the infrastructure to do so, leaving actual conversion
      of headers to subsequent patches. This infrastructure consists of:
      
      * atomics.tbl - a table describing the functions in the atomics API,
        with names, prototypes, and metadata describing the variants that
        exist (e.g fetch/return, acquire/release/relaxed). Note that the
        return type is dependent on the particular variant.
      
      * atomic-tbl.sh - a library of routines useful for dealing with
        atomics.tbl (e.g. querying which variants exist, or generating
        argument/parameter lists for a given function variant).
      
      * gen-atomic-fallback.sh - a script which generates a header of
        fallbacks, covering cases where architecture omit certain functions
        (e.g. omitting relaxed variants).
      
      * gen-atomic-long.sh - a script which generates wrappers providing the
        atomic_long API atomic of the relevant atomic or atomic64 API,
        ensuring the APIs are consistent.
      
      * gen-atomic-instrumented.sh - a script which generates atomic* wrappers
        atop of arch_atomic* functions, with automatically generated KASAN
        instrumentation.
      
      * fallbacks/* - a set of fallback implementations for atomics, which
        should be used when no implementation of a given atomic is provided.
        These are used by gen-atomic-fallback.sh to generate fallbacks, and
        these are also used by other scripts to determine the set of optional
        atomics (as required to generate preprocessor guards correctly).
      
        Fallbacks may use the following variables:
      
        ${atomic}     atomic prefix: atomic/atomic64/atomic_long, which can be
      		used to derive the atomic type, and to prefix functions
      
        ${int}        integer type: int/s64/long
      
        ${pfx}        variant prefix, e.g. fetch_
      
        ${name}       base function name, e.g. add
      
        ${sfx}        variant suffix, e.g. _return
      
        ${order}      order suffix, e.g. _relaxed
      
        ${atomicname} full name, e.g. atomic64_fetch_add_relaxed
      
        ${ret}        return type of the function, e.g. void
      
        ${retstmt}    a return statement (with a trailing space), unless the
                      variant returns void
      
        ${params}     parameter list for the function declaration, e.g.
                      "int i, atomic_t *v"
      
        ${args}       argument list for invoking the function, e.g. "i, v"
      
        ... for clarity, ${ret}, ${retstmt}, ${params}, and ${args} are
        open-coded for fallbacks where these do not vary, or are critical to
        understanding the logic of the fallback.
      
      The MAINTAINERS entry for the atomic infrastructure is updated to cover
      the new scripts.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: catalin.marinas@arm.com
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linuxdrivers@attotech.com
      Cc: dvyukov@google.com
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: arnd@arndb.de
      Cc: aryabinin@virtuozzo.com
      Cc: glider@google.com
      Link: http://lkml.kernel.org/r/20180904104830.2975-2-mark.rutland@arm.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ace9bad4
  3. 19 Oct, 2018 2 commits
    • Waiman Long's avatar
      locking/lockdep: Make global debug_locks* variables read-mostly · 01a14bda
      Waiman Long authored
      Make the frequently used lockdep global variable debug_locks read-mostly.
      As debug_locks_silent is sometime used together with debug_locks,
      it is also made read-mostly so that they can be close together.
      
      With false cacheline sharing, cacheline contention problem can happen
      depending on what get put into the same cacheline as debug_locks.
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1539913518-15598-2-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      01a14bda
    • Waiman Long's avatar
      locking/lockdep: Fix debug_locks off performance problem · 9506a742
      Waiman Long authored
      It was found that when debug_locks was turned off because of a problem
      found by the lockdep code, the system performance could drop quite
      significantly when the lock_stat code was also configured into the
      kernel. For instance, parallel kernel build time on a 4-socket x86-64
      server nearly doubled.
      
      Further analysis into the cause of the slowdown traced back to the
      frequent call to debug_locks_off() from the __lock_acquired() function
      probably due to some inconsistent lockdep states with debug_locks
      off. The debug_locks_off() function did an unconditional atomic xchg
      to write a 0 value into debug_locks which had already been set to 0.
      This led to severe cacheline contention in the cacheline that held
      debug_locks.  As debug_locks is being referenced in quite a few different
      places in the kernel, this greatly slow down the system performance.
      
      To prevent that trashing of debug_locks cacheline, lock_acquired()
      and lock_contended() now checks the state of debug_locks before
      proceeding. The debug_locks_off() function is also modified to check
      debug_locks before calling __debug_locks_off().
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9506a742
  4. 17 Oct, 2018 2 commits
    • Waiman Long's avatar
      locking/pvqspinlock: Extend node size when pvqspinlock is configured · 0fa809ca
      Waiman Long authored
      The qspinlock code supports up to 4 levels of slowpath nesting using
      four per-CPU mcs_spinlock structures. For 64-bit architectures, they
      fit nicely in one 64-byte cacheline.
      
      For para-virtualized (PV) qspinlocks it needs to store more information
      in the per-CPU node structure than there is space for. It uses a trick
      to use a second cacheline to hold the extra information that it needs.
      So PV qspinlock needs to access two extra cachelines for its information
      whereas the native qspinlock code only needs one extra cacheline.
      
      Freshly added counter profiling of the qspinlock code, however, revealed
      that it was very rare to use more than two levels of slowpath nesting.
      So it doesn't make sense to penalize PV qspinlock code in order to have
      four mcs_spinlock structures in the same cacheline to optimize for a case
      in the native qspinlock code that rarely happens.
      
      Extend the per-CPU node structure to have two more long words when PV
      qspinlock locks are configured to hold the extra data that it needs.
      
      As a result, the PV qspinlock code will enjoy the same benefit of using
      just one extra cacheline like the native counterpart, for most cases.
      
      [ mingo: Minor changelog edits. ]
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1539697507-28084-2-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0fa809ca
    • Waiman Long's avatar
      locking/qspinlock_stat: Count instances of nested lock slowpaths · 1222109a
      Waiman Long authored
      Queued spinlock supports up to 4 levels of lock slowpath nesting -
      user context, soft IRQ, hard IRQ and NMI. However, we are not sure how
      often the nesting happens.
      
      So add 3 more per-CPU stat counters to track the number of instances where
      nesting index goes to 1, 2 and 3 respectively.
      
      On a dual-socket 64-core 128-thread Zen server, the following were the
      new stat counter values under different circumstances:
      
               State                         slowpath   index1   index2   index3
               -----                         --------   ------   ------   -------
        After bootup                         1,012,150    82       0        0
        After parallel build + perf-top    125,195,009    82       0        0
      
      So the chance of having more than 2 levels of nesting is extremely low.
      
      [ mingo: Minor changelog edits. ]
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/1539697507-28084-1-git-send-email-longman@redhat.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1222109a
  5. 16 Oct, 2018 6 commits
  6. 10 Oct, 2018 1 commit
  7. 09 Oct, 2018 2 commits
  8. 06 Oct, 2018 4 commits
  9. 05 Oct, 2018 1 commit
  10. 04 Oct, 2018 8 commits
    • Nadav Amit's avatar
      x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops · 494b5168
      Nadav Amit authored
      As described in:
      
        77b0bf55: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
      
      GCC's inlining heuristics are broken with common asm() patterns used in
      kernel code, resulting in the effective disabling of inlining.
      
      The workaround is to set an assembly macro and call it from the inline
      assembly block. As a result GCC considers the inline assembly block as
      a single instruction. (Which it isn't, but that's the best we can get.)
      
      In this patch we wrap the paravirt call section tricks in a macro,
      to hide it from GCC.
      
      The effect of the patch is a more aggressive inlining, which also
      causes a size increase of kernel.
      
            text     data     bss      dec     hex  filename
        18147336 10226688 2957312 31331336 1de1408  ./vmlinux before
        18162555 10226288 2957312 31346155 1de4deb  ./vmlinux after (+14819)
      
      The number of static text symbols (non-inlined functions) goes down:
      
        Before: 40053
        After:  39942 (-111)
      
      [ mingo: Rewrote the changelog. ]
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: virtualization@lists.linux-foundation.org
      Link: http://lkml.kernel.org/r/20181003213100.189959-8-namit@vmware.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      494b5168
    • Nadav Amit's avatar
      x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs · f81f8ad5
      Nadav Amit authored
      As described in:
      
        77b0bf55: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
      
      GCC's inlining heuristics are broken with common asm() patterns used in
      kernel code, resulting in the effective disabling of inlining.
      
      The workaround is to set an assembly macro and call it from the inline
      assembly block. As a result GCC considers the inline assembly block as
      a single instruction. (Which it isn't, but that's the best we can get.)
      
      This patch increases the kernel size:
      
            text     data     bss      dec     hex  filename
        18146889 10225380 2957312 31329581 1de0d2d  ./vmlinux before
        18147336 10226688 2957312 31331336 1de1408  ./vmlinux after (+1755)
      
      But enables more aggressive inlining (and probably better branch decisions).
      
      The number of static text symbols in vmlinux is much lower:
      
       Before: 40218
       After:  40053 (-165)
      
      The assembly code gets harder to read due to the extra macro layer.
      
      [ mingo: Rewrote the changelog. ]
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20181003213100.189959-7-namit@vmware.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f81f8ad5
    • Nadav Amit's avatar
      x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs · 77f48ec2
      Nadav Amit authored
      As described in:
      
        77b0bf55: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
      
      GCC's inlining heuristics are broken with common asm() patterns used in
      kernel code, resulting in the effective disabling of inlining.
      
      The workaround is to set an assembly macro and call it from the inline
      assembly block - i.e. to macrify the affected block.
      
      As a result GCC considers the inline assembly block as a single instruction.
      
      This patch handles the LOCK prefix, allowing more aggresive inlining:
      
            text     data     bss      dec     hex  filename
        18140140 10225284 2957312 31322736 1ddf270  ./vmlinux before
        18146889 10225380 2957312 31329581 1de0d2d  ./vmlinux after (+6845)
      
      This is the reduction in non-inlined functions:
      
        Before: 40286
        After:  40218 (-68)
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20181003213100.189959-6-namit@vmware.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      77f48ec2
    • Nadav Amit's avatar
      x86/refcount: Work around GCC inlining bug · 9e1725b4
      Nadav Amit authored
      As described in:
      
        77b0bf55: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
      
      GCC's inlining heuristics are broken with common asm() patterns used in
      kernel code, resulting in the effective disabling of inlining.
      
      The workaround is to set an assembly macro and call it from the inline
      assembly block. As a result GCC considers the inline assembly block as
      a single instruction. (Which it isn't, but that's the best we can get.)
      
      This patch allows GCC to inline simple functions such as __get_seccomp_filter().
      
      To no-one's surprise the result is that GCC performs more aggressive (read: correct)
      inlining decisions in these senarios, which reduces the kernel size and presumably
      also speeds it up:
      
            text     data     bss      dec     hex  filename
        18140970 10225412 2957312 31323694 1ddf62e  ./vmlinux before
        18140140 10225284 2957312 31322736 1ddf270  ./vmlinux after (-958)
      
      16 fewer static text symbols:
      
         Before: 40302
          After: 40286 (-16)
      
      these got inlined instead.
      
      Functions such as kref_get(), free_user(), fuse_file_get() now get inlined. Hurray!
      
      [ mingo: Rewrote the changelog. ]
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20181003213100.189959-5-namit@vmware.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9e1725b4
    • Nadav Amit's avatar
      x86/objtool: Use asm macros to work around GCC inlining bugs · c06c4d80
      Nadav Amit authored
      As described in:
      
        77b0bf55: ("kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs")
      
      GCC's inlining heuristics are broken with common asm() patterns used in
      kernel code, resulting in the effective disabling of inlining.
      
      In the case of objtool the resulting borkage can be significant, since all the
      annotations of objtool are discarded during linkage and never inlined,
      yet GCC bogusly considers most functions affected by objtool annotations
      as 'too large'.
      
      The workaround is to set an assembly macro and call it from the inline
      assembly block. As a result GCC considers the inline assembly block as
      a single instruction. (Which it isn't, but that's the best we can get.)
      
      This increases the kernel size slightly:
      
            text     data     bss      dec     hex filename
        18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
        18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829)
      
      The number of static text symbols (i.e. non-inlined functions) is reduced:
      
        Before:  40321
        After:   40302 (-19)
      
      [ mingo: Rewrote the changelog. ]
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-sparse@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181003213100.189959-4-namit@vmware.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c06c4d80
    • Nadav Amit's avatar
      kbuild/Makefile: Prepare for using macros in inline assembly code to work... · 77b0bf55
      Nadav Amit authored
      kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs
      
      Using macros in inline assembly allows us to work around bugs
      in GCC's inlining decisions.
      
      Compile macros.S and use it to assemble all C files.
      Currently only x86 will use it.
      
      Background:
      
      The inlining pass of GCC doesn't include an assembler, so it's not aware
      of basic properties of the generated code, such as its size in bytes,
      or that there are such things as discontiuous blocks of code and data
      due to the newfangled linker feature called 'sections' ...
      
      Instead GCC uses a lazy and fragile heuristic: it does a linear count of
      certain syntactic and whitespace elements in inlined assembly block source
      code, such as a count of new-lines and semicolons (!), as a poor substitute
      for "code size and complexity".
      
      Unsurprisingly this heuristic falls over and breaks its neck whith certain
      common types of kernel code that use inline assembly, such as the frequent
      practice of putting useful information into alternative sections.
      
      As a result of this fresh, 20+ years old GCC bug, GCC's inlining decisions
      are effectively disabled for inlined functions that make use of such asm()
      blocks, because GCC thinks those sections of code are "large" - when in
      reality they are often result in just a very low number of machine
      instructions.
      
      This absolute lack of inlining provess when GCC comes across such asm()
      blocks both increases generated kernel code size and causes performance
      overhead, which is particularly noticeable on paravirt kernels, which make
      frequent use of these inlining facilities in attempt to stay out of the
      way when running on baremetal hardware.
      
      Instead of fixing the compiler we use a workaround: we set an assembly macro
      and call it from the inlined assembly block. As a result GCC considers the
      inline assembly block as a single instruction. (Which it often isn't but I digress.)
      
      This uglifies and bloats the source code - for example just the refcount
      related changes have this impact:
      
       Makefile                 |    9 +++++++--
       arch/x86/Makefile        |    7 +++++++
       arch/x86/kernel/macros.S |    7 +++++++
       scripts/Kbuild.include   |    4 +++-
       scripts/mod/Makefile     |    2 ++
       5 files changed, 26 insertions(+), 3 deletions(-)
      
      Yay readability and maintainability, it's not like assembly code is hard to read
      and maintain ...
      
      We also hope that GCC will eventually get fixed, but we are not holding
      our breath for that. Yet we are optimistic, it might still happen, any decade now.
      
      [ mingo: Wrote new changelog describing the background. ]
      Tested-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kbuild@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181003213100.189959-3-namit@vmware.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      77b0bf55
    • Nadav Amit's avatar
      kbuild/arch/xtensa: Define LINKER_SCRIPT for the linker script · 35e76b99
      Nadav Amit authored
      Define the LINKER_SCRIPT when building the linker script as being done
      in other architectures. This is required, because upcoming Makefile changes
      would otherwise break things.
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Acked-by: default avatarMax Filippov <jcmvbkbc@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-xtensa@linux-xtensa.org
      Link: http://lkml.kernel.org/r/20181003213100.189959-2-namit@vmware.comSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      35e76b99
    • Ingo Molnar's avatar
      c0554d2d
  11. 03 Oct, 2018 3 commits
  12. 02 Oct, 2018 3 commits