1. 12 Sep, 2022 26 commits
    • Uros Bizjak's avatar
      bitops: use try_cmpxchg in set_mask_bits and bit_clear_unless · e1d7c760
      Uros Bizjak authored
      Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
      set_mask_bits and bit_clear_unless.  x86 CMPXCHG instruction returns
      success in ZF flag, so this change saves a compare after cmpxchg (and
      related move instruction in front of cmpxchg).
      
      Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
      fails, enabling further code simplifications.
      
      Link: https://lkml.kernel.org/r/20220822143851.3290-1-ubizjak@gmail.comSigned-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e1d7c760
    • Fabio M. De Francesco's avatar
      hfs: replace kmap() with kmap_local_page() in btree.c · 21490eff
      Fabio M. De Francesco authored
      kmap() is being deprecated in favor of kmap_local_page().
      
      Two main problems with kmap(): (1) It comes with an overhead as mapping
      space is restricted and protected by a global lock for synchronization and
      (2) it also requires global TLB invalidation when the kmap's pool wraps
      and it might block when the mapping space is fully utilized until a slot
      becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and still valid.
      
      Since its use in btree.c is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in btree.c.  Where
      possible, use the suited standard helpers (memzero_page(), memcpy_page())
      instead of open coding kmap_local_page() plus memset() or memcpy().
      
      Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      Link: https://lkml.kernel.org/r/20220821180400.8198-4-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Suggested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chaitanya Kulkarni <kch@nvidia.com>
      Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
      Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      21490eff
    • Fabio M. De Francesco's avatar
      hfs: replace kmap() with kmap_local_page() in bnode.c · ca0ac8df
      Fabio M. De Francesco authored
      kmap() is being deprecated in favor of kmap_local_page().
      
      Two main problems with kmap(): (1) It comes with an overhead as mapping
      space is restricted and protected by a global lock for synchronization and
      (2) it also requires global TLB invalidation when the kmap's pool wraps
      and it might block when the mapping space is fully utilized until a slot
      becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and still valid.
      
      Since its use in bnode.c is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in bnode.c.  Where
      possible, use the suited standard helpers (memzero_page(), memcpy_page())
      instead of open coding kmap_local_page() plus memset() or memcpy().
      
      Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      Link: https://lkml.kernel.org/r/20220821180400.8198-3-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Suggested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chaitanya Kulkarni <kch@nvidia.com>
      Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
      Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ca0ac8df
    • Fabio M. De Francesco's avatar
      hfs: unmap the page in the "fail_page" label · d75e9a4b
      Fabio M. De Francesco authored
      Patch series "hfs: Replace kmap() with kmap_local_page()".
      
      kmap() is being deprecated in favor of kmap_local_page().
      
      There are two main problems with kmap(): (1) It comes with an overhead as
      mapping space is restricted and protected by a global lock for
      synchronization and (2) it also requires global TLB invalidation when the
      kmaps pool wraps and it might block when the mapping space is fully
      utilized until a slot becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and still valid.
      
      Since its use in fs/hfs is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in fs/hfs.  Where
      possible, use the suited standard helpers (memzero_page(), memcpy_page())
      instead of open coding kmap_local_page() plus memset() or memcpy().
      
      Fix a bug due to a page being not unmapped if the code jumps to the
      "fail_page" label (1/3).
      
      Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      
      This patch (of 3):
      
      Several paths within hfs_btree_open() jump to the "fail_page" label where
      put_page() is called while the page is still mapped.
      
      Call kunmap() to unmap the page soon before put_page().
      
      Link: https://lkml.kernel.org/r/20220821180400.8198-1-fmdefrancesco@gmail.com
      Link: https://lkml.kernel.org/r/20220821180400.8198-2-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chaitanya Kulkarni <kch@nvidia.com>
      Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
      Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
      Cc: Matthew Wilcox <willy@infradead.org>]
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d75e9a4b
    • Fabio M. De Francesco's avatar
      kexec: replace kmap() with kmap_local_page() · 948084f0
      Fabio M. De Francesco authored
      kmap() is being deprecated in favor of kmap_local_page().
      
      There are two main problems with kmap(): (1) It comes with an overhead as
      mapping space is restricted and protected by a global lock for
      synchronization and (2) it also requires global TLB invalidation when the
      kmap's pool wraps and it might block when the mapping space is fully
      utilized until a slot becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and are still valid.
      
      Since its use in kexec_core.c is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in kexec_core.c.
      
      Tested on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      Link: https://lkml.kernel.org/r/20220821182519.9483-1-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Suggested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Acked-by: default avatarBaoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      948084f0
    • Uros Bizjak's avatar
      iversion: use atomic64_try_cmpxchg) · da3f52ba
      Uros Bizjak authored
      Use atomic64_try_cmpxchg instead of
      atomic64_cmpxchg (*ptr, old, new) == old in inode_set_max_iversion_raw,
      inode_maybe_inc_version and inode_query_iversion. x86 CMPXCHG instruction
      returns success in ZF flag, so this change saves a compare after cmpxchg
      (and related move instruction in front of cmpxchg).
      
      Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
      fails, enabling further code simplifications.
      
      The loop in inode_maybe_inc_iversion improves from:
      
          5563:	48 89 ca             	mov    %rcx,%rdx
          5566:	48 89 c8             	mov    %rcx,%rax
          5569:	48 83 e2 fe          	and    $0xfffffffffffffffe,%rdx
          556d:	48 83 c2 02          	add    $0x2,%rdx
          5571:	f0 48 0f b1 16       	lock cmpxchg %rdx,(%rsi)
          5576:	48 39 c1             	cmp    %rax,%rcx
          5579:	0f 84 85 fc ff ff    	je     5204 <...>
          557f:	48 89 c1             	mov    %rax,%rcx
          5582:	eb df                	jmp    5563 <...>
      
      to:
      
          5563:	48 89 c2             	mov    %rax,%rdx
          5566:	48 83 e2 fe          	and    $0xfffffffffffffffe,%rdx
          556a:	48 83 c2 02          	add    $0x2,%rdx
          556e:	f0 48 0f b1 11       	lock cmpxchg %rdx,(%rcx)
          5573:	0f 84 8b fc ff ff    	je     5204 <...>
          5579:	eb e8                	jmp    5563 <...>
      
      Link: https://lkml.kernel.org/r/20220821193011.88208-1-ubizjak@gmail.comSigned-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      da3f52ba
    • Uros Bizjak's avatar
      aio: use atomic_try_cmpxchg in __get_reqs_available · 38ace0d5
      Uros Bizjak authored
      Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
      in __get_reqs_available.  x86 CMPXCHG instruction returns success in ZF
      flag, so this change saves a compare after cmpxchg (and related move
      instruction in front of cmpxchg).
      
      Also, atomic_try_cmpxchg implicitly assigns old *ptr value to "old" when
      cmpxchg fails, enabling further code simplifications.
      
      No functional change intended.
      
      Link: https://lkml.kernel.org/r/20220714164851.3055-1-ubizjak@gmail.comSigned-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      38ace0d5
    • Uros Bizjak's avatar
      buffer: use try_cmpxchg in discard_buffer · b0192296
      Uros Bizjak authored
      Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
      discard_buffer.  x86 CMPXCHG instruction returns success in ZF flag, so
      this change saves a compare after cmpxchg (and related move instruction in
      front of cmpxchg).
      
      Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
      fails, enabling further code simplifications.
      
      Note that the value from *ptr should be read using READ_ONCE to prevent
      the compiler from merging, refetching or reordering the read.
      
      No functional change intended.
      
      Link: https://lkml.kernel.org/r/20220714171653.12128-1-ubizjak@gmail.comSigned-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b0192296
    • Uros Bizjak's avatar
      epoll: use try_cmpxchg in list_add_tail_lockless · 693fc06e
      Uros Bizjak authored
      Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
      list_add_tail_lockless.  x86 CMPXCHG instruction returns success in ZF
      flag, so this change saves a compare after cmpxchg (and related move
      instruction in front of cmpxchg).
      
      No functional change intended.
      
      Link: https://lkml.kernel.org/r/20220714173255.12987-1-ubizjak@gmail.comSigned-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      693fc06e
    • Sergei Trofimovich's avatar
      ia64: fix clock_getres(CLOCK_MONOTONIC) to report ITC frequency · aa06a9bd
      Sergei Trofimovich authored
      clock_gettime(CLOCK_MONOTONIC, &tp) is very precise on ia64 as it uses ITC
      (similar to rdtsc on x86).  It's not quite a hrtimer as it is a few times
      slower than 1ns.  Usually 2-3ns.
      
      clock_getres(CLOCK_MONOTONIC, &res) never reflected that fact and reported
      0.04s precision (1/HZ value).
      
      In https://bugs.gentoo.org/596382 gstreamer's test suite failed loudly
      when it noticed precision discrepancy.
      
      Before the change:
      
          clock_getres(CLOCK_MONOTONIC, &res) reported 250Hz precision.
      
      After the change:
      
          clock_getres(CLOCK_MONOTONIC, &res) reports ITC (400Mhz) precision.
      
      The patch is based on matoro's fix. I added a bit of explanation why we
      need to special-case arch-specific clock_getres().
      
      [akpm@linux-foundation.org: coding-style cleanups]
      Link: https://lkml.kernel.org/r/20220820181813.2275195-1-slyich@gmail.comSigned-off-by: default avatarSergei Trofimovich <slyich@gmail.com>
      Cc: matoro <matoro_mailinglist_kernel@matoro.tk>
      Cc: Émeric Maschino <emeric.maschino@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      aa06a9bd
    • Minghao Chi's avatar
      fs/qnx6: delete unnecessary checks before brelse() · cba7543e
      Minghao Chi authored
      brelse() tests whether its argument is NULL and then returns immediately. 
      Thus remove the tests which are not needed around the shown calls.
      
      Link: https://lkml.kernel.org/r/20220819081819.96347-1-chi.minghao@zte.com.cnSigned-off-by: default avatarMinghao Chi <chi.minghao@zte.com.cn>
      Reported-by: default avatarZeal Robot <zealci@zte.com.cn>
      Cc: CGEL ZTE <cgel.zte@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Minghao Chi <chi.minghao@zte.com.cn>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cba7543e
    • Kefeng Wang's avatar
      kernel: exit: cleanup release_thread() · 2be9880d
      Kefeng Wang authored
      Only x86 has own release_thread(), introduce a new weak release_thread()
      function to clean empty definitions in other ARCHs.
      
      Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.comSigned-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Guo Ren <guoren@kernel.org>				[csky]
      Acked-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Acked-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: default avatarBrian Cain <bcain@quicinc.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>			[powerpc]
      Acked-by: Stafford Horne <shorne@gmail.com>			[openrisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>		[arm64]
      Acked-by: Huacai Chen <chenhuacai@kernel.org>			[LoongArch]
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Guo Ren <guoren@kernel.org> [csky]
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Henderson <richard.henderson@linaro.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Cc: Xuerui Wang <kernel@xen0n.name>
      Cc: Yoshinori Sato <ysato@users.osdn.me>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2be9880d
    • Brian Foster's avatar
      proc: save LOC in vsyscall test · f4068af3
      Brian Foster authored
      Do one fork in vsyscall detection code and let SIGSEGV handler exit and
      carry information to the parent saving LOC.
      
      [adobriyan@gmail.com: redo original patch, delete unnecessary variables, minimise code changes]
      Link: https://lkml.kernel.org/r/YvoWzAn5dlhF75xa@localhost.localdomainSigned-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarBrian Foster <bfoster@redhat.com>
      Tested-by: default avatarBrian Foster <bfoster@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f4068af3
    • Uros Bizjak's avatar
      llist: use try_cmpxchg in llist_add_batch and llist_del_first · 4f1d2a03
      Uros Bizjak authored
      Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
      llist_add_batch and llist_del_first.  x86 CMPXCHG instruction returns
      success in ZF flag, so this change saves a compare after cmpxchg.
      
      Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
      fails, enabling further code simplifications.
      
      No functional change intended.
      
      Link: https://lkml.kernel.org/r/20220712144917.4497-1-ubizjak@gmail.comSigned-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4f1d2a03
    • Valentin Schneider's avatar
      panic, kexec: make __crash_kexec() NMI safe · 05c62574
      Valentin Schneider authored
      Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI
      panic() doesn't work.  The cause of that lies in the PREEMPT_RT definition
      of mutex_trylock():
      
      	if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task()))
      		return 0;
      
      This prevents an nmi_panic() from executing the main body of
      __crash_kexec() which does the actual kexec into the kdump kernel.  The
      warning and return are explained by:
      
        6ce47fd9 ("rtmutex: Warn if trylock is called from hard/softirq context")
        [...]
        The reasons for this are:
      
            1) There is a potential deadlock in the slowpath
      
            2) Another cpu which blocks on the rtmutex will boost the task
      	 which allegedly locked the rtmutex, but that cannot work
      	 because the hard/softirq context borrows the task context.
      
      Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex
      and replace it with an atomic variable.  This is somewhat overzealous as
      *some* callsites could keep using a mutex (e.g.  the sysfs-facing ones
      like crash_shrink_memory()), but this has the benefit of involving a
      single unified lock and preventing any future NMI-related surprises.
      
      Tested by triggering NMI panics via:
      
        $ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi
        $ echo 1 > /proc/sys/kernel/unknown_nmi_panic
        $ echo 1 > /proc/sys/kernel/panic
      
        $ ipmitool power diag
      
      Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com
      Fixes: 6ce47fd9 ("rtmutex: Warn if trylock is called from hard/softirq context")
      Signed-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Juri Lelli <jlelli@redhat.com>
      Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      05c62574
    • Valentin Schneider's avatar
      kexec: turn all kexec_mutex acquisitions into trylocks · 7bb5da0d
      Valentin Schneider authored
      Patch series "kexec, panic: Making crash_kexec() NMI safe", v4.
      
      
      This patch (of 2):
      
      Most acquistions of kexec_mutex are done via mutex_trylock() - those were
      a direct "translation" from:
      
        8c5a1cf0 ("kexec: use a mutex for locking rather than xchg()")
      
      there have however been two additions since then that use mutex_lock():
      crash_get_memory_size() and crash_shrink_memory().
      
      A later commit will replace said mutex with an atomic variable, and
      locking operations will become atomic_cmpxchg().  Rather than having those
      mutex_lock() become while (atomic_cmpxchg(&lock, 0, 1)), turn them into
      trylocks that can return -EBUSY on acquisition failure.
      
      This does halve the printable size of the crash kernel, but that's still
      neighbouring 2G for 32bit kernels which should be ample enough.
      
      Link: https://lkml.kernel.org/r/20220630223258.4144112-1-vschneid@redhat.com
      Link: https://lkml.kernel.org/r/20220630223258.4144112-2-vschneid@redhat.comSigned-off-by: default avatarValentin Schneider <vschneid@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Juri Lelli <jlelli@redhat.com>
      Cc: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7bb5da0d
    • Neel Natu's avatar
      lib/cmdline: avoid page fault in next_arg · 9847f212
      Neel Natu authored
      An argument list like "arg=val arg2 \"" can trigger a page fault if the
      page pointed by 'args[0xffffffff]' is not mapped and potential memory
      corruption otherwise (unlikely but possible if the bogus address is mapped
      and contents happen to match the ascii value of the quote character).
      
      The fix is to ensure that we load 'args[i-1]' only when (i > 0).
      
      Prior to this commit the following command would trigger an
      unhandled page fault in the kernel:
      
      root@(none):/linus/fs/fat# insmod ./fat.ko  "foo=bar \""
      [   33.870507] BUG: unable to handle page fault for address: ffff888204252608
      [   33.872180] #PF: supervisor read access in kernel mode
      [   33.873414] #PF: error_code(0x0000) - not-present page
      [   33.874650] PGD 4401067 P4D 4401067 PUD 0
      [   33.875321] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
      [   33.876113] CPU: 16 PID: 399 Comm: insmod Not tainted 5.19.0-dbg-DEV #4
      [   33.877193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
      [   33.878739] RIP: 0010:next_arg+0xd1/0x110
      [   33.879399] Code: 22 75 1d 41 c6 04 01 00 41 80 f8 22 74 18 eb 35 4c 89 0e 45 31 d2 4c 89 cf 48 c7 02 00 00 00 00 41 80 f8 22 75 1f 41 8d 42 ff <41> 80 3c 01 22 75 14 41 c6 04 01 00 eb 0d 48 c7 02 00 00 00 00 41
      [   33.882338] RSP: 0018:ffffc90001253d08 EFLAGS: 00010246
      [   33.883174] RAX: 00000000ffffffff RBX: ffff888104252608 RCX: 0fc317bba1c1dd00
      [   33.884311] RDX: ffffc90001253d40 RSI: ffffc90001253d48 RDI: ffff888104252609
      [   33.885450] RBP: ffffc90001253d10 R08: 0000000000000022 R09: ffff888104252609
      [   33.886595] R10: 0000000000000000 R11: ffffffff82c7ff20 R12: 0000000000000282
      [   33.887748] R13: 00000000ffff8000 R14: 0000000000000000 R15: 0000000000007fff
      [   33.888887] FS:  00007f04ec7432c0(0000) GS:ffff88813d300000(0000) knlGS:0000000000000000
      [   33.890183] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   33.891111] CR2: ffff888204252608 CR3: 0000000100f36005 CR4: 0000000000170ee0
      [   33.892241] Call Trace:
      [   33.892641]  <TASK>
      [   33.892989]  parse_args+0x8f/0x220
      [   33.893538]  load_module+0x138b/0x15a0
      [   33.894149]  ? prepare_coming_module+0x50/0x50
      [   33.894879]  ? kernel_read_file_from_fd+0x5f/0x90
      [   33.895639]  __se_sys_finit_module+0xce/0x130
      [   33.896342]  __x64_sys_finit_module+0x1d/0x20
      [   33.897042]  do_syscall_64+0x44/0xa0
      [   33.897622]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [   33.898434] RIP: 0033:0x7f04ec85ef79
      [   33.899009] Code: 48 8d 3d da db 0d 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 9e 0d 00 f7 d8 64 89 01 48
      [   33.901912] RSP: 002b:00007fffae81bfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [   33.903081] RAX: ffffffffffffffda RBX: 0000559c5f1d2640 RCX: 00007f04ec85ef79
      [   33.904191] RDX: 0000000000000000 RSI: 0000559c5f1d12a0 RDI: 0000000000000003
      [   33.905304] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
      [   33.906421] R10: 0000000000000003 R11: 0000000000000246 R12: 0000559c5f1d12a0
      [   33.907526] R13: 0000000000000000 R14: 0000559c5f1d25f0 R15: 0000559c5f1d12a0
      [   33.908631]  </TASK>
      [   33.908986] Modules linked in: fat(+) [last unloaded: fat]
      [   33.909843] CR2: ffff888204252608
      [   33.910375] ---[ end trace 0000000000000000 ]---
      [   33.911172] RIP: 0010:next_arg+0xd1/0x110
      [   33.911796] Code: 22 75 1d 41 c6 04 01 00 41 80 f8 22 74 18 eb 35 4c 89 0e 45 31 d2 4c 89 cf 48 c7 02 00 00 00 00 41 80 f8 22 75 1f 41 8d 42 ff <41> 80 3c 01 22 75 14 41 c6 04 01 00 eb 0d 48 c7 02 00 00 00 00 41
      [   33.914643] RSP: 0018:ffffc90001253d08 EFLAGS: 00010246
      [   33.915446] RAX: 00000000ffffffff RBX: ffff888104252608 RCX: 0fc317bba1c1dd00
      [   33.916544] RDX: ffffc90001253d40 RSI: ffffc90001253d48 RDI: ffff888104252609
      [   33.917636] RBP: ffffc90001253d10 R08: 0000000000000022 R09: ffff888104252609
      [   33.918727] R10: 0000000000000000 R11: ffffffff82c7ff20 R12: 0000000000000282
      [   33.919821] R13: 00000000ffff8000 R14: 0000000000000000 R15: 0000000000007fff
      [   33.920908] FS:  00007f04ec7432c0(0000) GS:ffff88813d300000(0000) knlGS:0000000000000000
      [   33.922125] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   33.923017] CR2: ffff888204252608 CR3: 0000000100f36005 CR4: 0000000000170ee0
      [   33.924098] Kernel panic - not syncing: Fatal exception
      [   33.925776] Kernel Offset: disabled
      [   33.926347] Rebooting in 10 seconds..
      
      Link: https://lkml.kernel.org/r/20220728232434.1666488-1-neelnatu@google.comSigned-off-by: default avatarNeel Natu <neelnatu@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9847f212
    • Ira Weiny's avatar
      checkpatch: add kmap and kmap_atomic to the deprecated list · defdaff1
      Ira Weiny authored
      kmap() and kmap_atomic() are being deprecated in favor of
      kmap_local_page().
      
      There are two main problems with kmap(): (1) It comes with an overhead as
      mapping space is restricted and protected by a global lock for
      synchronization and (2) it also requires global TLB invalidation when the
      kmap's pool wraps and it might block when the mapping space is fully
      utilized until a slot becomes available.
      
      kmap_local_page() is safe from any context and is therefore redundant with
      kmap_atomic() with the exception of any pagefault or preemption disable
      requirements.  However, using kmap_atomic() for these side effects makes
      the code less clear.  So any requirement for pagefault or preemption
      disable should be made explicitly.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored.
      
      Link: https://lkml.kernel.org/r/20220813220034.806698-1-ira.weiny@intel.comSigned-off-by: default avatarIra Weiny <ira.weiny@intel.com>
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Suggested-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Reviewed-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      defdaff1
    • Fabio M. De Francesco's avatar
      fs/isofs: replace kmap() with kmap_local_page() · 5bb6ce3a
      Fabio M. De Francesco authored
      The use of kmap() is being deprecated in favor of kmap_local_page().
      
      There are two main problems with kmap(): (1) It comes with an overhead as
      mapping space is restricted and protected by a global lock for
      synchronization and (2) it also requires global TLB invalidation when the
      kmap's pool wraps and it might block when the mapping space is fully
      utilized until a slot becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      Tasks can be preempted and, when scheduled to run again, the kernel
      virtual addresses are restored and still valid.  It is faster than kmap()
      in kernels with HIGHMEM enabled.
      
      Since kmap_local_page() can be safely used in compress.c, it should be
      called everywhere instead of kmap().
      
      Therefore, replace kmap() with kmap_local_page() in compress.c.  Where it
      is needed, use memzero_page() instead of open coding kmap_local_page()
      plus memset() to fill the pages with zeros.  Delete the redundant
      flush_dcache_page() in the two call sites of memzero_page().
      
      Tested with mkisofs on a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel
      with HIGHMEM64GB enabled.
      
      Link: https://lkml.kernel.org/r/20220801122709.8164-1-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Suggested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Pali Rohár <pali@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5bb6ce3a
    • Arnd Bergmann's avatar
      treewide: defconfig: address renamed CONFIG_DEBUG_INFO=y · 64367f2e
      Arnd Bergmann authored
      CONFIG_DEBUG_INFO is now implicitly selected if one picks one of the
      explicit options that could be DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT,
      DEBUG_INFO_DWARF4, DEBUG_INFO_DWARF5.
      
      This was actually not what I had in mind when I suggested making it a
      'choice' statement, but it's too late to change again now, and the Kconfig
      logic is more sensible in the new form.
      
      Change any defconfig file that had CONFIG_DEBUG_INFO enabled but did not
      pick DWARF4 or DWARF5 explicitly to now pick the toolchain default.
      
      Link: https://lkml.kernel.org/r/20220811114609.2097335-1-arnd@kernel.org
      Fixes: f9b3cd24 ("Kconfig.debug: make DEBUG_INFO selectable from a choice")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Dinh Nguyen <dinguyen@kernel.org>
      Cc: Yoshinori Sato <ysato@users.osdn.me>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      64367f2e
    • Manfred Spraul's avatar
      ipc/util.c: cleanup and improve sysvipc_find_ipc() · 58b5c203
      Manfred Spraul authored
      sysvipc_find_ipc() can be simplified further:
      
      - It uses a for() loop to locate the next entry in the idr.
        This can be replaced with idr_get_next().
      
      - It receives two parameters (pos - which is actually
        an idr index and not a position, and new_pos, which
        is really a position).
        One parameter is sufficient.
      
      Link: https://lore.kernel.org/all/20210903052020.3265-3-manfred@colorfullife.com/
      Link: https://lkml.kernel.org/r/20220805115733.104763-1-manfred@colorfullife.comSigned-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: <1vier1@web.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      58b5c203
    • Borislav Petkov's avatar
      scripts/decodecode: improve faulting line determination · 765f2bf0
      Borislav Petkov authored
      There are cases where the IP pointer in a Code: line in an oops doesn't
      point at the beginning of an instruction:
      
      Code: 0f bd c2 e9 a0 cd b5 e4 48 0f bd c2 e9 97 cd b5 e4 0f 1f 80 00 00 00 00 \
      	  e9 8b cd b5 e4 0f 1f 00 66 0f a3 d0 e9 7f cd b5 e4 0f 1f <80> 00 00 00 \
      	  00 0f a3 d0 e9 70 cd b5 e4 48 0f a3 d0 e9 67 cd b5
      
        e9 7f cd b5 e4          jmp    0xffffffffe4b5cda8
        0f 1f 80 00 00 00 00    nopl   0x0(%rax)
      	^^
      
      and the current way of determining the faulting instruction line doesn't
      work because disassembled instructions are counted from the IP byte to
      the end and when that thing points in the middle, the trailing bytes can
      be interpreted as different insns:
      
        Code starting with the faulting instruction
        ===========================================
           0:   80 00 00                addb   $0x0,(%rax)
           3:   00 00                   add    %al,(%rax)
      
      whereas, this is part of
      
      0f 1f 80 00 00 00 00    nopl   0x0(%rax)
      
           5:   0f a3 d0                bt     %edx,%eax
           ...
      
      leading to:
      
        1d:   0f 1f 00                nopl   (%rax)
        20:   66 0f a3 d0             bt     %dx,%ax
        24:*  e9 7f cd b5 e4          jmp    0xffffffffe4b5cda8               <-- trapping instruction
        29:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
        30:   0f a3 d0                bt     %edx,%eax
      
      which is the wrong faulting instruction.
      
      Change the way the faulting line number is determined by matching the
      opcode bytes from the beginning, leading to correct output:
      
        1d:   0f 1f 00                nopl   (%rax)
        20:   66 0f a3 d0             bt     %dx,%ax
        24:   e9 7f cd b5 e4          jmp    0xffffffffe4b5cda8
        29:*  0f 1f 80 00 00 00 00    nopl   0x0(%rax)                <-- trapping instruction
        30:   0f a3 d0                bt     %edx,%eax
      
      While at it, make decodecode use bash as the interpreter - that thing
      should be present on everything by now. It simplifies the code a lot
      too.
      
      Link: https://lkml.kernel.org/r/20220808085928.29840-1-bp@alien8.deSigned-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      765f2bf0
    • Fabio M. De Francesco's avatar
      hfsplus: convert kmap() to kmap_local_page() in btree.c · 9f25f357
      Fabio M. De Francesco authored
      kmap() is being deprecated in favor of kmap_local_page().
      
      There are two main problems with kmap(): (1) It comes with an overhead as
      mapping space is restricted and protected by a global lock for
      synchronization and (2) it also requires global TLB invalidation when the
      kmap's pool wraps and it might block when the mapping space is fully
      utilized until a slot becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and are still valid.
      
      Since its use in btree.c is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in btree.c.
      
      Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      Link: https://lkml.kernel.org/r/20220809203105.26183-5-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Suggested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9f25f357
    • Fabio M. De Francesco's avatar
      hfsplus: convert kmap() to kmap_local_page() in bitmap.c · f9ef3b95
      Fabio M. De Francesco authored
      kmap() is being deprecated in favor of kmap_local_page().
      
      There are two main problems with kmap(): (1) It comes with an overhead as
      mapping space is restricted and protected by a global lock for
      synchronization and (2) it also requires global TLB invalidation when the
      kmap's pool wraps and it might block when the mapping space is fully
      utilized until a slot becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and are still valid.
      
      Since its use in bitmap.c is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in bitmap.c.
      
      Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      Link: https://lkml.kernel.org/r/20220809203105.26183-4-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Suggested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f9ef3b95
    • Fabio M. De Francesco's avatar
      hfsplus: convert kmap() to kmap_local_page() in bnode.c · 6c3014a6
      Fabio M. De Francesco authored
      kmap() is being deprecated in favor of kmap_local_page().
      
      Two main problems with kmap(): (1) It comes with an overhead as mapping
      space is restricted and protected by a global lock for synchronization and
      (2) it also requires global TLB invalidation when the kmap's pool wraps
      and it might block when the mapping space is fully utilized until a slot
      becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and still valid.
      
      Since its use in bnode.c is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in bnode.c.  Where
      possible, use the suited standard helpers (memzero_page(), memcpy_page())
      instead of open coding kmap_local_page() plus memset() or memcpy().
      
      Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      Link: https://lkml.kernel.org/r/20220809203105.26183-3-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Suggested-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6c3014a6
    • Fabio M. De Francesco's avatar
      hfsplus: unmap the page in the "fail_page" label · f5b23d67
      Fabio M. De Francesco authored
      Patch series "hfsplus: Replace kmap() with kmap_local_page()".
      
      kmap() is being deprecated in favor of kmap_local_page().
      
      There are two main problems with kmap(): (1) It comes with an overhead as
      mapping space is restricted and protected by a global lock for
      synchronization and (2) it also requires global TLB invalidation when the
      kmap’s pool wraps and it might block when the mapping space is fully
      utilized until a slot becomes available.
      
      With kmap_local_page() the mappings are per thread, CPU local, can take
      page faults, and can be called from any context (including interrupts). 
      It is faster than kmap() in kernels with HIGHMEM enabled.  Furthermore,
      the tasks can be preempted and, when they are scheduled to run again, the
      kernel virtual addresses are restored and still valid.
      
      Since its use in fs/hfsplus is safe everywhere, it should be preferred.
      
      Therefore, replace kmap() with kmap_local_page() in fs/hfsplus.  Where
      possible, use the suited standard helpers (memzero_page(), memcpy_page())
      instead of open coding kmap_local_page() plus memset() or memcpy().
      
      Fix a bug due to a page being not unmapped if the code jumps to the
      "fail_page" label (1/4).
      
      Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with
      HIGHMEM64GB enabled.
      
      
      This patch (of 4):
      
      Several paths within hfs_btree_open() jump to the "fail_page" label where
      put_page() is called while the page is still mapped.
      
      Call kunmap() to unmap the page soon before put_page().
      
      Link: https://lkml.kernel.org/r/20220809203105.26183-1-fmdefrancesco@gmail.com
      Link: https://lkml.kernel.org/r/20220809203105.26183-2-fmdefrancesco@gmail.comSigned-off-by: default avatarFabio M. De Francesco <fmdefrancesco@gmail.com>
      Reviewed-by: default avatarIra Weiny <ira.weiny@intel.com>
      Reviewed-by: default avatarViacheslav Dubeyko <slava@dubeyko.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Fabio M. De Francesco <fmdefrancesco@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f5b23d67
  2. 28 Aug, 2022 14 commits
    • Linus Torvalds's avatar
      Linux 6.0-rc3 · b90cb105
      Linus Torvalds authored
      b90cb105
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · b467192e
      Linus Torvalds authored
      Pull more hotfixes from Andrew Morton:
       "Seventeen hotfixes.  Mostly memory management things.
      
        Ten patches are cc:stable, addressing pre-6.0 issues"
      
      * tag 'mm-hotfixes-stable-2022-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        .mailmap: update Luca Ceresoli's e-mail address
        mm/mprotect: only reference swap pfn page if type match
        squashfs: don't call kmalloc in decompressors
        mm/damon/dbgfs: avoid duplicate context directory creation
        mailmap: update email address for Colin King
        asm-generic: sections: refactor memory_intersects
        bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem
        ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown
        Revert "memcg: cleanup racy sum avoidance code"
        mm/zsmalloc: do not attempt to free IS_ERR handle
        binder_alloc: add missing mmap_lock calls when using the VMA
        mm: re-allow pinning of zero pfns (again)
        vmcoreinfo: add kallsyms_num_syms symbol
        mailmap: update Guilherme G. Piccoli's email addresses
        writeback: avoid use-after-free after removing device
        shmem: update folio if shmem_replace_page() updates the page
        mm/hugetlb: avoid corrupting page->mapping in hugetlb_mcopy_atomic_pte
      b467192e
    • Linus Torvalds's avatar
      Merge tag 'bitmap-6.0-rc3' of github.com:/norov/linux · 373eff57
      Linus Torvalds authored
      Pull bitmap fixes from Yury Norov:
       "Fix the reported issues, and implements the suggested improvements,
        for the version of the cpumask tests [1] that was merged with commit
        c41e8866 ("lib/test: introduce cpumask KUnit test suite").
      
        These changes include fixes for the tests, and better alignment with
        the KUnit style guidelines"
      
      * tag 'bitmap-6.0-rc3' of github.com:/norov/linux:
        lib/cpumask_kunit: add tests file to MAINTAINERS
        lib/cpumask_kunit: log mask contents
        lib/test_cpumask: follow KUnit style guidelines
        lib/test_cpumask: fix cpu_possible_mask last test
        lib/test_cpumask: drop cpu_possible_mask full test
      373eff57
    • Luca Ceresoli's avatar
      .mailmap: update Luca Ceresoli's e-mail address · 0ebafe2e
      Luca Ceresoli authored
      My Bootlin address is preferred from now on.
      
      Link: https://lkml.kernel.org/r/20220826130515.3011951-1-luca.ceresoli@bootlin.comSigned-off-by: default avatarLuca Ceresoli <luca.ceresoli@bootlin.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Atish Patra <atishp@atishpatra.org>
      Cc: Hans Verkuil <hverkuil-cisco@xs4all.nl>
      Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0ebafe2e
    • Peter Xu's avatar
      mm/mprotect: only reference swap pfn page if type match · 3d2f78f0
      Peter Xu authored
      Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
      fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
      
        kernel BUG at include/linux/swapops.h:117!
        CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S         O L 6.0.0-dbg-DEV #2
        RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
        Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
        c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
        48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
        RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
        RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
        RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
        RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
        R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
        R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
        FS:  00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         <TASK>
         change_pte_range+0x36e/0x880
         change_p4d_range+0x2e8/0x670
         change_protection_range+0x14e/0x2c0
         mprotect_fixup+0x1ee/0x330
         do_mprotect_pkey+0x34c/0x440
         __x64_sys_mprotect+0x1d/0x30
      
      It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
      genuine swap entry.
      
      Fix it by only calling it when it's a write migration entry where the page*
      is used.
      
      [1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VTS+k1w@mail.gmail.com/
      
      Link: https://lkml.kernel.org/r/20220823221138.45602-1-peterx@redhat.com
      Fixes: 6c287605 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reported-by: default avatarYu Zhao <yuzhao@google.com>
      Tested-by: default avatarYu Zhao <yuzhao@google.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3d2f78f0
    • Phillip Lougher's avatar
      squashfs: don't call kmalloc in decompressors · 1f13dff0
      Phillip Lougher authored
      The decompressors may be called while in an atomic section.  So move the
      kmalloc() out of this path, and into the "page actor" init function.
      
      This fixes a regression introduced by commit
      f268eedd ("squashfs: extend "page actor" to handle missing pages")
      
      Link: https://lkml.kernel.org/r/20220822215430.15933-1-phillip@squashfs.org.uk
      Fixes: f268eedd ("squashfs: extend "page actor" to handle missing pages")
      Reported-by: default avatarChris Murphy <lists@colorremedies.com>
      Signed-off-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1f13dff0
    • Badari Pulavarty's avatar
      mm/damon/dbgfs: avoid duplicate context directory creation · d26f6070
      Badari Pulavarty authored
      When user tries to create a DAMON context via the DAMON debugfs interface
      with a name of an already existing context, the context directory creation
      fails but a new context is created and added in the internal data
      structure, due to absence of the directory creation success check.  As a
      result, memory could leak and DAMON cannot be turned on.  An example test
      case is as below:
      
          # cd /sys/kernel/debug/damon/
          # echo "off" >  monitor_on
          # echo paddr > target_ids
          # echo "abc" > mk_context
          # echo "abc" > mk_context
          # echo $$ > abc/target_ids
          # echo "on" > monitor_on  <<< fails
      
      Return value of 'debugfs_create_dir()' is expected to be ignored in
      general, but this is an exceptional case as DAMON feature is depending
      on the debugfs functionality and it has the potential duplicate name
      issue.  This commit therefore fixes the issue by checking the directory
      creation failure and immediately return the error in the case.
      
      Link: https://lkml.kernel.org/r/20220821180853.2400-1-sj@kernel.org
      Fixes: 75c1c2b5 ("mm/damon/dbgfs: support multiple contexts")
      Signed-off-by: default avatarBadari Pulavarty <badari.pulavarty@intel.com>
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: <stable@vger.kernel.org>	[ 5.15.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d26f6070
    • Colin Ian King's avatar
      mailmap: update email address for Colin King · ac733f65
      Colin Ian King authored
      Colin King is working on kernel janitorial fixes in his spare time and
      using his Intel email is confusing.  Use his gmail account as the default
      email address.
      
      Link: https://lkml.kernel.org/r/20220817212753.101109-1-colin.i.king@gmail.comSigned-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ac733f65
    • Quanyang Wang's avatar
      asm-generic: sections: refactor memory_intersects · 0c7d7cc2
      Quanyang Wang authored
      There are two problems with the current code of memory_intersects:
      
      First, it doesn't check whether the region (begin, end) falls inside the
      region (virt, vend), that is (virt < begin && vend > end).
      
      The second problem is if vend is equal to begin, it will return true but
      this is wrong since vend (virt + size) is not the last address of the
      memory region but (virt + size -1) is.  The wrong determination will
      trigger the misreporting when the function check_for_illegal_area calls
      memory_intersects to check if the dma region intersects with stext region.
      
      The misreporting is as below (stext is at 0x80100000):
       WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168
       DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536]
       Modules linked in:
       CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5
       Hardware name: Xilinx Zynq Platform
        unwind_backtrace from show_stack+0x18/0x1c
        show_stack from dump_stack_lvl+0x58/0x70
        dump_stack_lvl from __warn+0xb0/0x198
        __warn from warn_slowpath_fmt+0x80/0xb4
        warn_slowpath_fmt from check_for_illegal_area+0x130/0x168
        check_for_illegal_area from debug_dma_map_sg+0x94/0x368
        debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128
        __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24
        dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4
        usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214
        usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118
        usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec
        usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70
        usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360
        usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440
        usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238
        usb_stor_control_thread from kthread+0xf8/0x104
        kthread from ret_from_fork+0x14/0x2c
      
      Refactor memory_intersects to fix the two problems above.
      
      Before the 1d7db834 ("dma-debug: use memory_intersects()
      directly"), memory_intersects is called only by printk_late_init:
      
      printk_late_init -> init_section_intersects ->memory_intersects.
      
      There were few places where memory_intersects was called.
      
      When commit 1d7db834 ("dma-debug: use memory_intersects()
      directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA
      subsystem uses it to check for an illegal area and the calltrace above
      is triggered.
      
      [akpm@linux-foundation.org: fix nearby comment typo]
      Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.com
      Fixes: 97955936 ("asm/sections: add helpers to check for section data")
      Signed-off-by: default avatarQuanyang Wang <quanyang.wang@windriver.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0c7d7cc2
    • Liu Shixin's avatar
      bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem · dd0ff4d1
      Liu Shixin authored
      The vmemmap pages is marked by kmemleak when allocated from memblock. 
      Remove it from kmemleak when freeing the page.  Otherwise, when we reuse
      the page, kmemleak may report such an error and then stop working.
      
       kmemleak: Cannot insert 0xffff98fb6eab3d40 into the object search tree (overlaps existing)
       kmemleak: Kernel memory leak detector disabled
       kmemleak: Object 0xffff98fb6be00000 (size 335544320):
       kmemleak:   comm "swapper", pid 0, jiffies 4294892296
       kmemleak:   min_count = 0
       kmemleak:   count = 0
       kmemleak:   flags = 0x1
       kmemleak:   checksum = 0
       kmemleak:   backtrace:
      
      Link: https://lkml.kernel.org/r/20220819094005.2928241-1-liushixin2@huawei.com
      Fixes: f41f2ed4 (mm: hugetlb: free the vmemmap pages associated with each HugeTLB page)
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dd0ff4d1
    • Heming Zhao's avatar
      ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown · 550842cc
      Heming Zhao authored
      After commit 0737e01d ("ocfs2: ocfs2_mount_volume does cleanup job
      before return error"), any procedure after ocfs2_dlm_init() fails will
      trigger crash when calling ocfs2_dlm_shutdown().
      
      ie: On local mount mode, no dlm resource is initialized.  If
      ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling will call
      ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will
      trigger kernel crash.
      
      This solution should bypass uninitialized resources in
      ocfs2_dlm_shutdown().
      
      Link: https://lkml.kernel.org/r/20220815085754.20417-1-heming.zhao@suse.com
      Fixes: 0737e01d ("ocfs2: ocfs2_mount_volume does cleanup job before return error")
      Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      550842cc
    • Shakeel Butt's avatar
      Revert "memcg: cleanup racy sum avoidance code" · dbb16df6
      Shakeel Butt authored
      This reverts commit 96e51ccf.
      
      Recently we started running the kernel with rstat infrastructure on
      production traffic and begin to see negative memcg stats values. 
      Particularly the 'sock' stat is the one which we observed having negative
      value.
      
      $ grep "sock " /mnt/memory/job/memory.stat
      sock 253952
      total_sock 18446744073708724224
      
      Re-run after couple of seconds
      
      $ grep "sock " /mnt/memory/job/memory.stat
      sock 253952
      total_sock 53248
      
      For now we are only seeing this issue on large machines (256 CPUs) and
      only with 'sock' stat.  I think the networking stack increase the stat on
      one cpu and decrease it on another cpu much more often.  So, this negative
      sock is due to rstat flusher flushing the stats on the CPU that has seen
      the decrement of sock but missed the CPU that has increments.  A typical
      race condition.
      
      For easy stable backport, revert is the most simple solution.  For long
      term solution, I am thinking of two directions.  First is just reduce the
      race window by optimizing the rstat flusher.  Second is if the reader sees
      a negative stat value, force flush and restart the stat collection. 
      Basically retry but limited.
      
      Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com
      Fixes: 96e51ccf ("memcg: cleanup racy sum avoidance code")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Cc: "Michal Koutný" <mkoutny@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: <stable@vger.kernel.org>	[5.15]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dbb16df6
    • Sergey Senozhatsky's avatar
      mm/zsmalloc: do not attempt to free IS_ERR handle · a5d21721
      Sergey Senozhatsky authored
      zsmalloc() now returns ERR_PTR values as handles, which zram accidentally
      can pass to zs_free().  Another bad scenario is when zcomp_compress()
      fails - handle has default -ENOMEM value, and zs_free() will try to free
      that "pointer value".
      
      Add the missing check and make sure that zs_free() bails out when
      ERR_PTR() is passed to it.
      
      Link: https://lkml.kernel.org/r/20220816050906.2583956-1-senozhatsky@chromium.org
      Fixes: c7e6f17b ("zsmalloc: zs_malloc: return ERR_PTR on failure")
      Signed-off-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>,
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a5d21721
    • Liam Howlett's avatar
      binder_alloc: add missing mmap_lock calls when using the VMA · 44e602b4
      Liam Howlett authored
      Take the mmap_read_lock() when using the VMA in binder_alloc_print_pages()
      and when checking for a VMA in binder_alloc_new_buf_locked().
      
      It is worth noting binder_alloc_new_buf_locked() drops the VMA read lock
      after it verifies a VMA exists, but may be taken again deeper in the call
      stack, if necessary.
      
      Link: https://lkml.kernel.org/r/20220810160209.1630707-1-Liam.Howlett@oracle.com
      Fixes: a43cfc87 (android: binder: stop saving a pointer to the VMA)
      Signed-off-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Reported-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Reported-by: <syzbot+a7b60a176ec13cafb793@syzkaller.appspotmail.com>
      Acked-by: default avatarCarlos Llamas <cmllamas@google.com>
      Tested-by: default avatarOndrej Mosnacek <omosnace@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Christian Brauner (Microsoft) <brauner@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hridya Valsaraju <hridya@google.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Martijn Coenen <maco@android.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Todd Kjos <tkjos@android.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: "Arve Hjønnevåg" <arve@android.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      44e602b4