1. 05 Sep, 2020 14 commits
    • Ralph Campbell's avatar
      mm/migrate: remove unnecessary is_zone_device_page() check · 6128763f
      Ralph Campbell authored
      Patch series "mm/migrate: preserve soft dirty in remove_migration_pte()".
      
      I happened to notice this from code inspection after seeing Alistair
      Popple's patch ("mm/rmap: Fixup copying of soft dirty and uffd ptes").
      
      This patch (of 2):
      
      The check for is_zone_device_page() and is_device_private_page() is
      unnecessary since the latter is sufficient to determine if the page is a
      device private page.  Simplify the code for easier reading.
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Alistair Popple <apopple@nvidia.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Bharata B Rao <bharata@linux.ibm.com>
      Link: https://lkml.kernel.org/r/20200831212222.22409-1-rcampbell@nvidia.com
      Link: https://lkml.kernel.org/r/20200831212222.22409-2-rcampbell@nvidia.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6128763f
    • Alistair Popple's avatar
      mm/rmap: fixup copying of soft dirty and uffd ptes · ad7df764
      Alistair Popple authored
      During memory migration a pte is temporarily replaced with a migration
      swap pte.  Some pte bits from the existing mapping such as the soft-dirty
      and uffd write-protect bits are preserved by copying these to the
      temporary migration swap pte.
      
      However these bits are not stored at the same location for swap and
      non-swap ptes.  Therefore testing these bits requires using the
      appropriate helper function for the given pte type.
      
      Unfortunately several code locations were found where the wrong helper
      function is being used to test soft_dirty and uffd_wp bits which leads to
      them getting incorrectly set or cleared during page-migration.
      
      Fix these by using the correct tests based on pte type.
      
      Fixes: a5430dda ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
      Fixes: 8c3328f1 ("mm/migrate: migrate_vma() unmap page from vma while collecting pages")
      Fixes: f45ec5ff ("userfaultfd: wp: support swap and page migration")
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200825064232.10023-2-alistair@popple.id.auSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ad7df764
    • Alistair Popple's avatar
      mm/migrate: fixup setting UFFD_WP flag · ebdf8321
      Alistair Popple authored
      Commit f45ec5ff ("userfaultfd: wp: support swap and page migration")
      introduced support for tracking the uffd wp bit during page migration.
      However the non-swap PTE variant was used to set the flag for zone device
      private pages which are a type of swap page.
      
      This leads to corruption of the swap offset if the original PTE has the
      uffd_wp flag set.
      
      Fixes: f45ec5ff ("userfaultfd: wp: support swap and page migration")
      Signed-off-by: default avatarAlistair Popple <alistair@popple.id.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Link: https://lkml.kernel.org/r/20200825064232.10023-1-alistair@popple.id.auSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ebdf8321
    • Yang Shi's avatar
      mm: madvise: fix vma user-after-free · 7867fd7c
      Yang Shi authored
      The syzbot reported the below use-after-free:
      
        BUG: KASAN: use-after-free in madvise_willneed mm/madvise.c:293 [inline]
        BUG: KASAN: use-after-free in madvise_vma mm/madvise.c:942 [inline]
        BUG: KASAN: use-after-free in do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
        Read of size 8 at addr ffff8880a6163eb0 by task syz-executor.0/9996
      
        CPU: 0 PID: 9996 Comm: syz-executor.0 Not tainted 5.9.0-rc1-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x18f/0x20d lib/dump_stack.c:118
          print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
          __kasan_report mm/kasan/report.c:513 [inline]
          kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
          madvise_willneed mm/madvise.c:293 [inline]
          madvise_vma mm/madvise.c:942 [inline]
          do_madvise.part.0+0x1c8b/0x1cf0 mm/madvise.c:1145
          do_madvise mm/madvise.c:1169 [inline]
          __do_sys_madvise mm/madvise.c:1171 [inline]
          __se_sys_madvise mm/madvise.c:1169 [inline]
          __x64_sys_madvise+0xd9/0x110 mm/madvise.c:1169
          do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Allocated by task 9992:
          kmem_cache_alloc+0x138/0x3a0 mm/slab.c:3482
          vm_area_alloc+0x1c/0x110 kernel/fork.c:347
          mmap_region+0x8e5/0x1780 mm/mmap.c:1743
          do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
          vm_mmap_pgoff+0x195/0x200 mm/util.c:506
          ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
          do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        Freed by task 9992:
          kmem_cache_free.part.0+0x67/0x1f0 mm/slab.c:3693
          remove_vma+0x132/0x170 mm/mmap.c:184
          remove_vma_list mm/mmap.c:2613 [inline]
          __do_munmap+0x743/0x1170 mm/mmap.c:2869
          do_munmap mm/mmap.c:2877 [inline]
          mmap_region+0x257/0x1780 mm/mmap.c:1716
          do_mmap+0xcf9/0x11d0 mm/mmap.c:1545
          vm_mmap_pgoff+0x195/0x200 mm/util.c:506
          ksys_mmap_pgoff+0x43a/0x560 mm/mmap.c:1596
          do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      It is because vma is accessed after releasing mmap_lock, but someone
      else acquired the mmap_lock and the vma is gone.
      
      Releasing mmap_lock after accessing vma should fix the problem.
      
      Fixes: 692fe624 ("mm: Handle MADV_WILLNEED through vfs_fadvise()")
      Reported-by: syzbot+b90df26038d1d5d85c97@syzkaller.appspotmail.com
      Signed-off-by: default avatarYang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>	[5.4+]
      Link: https://lkml.kernel.org/r/20200816141204.162624-1-shy828301@gmail.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7867fd7c
    • Mrinal Pandey's avatar
      checkpatch: fix the usage of capture group ( ... ) · 13e45417
      Mrinal Pandey authored
      The usage of "capture group (...)" in the immediate condition after `&&`
      results in `$1` being uninitialized.  This issues a warning "Use of
      uninitialized value $1 in regexp compilation at ./scripts/checkpatch.pl
      line 2638".
      
      I noticed this bug while running checkpatch on the set of commits from
      v5.7 to v5.8-rc1 of the kernel on the commits with a diff content in
      their commit message.
      
      This bug was introduced in the script by commit e518e9a5
      ("checkpatch: emit an error when there's a diff in a changelog").  It
      has been in the script since then.
      
      The author intended to store the match made by capture group in variable
      `$1`.  This should have contained the name of the file as `[\w/]+`
      matched.  However, this couldn't be accomplished due to usage of capture
      group and `$1` in the same regular expression.
      
      Fix this by placing the capture group in the condition before `&&`.
      Thus, `$1` can be initialized to the text that capture group matches
      thereby setting it to the desired and required value.
      
      Fixes: e518e9a5 ("checkpatch: emit an error when there's a diff in a changelog")
      Signed-off-by: default avatarMrinal Pandey <mrinalmni@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Reviewed-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Cc: Joe Perches <joe@perches.com>
      Link: https://lkml.kernel.org/r/20200714032352.f476hanaj2dlmiot@mrinalpandeySigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      13e45417
    • Tobias Klauser's avatar
      fork: adjust sysctl_max_threads definition to match prototype · b0daa2c7
      Tobias Klauser authored
      Commit 32927393 ("sysctl: pass kernel pointers to ->proc_handler")
      changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
      definition of sysctl_max_threads to match its prototype in
      linux/sysctl.h which fixes the following sparse error/warning:
      
        kernel/fork.c:3050:47: warning: incorrect type in argument 3 (different address spaces)
        kernel/fork.c:3050:47:    expected void *
        kernel/fork.c:3050:47:    got void [noderef] __user *buffer
        kernel/fork.c:3036:5: error: symbol 'sysctl_max_threads' redeclared with different type (incompatible argument 3 (different address spaces)):
        kernel/fork.c:3036:5:    int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
        kernel/fork.c: note: in included file (through include/linux/key.h, include/linux/cred.h, include/linux/sched/signal.h, include/linux/sched/cputime.h):
        include/linux/sysctl.h:242:5: note: previously declared as:
        include/linux/sysctl.h:242:5:    int extern [addressable] [signed] [toplevel] sysctl_max_threads( ... )
      
      Fixes: 32927393 ("sysctl: pass kernel pointers to ->proc_handler")
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Link: https://lkml.kernel.org/r/20200825093647.24263-1-tklauser@distanz.chSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b0daa2c7
    • Tobias Klauser's avatar
      ipc: adjust proc_ipc_sem_dointvec definition to match prototype · fff1662c
      Tobias Klauser authored
      Commit 32927393 ("sysctl: pass kernel pointers to ->proc_handler")
      changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
      signature of proc_ipc_sem_dointvec to match ctl_table.proc_handler which
      fixes the following sparse error/warning:
      
        ipc/ipc_sysctl.c:94:47: warning: incorrect type in argument 3 (different address spaces)
        ipc/ipc_sysctl.c:94:47:    expected void *buffer
        ipc/ipc_sysctl.c:94:47:    got void [noderef] __user *buffer
        ipc/ipc_sysctl.c:194:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces))
        ipc/ipc_sysctl.c:194:35:    expected int ( [usertype] *proc_handler )( ... )
        ipc/ipc_sysctl.c:194:35:    got int ( * )( ... )
      
      Fixes: 32927393 ("sysctl: pass kernel pointers to ->proc_handler")
      Signed-off-by: default avatarTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Link: https://lkml.kernel.org/r/20200825105846.5193-1-tklauser@distanz.chSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fff1662c
    • Joerg Roedel's avatar
      mm: track page table modifications in __apply_to_page_range() · e80d3909
      Joerg Roedel authored
      __apply_to_page_range() is also used to change and/or allocate
      page-table pages in the vmalloc area of the address space.  Make sure
      these changes get synchronized to other page-tables in the system by
      calling arch_sync_kernel_mappings() when necessary.
      
      The impact appears limited to x86-32, where apply_to_page_range may miss
      updating the PMD.  That leads to explosions in drivers like
      
        BUG: unable to handle page fault for address: fe036000
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0002) - not-present page
        *pde = 00000000
        Oops: 0002 [#1] SMP
        CPU: 3 PID: 1300 Comm: gem_concurrent_ Not tainted 5.9.0-rc1+ #16
        Hardware name:  /NUC6i3SYB, BIOS SYSKLi35.86A.0024.2015.1027.2142 10/27/2015
        EIP: __execlists_context_alloc+0x132/0x2d0 [i915]
        Code: 31 d2 89 f0 e8 2f 55 02 00 89 45 e8 3d 00 f0 ff ff 0f 87 11 01 00 00 8b 4d e8 03 4b 30 b8 5a 5a 5a 5a ba 01 00 00 00 8d 79 04 <c7> 01 5a 5a 5a 5a c7 81 fc 0f 00 00 5a 5a 5a 5a 83 e7 fc 29 f9 81
        EAX: 5a5a5a5a EBX: f60ca000 ECX: fe036000 EDX: 00000001
        ESI: f43b7340 EDI: fe036004 EBP: f6389cb8 ESP: f6389c9c
        DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010286
        CR0: 80050033 CR2: fe036000 CR3: 2d361000 CR4: 001506d0
        DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
        DR6: fffe0ff0 DR7: 00000400
        Call Trace:
          execlists_context_alloc+0x10/0x20 [i915]
          intel_context_alloc_state+0x3f/0x70 [i915]
          __intel_context_do_pin+0x117/0x170 [i915]
          i915_gem_do_execbuffer+0xcc7/0x2500 [i915]
          i915_gem_execbuffer2_ioctl+0xcd/0x1f0 [i915]
          drm_ioctl_kernel+0x8f/0xd0
          drm_ioctl+0x223/0x3d0
          __ia32_sys_ioctl+0x1ab/0x760
          __do_fast_syscall_32+0x3f/0x70
          do_fast_syscall_32+0x29/0x60
          do_SYSENTER_32+0x15/0x20
          entry_SYSENTER_32+0x9f/0xf2
        EIP: 0xb7f28559
        Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
        EAX: ffffffda EBX: 00000005 ECX: c0406469 EDX: bf95556c
        ESI: b7e68000 EDI: c0406469 EBP: 00000005 ESP: bf9554d8
        DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
        Modules linked in: i915 x86_pkg_temp_thermal intel_powerclamp crc32_pclmul crc32c_intel intel_cstate intel_uncore intel_gtt drm_kms_helper intel_pch_thermal video button autofs4 i2c_i801 i2c_smbus fan
        CR2: 00000000fe036000
      
      It looks like kasan, xen and i915 are vulnerable.
      
      Actual impact is "on thinkpad X60 in 5.9-rc1, screen starts blinking
      after 30-or-so minutes, and machine is unusable"
      
      [sfr@canb.auug.org.au: ARCH_PAGE_TABLE_SYNC_MASK needs vmalloc.h]
        Link: https://lkml.kernel.org/r/20200825172508.16800a4f@canb.auug.org.au
      [chris@chris-wilson.co.uk: changelog addition]
      [pavel@ucw.cz: changelog addition]
      
      Fixes: 2ba3e694 ("mm/vmalloc: track which page-table levels were modified")
      Fixes: 86cf69f1 ("x86/mm/32: implement arch_sync_kernel_mappings()")
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Chris Wilson <chris@chris-wilson.co.uk>	[x86-32]
      Tested-by: default avatarPavel Machek <pavel@ucw.cz>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@vger.kernel.org>	[5.8+]
      Link: https://lkml.kernel.org/r/20200821123746.16904-1-joro@8bytes.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e80d3909
    • Randy Dunlap's avatar
      MAINTAINERS: IA64: mark Status as Odd Fixes only · 9d90dd18
      Randy Dunlap authored
      IA64 isn't really being maintained, so mark it as Odd Fixes only.
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Link: http://lkml.kernel.org/r/7e719139-450f-52c2-59a2-7964a34eda1f@infradead.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9d90dd18
    • Nick Desaulniers's avatar
      MAINTAINERS: add LLVM maintainers · b9644289
      Nick Desaulniers authored
      Nominate Nathan and myself to be point of contact for clang/LLVM related
      support, after a poll at the LLVM BoF at Linux Plumbers Conf 2020.
      
      While corporate sponsorship is beneficial, its important to not entrust
      the keys to the nukes with any one entity.  Should Nathan and I find
      ourselves at the same employer, I would gladly step down.
      Signed-off-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarSedat Dilek <sedat.dilek@gmail.com>
      Acked-by: default avatarNathan Chancellor <natechancellor@gmail.com>
      Acked-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Acked-by: default avatarMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Acked-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Link: https://lkml.kernel.org/r/20200825143540.2948637-1-ndesaulniers@google.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9644289
    • Robert Richter's avatar
      MAINTAINERS: update Cavium/Marvell entries · f548a645
      Robert Richter authored
      I am leaving Marvell and already do not have access to my @marvell.com
      email address.  So switching over to my korg mail address or removing my
      address there another maintainer is already listed.  For the entries
      there no other maintainer is listed I will keep looking into patches for
      Cavium systems for a while until someone from Marvell takes it over.
      
      Since I might have limited access to hardware and also limited time I
      changed state to 'Odd Fixes' for those entries.
      Signed-off-by: default avatarRobert Richter <rric@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Ganapatrao Kulkarni <gkulkarni@marvell.com>
      Cc: Sunil Goutham <sgoutham@marvell.com>
      CC: Borislav Petkov <bp@alien8.de>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Wolfram Sang <wsa@kernel.org>,
      Link: https://lkml.kernel.org/r/20200824122050.31164-1-rric@kernel.orgSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f548a645
    • Eugeniu Rosca's avatar
      mm: slub: fix conversion of freelist_corrupted() · dc07a728
      Eugeniu Rosca authored
      Commit 52f23478 ("mm/slub.c: fix corrupted freechain in
      deactivate_slab()") suffered an update when picked up from LKML [1].
      
      Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()'
      created a no-op statement.  Fix it by sticking to the behavior intended
      in the original patch [1].  In addition, make freelist_corrupted()
      immune to passing NULL instead of &freelist.
      
      The issue has been spotted via static analysis and code review.
      
      [1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/
      
      Fixes: 52f23478 ("mm/slub.c: fix corrupted freechain in deactivate_slab()")
      Signed-off-by: default avatarEugeniu Rosca <erosca@de.adit-jv.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: Joe Jin <joe.jin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20200824130643.10291-1-erosca@de.adit-jv.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dc07a728
    • Xunlei Pang's avatar
      mm: memcg: fix memcg reclaim soft lockup · e3336cab
      Xunlei Pang authored
      We've met softlockup with "CONFIG_PREEMPT_NONE=y", when the target memcg
      doesn't have any reclaimable memory.
      
      It can be easily reproduced as below:
      
        watchdog: BUG: soft lockup - CPU#0 stuck for 111s![memcg_test:2204]
        CPU: 0 PID: 2204 Comm: memcg_test Not tainted 5.9.0-rc2+ #12
        Call Trace:
          shrink_lruvec+0x49f/0x640
          shrink_node+0x2a6/0x6f0
          do_try_to_free_pages+0xe9/0x3e0
          try_to_free_mem_cgroup_pages+0xef/0x1f0
          try_charge+0x2c1/0x750
          mem_cgroup_charge+0xd7/0x240
          __add_to_page_cache_locked+0x2fd/0x370
          add_to_page_cache_lru+0x4a/0xc0
          pagecache_get_page+0x10b/0x2f0
          filemap_fault+0x661/0xad0
          ext4_filemap_fault+0x2c/0x40
          __do_fault+0x4d/0xf9
          handle_mm_fault+0x1080/0x1790
      
      It only happens on our 1-vcpu instances, because there's no chance for
      oom reaper to run to reclaim the to-be-killed process.
      
      Add a cond_resched() at the upper shrink_node_memcgs() to solve this
      issue, this will mean that we will get a scheduling point for each memcg
      in the reclaimed hierarchy without any dependency on the reclaimable
      memory in that memcg thus making it more predictable.
      Suggested-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarXunlei Pang <xlpang@linux.alibaba.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Link: http://lkml.kernel.org/r/1598495549-67324-1-git-send-email-xlpang@linux.alibaba.comSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e3336cab
    • Michal Hocko's avatar
      memcg: fix use-after-free in uncharge_batch · f1796544
      Michal Hocko authored
      syzbot has reported an use-after-free in the uncharge_batch path
      
        BUG: KASAN: use-after-free in instrument_atomic_write include/linux/instrumented.h:71 [inline]
        BUG: KASAN: use-after-free in atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
        BUG: KASAN: use-after-free in atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
        BUG: KASAN: use-after-free in page_counter_cancel mm/page_counter.c:54 [inline]
        BUG: KASAN: use-after-free in page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
        Write of size 8 at addr ffff8880371c0148 by task syz-executor.0/9304
      
        CPU: 0 PID: 9304 Comm: syz-executor.0 Not tainted 5.8.0-syzkaller #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x1f0/0x31e lib/dump_stack.c:118
          print_address_description+0x66/0x620 mm/kasan/report.c:383
          __kasan_report mm/kasan/report.c:513 [inline]
          kasan_report+0x132/0x1d0 mm/kasan/report.c:530
          check_memory_region_inline mm/kasan/generic.c:183 [inline]
          check_memory_region+0x2b5/0x2f0 mm/kasan/generic.c:192
          instrument_atomic_write include/linux/instrumented.h:71 [inline]
          atomic64_sub_return include/asm-generic/atomic-instrumented.h:970 [inline]
          atomic_long_sub_return include/asm-generic/atomic-long.h:113 [inline]
          page_counter_cancel mm/page_counter.c:54 [inline]
          page_counter_uncharge+0x3d/0xc0 mm/page_counter.c:155
          uncharge_batch+0x6c/0x350 mm/memcontrol.c:6764
          uncharge_page+0x115/0x430 mm/memcontrol.c:6796
          uncharge_list mm/memcontrol.c:6835 [inline]
          mem_cgroup_uncharge_list+0x70/0xe0 mm/memcontrol.c:6877
          release_pages+0x13a2/0x1550 mm/swap.c:911
          tlb_batch_pages_flush mm/mmu_gather.c:49 [inline]
          tlb_flush_mmu_free mm/mmu_gather.c:242 [inline]
          tlb_flush_mmu+0x780/0x910 mm/mmu_gather.c:249
          tlb_finish_mmu+0xcb/0x200 mm/mmu_gather.c:328
          exit_mmap+0x296/0x550 mm/mmap.c:3185
          __mmput+0x113/0x370 kernel/fork.c:1076
          exit_mm+0x4cd/0x550 kernel/exit.c:483
          do_exit+0x576/0x1f20 kernel/exit.c:793
          do_group_exit+0x161/0x2d0 kernel/exit.c:903
          get_signal+0x139b/0x1d30 kernel/signal.c:2743
          arch_do_signal+0x33/0x610 arch/x86/kernel/signal.c:811
          exit_to_user_mode_loop kernel/entry/common.c:135 [inline]
          exit_to_user_mode_prepare+0x8d/0x1b0 kernel/entry/common.c:166
          syscall_exit_to_user_mode+0x5e/0x1a0 kernel/entry/common.c:241
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Commit 1a3e1f40 ("mm: memcontrol: decouple reference counting from
      page accounting") reworked the memcg lifetime to be bound the the struct
      page rather than charges.  It also removed the css_put_many from
      uncharge_batch and that is causing the above splat.
      
      uncharge_batch() is supposed to uncharge accumulated charges for all
      pages freed from the same memcg.  The queuing is done by uncharge_page
      which however drops the memcg reference after it adds charges to the
      batch.  If the current page happens to be the last one holding the
      reference for its memcg then the memcg is OK to go and the next page to
      be freed will trigger batched uncharge which needs to access the memcg
      which is gone already.
      
      Fix the issue by taking a reference for the memcg in the current batch.
      
      Fixes: 1a3e1f40 ("mm: memcontrol: decouple reference counting from page accounting")
      Reported-by: syzbot+b305848212deec86eabe@syzkaller.appspotmail.com
      Reported-by: syzbot+b5ea6fb6f139c8b9482b@syzkaller.appspotmail.com
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: https://lkml.kernel.org/r/20200820090341.GC5033@dhcp22.suse.czSigned-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f1796544
  2. 04 Sep, 2020 5 commits
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.9-2020-09-03' of... · 59126901
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.9-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull more perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Use uintptr_t when casting numbers to pointers
      
       - Keep output expected by 3rd parties: Turn off summary for interval
         mode by default.
      
       - BPF is in kernel space, make sure do_validate_kcore_modules() knows
         about that.
      
       - Explicitly call out event modifiers in the documentation.
      
       - Fix jevents() allocation of space for regular expressions.
      
       - Address libtraceevent build warnings on 32-bit arches.
      
       - Fix checking of functions returns using ERR_PTR() in 'perf bench'.
      
      * tag 'perf-tools-fixes-for-v5.9-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        perf tools: Add bpf image check to __map__is_kmodule
        perf record/stat: Explicitly call out event modifiers in the documentation
        perf bench: The do_run_multi_threaded() function must use IS_ERR(perf_session__new())
        perf stat: Turn off summary for interval mode by default
        libtraceevent: Fix build warning on 32-bit arches
        perf jevents: Fix suspicious code in fixregex()
        perf parse-events: Use uintptr_t when casting numbers to pointers
      59126901
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 3e8d3bdc
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Use netif_rx_ni() when necessary in batman-adv stack, from Jussi
          Kivilinna.
      
       2) Fix loss of RTT samples in rxrpc, from David Howells.
      
       3) Memory leak in hns_nic_dev_probe(), from Dignhao Liu.
      
       4) ravb module cannot be unloaded, fix from Yuusuke Ashizuka.
      
       5) We disable BH for too lokng in sctp_get_port_local(), add a
          cond_resched() here as well, from Xin Long.
      
       6) Fix memory leak in st95hf_in_send_cmd, from Dinghao Liu.
      
       7) Out of bound access in bpf_raw_tp_link_fill_link_info(), from
          Yonghong Song.
      
       8) Missing of_node_put() in mt7530 DSA driver, from Sumera
          Priyadarsini.
      
       9) Fix crash in bnxt_fw_reset_task(), from Michael Chan.
      
      10) Fix geneve tunnel checksumming bug in hns3, from Yi Li.
      
      11) Memory leak in rxkad_verify_response, from Dinghao Liu.
      
      12) In tipc, don't use smp_processor_id() in preemptible context. From
          Tuong Lien.
      
      13) Fix signedness issue in mlx4 memory allocation, from Shung-Hsi Yu.
      
      14) Missing clk_disable_prepare() in gemini driver, from Dan Carpenter.
      
      15) Fix ABI mismatch between driver and firmware in nfp, from Louis
          Peens.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (110 commits)
        net/smc: fix sock refcounting in case of termination
        net/smc: reset sndbuf_desc if freed
        net/smc: set rx_off for SMCR explicitly
        net/smc: fix toleration of fake add_link messages
        tg3: Fix soft lockup when tg3_reset_task() fails.
        doc: net: dsa: Fix typo in config code sample
        net: dp83867: Fix WoL SecureOn password
        nfp: flower: fix ABI mismatch between driver and firmware
        tipc: fix shutdown() of connectionless socket
        ipv6: Fix sysctl max for fib_multipath_hash_policy
        drivers/net/wan/hdlc: Change the default of hard_header_len to 0
        net: gemini: Fix another missing clk_disable_unprepare() in probe
        net: bcmgenet: fix mask check in bcmgenet_validate_flow()
        amd-xgbe: Add support for new port mode
        net: usb: dm9601: Add USB ID of Keenetic Plus DSL
        vhost: fix typo in error message
        net: ethernet: mlx4: Fix memory allocation in mlx4_buddy_init()
        pktgen: fix error message with wrong function name
        net: ethernet: ti: am65-cpsw: fix rmii 100Mbit link mode
        cxgb4: fix thermal zone device registration
        ...
      3e8d3bdc
    • Linus Torvalds's avatar
      Merge branch 'gate-page-refcount' (patches from Dave Hansen) · 8381979d
      Linus Torvalds authored
      Merge gate page refcount fix from Dave Hansen:
       "During the conversion over to pin_user_pages(), gate pages were missed.
      
        The fix is pretty simple, and is accompanied by a new test from Andy
        which probably would have caught this earlier"
      
      * emailed patches from Dave Hansen <dave.hansen@linux.intel.com>:
        selftests/x86/test_vsyscall: Improve the process_vm_readv() test
        mm: fix pin vs. gup mismatch with gate pages
      8381979d
    • Andy Lutomirski's avatar
      selftests/x86/test_vsyscall: Improve the process_vm_readv() test · 8891adc6
      Andy Lutomirski authored
      The existing code accepted process_vm_readv() success or failure as long
      as it didn't return garbage.  This is too weak: if the vsyscall page is
      readable, then process_vm_readv() should succeed and, if the page is not
      readable, then it should fail.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8891adc6
    • Dave Hansen's avatar
      mm: fix pin vs. gup mismatch with gate pages · 9fa2dd94
      Dave Hansen authored
      Gate pages were missed when converting from get to pin_user_pages().
      This can lead to refcount imbalances.  This is reliably and quickly
      reproducible running the x86 selftests when vsyscall=emulate is enabled
      (the default).  Fix by using try_grab_page() with appropriate flags
      passed.
      
      The long story:
      
      Today, pin_user_pages() and get_user_pages() are similar interfaces for
      manipulating page reference counts.  However, "pins" use a "bias" value
      and manipulate the actual reference count by 1024 instead of 1 used by
      plain "gets".
      
      That means that pin_user_pages() must be matched with unpin_user_pages()
      and can't be mixed with a plain put_user_pages() or put_page().
      
      Enter gate pages, like the vsyscall page.  They are pages usually in the
      kernel image, but which are mapped to userspace.  Userspace is allowed
      access to them, including interfaces using get/pin_user_pages().  The
      refcount of these kernel pages is manipulated just like a normal user
      page on the get/pin side so that the put/unpin side can work the same
      for normal user pages or gate pages.
      
      get_gate_page() uses try_get_page() which only bumps the refcount by
      1, not 1024, even if called in the pin_user_pages() path.  If someone
      pins a gate page, this happens:
      
      	pin_user_pages()
      		get_gate_page()
      			try_get_page() // bump refcount +1
      	... some time later
      	unpin_user_pages()
      		page_ref_sub_and_test(page, 1024))
      
      ... and boom, we get a refcount off by 1023.  This is reliably and
      quickly reproducible running the x86 selftests when booted with
      vsyscall=emulate (the default).  The selftests use ptrace(), but I
      suspect anything using pin_user_pages() on gate pages could hit this.
      
      To fix it, simply use try_grab_page() instead of try_get_page(), and
      pass 'gup_flags' in so that FOLL_PIN can be respected.
      
      This bug traces back to the very beginning of the FOLL_PIN support in
      commit 3faa52c0 ("mm/gup: track FOLL_PIN pages"), which showed up in
      the 5.7 release.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Fixes: 3faa52c0 ("mm/gup: track FOLL_PIN pages")
      Reported-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Acked-by: default avatarAndy Lutomirski <luto@kernel.org>
      Cc: x86@kernel.org
      Cc: Jann Horn <jannh@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9fa2dd94
  3. 03 Sep, 2020 17 commits
    • David S. Miller's avatar
      Merge branch 'smc-fixes' · b61ac5bb
      David S. Miller authored
      Karsten Graul says:
      
      ====================
      net/smc: fixes 2020-09-03
      
      Please apply the following patch series for smc to netdev's net tree.
      
      Patch 1 fixes the toleration of older SMC implementations. Patch 2
      takes care of a problem that happens when SMCR is used after SMCD
      initialization failed. Patch 3 fixes a problem with freed send buffers,
      and patch 4 corrects refcounting when SMC terminates due to device
      removal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b61ac5bb
    • Ursula Braun's avatar
      net/smc: fix sock refcounting in case of termination · 5fb8642a
      Ursula Braun authored
      When an ISM device is removed, all its linkgroups are terminated,
      i.e. all the corresponding connections are killed.
      Connection killing invokes smc_close_active_abort(), which decreases
      the sock refcount for certain states to simulate passive closing.
      And it cancels the close worker and has to give up the sock lock for
      this timeframe. This opens the door for a passive close worker or a
      socket close to run in between. In this case smc_close_active_abort() and
      passive close worker resp. smc_release() might do a sock_put for passive
      closing. This causes:
      
      [ 1323.315943] refcount_t: underflow; use-after-free.
      [ 1323.316055] WARNING: CPU: 3 PID: 54469 at lib/refcount.c:28 refcount_warn_saturate+0xe8/0x130
      [ 1323.316069] Kernel panic - not syncing: panic_on_warn set ...
      [ 1323.316084] CPU: 3 PID: 54469 Comm: uperf Not tainted 5.9.0-20200826.rc2.git0.46328853ed20.300.fc32.s390x+debug #1
      [ 1323.316096] Hardware name: IBM 2964 NC9 702 (z/VM 6.4.0)
      [ 1323.316108] Call Trace:
      [ 1323.316125]  [<00000000c0d4aae8>] show_stack+0x90/0xf8
      [ 1323.316143]  [<00000000c15989b0>] dump_stack+0xa8/0xe8
      [ 1323.316158]  [<00000000c0d8344e>] panic+0x11e/0x288
      [ 1323.316173]  [<00000000c0d83144>] __warn+0xac/0x158
      [ 1323.316187]  [<00000000c1597a7a>] report_bug+0xb2/0x130
      [ 1323.316201]  [<00000000c0d36424>] monitor_event_exception+0x44/0xc0
      [ 1323.316219]  [<00000000c195c716>] pgm_check_handler+0x1da/0x238
      [ 1323.316234]  [<00000000c151844c>] refcount_warn_saturate+0xec/0x130
      [ 1323.316280] ([<00000000c1518448>] refcount_warn_saturate+0xe8/0x130)
      [ 1323.316310]  [<000003ff801f2e2a>] smc_release+0x192/0x1c8 [smc]
      [ 1323.316323]  [<00000000c169f1fa>] __sock_release+0x5a/0xe0
      [ 1323.316334]  [<00000000c169f2ac>] sock_close+0x2c/0x40
      [ 1323.316350]  [<00000000c1086de0>] __fput+0xb8/0x278
      [ 1323.316362]  [<00000000c0db1e0e>] task_work_run+0x76/0xb8
      [ 1323.316393]  [<00000000c0d8ab84>] do_exit+0x26c/0x520
      [ 1323.316408]  [<00000000c0d8af08>] do_group_exit+0x48/0xc0
      [ 1323.316421]  [<00000000c0d8afa8>] __s390x_sys_exit_group+0x28/0x38
      [ 1323.316433]  [<00000000c195c32c>] system_call+0xe0/0x2b4
      [ 1323.316446] 1 lock held by uperf/54469:
      [ 1323.316456]  #0: 0000000044125e60 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x44/0xe0
      
      The patch rechecks sock state in smc_close_active_abort() after
      smc_close_cancel_work() to avoid duplicate decrease of sock
      refcount for the same purpose.
      
      Fixes: 611b63a1 ("net/smc: cancel tx worker in case of socket aborts")
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fb8642a
    • Ursula Braun's avatar
      net/smc: reset sndbuf_desc if freed · 1d8df41d
      Ursula Braun authored
      When an SMC connection is created, and there is a problem to
      create an RMB or DMB, the previously created send buffer is
      thrown away as well including buffer descriptor freeing.
      Make sure the connection no longer references the freed
      buffer descriptor, otherwise bugs like this are possible:
      
      [71556.835148] =============================================================================
      [71556.835168] BUG kmalloc-128 (Tainted: G    B      OE    ): Poison overwritten
      [71556.835172] -----------------------------------------------------------------------------
      
      [71556.835179] INFO: 0x00000000d20894be-0x00000000aaef63e9 @offset=2724. First byte 0x0 instead of 0x6b
      [71556.835215] INFO: Allocated in __smc_buf_create+0x184/0x578 [smc] age=0 cpu=5 pid=46726
      [71556.835234]     ___slab_alloc+0x5a4/0x690
      [71556.835239]     __slab_alloc.constprop.0+0x70/0xb0
      [71556.835243]     kmem_cache_alloc_trace+0x38e/0x3f8
      [71556.835250]     __smc_buf_create+0x184/0x578 [smc]
      [71556.835257]     smc_buf_create+0x2e/0xe8 [smc]
      [71556.835264]     smc_listen_work+0x516/0x6a0 [smc]
      [71556.835275]     process_one_work+0x280/0x478
      [71556.835280]     worker_thread+0x66/0x368
      [71556.835287]     kthread+0x17a/0x1a0
      [71556.835294]     ret_from_fork+0x28/0x2c
      [71556.835301] INFO: Freed in smc_buf_create+0xd8/0xe8 [smc] age=0 cpu=5 pid=46726
      [71556.835307]     __slab_free+0x246/0x560
      [71556.835311]     kfree+0x398/0x3f8
      [71556.835318]     smc_buf_create+0xd8/0xe8 [smc]
      [71556.835324]     smc_listen_work+0x516/0x6a0 [smc]
      [71556.835328]     process_one_work+0x280/0x478
      [71556.835332]     worker_thread+0x66/0x368
      [71556.835337]     kthread+0x17a/0x1a0
      [71556.835344]     ret_from_fork+0x28/0x2c
      [71556.835348] INFO: Slab 0x00000000a0744551 objects=51 used=51 fp=0x0000000000000000 flags=0x1ffff00000010200
      [71556.835352] INFO: Object 0x00000000563480a1 @offset=2688 fp=0x00000000289567b2
      
      [71556.835359] Redzone 000000006783cde2: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835363] Redzone 00000000e35b876e: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835367] Redzone 0000000023074562: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835372] Redzone 00000000b9564b8c: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835376] Redzone 00000000810c6362: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835380] Redzone 0000000065ef52c3: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835384] Redzone 00000000c5dd6984: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835388] Redzone 000000004c480f8f: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
      [71556.835392] Object 00000000563480a1: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835397] Object 000000009c479d06: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835401] Object 000000006e1dce92: 6b 6b 6b 6b 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b  kkkk....kkkkkkkk
      [71556.835405] Object 00000000227f7cf8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835410] Object 000000009a701215: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835414] Object 000000003731ce76: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835418] Object 00000000f7085967: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
      [71556.835422] Object 0000000007f99927: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  kkkkkkkkkkkkkkk.
      [71556.835427] Redzone 00000000579c4913: bb bb bb bb bb bb bb bb                          ........
      [71556.835431] Padding 00000000305aef82: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [71556.835435] Padding 00000000b1cdd722: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [71556.835438] Padding 00000000c7568199: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [71556.835442] Padding 00000000fad4c4d4: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a  ZZZZZZZZZZZZZZZZ
      [71556.835451] CPU: 0 PID: 47939 Comm: kworker/0:15 Tainted: G    B      OE     5.9.0-rc1uschi+ #54
      [71556.835456] Hardware name: IBM 3906 M03 703 (LPAR)
      [71556.835464] Workqueue: events smc_listen_work [smc]
      [71556.835470] Call Trace:
      [71556.835478]  [<00000000d5eaeb10>] show_stack+0x90/0xf8
      [71556.835493]  [<00000000d66fc0f8>] dump_stack+0xa8/0xe8
      [71556.835499]  [<00000000d61a511c>] check_bytes_and_report+0x104/0x130
      [71556.835504]  [<00000000d61a57b2>] check_object+0x26a/0x2e0
      [71556.835509]  [<00000000d61a59bc>] alloc_debug_processing+0x194/0x238
      [71556.835514]  [<00000000d61a8c14>] ___slab_alloc+0x5a4/0x690
      [71556.835519]  [<00000000d61a9170>] __slab_alloc.constprop.0+0x70/0xb0
      [71556.835524]  [<00000000d61aaf66>] kmem_cache_alloc_trace+0x38e/0x3f8
      [71556.835530]  [<000003ff80549bbc>] __smc_buf_create+0x184/0x578 [smc]
      [71556.835538]  [<000003ff8054a396>] smc_buf_create+0x2e/0xe8 [smc]
      [71556.835545]  [<000003ff80540c16>] smc_listen_work+0x516/0x6a0 [smc]
      [71556.835549]  [<00000000d5f0f448>] process_one_work+0x280/0x478
      [71556.835554]  [<00000000d5f0f6a6>] worker_thread+0x66/0x368
      [71556.835559]  [<00000000d5f18692>] kthread+0x17a/0x1a0
      [71556.835563]  [<00000000d6abf3b8>] ret_from_fork+0x28/0x2c
      [71556.835569] INFO: lockdep is turned off.
      [71556.835573] FIX kmalloc-128: Restoring 0x00000000d20894be-0x00000000aaef63e9=0x6b
      
      [71556.835577] FIX kmalloc-128: Marking all objects used
      
      Fixes: fd7f3a74 ("net/smc: remove freed buffer from list")
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d8df41d
    • Ursula Braun's avatar
      net/smc: set rx_off for SMCR explicitly · 2d2bfeb8
      Ursula Braun authored
      SMC tries to make use of SMCD first. If a problem shows up,
      it tries to switch to SMCR. If the SMCD initializing problem shows
      up after the SMCD connection has already been initialized, field
      rx_off keeps the wrong SMCD value for SMCR, which results in corrupted
      data at the receiver.
      This patch adds an explicit (re-)setting of field rx_off to zero if the
      connection uses SMCR.
      
      Fixes: be244f28 ("net/smc: add SMC-D support in data transfer")
      Reviewed-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d2bfeb8
    • Karsten Graul's avatar
      net/smc: fix toleration of fake add_link messages · fffe83c8
      Karsten Graul authored
      Older SMCR implementations had no link failover support and used one
      link only. Because the handshake protocol requires to try the
      establishment of a second link the old code sent a fake add_link message
      and declined any server response afterwards.
      The current code supports multiple links and inspects the received fake
      add_link message more closely. To tolerate the fake add_link messages
      smc_llc_is_local_add_link() needs an improved check of the message to
      be able to separate between locally enqueued and fake add_link messages.
      And smc_llc_cli_add_link() needs to check if the provided qp_mtu size is
      invalid and reject the add_link request in that case.
      
      Fixes: c48254fa ("net/smc: move add link processing for new device into llc layer")
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fffe83c8
    • Michael Chan's avatar
      tg3: Fix soft lockup when tg3_reset_task() fails. · 55669934
      Michael Chan authored
      If tg3_reset_task() fails, the device state is left in an inconsistent
      state with IFF_RUNNING still set but NAPI state not enabled.  A
      subsequent operation, such as ifdown or AER error can cause it to
      soft lock up when it tries to disable NAPI state.
      
      Fix it by bringing down the device to !IFF_RUNNING state when
      tg3_reset_task() fails.  tg3_reset_task() running from workqueue
      will now call tg3_close() when the reset fails.  We need to
      modify tg3_reset_task_cancel() slightly to avoid tg3_close()
      calling cancel_work_sync() to cancel tg3_reset_task().  Otherwise
      cancel_work_sync() will wait forever for tg3_reset_task() to
      finish.
      Reported-by: default avatarDavid Christensen <drc@linux.vnet.ibm.com>
      Reported-by: default avatarBaptiste Covolato <baptiste@arista.com>
      Fixes: db219973 ("tg3: Schedule at most one tg3_reset_task run")
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55669934
    • Jiri Olsa's avatar
      perf tools: Add bpf image check to __map__is_kmodule · 830fadfd
      Jiri Olsa authored
      When validating kcore modules the do_validate_kcore_modules function
      checks on every kernel module dso against modules record. The
      __map__is_kmodule check is used to get only kernel module dso objects
      through.
      
      Currently the bpf images are slipping through the check and making the
      validation to fail, so report falls back from kcore usage to kallsyms.
      
      Adding __map__is_bpf_image check for bpf image and adding it to
      __map__is_kmodule check.
      
      Fixes: 3c29d448 ("perf annotate: Add basic support for bpf_image")
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Michael Petlan <mpetlan@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20200826213017.818788-1-jolsa@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      830fadfd
    • Kim Phillips's avatar
      perf record/stat: Explicitly call out event modifiers in the documentation · e48a73a3
      Kim Phillips authored
      Event modifiers are not mentioned in the perf record or perf stat
      manpages.  Add them to orient new users more effectively by pointing
      them to the perf list manpage for details.
      
      Fixes: 2055fdaf ("perf list: Document precise event sampling for AMD IBS")
      Signed-off-by: default avatarKim Phillips <kim.phillips@amd.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Clarke <pc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tony Jones <tonyj@suse.de>
      Cc: stable@vger.kernel.org
      Link: http://lore.kernel.org/lkml/20200901215853.276234-1-kim.phillips@amd.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e48a73a3
    • YueHaibing's avatar
      perf bench: The do_run_multi_threaded() function must use IS_ERR(perf_session__new()) · e4d71f79
      YueHaibing authored
      In case of error, the function perf_session__new() returns ERR_PTR() and
      never returns NULL. The NULL test in the return value check should be
      replaced with IS_ERR()
      
      Committer notes:
      
      This wasn't compiling due to an extraneous '{' not matched by a '}', fix
      it.
      
      Fixes: 13edc237 ("perf bench: Add a multi-threaded synthesize benchmark")
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200902140526.26916-1-yuehaibing@huawei.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e4d71f79
    • Jin Yao's avatar
      perf stat: Turn off summary for interval mode by default · ee6a9614
      Jin Yao authored
      There's a risk that outputting interval mode summaries by default breaks
      CSV consumers. It already broke pmu-tools/toplev.
      
      So now we turn off the summary by default but we create a new option
      '--summary' to enable the summary. This is active even when not using
      CSV mode.
      
      Before:
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2
        #           time             counts unit events
             1.000265904           8,005.73 msec cpu-clock                 #    8.006 CPUs utilized
             1.000265904                601      context-switches          #    0.075 K/sec
             1.000265904                 10      cpu-migrations            #    0.001 K/sec
             1.000265904                  0      page-faults               #    0.000 K/sec
             1.000265904         66,746,521      cycles                    #    0.008 GHz
             1.000265904         71,874,398      instructions              #    1.08  insn per cycle
             1.000265904         13,356,781      branches                  #    1.668 M/sec
             1.000265904            298,756      branch-misses             #    2.24% of all branches
             2.001857667           8,012.52 msec cpu-clock                 #    8.013 CPUs utilized
             2.001857667                164      context-switches          #    0.020 K/sec
             2.001857667                 10      cpu-migrations            #    0.001 K/sec
             2.001857667                  2      page-faults               #    0.000 K/sec
             2.001857667          5,822,188      cycles                    #    0.001 GHz
             2.001857667          2,186,170      instructions              #    0.38  insn per cycle
             2.001857667            442,378      branches                  #    0.055 M/sec
             2.001857667             44,750      branch-misses             #   10.12% of all branches
      
         Performance counter stats for 'system wide':
      
                 16,018.25 msec cpu-clock                 #    7.993 CPUs utilized
                       765      context-switches          #    0.048 K/sec
                        20      cpu-migrations            #    0.001 K/sec
                         2      page-faults               #    0.000 K/sec
                72,568,709      cycles                    #    0.005 GHz
                74,060,568      instructions              #    1.02  insn per cycle
                13,799,159      branches                  #    0.861 M/sec
                   343,506      branch-misses             #    2.49% of all branches
      
               2.004118489 seconds time elapsed
      
      After:
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2
        #           time             counts unit events
             1.001336393           8,013.28 msec cpu-clock                 #    8.013 CPUs utilized
             1.001336393                 82      context-switches          #    0.010 K/sec
             1.001336393                  8      cpu-migrations            #    0.001 K/sec
             1.001336393                  0      page-faults               #    0.000 K/sec
             1.001336393          4,199,121      cycles                    #    0.001 GHz
             1.001336393          1,373,991      instructions              #    0.33  insn per cycle
             1.001336393            270,681      branches                  #    0.034 M/sec
             1.001336393             31,659      branch-misses             #   11.70% of all branches
             2.003905006           8,020.52 msec cpu-clock                 #    8.021 CPUs utilized
             2.003905006                184      context-switches          #    0.023 K/sec
             2.003905006                  8      cpu-migrations            #    0.001 K/sec
             2.003905006                  2      page-faults               #    0.000 K/sec
             2.003905006          5,446,190      cycles                    #    0.001 GHz
             2.003905006          2,312,547      instructions              #    0.42  insn per cycle
             2.003905006            451,691      branches                  #    0.056 M/sec
             2.003905006             37,925      branch-misses             #    8.40% of all branches
      
        root@kbl-ppc:~# perf stat -I1000 --interval-count 2 --summary
        #           time             counts unit events
             1.001313128           8,013.20 msec cpu-clock                 #    8.013 CPUs utilized
             1.001313128                 83      context-switches          #    0.010 K/sec
             1.001313128                  8      cpu-migrations            #    0.001 K/sec
             1.001313128                  0      page-faults               #    0.000 K/sec
             1.001313128          4,470,950      cycles                    #    0.001 GHz
             1.001313128          1,440,045      instructions              #    0.32  insn per cycle
             1.001313128            283,222      branches                  #    0.035 M/sec
             1.001313128             33,576      branch-misses             #   11.86% of all branches
             2.003857385           8,020.34 msec cpu-clock                 #    8.020 CPUs utilized
             2.003857385                154      context-switches          #    0.019 K/sec
             2.003857385                  8      cpu-migrations            #    0.001 K/sec
             2.003857385                  2      page-faults               #    0.000 K/sec
             2.003857385          4,515,676      cycles                    #    0.001 GHz
             2.003857385          2,180,449      instructions              #    0.48  insn per cycle
             2.003857385            435,254      branches                  #    0.054 M/sec
             2.003857385             31,179      branch-misses             #    7.16% of all branches
      
         Performance counter stats for 'system wide':
      
                 16,033.53 msec cpu-clock                 #    7.992 CPUs utilized
                       237      context-switches          #    0.015 K/sec
                        16      cpu-migrations            #    0.001 K/sec
                         2      page-faults               #    0.000 K/sec
                 8,986,626      cycles                    #    0.001 GHz
                 3,620,494      instructions              #    0.40  insn per cycle
                   718,476      branches                  #    0.045 M/sec
                    64,755      branch-misses             #    9.01% of all branches
      
               2.006124542 seconds time elapsed
      
      Fixes: c7e5b328 ("perf stat: Report summary for interval mode")
      Signed-off-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/20200903010113.32232-1-yao.jin@linux.intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ee6a9614
    • Tzvetomir Stoyanov (VMware)'s avatar
      libtraceevent: Fix build warning on 32-bit arches · 10a6f5c3
      Tzvetomir Stoyanov (VMware) authored
      Fixed a compilation warning for casting to pointer from integer of
      different size on 32-bit platforms.
      Reported-by: default avatarArnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
      Signed-off-by: default avatarTzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: linux-trace-devel@vger.kernel.org
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      10a6f5c3
    • Namhyung Kim's avatar
      perf jevents: Fix suspicious code in fixregex() · e62458e3
      Namhyung Kim authored
      The new string should have enough space for the original string and the
      back slashes IMHO.
      
      Fixes: fbc2844e ("perf vendor events: Use more flexible pattern matching for CPU identification for mapfile.csv")
      Signed-off-by: default avatarNamhyung Kim <namhyung@kernel.org>
      Reviewed-by: default avatarIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kajol Jain <kjain@linux.ibm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: William Cohen <wcohen@redhat.com>
      Link: http://lore.kernel.org/lkml/20200903152510.489233-1-namhyung@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e62458e3
    • Arnaldo Carvalho de Melo's avatar
      perf parse-events: Use uintptr_t when casting numbers to pointers · 0823f768
      Arnaldo Carvalho de Melo authored
      To address these errors found when cross building from x86_64 to MIPS
      little endian 32-bit:
      
          CC       /tmp/build/perf/util/parse-events-bison.o
        util/parse-events.y: In function 'parse_events_parse':
        util/parse-events.y:514:6: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
          514 |      (void *) $2, $6, $4);
              |      ^
        util/parse-events.y:531:7: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
          531 |       (void *) $2, NULL, $4)) {
              |       ^
        util/parse-events.y:547:6: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
          547 |      (void *) $2, $4, 0);
              |      ^
        util/parse-events.y:564:7: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
          564 |       (void *) $2, NULL, 0)) {
              |       ^
      
      Fixes: cabbf268 ("perf parse: Before yyabort-ing free components")
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yonghong Song <yhs@fb.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      0823f768
    • Paul Barker's avatar
      doc: net: dsa: Fix typo in config code sample · af0ae997
      Paul Barker authored
      In the "single port" example code for configuring a DSA switch without
      tagging support from userspace the command to bring up the "lan2" link
      was typo'd.
      Signed-off-by: default avatarPaul Barker <pbarker@konsulko.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af0ae997
    • Linus Torvalds's avatar
      Merge tag 'fixes-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock · e28f0104
      Linus Torvalds authored
      Pull misc build failure fixes from Mike Rapoport:
       "Fix min_low_pfn/max_low_pfn build errors on ia64 and microblaze.
      
        Some configurations of ia64 and microblaze use min_low_pfn and
        max_low_pfn in pfn_valid(). This causes build failures for modules
        that use pfn_valid().
      
        The fix is to add EXPORT_SYMBOL() for these variables on ia64 and
        microblaze"
      
      * tag 'fixes-2020-09-03' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
        ia64: fix min_low_pfn/max_low_pfn build errors
        microblaze: fix min_low_pfn/max_low_pfn build errors
      e28f0104
    • Linus Torvalds's avatar
      Merge tag 'affs-for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 26acd8b0
      Linus Torvalds authored
      Pull affs fix from David Sterba:
       "One fix to make permissions work the same way as on AmigaOS"
      
      * tag 'affs-for-5.9-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        affs: fix basic permission bits to actually work
      26acd8b0
    • Linus Torvalds's avatar
      Merge tag 'media/v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 0fdf68c7
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
      
       - a compilation fix issue with ti-vpe on arm 32 bits
      
       - two Kconfig fixes for imx214 and max9286 drivers
      
       - a kernel information leak at v4l2-core on time32 compat ioctls
      
       - some fixes at rc core unbind logic
      
       - a fix at mceusb driver for it to not use GFP_ATOMIC
      
       - fixes at cedrus and vicodec drivers at the control handling logic
      
       - a fix at gpio-ir-tx to avoid disabling interruts on a spinlock
      
      * tag 'media/v5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: mceusb: Avoid GFP_ATOMIC where it is not needed
        media: gpio-ir-tx: spinlock is not needed to disable interrupts
        media: rc: do not access device via sysfs after rc_unregister_device()
        media: rc: uevent sysfs file races with rc_unregister_device()
        media: max9286: Depend on OF_GPIO
        media: i2c: imx214: select V4L2_FWNODE
        media: cedrus: Add missing v4l2_ctrl_request_hdl_put()
        media: vicodec: add missing v4l2_ctrl_request_hdl_put()
        media: media/v4l2-core: Fix kernel-infoleak in video_put_user()
        media: ti-vpe: cal: Fix compilation on 32-bit ARM
      0fdf68c7
  4. 02 Sep, 2020 4 commits