1. 18 Jul, 2022 11 commits
    • Junxiao Bi's avatar
      Revert "ocfs2: mount shared volume without ha stack" · c80af0c2
      Junxiao Bi authored
      This reverts commit 912f655d.
      
      This commit introduced a regression that can cause mount hung.  The
      changes in __ocfs2_find_empty_slot causes that any node with none-zero
      node number can grab the slot that was already taken by node 0, so node 1
      will access the same journal with node 0, when it try to grab journal
      cluster lock, it will hung because it was already acquired by node 0. 
      It's very easy to reproduce this, in one cluster, mount node 0 first, then
      node 1, you will see the following call trace from node 1.
      
      [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
      [13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
      [13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
      [13148.749354] Call Trace:
      [13148.750718]  <TASK>
      [13148.752019]  ? usleep_range+0x90/0x89
      [13148.753882]  __schedule+0x210/0x567
      [13148.755684]  schedule+0x44/0xa8
      [13148.757270]  schedule_timeout+0x106/0x13c
      [13148.759273]  ? __prepare_to_swait+0x53/0x78
      [13148.761218]  __wait_for_common+0xae/0x163
      [13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
      [13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
      [13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
      [13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
      [13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
      [13148.775401]  ? iput+0x69/0xba
      [13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
      [13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
      [13148.781756]  mount_bdev+0x190/0x1b7
      [13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
      [13148.785634]  legacy_get_tree+0x27/0x48
      [13148.787466]  vfs_get_tree+0x25/0xd0
      [13148.789270]  do_new_mount+0x18c/0x2d9
      [13148.791046]  __x64_sys_mount+0x10e/0x142
      [13148.792911]  do_syscall_64+0x3b/0x89
      [13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
      [13148.797051] RIP: 0033:0x7f2309f6e26e
      [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
      [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
      [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
      [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
      [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
      [13148.816564]  </TASK>
      
      To fix it, we can just fix __ocfs2_find_empty_slot.  But original commit
      introduced the feature to mount ocfs2 locally even it is cluster based,
      that is a very dangerous, it can easily cause serious data corruption,
      there is no way to stop other nodes mounting the fs and corrupting it. 
      Setup ha or other cluster-aware stack is just the cost that we have to
      take for avoiding corruption, otherwise we have to do it in kernel.
      
      Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com
      Fixes: 912f655d("ocfs2: mount shared volume without ha stack")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <heming.zhao@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c80af0c2
    • Miaohe Lin's avatar
      hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte · da9a298f
      Miaohe Lin authored
      When alloc_huge_page fails, *pagep is set to NULL without put_page first.
      So the hugepage indicated by *pagep is leaked.
      
      Link: https://lkml.kernel.org/r/20220709092629.54291-1-linmiaohe@huawei.com
      Fixes: 8cc5fcbb ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      da9a298f
    • Andrei Vagin's avatar
      fs: sendfile handles O_NONBLOCK of out_fd · bdeb77bc
      Andrei Vagin authored
      sendfile has to return EAGAIN if out_fd is nonblocking and the write into
      it would block.
      
      Here is a small reproducer for the problem:
      
      #define _GNU_SOURCE /* See feature_test_macros(7) */
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <errno.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <sys/sendfile.h>
      
      
      #define FILE_SIZE (1UL << 30)
      int main(int argc, char **argv) {
              int p[2], fd;
      
              if (pipe2(p, O_NONBLOCK))
                      return 1;
      
              fd = open(argv[1], O_RDWR | O_TMPFILE, 0666);
              if (fd < 0)
                      return 1;
              ftruncate(fd, FILE_SIZE);
      
              if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) {
                      fprintf(stderr, "FAIL\n");
              }
              if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) {
                      fprintf(stderr, "FAIL\n");
              }
              return 0;
      }
      
      It worked before b964bf53, it is stuck after b964bf53, and it
      works again with this fix.
      
      This regression occurred because do_splice_direct() calls pipe_write
      that handles O_NONBLOCK.  Here is a trace log from the reproducer:
      
       1)               |  __x64_sys_sendfile64() {
       1)               |    do_sendfile() {
       1)               |      __fdget()
       1)               |      rw_verify_area()
       1)               |      __fdget()
       1)               |      rw_verify_area()
       1)               |      do_splice_direct() {
       1)               |        rw_verify_area()
       1)               |        splice_direct_to_actor() {
       1)               |          do_splice_to() {
       1)               |            rw_verify_area()
       1)               |            generic_file_splice_read()
       1) + 74.153 us   |          }
       1)               |          direct_splice_actor() {
       1)               |            iter_file_splice_write() {
       1)               |              __kmalloc()
       1)   0.148 us    |              pipe_lock();
       1)   0.153 us    |              splice_from_pipe_next.part.0();
       1)   0.162 us    |              page_cache_pipe_buf_confirm();
      ... 16 times
       1)   0.159 us    |              page_cache_pipe_buf_confirm();
       1)               |              vfs_iter_write() {
       1)               |                do_iter_write() {
       1)               |                  rw_verify_area()
       1)               |                  do_iter_readv_writev() {
       1)               |                    pipe_write() {
       1)               |                      mutex_lock()
       1)   0.153 us    |                      mutex_unlock();
       1)   1.368 us    |                    }
       1)   1.686 us    |                  }
       1)   5.798 us    |                }
       1)   6.084 us    |              }
       1)   0.174 us    |              kfree();
       1)   0.152 us    |              pipe_unlock();
       1) + 14.461 us   |            }
       1) + 14.783 us   |          }
       1)   0.164 us    |          page_cache_pipe_buf_release();
      ... 16 times
       1)   0.161 us    |          page_cache_pipe_buf_release();
       1)               |          touch_atime()
       1) + 95.854 us   |        }
       1) + 99.784 us   |      }
       1) ! 107.393 us  |    }
       1) ! 107.699 us  |  }
      
      Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com
      Fixes: b964bf53 ("teach sendfile(2) to handle send-to-pipe directly")
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bdeb77bc
    • ChenXiaoSong's avatar
      ntfs: fix use-after-free in ntfs_ucsncmp() · 38c9c22a
      ChenXiaoSong authored
      Syzkaller reported use-after-free bug as follows:
      
      ==================================================================
      BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130
      Read of size 2 at addr ffff8880751acee8 by task a.out/879
      
      CPU: 7 PID: 879 Comm: a.out Not tainted 5.19.0-rc4-next-20220630-00001-gcc5218c8bd2c-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x1c0/0x2b0
       print_address_description.constprop.0.cold+0xd4/0x484
       print_report.cold+0x55/0x232
       kasan_report+0xbf/0xf0
       ntfs_ucsncmp+0x123/0x130
       ntfs_are_names_equal.cold+0x2b/0x41
       ntfs_attr_find+0x43b/0xb90
       ntfs_attr_lookup+0x16d/0x1e0
       ntfs_read_locked_attr_inode+0x4aa/0x2360
       ntfs_attr_iget+0x1af/0x220
       ntfs_read_locked_inode+0x246c/0x5120
       ntfs_iget+0x132/0x180
       load_system_files+0x1cc6/0x3480
       ntfs_fill_super+0xa66/0x1cf0
       mount_bdev+0x38d/0x460
       legacy_get_tree+0x10d/0x220
       vfs_get_tree+0x93/0x300
       do_new_mount+0x2da/0x6d0
       path_mount+0x496/0x19d0
       __x64_sys_mount+0x284/0x300
       do_syscall_64+0x3b/0xc0
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f3f2118d9ea
      Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffc269deac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3f2118d9ea
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffc269dec00
      RBP: 00007ffc269dec80 R08: 00007ffc269deb00 R09: 00007ffc269dec44
      R10: 0000000000000000 R11: 0000000000000202 R12: 000055f81ab1d220
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      
      The buggy address belongs to the physical page:
      page:0000000085430378 refcount:1 mapcount:1 mapping:0000000000000000 index:0x555c6a81d pfn:0x751ac
      memcg:ffff888101f7e180
      anon flags: 0xfffffc00a0014(uptodate|lru|mappedtodisk|swapbacked|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 000fffffc00a0014 ffffea0001bf2988 ffffea0001de2448 ffff88801712e201
      raw: 0000000555c6a81d 0000000000000000 0000000100000000 ffff888101f7e180
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880751acd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8880751ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8880751ace80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                                                                ^
       ffff8880751acf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8880751acf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ==================================================================
      
      The reason is that struct ATTR_RECORD->name_offset is 6485, end address of
      name string is out of bounds.
      
      Fix this by adding sanity check on end address of attribute name string.
      
      [akpm@linux-foundation.org: coding-style cleanups]
      [chenxiaosong2@huawei.com: cleanup suggested by Hawkins Jiawei]
        Link: https://lkml.kernel.org/r/20220709064511.3304299-1-chenxiaosong2@huawei.com
      Link: https://lkml.kernel.org/r/20220707105329.4020708-1-chenxiaosong2@huawei.comSigned-off-by: default avatarChenXiaoSong <chenxiaosong2@huawei.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: ChenXiaoSong <chenxiaosong2@huawei.com>
      Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      38c9c22a
    • Mike Rapoport's avatar
      secretmem: fix unhandled fault in truncate · 84ac0130
      Mike Rapoport authored
      syzkaller reports the following issue:
      
      BUG: unable to handle page fault for address: ffff888021f7e005
      PGD 11401067 P4D 11401067 PUD 11402067 PMD 21f7d063 PTE 800fffffde081060
      Oops: 0002 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 3761 Comm: syz-executor281 Not tainted 5.19.0-rc4-syzkaller-00014-g941e3e79 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
      Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
      RSP: 0018:ffffc9000329fa90 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000ffb
      RDX: 0000000000000ffb RSI: 0000000000000000 RDI: ffff888021f7e005
      RBP: ffffea000087df80 R08: 0000000000000001 R09: ffff888021f7e005
      R10: ffffed10043efdff R11: 0000000000000000 R12: 0000000000000005
      R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000ffb
      FS:  00007fb29d8b2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff888021f7e005 CR3: 0000000026e7b000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       zero_user_segments include/linux/highmem.h:272 [inline]
       folio_zero_range include/linux/highmem.h:428 [inline]
       truncate_inode_partial_folio+0x76a/0xdf0 mm/truncate.c:237
       truncate_inode_pages_range+0x83b/0x1530 mm/truncate.c:381
       truncate_inode_pages mm/truncate.c:452 [inline]
       truncate_pagecache+0x63/0x90 mm/truncate.c:753
       simple_setattr+0xed/0x110 fs/libfs.c:535
       secretmem_setattr+0xae/0xf0 mm/secretmem.c:170
       notify_change+0xb8c/0x12b0 fs/attr.c:424
       do_truncate+0x13c/0x200 fs/open.c:65
       do_sys_ftruncate+0x536/0x730 fs/open.c:193
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fb29d900899
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb29d8b2318 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
      RAX: ffffffffffffffda RBX: 00007fb29d988408 RCX: 00007fb29d900899
      RDX: 00007fb29d900899 RSI: 0000000000000005 RDI: 0000000000000003
      RBP: 00007fb29d988400 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb29d98840c
      R13: 00007ffca01a23bf R14: 00007fb29d8b2400 R15: 0000000000022000
       </TASK>
      Modules linked in:
      CR2: ffff888021f7e005
      ---[ end trace 0000000000000000 ]---
      
      Eric Biggers suggested that this happens when
      secretmem_setattr()->simple_setattr() races with secretmem_fault() so that
      a page that is faulted in by secretmem_fault() (and thus removed from the
      direct map) is zeroed by inode truncation right afterwards.
      
      Use mapping->invalidate_lock to make secretmem_fault() and
      secretmem_setattr() mutually exclusive.
      
      [rppt@linux.ibm.com: v3]
        Link: https://lkml.kernel.org/r/20220714091337.412297-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20220707165650.248088-1-rppt@kernel.org
      Reported-by: syzbot+9bd2b7adbd34b30b87e4@syzkaller.appspotmail.com
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Suggested-by: default avatarEric Biggers <ebiggers@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      84ac0130
    • Naoya Horiguchi's avatar
      mm/hugetlb: separate path for hwpoison entry in copy_hugetlb_page_range() · c2cb0dcc
      Naoya Horiguchi authored
      Originally copy_hugetlb_page_range() handles migration entries and
      hwpoisoned entries in similar manner.  But recently the related code path
      has more code for migration entries, and when
      is_writable_migration_entry() was converted to
      !is_readable_migration_entry(), hwpoison entries on source processes got
      to be unexpectedly updated (which is legitimate for migration entries, but
      not for hwpoison entries).  This results in unexpected serious issues like
      kernel panic when forking processes with hwpoison entries in pmd.
      
      Separate the if branch into one for hwpoison entries and one for migration
      entries.
      
      Link: https://lkml.kernel.org/r/20220704013312.2415700-3-naoya.horiguchi@linux.dev
      Fixes: 6c287605 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>	[5.18]
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Liu Shixin <liushixin2@huawei.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c2cb0dcc
    • Muchun Song's avatar
      mm: fix missing wake-up event for FSDAX pages · f4f451a1
      Muchun Song authored
      FSDAX page refcounts are 1-based, rather than 0-based: if refcount is
      1, then the page is freed.  The FSDAX pages can be pinned through GUP,
      then they will be unpinned via unpin_user_page() using a folio variant
      to put the page, however, folio variants did not consider this special
      case, the result will be to miss a wakeup event (like the user of
      __fuse_dax_break_layouts()).  This results in a task being permanently
      stuck in TASK_INTERRUPTIBLE state.
      
      Since FSDAX pages are only possibly obtained by GUP users, so fix GUP
      instead of folio_put() to lower overhead.
      
      Link: https://lkml.kernel.org/r/20220705123532.283-1-songmuchun@bytedance.com
      Fixes: d8ddc099 ("mm/gup: Add gup_put_folio()")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f4f451a1
    • Josef Bacik's avatar
      mm: fix page leak with multiple threads mapping the same page · 3fe2895c
      Josef Bacik authored
      We have an application with a lot of threads that use a shared mmap backed
      by tmpfs mounted with -o huge=within_size.  This application started
      leaking loads of huge pages when we upgraded to a recent kernel.
      
      Using the page ref tracepoints and a BPF program written by Tejun Heo we
      were able to determine that these pages would have multiple refcounts from
      the page fault path, but when it came to unmap time we wouldn't drop the
      number of refs we had added from the faults.
      
      I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
      huge=always, and then spawned 20 threads all looping faulting random
      offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
      page aligned ranges.  This very quickly reproduced the problem.
      
      The problem here is that we check for the case that we have multiple
      threads faulting in a range that was previously unmapped.  One thread maps
      the PMD, the other thread loses the race and then returns 0.  However at
      this point we already have the page, and we are no longer putting this
      page into the processes address space, and so we leak the page.  We
      actually did the correct thing prior to f9ce0be7, however it looks
      like Kirill copied what we do in the anonymous page case.  In the
      anonymous page case we don't yet have a page, so we don't have to drop a
      reference on anything.  Previously we did the correct thing for file based
      faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
      the page we faulted in.
      
      Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
      case, this makes us drop the ref on the page properly, and now my
      reproducer no longer leaks the huge pages.
      
      [josef@toxicpanda.com: v2]
        Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
      Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
      Fixes: f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths")
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3fe2895c
    • Seth Forshee's avatar
      mailmap: update Seth Forshee's email address · f073c833
      Seth Forshee authored
      seth.forshee@canonical.com is no longer valid, use sforshee@kernel.org
      instead.
      
      Link: https://lkml.kernel.org/r/20220628200734.424495-1-sforshee@kernel.orgSigned-off-by: default avatarSeth Forshee <sforshee@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f073c833
    • ZhaoLong Wang's avatar
      tmpfs: fix the issue that the mount and remount results are inconsistent. · 0c98c8e1
      ZhaoLong Wang authored
      An undefined-behavior issue has not been completely fixed since commit
      d14f5efa ("tmpfs: fix undefined-behaviour in shmem_reconfigure()"). 
      In the commit, check in the shmem_reconfigure() is added in remount
      process to avoid the Ubsan problem.  However, the check is not added to
      the mount process.  It causes inconsistent results between mount and
      remount.  The operations to reproduce the problem in user mode as follows:
      
      If nr_blocks is set to 0x8000000000000000, the mounting is successful.
      
        # mount tmpfs /dev/shm/ -t tmpfs -o nr_blocks=0x8000000000000000
      
      However, when -o remount is used, the mount fails because of the
      check in the shmem_reconfigure()
      
        # mount tmpfs /dev/shm/ -t tmpfs -o remount,nr_blocks=0x8000000000000000
        mount: /dev/shm: mount point not mounted or bad option.
      
      Therefore, add checks in the shmem_parse_one() function and remove the
      check in shmem_reconfigure() to avoid this problem.
      
      Link: https://lkml.kernel.org/r/20220629124324.1640807-1-wangzhaolong1@huawei.comSigned-off-by: default avatarZhaoLong Wang <wangzhaolong1@huawei.com>
      Cc: Luo Meng <luomeng12@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Yu Kuai <yukuai3@huawei.com>
      Cc: Zhihao Cheng <chengzhihao1@huawei.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0c98c8e1
    • Yee Lee's avatar
      mm: kfence: apply kmemleak_ignore_phys on early allocated pool · 07313a2b
      Yee Lee authored
      This patch solves two issues.
      
      (1) The pool allocated by memblock needs to unregister from
      kmemleak scanning. Apply kmemleak_ignore_phys to replace the
      original kmemleak_free as its address now is stored in the phys tree.
      
      (2) The pool late allocated by page-alloc doesn't need to unregister.
      Move out the freeing operation from its call path.
      
      Link: https://lkml.kernel.org/r/20220628113714.7792-2-yee.lee@mediatek.com
      Fixes: 0c24e061 ("mm: kmemleak: add rbtree and store physical address for objects allocated with PA")
      Signed-off-by: default avatarYee Lee <yee.lee@mediatek.com>
      Suggested-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Suggested-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      07313a2b
  2. 03 Jul, 2022 9 commits
  3. 26 Jun, 2022 20 commits
    • Linus Torvalds's avatar
      Linux 5.19-rc4 · 03c765b0
      Linus Torvalds authored
      03c765b0
    • Linus Torvalds's avatar
      Merge tag 'soc-fixes-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · 1709b887
      Linus Torvalds authored
      Pull ARM SoC fixes from Arnd Bergmann:
       "A number of fixes have accumulated, but they are largely for harmless
        issues:
      
         - Several OF node leak fixes
      
         - A fix to the Exynos7885 UART clock description
      
         - DTS fixes to prevent boot failures on TI AM64 and J721s2
      
         - Bus probe error handling fixes for Baikal-T1
      
         - A fixup to the way STM32 SoCs use separate dts files for different
           firmware stacks
      
         - Multiple code fixes for Arm SCMI firmware, all dealing with
           robustness of the implementation
      
         - Multiple NXP i.MX devicetree fixes, addressing incorrect data in DT
           nodes
      
         - Three updates to the MAINTAINERS file, including Florian Fainelli
           taking over BCM283x/BCM2711 (Raspberry Pi) from Nicolas Saenz
           Julienne"
      
      * tag 'soc-fixes-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
        ARM: dts: aspeed: nuvia: rename vendor nuvia to qcom
        arm: mach-spear: Add missing of_node_put() in time.c
        ARM: cns3xxx: Fix refcount leak in cns3xxx_init
        MAINTAINERS: Update email address
        arm64: dts: ti: k3-am64-main: Remove support for HS400 speed mode
        arm64: dts: ti: k3-j721s2: Fix overlapping GICD memory region
        ARM: dts: bcm2711-rpi-400: Fix GPIO line names
        bus: bt1-axi: Don't print error on -EPROBE_DEFER
        bus: bt1-apb: Don't print error on -EPROBE_DEFER
        ARM: Fix refcount leak in axxia_boot_secondary
        ARM: dts: stm32: move SCMI related nodes in a dedicated file for stm32mp15
        soc: imx: imx8m-blk-ctrl: fix display clock for LCDIF2 power domain
        ARM: dts: imx6qdl-colibri: Fix capacitive touch reset polarity
        ARM: dts: imx6qdl: correct PU regulator ramp delay
        firmware: arm_scmi: Fix incorrect error propagation in scmi_voltage_descriptors_get
        firmware: arm_scmi: Avoid using extended string-buffers sizes if not necessary
        firmware: arm_scmi: Fix SENSOR_AXIS_NAME_GET behaviour when unsupported
        ARM: dts: imx7: Move hsic_phy power domain to HSIC PHY node
        soc: bcm: brcmstb: pm: pm-arm: Fix refcount leak in brcmstb_pm_probe
        MAINTAINERS: Update BCM2711/BCM2835 maintainer
        ...
      1709b887
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 413c1f14
      Linus Torvalds authored
      Pull hotfixes from Andrew Morton:
       "Minor things, mainly - mailmap updates, MAINTAINERS updates, etc.
      
        Fixes for this merge window:
      
         - fix for a damon boot hang, from SeongJae
      
         - fix for a kfence warning splat, from Jason Donenfeld
      
         - fix for zero-pfn pinning, from Alex Williamson
      
         - fix for fallocate hole punch clearing, from Mike Kravetz
      
        Fixes for previous releases:
      
         - fix for a performance regression, from Marcelo
      
         - fix for a hwpoisining BUG from zhenwei pi"
      
      * tag 'mm-hotfixes-stable-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mailmap: add entry for Christian Marangi
        mm/memory-failure: disable unpoison once hw error happens
        hugetlbfs: zero partial pages during fallocate hole punch
        mm: memcontrol: reference to tools/cgroup/memcg_slabinfo.py
        mm: re-allow pinning of zero pfns
        mm/kfence: select random number before taking raw lock
        MAINTAINERS: add maillist information for LoongArch
        MAINTAINERS: update MM tree references
        MAINTAINERS: update Abel Vesa's email
        MAINTAINERS: add MEMORY HOT(UN)PLUG section and add David as reviewer
        MAINTAINERS: add Miaohe Lin as a memory-failure reviewer
        mailmap: add alias for jarkko@profian.com
        mm/damon/reclaim: schedule 'damon_reclaim_timer' only after 'system_wq' is initialized
        kthread: make it clear that kthread_create_on_node() might be terminated by any fatal signal
        mm: lru_cache_disable: use synchronize_rcu_expedited
        mm/page_isolation.c: fix one kernel-doc comment
      413c1f14
    • Linus Torvalds's avatar
      Merge tag 'perf-tools-fixes-for-v5.19-2022-06-26' of... · 893d1eaa
      Linus Torvalds authored
      Merge tag 'perf-tools-fixes-for-v5.19-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Enable ignore_missing_thread in 'perf stat', enabling counting with
         '--pid' when threads disappear during counting session setup
      
       - Adjust output data offset for backward compatibility in 'perf inject'
      
       - Fix missing free in copy_kcore_dir() in 'perf inject'
      
       - Fix caching files with a wrong build ID
      
       - Sync drm, cpufeatures, vhost and svn headers with the kernel
      
      * tag 'perf-tools-fixes-for-v5.19-2022-06-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        tools headers UAPI: Synch KVM's svm.h header with the kernel
        tools include UAPI: Sync linux/vhost.h with the kernel sources
        perf stat: Enable ignore_missing_thread
        perf inject: Adjust output data offset for backward compatibility
        perf trace beauty: Fix generation of errno id->str table on ALT Linux
        perf build-id: Fix caching files with a wrong build ID
        tools headers cpufeatures: Sync with the kernel sources
        tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
        perf inject: Fix missing free in copy_kcore_dir()
      893d1eaa
    • Linus Torvalds's avatar
      Merge tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 82708bb1
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - zoned relocation fixes:
            - fix critical section end for extent writeback, this could lead
              to out of order write
            - prevent writing to previous data relocation block group if space
              gets low
      
       - reflink fixes:
            - fix race between reflinking and ordered extent completion
            - proper error handling when block reserve migration fails
            - add missing inode iversion/mtime/ctime updates on each iteration
              when replacing extents
      
       - fix deadlock when running fsync/fiemap/commit at the same time
      
       - fix false-positive KCSAN report regarding pid tracking for read locks
         and data race
      
       - minor documentation update and link to new site
      
      * tag 'for-5.19-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Documentation: update btrfs list of features and link to readthedocs.io
        btrfs: fix deadlock with fsync+fiemap+transaction commit
        btrfs: don't set lock_owner when locking extent buffer for reading
        btrfs: zoned: fix critical section of relocation inode writeback
        btrfs: zoned: prevent allocation from previous data relocation BG
        btrfs: do not BUG_ON() on failure to migrate space when replacing extents
        btrfs: add missing inode updates on each iteration when replacing extents
        btrfs: fix race between reflinking and ordered extent completion
      82708bb1
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-5.19-2022-06-26' of git://git.infradead.org/users/hch/dma-mapping · c898c67d
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
      
       - pass the correct size to dma_set_encrypted() when freeing memory
         (Dexuan Cui)
      
      * tag 'dma-mapping-5.19-2022-06-26' of git://git.infradead.org/users/hch/dma-mapping:
        dma-direct: use the correct size for dma_set_encrypted()
      c898c67d
    • Linus Torvalds's avatar
      Merge tag 'for-5.19/fbdev-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev · be129fab
      Linus Torvalds authored
      Pull fbdev fixes from Helge Deller:
       "Two bug fixes for the pxa3xx and intelfb drivers:
      
         - pxa3xx-gcu: Fix integer overflow in pxa3xx_gcu_write
      
         - intelfb: Initialize value of stolen size
      
        The other changes are small cleanups, simplifications and
        documentation updates to the cirrusfb, skeletonfb, omapfb,
        intelfb, au1100fb and simplefb drivers"
      
      * tag 'for-5.19/fbdev-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
        video: fbdev: omap: Remove duplicate 'the' in comment
        video: fbdev: omapfb: Align '*' in comment
        video: fbdev: simplefb: Check before clk_put() not needed
        video: fbdev: au1100fb: Drop unnecessary NULL ptr check
        video: fbdev: pxa3xx-gcu: Fix integer overflow in pxa3xx_gcu_write
        video: fbdev: skeletonfb: Convert to generic power management
        video: fbdev: cirrusfb: Remove useless reference to PCI power management
        video: fbdev: intelfb: Initialize value of stolen size
        video: fbdev: intelfb: Use aperture size from pci_resource_len
        video: fbdev: skeletonfb: Fix syntax errors in comments
      be129fab
    • Linus Torvalds's avatar
      Merge tag 'for-5.19/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · c0c6a7bd
      Linus Torvalds authored
      Pull parisc architecture fixes from Helge Deller:
      
       - enable ARCH_HAS_STRICT_MODULE_RWX to prevent a boot crash on c8000
         machines
      
       - flush all mappings of a shared anonymous page on PA8800/8900 machines
         via flushing the whole data cache. This may slow down such machines
         but makes sure that the cache is consistent
      
       - Fix duplicate definition build error regarding fb_is_primary_device()
      
      * tag 'for-5.19/parisc-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Enable ARCH_HAS_STRICT_MODULE_RWX
        parisc: Fix flush_anon_page on PA8800/PA8900
        parisc: align '*' in comment in math-emu code
        parisc/stifb: Fix fb_is_primary_device() only available with CONFIG_FB_STI
      c0c6a7bd
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20220626' of https://github.com/jcmvbkbc/linux-xtensa · e963d685
      Linus Torvalds authored
      Pull xtensa fixes from Max Filippov:
      
       - fix OF reference leaks in xtensa arch code
      
       - replace '.bss' with '.section .bss' to fix entry.S build with old
         assembler
      
      * tag 'xtensa-20220626' of https://github.com/jcmvbkbc/linux-xtensa:
        xtensa: change '.bss' to '.section .bss'
        xtensa: xtfpga: Fix refcount leak bug in setup
        xtensa: Fix refcount leak bug in time.c
      e963d685
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 8100775d
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - A fix for a CMA change that broke booting guests with > 2G RAM on
         Power8 hosts.
      
       - Fix the RTAS call filter to allow a special case that applications
         rely on.
      
       - A change to our execve path, to make the execve syscall exit
         tracepoint work.
      
       - Three fixes to wire up our various RNGs earlier in boot so they're
         available for use in the initial seeding in random_init().
      
       - A build fix for when KASAN is enabled along with
         STRUCTLEAK_BYREF_ALL.
      
      Thanks to Andrew Donnellan, Aneesh Kumar K.V, Christophe Leroy, Jason
      Donenfeld, Nathan Lynch, Naveen N. Rao, Sathvika Vasireddy, Sumit
      Dubey2, Tyrel Datwyler, and Zi Yan.
      
      * tag 'powerpc-5.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/powernv: wire up rng during setup_arch
        powerpc/prom_init: Fix build failure with GCC_PLUGIN_STRUCTLEAK_BYREF_ALL and KASAN
        powerpc/rtas: Allow ibm,platform-dump RTAS call with null buffer address
        powerpc: Enable execve syscall exit tracepoint
        powerpc/pseries: wire up rng during setup_arch()
        powerpc/microwatt: wire up rng during setup_arch()
        powerpc/mm: Move CMA reservations after initmem_init()
      8100775d
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.19-2' of... · 393ed5d8
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - Fix modpost to detect EXPORT_SYMBOL marked as __init or__exit
      
       - Update the supported arch list in the LLVM document
      
       - Avoid the second link of vmlinux for CONFIG_TRIM_UNUSED_KSYMS
      
       - Avoid false __KSYM___this_module define in include/generated/autoksyms.h
      
      * tag 'kbuild-fixes-v5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kbuild: Ignore __this_module in gen_autoksyms.sh
        kbuild: link vmlinux only once for CONFIG_TRIM_UNUSED_KSYMS (2nd attempt)
        Documentation/llvm: Update Supported Arch table
        modpost: fix section mismatch check for exported init/exit sections
      393ed5d8
    • Linus Torvalds's avatar
      Merge tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat · 97d4d026
      Linus Torvalds authored
      Pull exfat fix from Namjae Jeon:
      
       - Use updated exfat_chain directly instead of snapshot values in
         rename.
      
      * tag 'exfat-for-5.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
        exfat: use updated exfat_chain directly during renaming
      97d4d026
    • Linus Torvalds's avatar
      Merge tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 918c30df
      Linus Torvalds authored
      Pull cifs client fixes from Steve French:
       "Fixes addressing important multichannel, and reconnect issues.
      
        Multichannel mounts when the server network interfaces changed, or ip
        addresses changed, uncovered problems, especially in reconnect, but
        the patches for this were held up until recently due to some lock
        conflicts that are now addressed.
      
        Included in this set of fixes:
      
         - three fixes relating to multichannel reconnect, dynamically
           adjusting the list of server interfaces to avoid problems during
           reconnect
      
         - a lock conflict fix related to the above
      
         - two important fixes for negotiate on secondary channels (null
           netname can unintentionally cause multichannel to be disabled to
           some servers)
      
         - a reconnect fix (reporting incorrect IP address in some cases)"
      
      * tag '5.19-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: update cifs_ses::ip_addr after failover
        cifs: avoid deadlocks while updating iface
        cifs: periodically query network interfaces from server
        cifs: during reconnect, update interface if necessary
        cifs: change iface_list from array to sorted linked list
        smb3: use netname when available on secondary channels
        smb3: fix empty netname context on secondary channels
      918c30df
    • Arnaldo Carvalho de Melo's avatar
      tools headers UAPI: Synch KVM's svm.h header with the kernel · f8d86619
      Arnaldo Carvalho de Melo authored
      To pick up the changes from:
      
        d5af44dd ("x86/sev: Provide support for SNP guest request NAEs")
        0afb6b66 ("x86/sev: Use SEV-SNP AP creation to start secondary CPUs")
        dc3f3d24 ("x86/mm: Validate memory when changing the C-bit")
        cbd3d4f7 ("x86/sev: Check SEV-SNP features support")
      
      That gets these new SVM exit reasons:
      
      +       { SVM_VMGEXIT_PSC,              "vmgexit_page_state_change" }, \
      +       { SVM_VMGEXIT_GUEST_REQUEST,    "vmgexit_guest_request" }, \
      +       { SVM_VMGEXIT_EXT_GUEST_REQUEST, "vmgexit_ext_guest_request" }, \
      +       { SVM_VMGEXIT_AP_CREATION,      "vmgexit_ap_creation" }, \
      +       { SVM_VMGEXIT_HV_FEATURES,      "vmgexit_hypervisor_feature" }, \
      
      Addressing this perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/svm.h' differs from latest version at 'arch/x86/include/uapi/asm/svm.h'
        diff -u tools/arch/x86/include/uapi/asm/svm.h arch/x86/include/uapi/asm/svm.h
      
      This causes these changes:
      
        CC      /tmp/build/perf-urgent/arch/x86/util/kvm-stat.o
        LD      /tmp/build/perf-urgent/arch/x86/util/perf-in.o
        LD      /tmp/build/perf-urgent/arch/x86/perf-in.o
        LD      /tmp/build/perf-urgent/arch/perf-in.o
        LD      /tmp/build/perf-urgent/perf-in.o
        LINK    /tmp/build/perf-urgent/perf
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      f8d86619
    • Arnaldo Carvalho de Melo's avatar
      tools include UAPI: Sync linux/vhost.h with the kernel sources · e2213a2d
      Arnaldo Carvalho de Melo authored
      To get the changes in:
      
        84d7c8fd ("vhost-vdpa: introduce uAPI to set group ASID")
        2d1fcb77 ("vhost-vdpa: uAPI to get virtqueue group id")
        a0c95f20 ("vhost-vdpa: introduce uAPI to get the number of address spaces")
        3ace88bd ("vhost-vdpa: introduce uAPI to get the number of virtqueue groups")
        175d493c ("vhost: move the backend feature bits to vhost_types.h")
      
      Silencing this perf build warning:
      
        Warning: Kernel ABI header at 'tools/include/uapi/linux/vhost.h' differs from latest version at 'include/uapi/linux/vhost.h'
        diff -u tools/include/uapi/linux/vhost.h include/uapi/linux/vhost.h
      
      To pick up these changes and support them:
      
        $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > before
        $ cp include/uapi/linux/vhost.h tools/include/uapi/linux/vhost.h
        $ tools/perf/trace/beauty/vhost_virtio_ioctl.sh > after
        $ diff -u before after
        --- before	2022-06-26 12:04:35.982003781 -0300
        +++ after	2022-06-26 12:04:43.819972476 -0300
        @@ -28,6 +28,7 @@
         	[0x74] = "VDPA_SET_CONFIG",
         	[0x75] = "VDPA_SET_VRING_ENABLE",
         	[0x77] = "VDPA_SET_CONFIG_CALL",
        +	[0x7C] = "VDPA_SET_GROUP_ASID",
         };
         static const char *vhost_virtio_ioctl_read_cmds[] = {
         	[0x00] = "GET_FEATURES",
        @@ -39,5 +40,8 @@
         	[0x76] = "VDPA_GET_VRING_NUM",
         	[0x78] = "VDPA_GET_IOVA_RANGE",
         	[0x79] = "VDPA_GET_CONFIG_SIZE",
        +	[0x7A] = "VDPA_GET_AS_NUM",
        +	[0x7B] = "VDPA_GET_VRING_GROUP",
         	[0x80] = "VDPA_GET_VQS_COUNT",
        +	[0x81] = "VDPA_GET_GROUP_NUM",
         };
        $
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Gautam Dawar <gautam.dawar@xilinx.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lore.kernel.org/lkml/Yrh3xMYbfeAD0MFL@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      e2213a2d
    • Gang Li's avatar
      perf stat: Enable ignore_missing_thread · 448ce0e6
      Gang Li authored
      perf already support ignore_missing_thread for -p, but not yet
      applied to `perf stat -p <pid>`. This patch enables ignore_missing_thread
      for `perf stat -p <pid>`.
      
      Committer notes:
      
      And here is a refresher about the 'ignore_missing_thread' knob, from a
      previous patch using it:
      
        ca800068 ("perf evsel: Enable ignore_missing_thread for pid option")
      
        ---
          While monitoring a multithread process with pid option, perf sometimes
          may return sys_perf_event_open failure with 3(No such process) if any of
          the process's threads die before we open the event. However, we want
          perf continue monitoring the remaining threads and do not exit with
          error.
        ---
      Signed-off-by: default avatarGang Li <ligang.bdlg@bytedance.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220622030037.15005-1-ligang.bdlg@bytedance.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      448ce0e6
    • Raul Silvera's avatar
      perf inject: Adjust output data offset for backward compatibility · 37ed2cdd
      Raul Silvera authored
      When 'perf inject' creates a new file, it reuses the data offset from
      the input file. If there has been a change on the size of the header, as
      happened in v5.12 -> v5.13, the new offsets will be wrong, resulting in
      a corrupted output file.
      
      This change adds the function perf_session__data_offset to compute the
      data offset based on the current header size, and uses that instead of
      the offset from the original input file.
      Signed-off-by: default avatarRaul Silvera <rsilvera@google.com>
      Acked-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Colin Ian King <colin.king@intel.com>
      Cc: Dave Marchevsky <davemarchevsky@fb.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20220621152725.2668041-1-rsilvera@google.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      37ed2cdd
    • Arnaldo Carvalho de Melo's avatar
      perf trace beauty: Fix generation of errno id->str table on ALT Linux · 3713e249
      Arnaldo Carvalho de Melo authored
      For some reason using:
      
               cat <<EoFuncBegin
        static const char *errno_to_name__$arch(int err)
        {
               switch (err) {
        EoFuncBegin
      
      In tools/perf/trace/beauty/arch_errno_names.sh isn't working on ALT
      Linux sisyphus (development version), which could be some distro
      specific glitch, so just get this done in an alternative way that works
      everywhere while giving notice to the people working on that distro to
      try and figure our what really took place.
      
      Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      3713e249
    • Adrian Hunter's avatar
      perf build-id: Fix caching files with a wrong build ID · ab66fdac
      Adrian Hunter authored
      Build ID events associate a file name with a build ID.  However, when
      using perf inject, there is no guarantee that the file on the current
      machine at the current time has that build ID. Fix by comparing the
      build IDs and skip adding to the cache if they are different.
      
      Example:
      
        $ echo "int main() {return 0;}" > prog.c
        $ gcc -o prog prog.c
        $ perf record --buildid-all ./prog
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.019 MB perf.data ]
        $ file-buildid() { file $1 | awk -F= '{print $2}' | awk -F, '{print $1}' ; }
        $ file-buildid prog
        444ad9be165d8058a48ce2ffb4e9f55854a3293e
        $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
        444ad9be165d8058a48ce2ffb4e9f55854a3293e
        $ echo "int main() {return 1;}" > prog.c
        $ gcc -o prog prog.c
        $ file-buildid prog
        885524d5aaa24008a3e2b06caa3ea95d013c0fc5
      
      Before:
      
        $ perf buildid-cache --purge $(pwd)/prog
        $ perf inject -i perf.data -o junk
        $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
        885524d5aaa24008a3e2b06caa3ea95d013c0fc5
        $
      
      After:
      
        $ perf buildid-cache --purge $(pwd)/prog
        $ perf inject -i perf.data -o junk
        $ file-buildid ~/.debug/$(pwd)/prog/444ad9be165d8058a48ce2ffb4e9f55854a3293e/elf
      
        $
      
      Fixes: 454c407e ("perf: add perf-inject builtin")
      Signed-off-by: default avatarAdrian Hunter <adrian.hunter@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Link: https://lore.kernel.org/r/20220621125144.5623-1-adrian.hunter@intel.comSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      ab66fdac
    • Arnaldo Carvalho de Melo's avatar
      tools headers cpufeatures: Sync with the kernel sources · 4b3f7644
      Arnaldo Carvalho de Melo authored
      To pick the changes from:
      
        d6d0c7f6 ("x86/cpufeatures: Add PerfMonV2 feature bit")
        296d5a17 ("KVM: SEV-ES: Use V_TSC_AUX if available instead of RDTSC/MSR_TSC_AUX intercepts")
        f3090339 ("x86/cpufeatures: Add virtual TSC_AUX feature bit")
        8ad7e8f6 ("x86/fpu/xsave: Support XSAVEC in the kernel")
        59bd54a8 ("x86/tdx: Detect running as a TDX guest in early boot")
        a77d41ac ("x86/cpufeatures: Add AMD Fam19h Branch Sampling feature")
      
      This only causes these perf files to be rebuilt:
      
        CC       /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
        CC       /tmp/build/perf/bench/mem-memset-x86-64-asm.o
      
      And addresses this perf build warning:
      
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
        diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
        Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
        diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sandipan Das <sandipan.das@amd.com>
      Cc: Babu Moger <babu.moger@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lore.kernel.org/lkml/YrDkgmwhLv+nKeOo@kernel.orgSigned-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4b3f7644