1. 15 Aug, 2021 9 commits
  2. 14 Aug, 2021 13 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · ba31f97d
      Linus Torvalds authored
      Pull xen fixes from Juergen Gross:
       "A small cleanup patch and a fix of a rare race in the Xen evtchn
        driver"
      
      * tag 'for-linus-5.14-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/events: Fix race in set_evtchn_to_irq
        xen/events: remove redundant initialization of variable irq
      ba31f97d
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · a7a4f1c0
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - avoid passing -mno-relax to compilers that don't support it
      
       - a comment fix
      
      * tag 'riscv-for-linus-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Fix comment regarding kernel mapping overlapping with IS_ERR_VALUE
        riscv: kexec: do not add '-mno-relax' flag if compiler doesn't support it
      a7a4f1c0
    • Linus Torvalds's avatar
      Merge tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs · 118516e2
      Linus Torvalds authored
      Pull configfs fix from Christoph Hellwig:
      
       - fix to revert to the historic write behavior (Bart Van Assche)
      
      * tag 'configfs-5.14' of git://git.infradead.org/users/hch/configfs:
        configfs: restore the kernel v5.13 text attribute write behavior
      118516e2
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · dfa377c3
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "7 patches.
      
        Subsystems affected by this patch series: mm (kasan, mm/slub,
        mm/madvise, and memcg), and lib"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        lib: use PFN_PHYS() in devmem_is_allowed()
        mm/memcg: fix incorrect flushing of lruvec data in obj_stock
        mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)
        mm: slub: fix slub_debug disabling for list of slabs
        slub: fix kmalloc_pagealloc_invalid_free unit test
        kasan, slub: reset tag when printing address
        kasan, kmemleak: reset tags when scanning block
      dfa377c3
    • Linus Torvalds's avatar
      Merge tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 27b2eaa1
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
       "Four CIFS/SMB3 Fixes, all for stable, two relating to deferred close,
        and one for the 'modefromsid' mount option (when 'idsfromsid' not
        specified)"
      
      * tag '5.14-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Call close synchronously during unlink/rename/lease break.
        cifs: Handle race conditions during rename
        cifs: use the correct max-length for dentry_path_raw()
        cifs: create sd context must be a multiple of 8
      27b2eaa1
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-5.14-rc6' of... · a83ed225
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest fix from Shuah Khan:
       "A single patch to sgx test to fix Q1 and Q2 calculation"
      
      * tag 'linux-kselftest-fixes-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/sgx: Fix Q1 and Q2 calculation in sigstruct.c
      a83ed225
    • Liang Wang's avatar
      lib: use PFN_PHYS() in devmem_is_allowed() · 854f3264
      Liang Wang authored
      The physical address may exceed 32 bits on 32-bit systems with more than
      32 bits of physcial address.  Use PFN_PHYS() in devmem_is_allowed(), or
      the physical address may overflow and be truncated.
      
      We found this bug when mapping a high addresses through devmem tool,
      when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem
      is used to map a high address that is not in the iomem address range, an
      unexpected error indicating no permission is returned.
      
      This bug was initially introduced from v2.6.37, and the function was
      moved to lib in v5.11.
      
      Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com
      Fixes: 087aaffc ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem")
      Fixes: 527701ed ("lib: Add a generic version of devmem_is_allowed()")
      Signed-off-by: default avatarLiang Wang <wangliang101@huawei.com>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Liang Wang <wangliang101@huawei.com>
      Cc: Xiaoming Ni <nixiaoming@huawei.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      854f3264
    • Waiman Long's avatar
      mm/memcg: fix incorrect flushing of lruvec data in obj_stock · 7fa0dacb
      Waiman Long authored
      When mod_objcg_state() is called with a pgdat that is different from
      that in the obj_stock, the old lruvec data cached in obj_stock are
      flushed out.  Unfortunately, they were flushed to the new pgdat and so
      the data go to the wrong node.  This will screw up the slab data
      reported in /sys/devices/system/node/node*/meminfo.
      
      Fix that by flushing the data to the cached pgdat instead.
      
      Link: https://lkml.kernel.org/r/20210802143834.30578-1-longman@redhat.com
      Fixes: 68ac5b3c ("mm/memcg: cache vmstat data in percpu memcg_stock_pcp")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Alex Shi <alex.shi@linux.alibaba.com>
      Cc: Chris Down <chris@chrisdown.name>
      Cc: Yafang Shao <laoar.shao@gmail.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
      Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Waiman Long <longman@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7fa0dacb
    • David Hildenbrand's avatar
      mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE) · eb2faa51
      David Hildenbrand authored
      Doing some extended tests and polishing the man page update for
      MADV_POPULATE_(READ|WRITE), I realized that we end up converting also
      SIGBUS (via -EFAULT) to -EINVAL, making it look like yet another
      madvise() user error.
      
      We want to report only problematic mappings and permission problems that
      the user could have know as -EINVAL.
      
      Let's not convert -EFAULT arising due to SIGBUS (or SIGSEGV) to -EINVAL,
      but instead indicate -EFAULT to user space.  While we could also convert
      it to -ENOMEM, using -EFAULT looks more helpful when user space might
      want to troubleshoot what's going wrong: MADV_POPULATE_(READ|WRITE) is
      not part of an final Linux release and we can still adjust the behavior.
      
      Link: https://lkml.kernel.org/r/20210726154932.102880-1-david@redhat.com
      Fixes: 4ca9b385 ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb2faa51
    • Vlastimil Babka's avatar
      mm: slub: fix slub_debug disabling for list of slabs · a7f1d485
      Vlastimil Babka authored
      Vijayanand Jitta reports:
      
        Consider the scenario where CONFIG_SLUB_DEBUG_ON is set and we would
        want to disable slub_debug for few slabs. Using boot parameter with
        slub_debug=-,slab_name syntax doesn't work as expected i.e; only
        disabling debugging for the specified list of slabs. Instead it
        disables debugging for all slabs, which is wrong.
      
      This patch fixes it by delaying the moment when the global slub_debug
      flags variable is updated.  In case a "slub_debug=-,slab_name" has been
      passed, the global flags remain as initialized (depending on
      CONFIG_SLUB_DEBUG_ON enabled or disabled) and are not simply reset to 0.
      
      Link: https://lkml.kernel.org/r/8a3d992a-473a-467b-28a0-4ad2ff60ab82@suse.czSigned-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reported-by: default avatarVijayanand Jitta <vjitta@codeaurora.org>
      Reviewed-by: default avatarVijayanand Jitta <vjitta@codeaurora.org>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vinayak Menon <vinmenon@codeaurora.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a7f1d485
    • Shakeel Butt's avatar
      slub: fix kmalloc_pagealloc_invalid_free unit test · 1ed7ce57
      Shakeel Butt authored
      The unit test kmalloc_pagealloc_invalid_free makes sure that for the
      higher order slub allocation which goes to page allocator, the free is
      called with the correct address i.e.  the virtual address of the head
      page.
      
      Commit f227f0fa ("slub: fix unreclaimable slab stat for bulk free")
      unified the free code paths for page allocator based slub allocations
      but instead of using the address passed by the caller, it extracted the
      address from the page.  Thus making the unit test
      kmalloc_pagealloc_invalid_free moot.  So, fix this by using the address
      passed by the caller.
      
      Should we fix this? I think yes because dev expect kasan to catch these
      type of programming bugs.
      
      Link: https://lkml.kernel.org/r/20210802180819.1110165-1-shakeelb@google.com
      Fixes: f227f0fa ("slub: fix unreclaimable slab stat for bulk free")
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1ed7ce57
    • Kuan-Ying Lee's avatar
      kasan, slub: reset tag when printing address · 340caf17
      Kuan-Ying Lee authored
      The address still includes the tags when it is printed.  With hardware
      tag-based kasan enabled, we will get a false positive KASAN issue when
      we access metadata.
      
      Reset the tag before we access the metadata.
      
      Link: https://lkml.kernel.org/r/20210804090957.12393-3-Kuan-Ying.Lee@mediatek.com
      Fixes: aa1ef4d7 ("kasan, mm: reset tags when accessing metadata")
      Signed-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Nicholas Tang <nicholas.tang@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      340caf17
    • Kuan-Ying Lee's avatar
      kasan, kmemleak: reset tags when scanning block · 6c7a00b8
      Kuan-Ying Lee authored
      Patch series "kasan, slub: reset tag when printing address", v3.
      
      With hardware tag-based kasan enabled, we reset the tag when we access
      metadata to avoid from false alarm.
      
      This patch (of 2):
      
      Kmemleak needs to scan kernel memory to check memory leak.  With hardware
      tag-based kasan enabled, when it scans on the invalid slab and
      dereference, the issue will occur as below.
      
      Hardware tag-based KASAN doesn't use compiler instrumentation, we can not
      use kasan_disable_current() to ignore tag check.
      
      Based on the below report, there are 11 0xf7 granules, which amounts to
      176 bytes, and the object is allocated from the kmalloc-256 cache.  So
      when kmemleak accesses the last 256-176 bytes, it causes faults, as those
      are marked with KASAN_KMALLOC_REDZONE == KASAN_TAG_INVALID == 0xfe.
      
      Thus, we reset tags before accessing metadata to avoid from false positives.
      
        BUG: KASAN: out-of-bounds in scan_block+0x58/0x170
        Read at addr f7ff0000c0074eb0 by task kmemleak/138
        Pointer tag: [f7], memory tag: [fe]
      
        CPU: 7 PID: 138 Comm: kmemleak Not tainted 5.14.0-rc2-00001-g8cae8cd8-dirty #134
        Hardware name: linux,dummy-virt (DT)
        Call trace:
         dump_backtrace+0x0/0x1b0
         show_stack+0x1c/0x30
         dump_stack_lvl+0x68/0x84
         print_address_description+0x7c/0x2b4
         kasan_report+0x138/0x38c
         __do_kernel_fault+0x190/0x1c4
         do_tag_check_fault+0x78/0x90
         do_mem_abort+0x44/0xb4
         el1_abort+0x40/0x60
         el1h_64_sync_handler+0xb4/0xd0
         el1h_64_sync+0x78/0x7c
         scan_block+0x58/0x170
         scan_gray_list+0xdc/0x1a0
         kmemleak_scan+0x2ac/0x560
         kmemleak_scan_thread+0xb0/0xe0
         kthread+0x154/0x160
         ret_from_fork+0x10/0x18
      
        Allocated by task 0:
         kasan_save_stack+0x2c/0x60
         __kasan_kmalloc+0xec/0x104
         __kmalloc+0x224/0x3c4
         __register_sysctl_paths+0x200/0x290
         register_sysctl_table+0x2c/0x40
         sysctl_init+0x20/0x34
         proc_sys_init+0x3c/0x48
         proc_root_init+0x80/0x9c
         start_kernel+0x648/0x6a4
         __primary_switched+0xc0/0xc8
      
        Freed by task 0:
         kasan_save_stack+0x2c/0x60
         kasan_set_track+0x2c/0x40
         kasan_set_free_info+0x44/0x54
         ____kasan_slab_free.constprop.0+0x150/0x1b0
         __kasan_slab_free+0x14/0x20
         slab_free_freelist_hook+0xa4/0x1fc
         kfree+0x1e8/0x30c
         put_fs_context+0x124/0x220
         vfs_kern_mount.part.0+0x60/0xd4
         kern_mount+0x24/0x4c
         bdev_cache_init+0x70/0x9c
         vfs_caches_init+0xdc/0xf4
         start_kernel+0x638/0x6a4
         __primary_switched+0xc0/0xc8
      
        The buggy address belongs to the object at ffff0000c0074e00
         which belongs to the cache kmalloc-256 of size 256
        The buggy address is located 176 bytes inside of
         256-byte region [ffff0000c0074e00, ffff0000c0074f00)
        The buggy address belongs to the page:
        page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100074
        head:(____ptrval____) order:2 compound_mapcount:0 compound_pincount:0
        flags: 0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff|kasantag=0x0)
        raw: 0bfffc0000010200 0000000000000000 dead000000000122 f5ff0000c0002300
        raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
      
        Memory state around the buggy address:
         ffff0000c0074c00: f0 f0 f0 f0 f0 f0 f0 f0 f0 fe fe fe fe fe fe fe
         ffff0000c0074d00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
        >ffff0000c0074e00: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 fe fe fe fe fe
                                                            ^
         ffff0000c0074f00: fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe fe
         ffff0000c0075000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        ==================================================================
        Disabling lock debugging due to kernel taint
        kmemleak: 181 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      Link: https://lkml.kernel.org/r/20210804090957.12393-1-Kuan-Ying.Lee@mediatek.com
      Link: https://lkml.kernel.org/r/20210804090957.12393-2-Kuan-Ying.Lee@mediatek.comSigned-off-by: default avatarKuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Nicholas Tang <nicholas.tang@mediatek.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6c7a00b8
  3. 13 Aug, 2021 18 commits
    • Linus Torvalds's avatar
      Merge tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block · 020efdad
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "A few fixes for block that should go into 5.14:
      
         - Revert the mq-deadline cgroup addition. More work is needed on this
           front, let's revert it for now and get it right before having it in
           a released kernel (Tejun)
      
         - blk-iocost lockdep fix (Ming)
      
         - nbd double completion fix (Xie)
      
         - Fix for non-idling when clearing the shared tag flag (Yu)"
      
      * tag 'block-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
        nbd: Aovid double completion of a request
        blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED
        Revert "block/mq-deadline: Add cgroup support"
        blk-iocost: fix lockdep warning on blkcg->lock
      020efdad
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block · 42995cee
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "A bit bigger than the previous weeks, but mostly just a few stable
        bound fixes. In detail:
      
         - Followup fixes to patches from last week for io-wq, turns out they
           weren't complete (Hao)
      
         - Two lockdep reported fixes out of the RT camp (me)
      
         - Sync the io_uring-cp example with liburing, as a few bug fixes
           never made it to the kernel carried version (me)
      
         - SQPOLL related TIF_NOTIFY_SIGNAL fix (Nadav)
      
         - Use WRITE_ONCE() when writing sq flags (Nadav)
      
         - io_rsrc_put_work() deadlock fix (Pavel)"
      
      * tag 'io_uring-5.14-2021-08-13' of git://git.kernel.dk/linux-block:
        tools/io_uring/io_uring-cp: sync with liburing example
        io_uring: fix ctx-exit io_rsrc_put_work() deadlock
        io_uring: drop ctx->uring_lock before flushing work item
        io-wq: fix IO_WORKER_F_FIXED issue in create_io_worker()
        io-wq: fix bug of creating io-wokers unconditionally
        io_uring: rsrc ref lock needs to be IRQ safe
        io_uring: Use WRITE_ONCE() when writing to sq_flags
        io_uring: clear TIF_NOTIFY_SIGNAL when running task work
      42995cee
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 462938cd
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "An assortment of pin control fixes of varying importance, the most
        important ones affecting Intel and AMD laptops turned up the recent
        few days so it's time to push this to your tree.
      
         - Fix the Kconfig dependency for Qualcomm SM8350 pin controller
      
         - Fix pin biasing fallback behaviour on the Mediatek pin controller
      
         - Fix the GPIO numbering scheme for Intel Tiger Lake-H to correspond
           to the products that are now actually out on the market
      
         - Fix a pin control function itemization in the Sunxi driver
           out-of-bounds access bug
      
         - Fix disable clocking for the RISC-V K210 pin controller on the
           errorpath
      
         - Fix a system shutdown bug affecting AMD Ryzen-based laptops, the
           system would not suspend but just bounce back up"
      
      * tag 'pinctrl-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: amd: Fix an issue with shutdown when system set to s0ix
        pinctrl: k210: Fix k210_fpioa_probe()
        pinctrl: sunxi: Don't underestimate number of functions
        pinctrl: tigerlake: Fix GPIO mapping for newer version of software
        pinctrl: mediatek: Fix fallback behavior for bias_set_combo
        pinctrl: qcom: fix GPIOLIB dependencies
      462938cd
    • Xie Yongji's avatar
      nbd: Aovid double completion of a request · cddce011
      Xie Yongji authored
      There is a race between iterating over requests in
      nbd_clear_que() and completing requests in recv_work(),
      which can lead to double completion of a request.
      
      To fix it, flush the recv worker before iterating over
      the requests and don't abort the completed request
      while iterating.
      
      Fixes: 96d97e17 ("nbd: clear_sock on netlink disconnect")
      Reported-by: default avatarJiang Yadong <jiangyadong@bytedance.com>
      Signed-off-by: default avatarXie Yongji <xieyongji@bytedance.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20210813151330.96-1-xieyongji@bytedance.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cddce011
    • Jens Axboe's avatar
      tools/io_uring/io_uring-cp: sync with liburing example · 8f40d037
      Jens Axboe authored
      This example is missing a few fixes that are in the liburing version,
      synchronize with the upstream version.
      Reported-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8f40d037
    • Yu Kuai's avatar
      blk-mq: clear active_queues before clearing BLK_MQ_F_TAG_QUEUE_SHARED · 454bb677
      Yu Kuai authored
      We run a test that delete and recover devcies frequently(two devices on
      the same host), and we found that 'active_queues' is super big after a
      period of time.
      
      If device a and device b share a tag set, and a is deleted, then
      blk_mq_exit_queue() will clear BLK_MQ_F_TAG_QUEUE_SHARED because there
      is only one queue that are using the tag set. However, if b is still
      active, the active_queues of b might never be cleared even if b is
      deleted.
      
      Thus clear active_queues before BLK_MQ_F_TAG_QUEUE_SHARED is cleared.
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20210731062130.1533893-1-yukuai3@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      454bb677
    • Paolo Bonzini's avatar
      Merge branch 'kvm-tdpmmu-fixes' into kvm-master · 6e949ddb
      Paolo Bonzini authored
      Merge topic branch with fixes for both 5.14-rc6 and 5.15.
      6e949ddb
    • Sean Christopherson's avatar
      KVM: x86/mmu: Protect marking SPs unsync when using TDP MMU with spinlock · ce25681d
      Sean Christopherson authored
      Add yet another spinlock for the TDP MMU and take it when marking indirect
      shadow pages unsync.  When using the TDP MMU and L1 is running L2(s) with
      nested TDP, KVM may encounter shadow pages for the TDP entries managed by
      L1 (controlling L2) when handling a TDP MMU page fault.  The unsync logic
      is not thread safe, e.g. the kvm_mmu_page fields are not atomic, and
      misbehaves when a shadow page is marked unsync via a TDP MMU page fault,
      which runs with mmu_lock held for read, not write.
      
      Lack of a critical section manifests most visibly as an underflow of
      unsync_children in clear_unsync_child_bit() due to unsync_children being
      corrupted when multiple CPUs write it without a critical section and
      without atomic operations.  But underflow is the best case scenario.  The
      worst case scenario is that unsync_children prematurely hits '0' and
      leads to guest memory corruption due to KVM neglecting to properly sync
      shadow pages.
      
      Use an entirely new spinlock even though piggybacking tdp_mmu_pages_lock
      would functionally be ok.  Usurping the lock could degrade performance when
      building upper level page tables on different vCPUs, especially since the
      unsync flow could hold the lock for a comparatively long time depending on
      the number of indirect shadow pages and the depth of the paging tree.
      
      For simplicity, take the lock for all MMUs, even though KVM could fairly
      easily know that mmu_lock is held for write.  If mmu_lock is held for
      write, there cannot be contention for the inner spinlock, and marking
      shadow pages unsync across multiple vCPUs will be slow enough that
      bouncing the kvm_arch cacheline should be in the noise.
      
      Note, even though L2 could theoretically be given access to its own EPT
      entries, a nested MMU must hold mmu_lock for write and thus cannot race
      against a TDP MMU page fault.  I.e. the additional spinlock only _needs_ to
      be taken by the TDP MMU, as opposed to being taken by any MMU for a VM
      that is running with the TDP MMU enabled.  Holding mmu_lock for read also
      prevents the indirect shadow page from being freed.  But as above, keep
      it simple and always take the lock.
      
      Alternative #1, the TDP MMU could simply pass "false" for can_unsync and
      effectively disable unsync behavior for nested TDP.  Write protecting leaf
      shadow pages is unlikely to noticeably impact traditional L1 VMMs, as such
      VMMs typically don't modify TDP entries, but the same may not hold true for
      non-standard use cases and/or VMMs that are migrating physical pages (from
      L1's perspective).
      
      Alternative #2, the unsync logic could be made thread safe.  In theory,
      simply converting all relevant kvm_mmu_page fields to atomics and using
      atomic bitops for the bitmap would suffice.  However, (a) an in-depth audit
      would be required, (b) the code churn would be substantial, and (c) legacy
      shadow paging would incur additional atomic operations in performance
      sensitive paths for no benefit (to legacy shadow paging).
      
      Fixes: a2855afc ("KVM: x86/mmu: Allow parallel page faults for the TDP MMU")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210812181815.3378104-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ce25681d
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't step down in the TDP iterator when zapping all SPTEs · 0103098f
      Sean Christopherson authored
      Set the min_level for the TDP iterator at the root level when zapping all
      SPTEs to optimize the iterator's try_step_down().  Zapping a non-leaf
      SPTE will recursively zap all its children, thus there is no need for the
      iterator to attempt to step down.  This avoids rereading the top-level
      SPTEs after they are zapped by causing try_step_down() to short-circuit.
      
      In most cases, optimizing try_step_down() will be in the noise as the cost
      of zapping SPTEs completely dominates the overall time.  The optimization
      is however helpful if the zap occurs with relatively few SPTEs, e.g. if KVM
      is zapping in response to multiple memslot updates when userspace is adding
      and removing read-only memslots for option ROMs.  In that case, the task
      doing the zapping likely isn't a vCPU thread, but it still holds mmu_lock
      for read and thus can be a noisy neighbor of sorts.
      Reviewed-by: default avatarBen Gardon <bgardon@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210812181414.3376143-3-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      0103098f
    • Sean Christopherson's avatar
      KVM: x86/mmu: Don't leak non-leaf SPTEs when zapping all SPTEs · 524a1e4e
      Sean Christopherson authored
      Pass "all ones" as the end GFN to signal "zap all" for the TDP MMU and
      really zap all SPTEs in this case.  As is, zap_gfn_range() skips non-leaf
      SPTEs whose range exceeds the range to be zapped.  If shadow_phys_bits is
      not aligned to the range size of top-level SPTEs, e.g. 512gb with 4-level
      paging, the "zap all" flows will skip top-level SPTEs whose range extends
      beyond shadow_phys_bits and leak their SPs when the VM is destroyed.
      
      Use the current upper bound (based on host.MAXPHYADDR) to detect that the
      caller wants to zap all SPTEs, e.g. instead of using the max theoretical
      gfn, 1 << (52 - 12).  The more precise upper bound allows the TDP iterator
      to terminate its walk earlier when running on hosts with MAXPHYADDR < 52.
      
      Add a WARN on kmv->arch.tdp_mmu_pages when the TDP MMU is destroyed to
      help future debuggers should KVM decide to leak SPTEs again.
      
      The bug is most easily reproduced by running (and unloading!) KVM in a
      VM whose host.MAXPHYADDR < 39, as the SPTE for gfn=0 will be skipped.
      
        =============================================================================
        BUG kvm_mmu_page_header (Not tainted): Objects remaining in kvm_mmu_page_header on __kmem_cache_shutdown()
        -----------------------------------------------------------------------------
        Slab 0x000000004d8f7af1 objects=22 used=2 fp=0x00000000624d29ac flags=0x4000000000000200(slab|zone=1)
        CPU: 0 PID: 1582 Comm: rmmod Not tainted 5.14.0-rc2+ #420
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        Call Trace:
         dump_stack_lvl+0x45/0x59
         slab_err+0x95/0xc9
         __kmem_cache_shutdown.cold+0x3c/0x158
         kmem_cache_destroy+0x3d/0xf0
         kvm_mmu_module_exit+0xa/0x30 [kvm]
         kvm_arch_exit+0x5d/0x90 [kvm]
         kvm_exit+0x78/0x90 [kvm]
         vmx_exit+0x1a/0x50 [kvm_intel]
         __x64_sys_delete_module+0x13f/0x220
         do_syscall_64+0x3b/0xc0
         entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes: faaf05b0 ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
      Cc: stable@vger.kernel.org
      Cc: Ben Gardon <bgardon@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210812181414.3376143-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      524a1e4e
    • Paolo Bonzini's avatar
      Merge tag 'kvmarm-fixes-5.14-2' of... · c5e2bf0b
      Paolo Bonzini authored
      Merge tag 'kvmarm-fixes-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
      
      KVM/arm64 fixes for 5.14, take #2
      
      - Plug race between enabling MTE and creating vcpus
      - Fix off-by-one bug when checking whether an address range is RAM
      c5e2bf0b
    • Sean Christopherson's avatar
      KVM: nVMX: Use vmx_need_pf_intercept() when deciding if L0 wants a #PF · 18712c13
      Sean Christopherson authored
      Use vmx_need_pf_intercept() when determining if L0 wants to handle a #PF
      in L2 or if the VM-Exit should be forwarded to L1.  The current logic fails
      to account for the case where #PF is intercepted to handle
      guest.MAXPHYADDR < host.MAXPHYADDR and ends up reflecting all #PFs into
      L1.  At best, L1 will complain and inject the #PF back into L2.  At
      worst, L1 will eat the unexpected fault and cause L2 to hang on infinite
      page faults.
      
      Note, while the bug was technically introduced by the commit that added
      support for the MAXPHYADDR madness, the shame is all on commit
      a0c13434 ("KVM: VMX: introduce vmx_need_pf_intercept").
      
      Fixes: 1dbf5d68 ("KVM: VMX: Add guest physical address check in EPT violation and misconfig")
      Cc: stable@vger.kernel.org
      Cc: Peter Shier <pshier@google.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210812045615.3167686-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      18712c13
    • Junaid Shahid's avatar
      kvm: vmx: Sync all matching EPTPs when injecting nested EPT fault · 85aa8889
      Junaid Shahid authored
      When a nested EPT violation/misconfig is injected into the guest,
      the shadow EPT PTEs associated with that address need to be synced.
      This is done by kvm_inject_emulated_page_fault() before it calls
      nested_ept_inject_page_fault(). However, that will only sync the
      shadow EPT PTE associated with the current L1 EPTP. Since the ASID
      is based on EP4TA rather than the full EPTP, so syncing the current
      EPTP is not enough. The SPTEs associated with any other L1 EPTPs
      in the prev_roots cache with the same EP4TA also need to be synced.
      Signed-off-by: default avatarJunaid Shahid <junaids@google.com>
      Message-Id: <20210806222229.1645356-1-junaids@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      85aa8889
    • Paolo Bonzini's avatar
      Merge branch 'kvm-vmx-secctl' into kvm-master · 375d1ade
      Paolo Bonzini authored
      Merge common topic branch for 5.14-rc6 and 5.15 merge window.
      375d1ade
    • Paolo Bonzini's avatar
      KVM: x86: remove dead initialization · ffbe17ca
      Paolo Bonzini authored
      hv_vcpu is initialized again a dozen lines below, and at this
      point vcpu->arch.hyperv is not valid.  Remove the initializer.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Reviewed-by: default avatarSean Christopherson <seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      ffbe17ca
    • Sean Christopherson's avatar
      KVM: x86: Allow guest to set EFER.NX=1 on non-PAE 32-bit kernels · 1383279c
      Sean Christopherson authored
      Remove an ancient restriction that disallowed exposing EFER.NX to the
      guest if EFER.NX=0 on the host, even if NX is fully supported by the CPU.
      The motivation of the check, added by commit 2cc51560 ("KVM: VMX:
      Avoid saving and restoring msr_efer on lightweight vmexit"), was to rule
      out the case of host.EFER.NX=0 and guest.EFER.NX=1 so that KVM could run
      the guest with the host's EFER.NX and thus avoid context switching EFER
      if the only divergence was the NX bit.
      
      Fast forward to today, and KVM has long since stopped running the guest
      with the host's EFER.NX.  Not only does KVM context switch EFER if
      host.EFER.NX=1 && guest.EFER.NX=0, KVM also forces host.EFER.NX=0 &&
      guest.EFER.NX=1 when using shadow paging (to emulate SMEP).  Furthermore,
      the entire motivation for the restriction was made obsolete over a decade
      ago when Intel added dedicated host and guest EFER fields in the VMCS
      (Nehalem timeframe), which reduced the overhead of context switching EFER
      from 400+ cycles (2 * WRMSR + 1 * RDMSR) to a mere ~2 cycles.
      
      In practice, the removed restriction only affects non-PAE 32-bit kernels,
      as EFER.NX is set during boot if NX is supported and the kernel will use
      PAE paging (32-bit or 64-bit), regardless of whether or not the kernel
      will actually use NX itself (mark PTEs non-executable).
      
      Alternatively and/or complementarily, startup_32_smp() in head_32.S could
      be modified to set EFER.NX=1 regardless of paging mode, thus eliminating
      the scenario where NX is supported but not enabled.  However, that runs
      the risk of breaking non-KVM non-PAE kernels (though the risk is very,
      very low as there are no known EFER.NX errata), and also eliminates an
      easy-to-use mechanism for stressing KVM's handling of guest vs. host EFER
      across nested virtualization transitions.
      Suggested-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-Id: <20210805183804.1221554-1-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1383279c
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f8e6dfc6
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from netfilter, bpf, can and
        ieee802154.
      
        The size of this is pretty normal, but we got more fixes for 5.14
        changes this week than last week. Nothing major but the trend is the
        opposite of what we like. We'll see how the next week goes..
      
        Current release - regressions:
      
         - r8169: fix ASPM-related link-up regressions
      
         - bridge: fix flags interpretation for extern learn fdb entries
      
         - phy: micrel: fix link detection on ksz87xx switch
      
         - Revert "tipc: Return the correct errno code"
      
         - ptp: fix possible memory leak caused by invalid cast
      
        Current release - new code bugs:
      
         - bpf: add missing bpf_read_[un]lock_trace() for syscall program
      
         - bpf: fix potentially incorrect results with bpf_get_local_storage()
      
         - page_pool: mask the page->signature before the checking, avoid dma
           mapping leaks
      
         - netfilter: nfnetlink_hook: 5 fixes to information in netlink dumps
      
         - bnxt_en: fix firmware interface issues with PTP
      
         - mlx5: Bridge, fix ageing time
      
        Previous releases - regressions:
      
         - linkwatch: fix failure to restore device state across
           suspend/resume
      
         - bareudp: fix invalid read beyond skb's linear data
      
        Previous releases - always broken:
      
         - bpf: fix integer overflow involving bucket_size
      
         - ppp: fix issues when desired interface name is specified via
           netlink
      
         - wwan: mhi_wwan_ctrl: fix possible deadlock
      
         - dsa: microchip: ksz8795: fix number of VLAN related bugs
      
         - dsa: drivers: fix broken backpressure in .port_fdb_dump
      
         - dsa: qca: ar9331: make proper initial port defaults
      
        Misc:
      
         - bpf: add lockdown check for probe_write_user helper
      
         - netfilter: conntrack: remove offload_pickup sysctl before 5.14 is
           out
      
         - netfilter: conntrack: collect all entries in one cycle,
           heuristically slow down garbage collection scans on idle systems to
           prevent frequent wake ups"
      
      * tag 'net-5.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
        vsock/virtio: avoid potential deadlock when vsock device remove
        wwan: core: Avoid returning NULL from wwan_create_dev()
        net: dsa: sja1105: unregister the MDIO buses during teardown
        Revert "tipc: Return the correct errno code"
        net: mscc: Fix non-GPL export of regmap APIs
        net: igmp: increase size of mr_ifc_count
        MAINTAINERS: switch to my OMP email for Renesas Ethernet drivers
        tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B packets
        net: pcs: xpcs: fix error handling on failed to allocate memory
        net: linkwatch: fix failure to restore device state across suspend/resume
        net: bridge: fix memleak in br_add_if()
        net: switchdev: zero-initialize struct switchdev_notifier_fdb_info emitted by drivers towards the bridge
        net: bridge: fix flags interpretation for extern learn fdb entries
        net: dsa: sja1105: fix broken backpressure in .port_fdb_dump
        net: dsa: lantiq: fix broken backpressure in .port_fdb_dump
        net: dsa: lan9303: fix broken backpressure in .port_fdb_dump
        net: dsa: hellcreek: fix broken backpressure in .port_fdb_dump
        bpf, core: Fix kernel-doc notation
        net: igmp: fix data-race in igmp_ifc_timer_expire()
        net: Fix memory leak in ieee802154_raw_deliver
        ...
      f8e6dfc6
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client · 3a03c67d
      Linus Torvalds authored
      Pull ceph fixes from Ilya Dryomov:
       "A patch to avoid a soft lockup in ceph_check_delayed_caps() from Luis
        and a reference handling fix from Jeff that should address some memory
        corruption reports in the snaprealm area.
      
        Both marked for stable"
      
      * tag 'ceph-for-5.14-rc6' of git://github.com/ceph/ceph-client:
        ceph: take snap_empty_lock atomically with snaprealm refcount change
        ceph: reduce contention in ceph_check_delayed_caps()
      3a03c67d