1. 12 Apr, 2021 5 commits
    • Miklos Szeredi's avatar
      ext2: convert to fileattr · aba405e3
      Miklos Szeredi authored
      Use the fileattr API to let the VFS handle locking, permission checking and
      conversion.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      aba405e3
    • Miklos Szeredi's avatar
      btrfs: convert to fileattr · 97fc2977
      Miklos Szeredi authored
      Use the fileattr API to let the VFS handle locking, permission checking and
      conversion.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: David Sterba <dsterba@suse.com>
      97fc2977
    • Miklos Szeredi's avatar
      ovl: stack fileattr ops · 66dbfabf
      Miklos Szeredi authored
      Add stacking for the fileattr operations.
      
      Add hack for calling security_file_ioctl() for now.  Probably better to
      have a pair of specific hooks for these operations.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      66dbfabf
    • Miklos Szeredi's avatar
      ecryptfs: stack fileattr ops · 97e2dee9
      Miklos Szeredi authored
      Add stacking for the fileattr operations.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Cc: Tyler Hicks <code@tyhicks.com>
      97e2dee9
    • Miklos Szeredi's avatar
      vfs: add fileattr ops · 4c5b4799
      Miklos Szeredi authored
      There's a substantial amount of boilerplate in filesystems handling
      FS_IOC_[GS]ETFLAGS/ FS_IOC_FS[GS]ETXATTR ioctls.
      
      Also due to userspace buffers being involved in the ioctl API this is
      difficult to stack, as shown by overlayfs issues related to these ioctls.
      
      Introduce a new internal API named "fileattr" (fsxattr can be confused with
      xattr, xflags is inappropriate, since this is more than just flags).
      
      There's significant overlap between flags and xflags and this API handles
      the conversions automatically, so filesystems may choose which one to use.
      
      In ->fileattr_get() a hint is provided to the filesystem whether flags or
      xattr are being requested by userspace, but in this series this hint is
      ignored by all filesystems, since generating all the attributes is cheap.
      
      If a filesystem doesn't implemement the fileattr API, just fall back to
      f_op->ioctl().  When all filesystems are converted, the fallback can be
      removed.
      
      32bit compat ioctls are now handled by the generic code as well.
      Signed-off-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      4c5b4799
  2. 04 Apr, 2021 2 commits
    • Linus Torvalds's avatar
      Linux 5.12-rc6 · e49d033b
      Linus Torvalds authored
      e49d033b
    • Zheyu Ma's avatar
      firewire: nosy: Fix a use-after-free bug in nosy_ioctl() · 829933ef
      Zheyu Ma authored
      For each device, the nosy driver allocates a pcilynx structure.
      A use-after-free might happen in the following scenario:
      
       1. Open nosy device for the first time and call ioctl with command
          NOSY_IOC_START, then a new client A will be malloced and added to
          doubly linked list.
       2. Open nosy device for the second time and call ioctl with command
          NOSY_IOC_START, then a new client B will be malloced and added to
          doubly linked list.
       3. Call ioctl with command NOSY_IOC_START for client A, then client A
          will be readded to the doubly linked list. Now the doubly linked
          list is messed up.
       4. Close the first nosy device and nosy_release will be called. In
          nosy_release, client A will be unlinked and freed.
       5. Close the second nosy device, and client A will be referenced,
          resulting in UAF.
      
      The root cause of this bug is that the element in the doubly linked list
      is reentered into the list.
      
      Fix this bug by adding a check before inserting a client.  If a client
      is already in the linked list, don't insert it.
      
      The following KASAN report reveals it:
      
         BUG: KASAN: use-after-free in nosy_release+0x1ea/0x210
         Write of size 8 at addr ffff888102ad7360 by task poc
         CPU: 3 PID: 337 Comm: poc Not tainted 5.12.0-rc5+ #6
         Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
         Call Trace:
           nosy_release+0x1ea/0x210
           __fput+0x1e2/0x840
           task_work_run+0xe8/0x180
           exit_to_user_mode_prepare+0x114/0x120
           syscall_exit_to_user_mode+0x1d/0x40
           entry_SYSCALL_64_after_hwframe+0x44/0xae
      
         Allocated by task 337:
           nosy_open+0x154/0x4d0
           misc_open+0x2ec/0x410
           chrdev_open+0x20d/0x5a0
           do_dentry_open+0x40f/0xe80
           path_openat+0x1cf9/0x37b0
           do_filp_open+0x16d/0x390
           do_sys_openat2+0x11d/0x360
           __x64_sys_open+0xfd/0x1a0
           do_syscall_64+0x33/0x40
           entry_SYSCALL_64_after_hwframe+0x44/0xae
      
         Freed by task 337:
           kfree+0x8f/0x210
           nosy_release+0x158/0x210
           __fput+0x1e2/0x840
           task_work_run+0xe8/0x180
           exit_to_user_mode_prepare+0x114/0x120
           syscall_exit_to_user_mode+0x1d/0x40
           entry_SYSCALL_64_after_hwframe+0x44/0xae
      
         The buggy address belongs to the object at ffff888102ad7300 which belongs to the cache kmalloc-128 of size 128
         The buggy address is located 96 bytes inside of 128-byte region [ffff888102ad7300, ffff888102ad7380)
      
      [ Modified to use 'list_empty()' inside proper lock  - Linus ]
      
      Link: https://lore.kernel.org/lkml/1617433116-5930-1-git-send-email-zheyuma97@gmail.com/Reported-and-tested-by: default avatar马哲宇 (Zheyu Ma) <zheyuma97@gmail.com>
      Signed-off-by: default avatarZheyu Ma <zheyuma97@gmail.com>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      829933ef
  3. 03 Apr, 2021 14 commits
  4. 02 Apr, 2021 17 commits
    • Linus Torvalds's avatar
      Merge tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block · d93a0d43
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Remove comment that never came to fruition in 22 years of development
         (Christoph)
      
       - Remove unused request flag (Christoph)
      
       - Fix for null_blk fake timeout handling (Damien)
      
       - Fix for IOCB_NOWAIT being ignored for O_DIRECT on raw bdevs (Pavel)
      
       - Error propagation fix for multiple split bios (Yufen)
      
      * tag 'block-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
        block: remove the unused RQF_ALLOCED flag
        block: update a few comments in uapi/linux/blkpg.h
        block: don't ignore REQ_NOWAIT for direct IO
        null_blk: fix command timeout completion handling
        block: only update parent bi_status when bio fail
      d93a0d43
    • Linus Torvalds's avatar
      Merge tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block · 1faccb63
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Nothing really major in here, and finally nothing really related to
        signals. A few minor fixups related to the threading changes, and some
        general fixes, that's it.
      
        There's the pending gdb-get-confused-about-arch, but that's more of a
        cosmetic issue, nothing that hinder use of it. And given that other
        archs will likely be affected by that oddity too, better to postpone
        any changes there until 5.13 imho"
      
      * tag 'io_uring-5.12-2021-04-02' of git://git.kernel.dk/linux-block:
        io_uring: move reissue into regular IO path
        io_uring: fix EIOCBQUEUED iter revert
        io_uring/io-wq: protect against sprintf overflow
        io_uring: don't mark S_ISBLK async work as unbounded
        io_uring: drop sqd lock before handling signals for SQPOLL
        io_uring: handle setup-failed ctx in kill_timeouts
        io_uring: always go for cancellation spin on exec
      1faccb63
    • Linus Torvalds's avatar
      Merge tag 'acpi-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 0a84c2e4
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix an ACPI tables management issue, an issue related to the
        ACPI enumeration of devices and CPU wakeup in the ACPI processor
        driver.
      
        Specifics:
      
         - Ensure that the memory occupied by ACPI tables on x86 will always
           be reserved to prevent it from being allocated for other purposes
           which was possible in some cases (Rafael Wysocki).
      
         - Fix the ACPI device enumeration code to prevent it from attempting
           to evaluate the _STA control method for devices with unmet
           dependencies which is likely to fail (Hans de Goede).
      
         - Fix the handling of CPU0 wakeup in the ACPI processor driver to
           prevent CPU0 online failures from occurring (Vitaly Kuznetsov)"
      
      * tag 'acpi-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()
        ACPI: scan: Fix _STA getting called on devices with unmet dependencies
        ACPI: tables: x86: Reserve memory occupied by ACPI tables
      0a84c2e4
    • Linus Torvalds's avatar
      Merge tag 'pm-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 9314a0e9
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix a race condition and an ordering issue related to using
        device links in the runtime PM framework and two kerneldoc comments in
        cpufreq.
      
        Specifics:
      
         - Fix race condition related to the handling of supplier devices
           during consumer device probe and fix the order of decrementation of
           two related reference counters in the runtime PM core code handling
           supplier devices (Adrian Hunter).
      
         - Fix kerneldoc comments in cpufreq that have not been updated along
           with the functions documented by them (Geert Uytterhoeven)"
      
      * tag 'pm-5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: runtime: Fix race getting/putting suppliers at probe
        PM: runtime: Fix ordering in pm_runtime_get_suppliers()
        cpufreq: Fix scaling_{available,boost}_frequencies_show() comments
      9314a0e9
    • Christoph Hellwig's avatar
      block: remove the unused RQF_ALLOCED flag · f06c6096
      Christoph Hellwig authored
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f06c6096
    • Christoph Hellwig's avatar
      block: update a few comments in uapi/linux/blkpg.h · b9c6cdc3
      Christoph Hellwig authored
      The big top of the file comment talk about grand plans that never
      happened, so remove them to not confuse the readers.  Also mark the
      devname and volname fields as ignored as they were never used by the
      kernel.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b9c6cdc3
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.12-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 05de4538
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Fix stack trace entry size to stop showing garbage
      
        The macro that creates both the structure and the format displayed to
        user space for the stack trace event was changed a while ago to fix
        the parsing by user space tooling. But this change also modified the
        structure used to store the stack trace event. It changed the caller
        array field from [0] to [8].
      
        Even though the size in the ring buffer is dynamic and can be
        something other than 8 (user space knows how to handle this), the 8
        extra words was not accounted for when reserving the event on the ring
        buffer, and added 8 more entries, due to the calculation of
        "sizeof(*entry) + nr_entries * sizeof(long)", as the sizeof(*entry)
        now contains 8 entries.
      
        The size of the caller field needs to be subtracted from the size of
        the entry to create the correct allocation size"
      
      * tag 'trace-v5.12-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix stack trace event size
      05de4538
    • Jens Axboe's avatar
      io_uring: move reissue into regular IO path · 230d50d4
      Jens Axboe authored
      It's non-obvious how retry is done for block backed files, when it happens
      off the kiocb done path. It also makes it tricky to deal with the iov_iter
      handling.
      
      Just mark the req as needing a reissue, and handling it from the
      submission path instead. This makes it directly obvious that we're not
      re-importing the iovec from userspace past the submit point, and it means
      that we can just reuse our usual -EAGAIN retry path from the read/write
      handling.
      
      At some point in the future, we'll gain the ability to always reliably
      return -EAGAIN through the stack. A previous attempt on the block side
      didn't pan out and got reverted, hence the need to check for this
      information out-of-band right now.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      230d50d4
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-tables' and 'acpi-scan' · 91463ebf
      Rafael J. Wysocki authored
      * acpi-tables:
        ACPI: tables: x86: Reserve memory occupied by ACPI tables
      
      * acpi-scan:
        ACPI: scan: Fix _STA getting called on devices with unmet dependencies
      91463ebf
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpufreq' · ac1790ad
      Rafael J. Wysocki authored
      * pm-cpufreq:
        cpufreq: Fix scaling_{available,boost}_frequencies_show() comments
      ac1790ad
    • Pavel Begunkov's avatar
      block: don't ignore REQ_NOWAIT for direct IO · f8b78caf
      Pavel Begunkov authored
      If IOCB_NOWAIT is set on submission, then that needs to get propagated to
      REQ_NOWAIT on the block side. Otherwise we completely lose this
      information, and any issuer of IOCB_NOWAIT IO will potentially end up
      blocking on eg request allocation on the storage side.
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f8b78caf
    • Kefeng Wang's avatar
      riscv: Make NUMA depend on MMU · 1adbc294
      Kefeng Wang authored
      NUMA is useless when NOMMU, and it leads some build error,
      make it depend on MMU.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      1adbc294
    • Yang Li's avatar
      riscv: remove unneeded semicolon · 9d8c7d92
      Yang Li authored
      Eliminate the following coccicheck warning:
      ./arch/riscv/mm/kasan_init.c:219:2-3: Unneeded semicolon
      Reported-by: default avatarAbaci Robot <abaci@linux.alibaba.com>
      Signed-off-by: default avatarYang Li <yang.lee@linux.alibaba.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      9d8c7d92
    • Zihao Yu's avatar
      riscv,entry: fix misaligned base for excp_vect_table · ac8d0b90
      Zihao Yu authored
      In RV64, the size of each entry in excp_vect_table is 8 bytes. If the
      base of the table is not 8-byte aligned, loading an entry in the table
      will raise a misaligned exception. Although such exception will be
      handled by opensbi/bbl, this still causes performance degradation.
      Signed-off-by: default avatarZihao Yu <yuzihao@ict.ac.cn>
      Reviewed-by: default avatarAnup Patel <anup@brainfault.org>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      ac8d0b90
    • Ben Dooks's avatar
      riscv: evaluate put_user() arg before enabling user access · 285a76bb
      Ben Dooks authored
      The <asm/uaccess.h> header has a problem with put_user(a, ptr) if
      the 'a' is not a simple variable, such as a function. This can lead
      to the compiler producing code as so:
      
      1:	enable_user_access()
      2:	evaluate 'a' into register 'r'
      3:	put 'r' to 'ptr'
      4:	disable_user_acess()
      
      The issue is that 'a' is now being evaluated with the user memory
      protections disabled. So we try and force the evaulation by assigning
      'x' to __val at the start, and hoping the compiler barriers in
       enable_user_access() do the job of ordering step 2 before step 1.
      
      This has shown up in a bug where 'a' sleeps and thus schedules out
      and loses the SR_SUM flag. This isn't sufficient to fully fix, but
      should reduce the window of opportunity. The first instance of this
      we found is in scheudle_tail() where the code does:
      
      $ less -N kernel/sched/core.c
      
      4263  if (current->set_child_tid)
      4264         put_user(task_pid_vnr(current), current->set_child_tid);
      
      Here, the task_pid_vnr(current) is called within the block that has
      enabled the user memory access. This can be made worse with KASAN
      which makes task_pid_vnr() a rather large call with plenty of
      opportunity to sleep.
      Signed-off-by: default avatarBen Dooks <ben.dooks@codethink.co.uk>
      Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
      Suggested-by: default avatarArnd Bergman <arnd@arndb.de>
      
      --
      Changes since v1:
      - fixed formatting and updated the patch description with more info
      
      Changes since v2:
      - fixed commenting on __put_user() (schwab@linux-m68k.org)
      
      Change since v3:
      - fixed RFC in patch title. Should be ready to merge.
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      285a76bb
    • Kefeng Wang's avatar
      riscv: Drop const annotation for sp · 23c1075a
      Kefeng Wang authored
      The const annotation should not be used for 'sp', or it will
      become read only and lead to bad stack output.
      
      Fixes: dec82277 ("riscv: stacktrace: Move register keyword to beginning of declaration")
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarPalmer Dabbelt <palmerdabbelt@google.com>
      23c1075a
    • Linus Torvalds's avatar
      Merge tag 'lto-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 1678e493
      Linus Torvalds authored
      Pull LTO fix from Kees Cook:
       "It seems that there is a bug in ld.bfd when doing module section
        merging.
      
        As explicit merging is only needed for LTO, the work-around is to only
        do it under LTO, leaving the original section layout choices alone
        under normal builds:
      
         - Only perform explicit module section merges under LTO (Sean
           Christopherson)"
      
      * tag 'lto-v5.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled
      1678e493
  5. 01 Apr, 2021 2 commits
    • Sean Christopherson's avatar
      kbuild: lto: Merge module sections if and only if CONFIG_LTO_CLANG is enabled · 6a3193cd
      Sean Christopherson authored
      Merge module sections only when using Clang LTO. With ld.bfd, merging
      sections does not appear to update the symbol tables for the module,
      e.g. 'readelf -s' shows the value that a symbol would have had, if
      sections were not merged. ld.lld does not show this problem.
      
      The stale symbol table breaks gdb's function disassembler, and presumably
      other things, e.g.
      
        gdb -batch -ex "file arch/x86/kvm/kvm.ko" -ex "disassemble kvm_init"
      
      reads the wrong bytes and dumps garbage.
      
      Fixes: dd277622 ("kbuild: lto: merge module sections")
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Tested-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20210322234438.502582-1-seanjc@google.com
      6a3193cd
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 6905b1dc
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "It's a bit larger than I (and probably you) would like by the time we
        get to -rc6, but perhaps not entirely unexpected since the changes in
        the last merge window were larger than usual.
      
        x86:
         - Fixes for missing TLB flushes with TDP MMU
      
         - Fixes for race conditions in nested SVM
      
         - Fixes for lockdep splat with Xen emulation
      
         - Fix for kvmclock underflow
      
         - Fix srcdir != builddir builds
      
         - Other small cleanups
      
        ARM:
         - Fix GICv3 MMIO compatibility probing
      
         - Prevent guests from using the ARMv8.4 self-hosted tracing
           extension"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)
        KVM: x86: Prevent 'hv_clock->system_time' from going negative in kvm_guest_time_update()
        KVM: x86: disable interrupts while pvclock_gtod_sync_lock is taken
        KVM: x86: reduce pvclock_gtod_sync_lock critical sections
        KVM: SVM: ensure that EFER.SVME is set when running nested guest or on nested vmexit
        KVM: SVM: load control fields from VMCB12 before checking them
        KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
        KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
        KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
        KVM: make: Fix out-of-source module builds
        selftests: kvm: make hardware_disable_test less verbose
        KVM: x86/vPMU: Forbid writing to MSR_F15H_PERF MSRs when guest doesn't have X86_FEATURE_PERFCTR_CORE
        KVM: x86: remove unused declaration of kvm_write_tsc()
        KVM: clean up the unused argument
        tools/kvm_stat: Add restart delay
        KVM: arm64: Fix CPU interface MMIO compatibility detection
        KVM: arm64: Disable guest access to trace filter controls
        KVM: arm64: Hide system instruction access to Trace registers
      6905b1dc