1. 10 Jun, 2021 1 commit
  2. 09 Jun, 2021 4 commits
    • Thomas Gleixner's avatar
      x86/pkru: Write hardware init value to PKRU when xstate is init · 510b80a6
      Thomas Gleixner authored
      When user space brings PKRU into init state, then the kernel handling is
      broken:
      
        T1 user space
           xsave(state)
           state.header.xfeatures &= ~XFEATURE_MASK_PKRU;
           xrstor(state)
      
        T1 -> kernel
           schedule()
             XSAVE(S) -> T1->xsave.header.xfeatures[PKRU] == 0
             T1->flags |= TIF_NEED_FPU_LOAD;
      
             wrpkru();
      
           schedule()
             ...
             pk = get_xsave_addr(&T1->fpu->state.xsave, XFEATURE_PKRU);
             if (pk)
      	 wrpkru(pk->pkru);
             else
      	 wrpkru(DEFAULT_PKRU);
      
      Because the xfeatures bit is 0 and therefore the value in the xsave
      storage is not valid, get_xsave_addr() returns NULL and switch_to()
      writes the default PKRU. -> FAIL #1!
      
      So that wrecks any copy_to/from_user() on the way back to user space
      which hits memory which is protected by the default PKRU value.
      
      Assumed that this does not fail (pure luck) then T1 goes back to user
      space and because TIF_NEED_FPU_LOAD is set it ends up in
      
        switch_fpu_return()
            __fpregs_load_activate()
              if (!fpregs_state_valid()) {
        	 load_XSTATE_from_task();
              }
      
      But if nothing touched the FPU between T1 scheduling out and back in,
      then the fpregs_state is still valid which means switch_fpu_return()
      does nothing and just clears TIF_NEED_FPU_LOAD. Back to user space with
      DEFAULT_PKRU loaded. -> FAIL #2!
      
      The fix is simple: if get_xsave_addr() returns NULL then set the
      PKRU value to 0 instead of the restrictive default PKRU value in
      init_pkru_value.
      
       [ bp: Massage in minor nitpicks from folks. ]
      
      Fixes: 0cecca9d ("x86/fpu: Eager switch PKRU state")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Tested-by: default avatarBabu Moger <babu.moger@amd.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144346.045616965@linutronix.de
      510b80a6
    • Thomas Gleixner's avatar
      x86/process: Check PF_KTHREAD and not current->mm for kernel threads · 12f7764a
      Thomas Gleixner authored
      switch_fpu_finish() checks current->mm as indicator for kernel threads.
      That's wrong because kernel threads can temporarily use a mm of a user
      process via kthread_use_mm().
      
      Check the task flags for PF_KTHREAD instead.
      
      Fixes: 0cecca9d ("x86/fpu: Eager switch PKRU state")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.912645927@linutronix.de
      12f7764a
    • Andy Lutomirski's avatar
      x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer · d8778e39
      Andy Lutomirski authored
      Both Intel and AMD consider it to be architecturally valid for XRSTOR to
      fail with #PF but nonetheless change the register state.  The actual
      conditions under which this might occur are unclear [1], but it seems
      plausible that this might be triggered if one sibling thread unmaps a page
      and invalidates the shared TLB while another sibling thread is executing
      XRSTOR on the page in question.
      
      __fpu__restore_sig() can execute XRSTOR while the hardware registers
      are preserved on behalf of a different victim task (using the
      fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but
      modify the registers.
      
      If this happens, then there is a window in which __fpu__restore_sig()
      could schedule out and the victim task could schedule back in without
      reloading its own FPU registers. This would result in part of the FPU
      state that __fpu__restore_sig() was attempting to load leaking into the
      victim task's user-visible state.
      
      Invalidate preserved FPU registers on XRSTOR failure to prevent this
      situation from corrupting any state.
      
      [1] Frequent readers of the errata lists might imagine "complex
          microarchitectural conditions".
      
      Fixes: 1d731e73 ("x86/fpu: Add a fastpath to __fpu__restore_sig()")
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.758116583@linutronix.de
      d8778e39
    • Thomas Gleixner's avatar
      x86/fpu: Prevent state corruption in __fpu__restore_sig() · 484cea4f
      Thomas Gleixner authored
      The non-compacted slowpath uses __copy_from_user() and copies the entire
      user buffer into the kernel buffer, verbatim.  This means that the kernel
      buffer may now contain entirely invalid state on which XRSTOR will #GP.
      validate_user_xstate_header() can detect some of that corruption, but that
      leaves the onus on callers to clear the buffer.
      
      Prior to XSAVES support, it was possible just to reinitialize the buffer,
      completely, but with supervisor states that is not longer possible as the
      buffer clearing code split got it backwards. Fixing that is possible but
      not corrupting the state in the first place is more robust.
      
      Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
      which validates the XSAVE header contents before copying the actual states
      to the kernel. copy_user_to_xstate() was previously only called for
      compacted-format kernel buffers, but it works for both compacted and
      non-compacted forms.
      
      Using it for the non-compacted form is slower because of multiple
      __copy_from_user() operations, but that cost is less important than robust
      code in an already slow path.
      
      [ Changelog polished by Dave Hansen ]
      
      Fixes: b860eb8d ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
      Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.611833074@linutronix.de
      484cea4f
  3. 08 Jun, 2021 1 commit
    • Tom Lendacky's avatar
      x86/ioremap: Map EFI-reserved memory as encrypted for SEV · 8d651ee9
      Tom Lendacky authored
      Some drivers require memory that is marked as EFI boot services
      data. In order for this memory to not be re-used by the kernel
      after ExitBootServices(), efi_mem_reserve() is used to preserve it
      by inserting a new EFI memory descriptor and marking it with the
      EFI_MEMORY_RUNTIME attribute.
      
      Under SEV, memory marked with the EFI_MEMORY_RUNTIME attribute needs to
      be mapped encrypted by Linux, otherwise the kernel might crash at boot
      like below:
      
        EFI Variables Facility v0.08 2004-May-17
        general protection fault, probably for non-canonical address 0x3597688770a868b2: 0000 [#1] SMP NOPTI
        CPU: 13 PID: 1 Comm: swapper/0 Not tainted 5.12.4-2-default #1 openSUSE Tumbleweed
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:efi_mokvar_entry_next
        [...]
        Call Trace:
         efi_mokvar_sysfs_init
         ? efi_mokvar_table_init
         do_one_initcall
         ? __kmalloc
         kernel_init_freeable
         ? rest_init
         kernel_init
         ret_from_fork
      
      Expand the __ioremap_check_other() function to additionally check for
      this other type of boot data reserved at runtime and indicate that it
      should be mapped encrypted for an SEV guest.
      
       [ bp: Massage commit message. ]
      
      Fixes: 58c90902 ("efi: Support for MOK variable config table")
      Reported-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarJoerg Roedel <jroedel@suse.de>
      Cc: <stable@vger.kernel.org> # 5.10+
      Link: https://lkml.kernel.org/r/20210608095439.12668-2-joro@8bytes.org
      8d651ee9
  4. 06 Jun, 2021 11 commits
    • Linus Torvalds's avatar
      Linux 5.13-rc5 · 614124be
      Linus Torvalds authored
      614124be
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 90d56a3d
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Five small and fairly minor fixes, all in drivers"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: scsi_devinfo: Add blacklist entry for HPE OPEN-V
        scsi: ufs: ufs-mediatek: Fix HCI version in some platforms
        scsi: qedf: Do not put host in qedf_vport_create() unconditionally
        scsi: lpfc: Fix failure to transmit ABTS on FC link
        scsi: target: core: Fix warning on realtime kernels
      90d56a3d
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 20e41d9b
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Miscellaneous ext4 bug fixes"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
        ext4: fix no-key deletion for encrypt+casefold
        ext4: fix memory leak in ext4_fill_super
        ext4: fix fast commit alignment issues
        ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
        ext4: fix accessing uninit percpu counter variable with fast_commit
        ext4: fix memory leak in ext4_mb_init_backend on error path.
      20e41d9b
    • Linus Torvalds's avatar
      Merge tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · decad3e1
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A set of fixes that have been coming in over the last few weeks, the
        usual mix of fixes:
      
         - DT fixups for TI K3
      
         - SATA drive detection fix for TI DRA7
      
         - Power management fixes and a few build warning removals for OMAP
      
         - OP-TEE fix to use standard API for UUID exporting
      
         - DT fixes for a handful of i.MX boards
      
        And a few other smaller items"
      
      * tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
        arm64: meson: select COMMON_CLK
        soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
        ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
        bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
        ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
        ARM: dts: imx7d-pico: Fix the 'tuning-step' property
        ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
        arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
        arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
        ARM: imx: pm-imx27: Include "common.h"
        arm64: dts: zii-ultra: fix 12V_MAIN voltage
        arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
        arm64: dts: ls1028a: fix memory node
        bus: ti-sysc: Fix am335x resume hang for usb otg module
        ARM: OMAP2+: Fix build warning when mmc_omap is not built
        ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
        ARM: OMAP1: Fix use of possibly uninitialized irq variable
        optee: use export_uuid() to copy client UUID
        arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
        arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
        ...
      decad3e1
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · bd7b12aa
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fix our KVM reverse map real-mode handling since we enabled huge
        vmalloc (in some configurations).
      
        Revert a recent change to our IOMMU code which broke some devices.
      
        Fix KVM handling of FSCR on P7/P8, which could have possibly let a
        guest crash it's Qemu.
      
        Fix kprobes validation of prefixed instructions across page boundary.
      
        Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
        Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"
      
      * tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
        KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
        powerpc: Fix reverse map real-mode address lookup with huge vmalloc
        powerpc/kprobes: Fix validation of prefixed instructions across page boundary
      bd7b12aa
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 773ac53b
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "A bunch of x86/urgent stuff accumulated for the last two weeks so
        lemme unload it to you.
      
        It should be all totally risk-free, of course. :-)
      
         - Fix out-of-spec hardware (1st gen Hygon) which does not implement
           MSR_AMD64_SEV even though the spec clearly states so, and check
           CPUID bits first.
      
         - Send only one signal to a task when it is a SEGV_PKUERR si_code
           type.
      
         - Do away with all the wankery of reserving X amount of memory in the
           first megabyte to prevent BIOS corrupting it and simply and
           unconditionally reserve the whole first megabyte.
      
         - Make alternatives NOP optimization work at an arbitrary position
           within the patched sequence because the compiler can put
           single-byte NOPs for alignment anywhere in the sequence (32-bit
           retpoline), vs our previous assumption that the NOPs are only
           appended.
      
         - Force-disable ENQCMD[S] instructions support and remove
           update_pasid() because of insufficient protection against FPU state
           modification in an interrupt context, among other xstate horrors
           which are being addressed at the moment. This one limits the
           fallout until proper enablement.
      
         - Use cpu_feature_enabled() in the idxd driver so that it can be
           build-time disabled through the defines in disabled-features.h.
      
         - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
           LVT value is read before APIC initialization so that softlockups
           during boot do not happen at least on one machine.
      
         - Mark all legacy interrupts as legacy vectors when the IO-APIC is
           disabled and when all legacy interrupts are routed through the PIC"
      
      * tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sev: Check SME/SEV support in CPUID first
        x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
        x86/setup: Always reserve the first 1M of RAM
        x86/alternative: Optimize single-byte NOPs at an arbitrary position
        x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
        dmaengine: idxd: Use cpu_feature_enabled()
        x86/thermal: Fix LVT thermal setup for SMI delivery mode
        x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
      773ac53b
    • Daniel Rosenberg's avatar
      ext4: Only advertise encrypted_casefold when encryption and unicode are enabled · e71f99f2
      Daniel Rosenberg authored
      Encrypted casefolding is only supported when both encryption and
      casefolding are both enabled in the config.
      
      Fixes: 471fbbea ("ext4: handle casefolding with encryption")
      Cc: stable@vger.kernel.org # 5.13+
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e71f99f2
    • Daniel Rosenberg's avatar
      ext4: fix no-key deletion for encrypt+casefold · 63e7f128
      Daniel Rosenberg authored
      commit 471fbbea ("ext4: handle casefolding with encryption") is
      missing a few checks for the encryption key which are needed to
      support deleting enrypted casefolded files when the key is not
      present.
      
      This bug made it impossible to delete encrypted+casefolded directories
      without the encryption key, due to errors like:
      
          W         : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key
      
      Repro steps in kvm-xfstests test appliance:
            mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc
            mount /vdc
            mkdir /vdc/dir
            chattr +F /vdc/dir
            keyid=$(head -c 64 /dev/zero | xfs_io -c add_enckey /vdc | awk '{print $NF}')
            xfs_io -c "set_encpolicy $keyid" /vdc/dir
            for i in `seq 1 100`; do
                mkdir /vdc/dir/$i
            done
            xfs_io -c "rm_enckey $keyid" /vdc
            rm -rf /vdc/dir # fails with the bug
      
      Fixes: 471fbbea ("ext4: handle casefolding with encryption")
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      63e7f128
    • Alexey Makhalov's avatar
      ext4: fix memory leak in ext4_fill_super · afd09b61
      Alexey Makhalov authored
      Buffer head references must be released before calling kill_bdev();
      otherwise the buffer head (and its page referenced by b_data) will not
      be freed by kill_bdev, and subsequently that bh will be leaked.
      
      If blocksizes differ, sb_set_blocksize() will kill current buffers and
      page cache by using kill_bdev(). And then super block will be reread
      again but using correct blocksize this time. sb_set_blocksize() didn't
      fully free superblock page and buffer head, and being busy, they were
      not freed and instead leaked.
      
      This can easily be reproduced by calling an infinite loop of:
      
        systemctl start <ext4_on_lvm>.mount, and
        systemctl stop <ext4_on_lvm>.mount
      
      ... since systemd creates a cgroup for each slice which it mounts, and
      the bh leak get amplified by a dying memory cgroup that also never
      gets freed, and memory consumption is much more easily noticed.
      
      Fixes: ce40733c ("ext4: Check for return value from sb_set_blocksize")
      Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
      Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.comSigned-off-by: default avatarAlexey Makhalov <amakhalov@vmware.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      afd09b61
    • Harshad Shirwadkar's avatar
      ext4: fix fast commit alignment issues · a7ba36bc
      Harshad Shirwadkar authored
      Fast commit recovery data on disk may not be aligned. So, when the
      recovery code reads it, this patch makes sure that fast commit info
      found on-disk is first memcpy-ed into an aligned variable before
      accessing it. As a consequence of it, we also remove some macros that
      could resulted in unaligned accesses.
      
      Cc: stable@kernel.org
      Fixes: 8016e29f ("ext4: fast commit recovery path")
      Signed-off-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a7ba36bc
    • Ye Bin's avatar
      ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed · 082cd4ec
      Ye Bin authored
      We got follow bug_on when run fsstress with injecting IO fault:
      [130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
      [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
      ......
      [130747.334329] Call trace:
      [130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
      [130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
      [130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
      [130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
      [130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
      [130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
      [130747.336995]  ext4_readpage+0x54/0x100 [ext4]
      [130747.337359]  generic_file_buffered_read+0x410/0xae8
      [130747.337767]  generic_file_read_iter+0x114/0x190
      [130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
      [130747.338556]  __vfs_read+0x11c/0x188
      [130747.338851]  vfs_read+0x94/0x150
      [130747.339110]  ksys_read+0x74/0xf0
      
      This patch's modification is according to Jan Kara's suggestion in:
      https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/
      "I see. Now I understand your patch. Honestly, seeing how fragile is trying
      to fix extent tree after split has failed in the middle, I would probably
      go even further and make sure we fix the tree properly in case of ENOSPC
      and EDQUOT (those are easily user triggerable).  Anything else indicates a
      HW problem or fs corruption so I'd rather leave the extent tree as is and
      don't try to fix it (which also means we will not create overlapping
      extents)."
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      082cd4ec
  5. 05 Jun, 2021 23 commits
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · f5b6eb1e
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
       "Some more bugfixes from I2C for v5.13. Usual stuff"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: qcom-geni: Suspend and resume the bus during SYSTEM_SLEEP_PM ops
        i2c: qcom-geni: Add shutdown callback for i2c
        i2c: tegra-bpmp: Demote kernel-doc abuses
        i2c: altera: Fix formatting issue in struct and demote unworthy kernel-doc headers
      f5b6eb1e
    • Olof Johansson's avatar
      Merge tag 'ti-k3-dt-fixes-for-v5.13' of... · b9c112f2
      Olof Johansson authored
      Merge tag 'ti-k3-dt-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux into arm/fixes
      
      Devicetree fixes for TI K3 platforms for v5.13 merge window:
      
      These minor fixes include:
      * Fixups for device tree discovered during yaml conversion
      * Fixups for missing dma-coherent property in j7200
      * Removal of camera sensor node from am65 evm dts to overlay
        as camera sensor boards are variable.
      
      * tag 'ti-k3-dt-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/nmenon/linux:
        arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
        arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
        arm64: dts: ti: k3-*: Rename the TI-SCI node
        arm64: dts: ti: k3-am65-wakeup: Drop un-necessary properties from dmsc node
        arm64: dts: ti: k3-am65-wakeup: Add debug region to TI-SCI node
        arm64: dts: ti: k3-*: Rename the TI-SCI clocks node name
        arm64: dts: ti: j7200-main: Mark Main NAVSS as dma-coherent
        arm64: dts: ti: k3-am654-base-board: remove ov5640
      
      Link: https://lore.kernel.org/r/20210518115634.467vgpbzplal5kou@obituarySigned-off-by: default avatarOlof Johansson <olof@lixom.net>
      b9c112f2
    • Olof Johansson's avatar
      Merge tag 'optee-fix-for-v5.13' of... · 7468bed8
      Olof Johansson authored
      Merge tag 'optee-fix-for-v5.13' of git://git.linaro.org/people/jens.wiklander/linux-tee into arm/fixes
      
      OP-TEE use export_uuid() to copy UUID
      
      * tag 'optee-fix-for-v5.13' of git://git.linaro.org/people/jens.wiklander/linux-tee:
        optee: use export_uuid() to copy client UUID
      
      Link: https://lore.kernel.org/r/20210518100712.GA449561@jadeSigned-off-by: default avatarOlof Johansson <olof@lixom.net>
      7468bed8
    • Olof Johansson's avatar
      Merge tag 'omap-for-v5.13/fixes-pm' of... · 2f3e4eb1
      Olof Johansson authored
      Merge tag 'omap-for-v5.13/fixes-pm' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
      
      PM and build warning fixes for omaps
      
      While chasing system suspend related regressions, I noticed few other
      issues related to PM would be good to have fixed:
      
      - UART idling does not always work for hardware autoidle features
      - am335x resume works only the first time unless musb module is loaded
      
      Then there are three patches for omap1 related warnings caused by the gpio
      changes, and one build warning fix for legacy mmc platform code when mmc
      is built as a loadable module.
      
      These can all be merged whenever suitable naturally. I've sent the more
      urgent SATA regression fix separately although it appears in this pull
      request too because of the branches merged.
      
      * tag 'omap-for-v5.13/fixes-pm' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
        ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
        bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
        bus: ti-sysc: Fix am335x resume hang for usb otg module
        ARM: OMAP2+: Fix build warning when mmc_omap is not built
        ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
        ARM: OMAP1: Fix use of possibly uninitialized irq variable
      
      Link: https://lore.kernel.org/r/pull-1622614772-543196@atomide.comSigned-off-by: default avatarOlof Johansson <olof@lixom.net>
      2f3e4eb1
    • Olof Johansson's avatar
      Merge tag 'omap-for-v5.13/fixes-sata' of... · 94277cb5
      Olof Johansson authored
      Merge tag 'omap-for-v5.13/fixes-sata' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes
      
      Regression fix for TI dra7 SATA not detecting drives
      
      The SATA quirk flags are no missing With recent removal of legacy
      platform data and we need to add the quirk flags to detect drives.
      
      * tag 'omap-for-v5.13/fixes-sata' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
        bus: ti-sysc: Fix missing quirk flags for sata
      
      Link: https://lore.kernel.org/r/pull-1622613578-121536@atomide.comSigned-off-by: default avatarOlof Johansson <olof@lixom.net>
      94277cb5
    • Olof Johansson's avatar
      Merge tag 'amlogic-fixes-v5.13-rc1' of... · 3091a9e7
      Olof Johansson authored
      Merge tag 'amlogic-fixes-v5.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux into arm/fixes
      
      Amlogic fixes for v5.13-rc1
      - arm64: meson: select COMMON_CLK to select a proper implementation of the clock API
      - soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
      
      * tag 'amlogic-fixes-v5.13-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/amlogic/linux:
        arm64: meson: select COMMON_CLK
        soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
      
      Link: https://lore.kernel.org/r/73e76706-f3f4-bebf-10dd-d2ec9106a234@baylibre.comSigned-off-by: default avatarOlof Johansson <olof@lixom.net>
      3091a9e7
    • Olof Johansson's avatar
      Merge tag 'imx-fixes-5.13' of... · 3a2d3ae0
      Olof Johansson authored
      Merge tag 'imx-fixes-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
      
      i.MX fixes for 5.13:
      
      - Fix missing-prototypes warning of 'imx27_pm_init' in i.MX27 platform
        pm code.
      - A couple of patches from Fabio Estevam to fix 'tuning-step' property
        in imx7d-meerkat96 and imx7d-pico DT.
      - Fix '#gpio-cells' of nxp,pca8574 device in imx6qdl-emcon-avari DT.
      - A couple of patches from Lucas Stach to fix regulator and voltage for
        imx8mq-zii-ultra board.
      - Add missing regulators for imx6q-dhcom to avoid possible instability
        issues.
      - Fix memory-controller settings for fsl-ls1028a DT.
      - Fix RGMII clock and voltage for a couple of fsl-ls1028a-kontron-sl28
        boards.
      - Fix RGMII connection to QCA8334 switch for imx6dl-yapp4 board.
      
      * tag 'imx-fixes-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
        ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
        ARM: dts: imx7d-pico: Fix the 'tuning-step' property
        ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
        arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
        arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
        ARM: imx: pm-imx27: Include "common.h"
        arm64: dts: zii-ultra: fix 12V_MAIN voltage
        arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
        arm64: dts: ls1028a: fix memory node
        ARM: dts: imx6q-dhcom: Add PU,VDD1P1,VDD2P5 regulators
        ARM: dts: imx6dl-yapp4: Fix RGMII connection to QCA8334 switch
      
      Link: https://lore.kernel.org/r/20210527011758.GD8194@dragonSigned-off-by: default avatarOlof Johansson <olof@lixom.net>
      3a2d3ae0
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · e5220dd1
      Linus Torvalds authored
      Merge misc fixes from Andrew Morton:
       "13 patches.
      
        Subsystems affected by this patch series: mips, mm (kfence, debug,
        pagealloc, memory-hotplug, hugetlb, kasan, and hugetlb), init, proc,
        lib, ocfs2, and mailmap"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mailmap: use private address for Michel Lespinasse
        ocfs2: fix data corruption by fallocate
        lib: crc64: fix kernel-doc warning
        mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY
        mm/kasan/init.c: fix doc warning
        proc: add .gitignore for proc-subset-pid selftest
        hugetlb: pass head page to remove_hugetlb_page()
        drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64
        mm/page_alloc: fix counting of free pages after take off from buddy
        mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests()
        pid: take a reference when initializing `cad_pid`
        kfence: use TASK_IDLE when awaiting allocation
        Revert "MIPS: make userspace mapping young by default"
      e5220dd1
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · af8d9eb8
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - Build with '-mno-relax' when using LLVM's linker, which doesn't
         support linker relaxation.
      
       - A fix to build without SiFive's errata.
      
       - A fix to use PAs during init_resources()
      
       - A fix to avoid W+X mappings during boot.
      
      * tag 'riscv-for-linus-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: Fix memblock_free() usages in init_resources()
        riscv: skip errata_cip_453.o if CONFIG_ERRATA_SIFIVE_CIP_453 is disabled
        riscv: mm: Fix W+X mappings at boot
        riscv: Use -mno-relax when using lld linker
      af8d9eb8
    • Michel Lespinasse's avatar
    • Junxiao Bi's avatar
      ocfs2: fix data corruption by fallocate · 6bba4471
      Junxiao Bi authored
      When fallocate punches holes out of inode size, if original isize is in
      the middle of last cluster, then the part from isize to the end of the
      cluster will be zeroed with buffer write, at that time isize is not yet
      updated to match the new size, if writeback is kicked in, it will invoke
      ocfs2_writepage()->block_write_full_page() where the pages out of inode
      size will be dropped.  That will cause file corruption.  Fix this by
      zero out eof blocks when extending the inode size.
      
      Running the following command with qemu-image 4.2.1 can get a corrupted
      coverted image file easily.
      
          qemu-img convert -p -t none -T none -f qcow2 $qcow_image \
                   -O qcow2 -o compat=1.1 $qcow_image.conv
      
      The usage of fallocate in qemu is like this, it first punches holes out
      of inode size, then extend the inode size.
      
          fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0
          fallocate(11, 0, 2276196352, 65536) = 0
      
      v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html
      v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/
      
      Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.comSigned-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6bba4471
    • YueHaibing's avatar
      lib: crc64: fix kernel-doc warning · 415f0c83
      YueHaibing authored
      Fix W=1 kernel build warning:
      
        lib/crc64.c:40: warning:
         bad line:         or the previous crc64 value if computing incrementally.
      
      Link: https://lkml.kernel.org/r/20210601135851.15444-1-yuehaibing@huawei.comSigned-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarColy Li <colyli@suse.de>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      415f0c83
    • Mina Almasry's avatar
      mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY · d84cf06e
      Mina Almasry authored
      The userfaultfd hugetlb tests cause a resv_huge_pages underflow.  This
      happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on
      an index for which we already have a page in the cache.  When this
      happens, we allocate a second page, double consuming the reservation,
      and then fail to insert the page into the cache and return -EEXIST.
      
      To fix this, we first check if there is a page in the cache which
      already consumed the reservation, and return -EEXIST immediately if so.
      
      There is still a rare condition where we fail to copy the page contents
      AND race with a call for hugetlb_no_page() for this index and again we
      will underflow resv_huge_pages.  That is fixed in a more complicated
      patch not targeted for -stable.
      
      Test:
      
        Hacked the code locally such that resv_huge_pages underflows produce a
        warning, then:
      
        ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
      	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
        ./tools/testing/selftests/vm/userfaultfd hugetlb 10
      	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
      
      Both tests succeed and produce no warnings.  After the test runs number
      of free/resv hugepages is correct.
      
      [mike.kravetz@oracle.com: changelog fixes]
      
      Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com
      Fixes: 8fb5debc ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
      Signed-off-by: default avatarMina Almasry <almasrymina@google.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d84cf06e
    • Yu Kuai's avatar
      mm/kasan/init.c: fix doc warning · 7b6889f5
      Yu Kuai authored
      Fix gcc W=1 warning:
      
        mm/kasan/init.c:228: warning: Function parameter or member 'shadow_start' not described in 'kasan_populate_early_shadow'
        mm/kasan/init.c:228: warning: Function parameter or member 'shadow_end' not described in 'kasan_populate_early_shadow'
      
      Link: https://lkml.kernel.org/r/20210603140700.3045298-1-yukuai3@huawei.comSigned-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Acked-by: default avatarAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b6889f5
    • David Matlack's avatar
      proc: add .gitignore for proc-subset-pid selftest · 263e88d6
      David Matlack authored
      This new selftest needs an entry in the .gitignore file otherwise git
      will try to track the binary.
      
      Link: https://lkml.kernel.org/r/20210601164305.11776-1-dmatlack@google.com
      Fixes: 268af17a ("selftests: proc: test subset=pid")
      Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      263e88d6
    • Naoya Horiguchi's avatar
      hugetlb: pass head page to remove_hugetlb_page() · 0c5da357
      Naoya Horiguchi authored
      When memory_failure() or soft_offline_page() is called on a tail page of
      some hugetlb page, "BUG: unable to handle page fault" error can be
      triggered.
      
      remove_hugetlb_page() dereferences page->lru, so it's assumed that the
      page points to a head page, but one of the caller,
      dissolve_free_huge_page(), provides remove_hugetlb_page() with 'page'
      which could be a tail page.  So pass 'head' to it, instead.
      
      Link: https://lkml.kernel.org/r/20210526235257.2769473-1-nao.horiguchi@gmail.com
      Fixes: 6eb4e88a ("hugetlb: create remove_hugetlb_page() to separate functionality")
      Signed-off-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0c5da357
    • David Hildenbrand's avatar
      drivers/base/memory: fix trying offlining memory blocks with memory holes on aarch64 · 92813053
      David Hildenbrand authored
      offline_pages() properly checks for memory holes and bails out.
      However, we do a page_zone(pfn_to_page(start_pfn)) before calling
      offline_pages() when offlining a memory block.
      
      We should not unconditionally call page_zone(pfn_to_page(start_pfn)) on
      aarch64 in offlining code, otherwise we can trigger a BUG when hitting a
      memory hole:
      
         kernel BUG at include/linux/mm.h:1383!
         Internal error: Oops - BUG: 0 [#1] SMP
         Modules linked in: loop processor efivarfs ip_tables x_tables ext4 mbcache jbd2 dm_mod igb nvme i2c_algo_bit mlx5_core i2c_core nvme_core firmware_class
         CPU: 13 PID: 1694 Comm: ranbug Not tainted 5.12.0-next-20210524+ #4
         Hardware name: MiTAC RAPTOR EV-883832-X3-0001/RAPTOR, BIOS 1.6 06/28/2020
         pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--)
         pc : memory_subsys_offline+0x1f8/0x250
         lr : memory_subsys_offline+0x1f8/0x250
         Call trace:
           memory_subsys_offline+0x1f8/0x250
           device_offline+0x154/0x1d8
           online_store+0xa4/0x118
           dev_attr_store+0x44/0x78
           sysfs_kf_write+0xe8/0x138
           kernfs_fop_write_iter+0x26c/0x3d0
           new_sync_write+0x2bc/0x4f8
           vfs_write+0x718/0xc88
           ksys_write+0xf8/0x1e0
           __arm64_sys_write+0x74/0xa8
           invoke_syscall.constprop.0+0x78/0x1e8
           do_el0_svc+0xe4/0x298
           el0_svc+0x20/0x30
           el0_sync_handler+0xb0/0xb8
           el0_sync+0x178/0x180
         Kernel panic - not syncing: Oops - BUG: Fatal exception
         SMP: stopping secondary CPUs
         Kernel Offset: disabled
         CPU features: 0x00000251,20000846
         Memory Limit: none
      
      If nr_vmemmap_pages is set, we know that we are dealing with hotplugged
      memory that doesn't have any holes.  So call
      page_zone(pfn_to_page(start_pfn)) only when really necessary -- when
      nr_vmemmap_pages is set and we actually adjust the present pages.
      
      Link: https://lkml.kernel.org/r/20210526075226.5572-1-david@redhat.com
      Fixes: a08a2ae3 ("mm,memory_hotplug: allocate memmap from the added memory range")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarQian Cai (QUIC) <quic_qiancai@quicinc.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      92813053
    • Ding Hui's avatar
      mm/page_alloc: fix counting of free pages after take off from buddy · bac9c6fa
      Ding Hui authored
      Recently we found that there is a lot MemFree left in /proc/meminfo
      after do a lot of pages soft offline, it's not quite correct.
      
      Before Oscar's rework of soft offline for free pages [1], if we soft
      offline free pages, these pages are left in buddy with HWPoison flag,
      and NR_FREE_PAGES is not updated immediately.  So the difference between
      NR_FREE_PAGES and real number of available free pages is also even big
      at the beginning.
      
      However, with the workload running, when we catch HWPoison page in any
      alloc functions subsequently, we will remove it from buddy, meanwhile
      update the NR_FREE_PAGES and try again, so the NR_FREE_PAGES will get
      more and more closer to the real number of available free pages.
      (regardless of unpoison_memory())
      
      Now, for offline free pages, after a successful call
      take_page_off_buddy(), the page is no longer belong to buddy allocator,
      and will not be used any more, but we missed accounting NR_FREE_PAGES in
      this situation, and there is no chance to be updated later.
      
      Do update in take_page_off_buddy() like rmqueue() does, but avoid double
      counting if some one already set_migratetype_isolate() on the page.
      
      [1]: commit 06be6ff3 ("mm,hwpoison: rework soft offline for free pages")
      
      Link: https://lkml.kernel.org/r/20210526075247.11130-1-dinghui@sangfor.com.cn
      Fixes: 06be6ff3 ("mm,hwpoison: rework soft offline for free pages")
      Signed-off-by: default avatarDing Hui <dinghui@sangfor.com.cn>
      Suggested-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarNaoya Horiguchi <naoya.horiguchi@nec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bac9c6fa
    • Gerald Schaefer's avatar
      mm/debug_vm_pgtable: fix alignment for pmd/pud_advanced_tests() · 04f7ce3f
      Gerald Schaefer authored
      In pmd/pud_advanced_tests(), the vaddr is aligned up to the next pmd/pud
      entry, and so it does not match the given pmdp/pudp and (aligned down)
      pfn any more.
      
      For s390, this results in memory corruption, because the IDTE
      instruction used e.g.  in xxx_get_and_clear() will take the vaddr for
      some calculations, in combination with the given pmdp.  It will then end
      up with a wrong table origin, ending on ...ff8, and some of those
      wrongly set low-order bits will also select a wrong pagetable level for
      the index addition.  IDTE could therefore invalidate (or 0x20) something
      outside of the page tables, depending on the wrongly picked index, which
      in turn depends on the random vaddr.
      
      As result, we sometimes see "BUG task_struct (Not tainted): Padding
      overwritten" on s390, where one 0x5a padding value got overwritten with
      0x7a.
      
      Fix this by aligning down, similar to how the pmd/pud_aligned pfns are
      calculated.
      
      Link: https://lkml.kernel.org/r/20210525130043.186290-2-gerald.schaefer@linux.ibm.com
      Fixes: a5c3b9ff ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
      Signed-off-by: default avatarGerald Schaefer <gerald.schaefer@linux.ibm.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: <stable@vger.kernel.org>	[5.9+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04f7ce3f
    • Mark Rutland's avatar
      pid: take a reference when initializing `cad_pid` · 0711f0d7
      Mark Rutland authored
      During boot, kernel_init_freeable() initializes `cad_pid` to the init
      task's struct pid.  Later on, we may change `cad_pid` via a sysctl, and
      when this happens proc_do_cad_pid() will increment the refcount on the
      new pid via get_pid(), and will decrement the refcount on the old pid
      via put_pid().  As we never called get_pid() when we initialized
      `cad_pid`, we decrement a reference we never incremented, can therefore
      free the init task's struct pid early.  As there can be dangling
      references to the struct pid, we can later encounter a use-after-free
      (e.g.  when delivering signals).
      
      This was spotted when fuzzing v5.13-rc3 with Syzkaller, but seems to
      have been around since the conversion of `cad_pid` to struct pid in
      commit 9ec52099 ("[PATCH] replace cad_pid by a struct pid") from the
      pre-KASAN stone age of v2.6.19.
      
      Fix this by getting a reference to the init task's struct pid when we
      assign it to `cad_pid`.
      
      Full KASAN splat below.
      
         ==================================================================
         BUG: KASAN: use-after-free in ns_of_pid include/linux/pid.h:153 [inline]
         BUG: KASAN: use-after-free in task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
         Read of size 4 at addr ffff23794dda0004 by task syz-executor.0/273
      
         CPU: 1 PID: 273 Comm: syz-executor.0 Not tainted 5.12.0-00001-g9aef892b2d15 #1
         Hardware name: linux,dummy-virt (DT)
         Call trace:
          ns_of_pid include/linux/pid.h:153 [inline]
          task_active_pid_ns+0xc0/0xc8 kernel/pid.c:509
          do_notify_parent+0x308/0xe60 kernel/signal.c:1950
          exit_notify kernel/exit.c:682 [inline]
          do_exit+0x2334/0x2bd0 kernel/exit.c:845
          do_group_exit+0x108/0x2c8 kernel/exit.c:922
          get_signal+0x4e4/0x2a88 kernel/signal.c:2781
          do_signal arch/arm64/kernel/signal.c:882 [inline]
          do_notify_resume+0x300/0x970 arch/arm64/kernel/signal.c:936
          work_pending+0xc/0x2dc
      
         Allocated by task 0:
          slab_post_alloc_hook+0x50/0x5c0 mm/slab.h:516
          slab_alloc_node mm/slub.c:2907 [inline]
          slab_alloc mm/slub.c:2915 [inline]
          kmem_cache_alloc+0x1f4/0x4c0 mm/slub.c:2920
          alloc_pid+0xdc/0xc00 kernel/pid.c:180
          copy_process+0x2794/0x5e18 kernel/fork.c:2129
          kernel_clone+0x194/0x13c8 kernel/fork.c:2500
          kernel_thread+0xd4/0x110 kernel/fork.c:2552
          rest_init+0x44/0x4a0 init/main.c:687
          arch_call_rest_init+0x1c/0x28
          start_kernel+0x520/0x554 init/main.c:1064
          0x0
      
         Freed by task 270:
          slab_free_hook mm/slub.c:1562 [inline]
          slab_free_freelist_hook+0x98/0x260 mm/slub.c:1600
          slab_free mm/slub.c:3161 [inline]
          kmem_cache_free+0x224/0x8e0 mm/slub.c:3177
          put_pid.part.4+0xe0/0x1a8 kernel/pid.c:114
          put_pid+0x30/0x48 kernel/pid.c:109
          proc_do_cad_pid+0x190/0x1b0 kernel/sysctl.c:1401
          proc_sys_call_handler+0x338/0x4b0 fs/proc/proc_sysctl.c:591
          proc_sys_write+0x34/0x48 fs/proc/proc_sysctl.c:617
          call_write_iter include/linux/fs.h:1977 [inline]
          new_sync_write+0x3ac/0x510 fs/read_write.c:518
          vfs_write fs/read_write.c:605 [inline]
          vfs_write+0x9c4/0x1018 fs/read_write.c:585
          ksys_write+0x124/0x240 fs/read_write.c:658
          __do_sys_write fs/read_write.c:670 [inline]
          __se_sys_write fs/read_write.c:667 [inline]
          __arm64_sys_write+0x78/0xb0 fs/read_write.c:667
          __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline]
          invoke_syscall arch/arm64/kernel/syscall.c:49 [inline]
          el0_svc_common.constprop.1+0x16c/0x388 arch/arm64/kernel/syscall.c:129
          do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:168
          el0_svc+0x28/0x38 arch/arm64/kernel/entry-common.c:416
          el0_sync_handler+0x134/0x180 arch/arm64/kernel/entry-common.c:432
          el0_sync+0x154/0x180 arch/arm64/kernel/entry.S:701
      
         The buggy address belongs to the object at ffff23794dda0000
          which belongs to the cache pid of size 224
         The buggy address is located 4 bytes inside of
          224-byte region [ffff23794dda0000, ffff23794dda00e0)
         The buggy address belongs to the page:
         page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4dda0
         head:(____ptrval____) order:1 compound_mapcount:0
         flags: 0x3fffc0000010200(slab|head)
         raw: 03fffc0000010200 dead000000000100 dead000000000122 ffff23794d40d080
         raw: 0000000000000000 0000000000190019 00000001ffffffff 0000000000000000
         page dumped because: kasan: bad access detected
      
         Memory state around the buggy address:
          ffff23794dd9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
          ffff23794dd9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
         >ffff23794dda0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
          ffff23794dda0080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
          ffff23794dda0100: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00
         ==================================================================
      
      Link: https://lkml.kernel.org/r/20210524172230.38715-1-mark.rutland@arm.com
      Fixes: 9ec52099 ("[PATCH] replace cad_pid by a struct pid")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Acked-by: default avatarChristian Brauner <christian.brauner@ubuntu.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Christian Brauner <christian@brauner.io>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kees Cook <keescook@chromium.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0711f0d7
    • Marco Elver's avatar
      kfence: use TASK_IDLE when awaiting allocation · 8fd0e995
      Marco Elver authored
      Since wait_event() uses TASK_UNINTERRUPTIBLE by default, waiting for an
      allocation counts towards load.  However, for KFENCE, this does not make
      any sense, since there is no busy work we're awaiting.
      
      Instead, use TASK_IDLE via wait_event_idle() to not count towards load.
      
      BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1185565
      Link: https://lkml.kernel.org/r/20210521083209.3740269-1-elver@google.com
      Fixes: 407f1d8c ("kfence: await for allocation using wait_event")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: <stable@vger.kernel.org>	[5.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8fd0e995
    • Thomas Bogendoerfer's avatar
      Revert "MIPS: make userspace mapping young by default" · 50c25ee9
      Thomas Bogendoerfer authored
      This reverts commit f685a533.
      
      The MIPS cache flush logic needs to know whether the mapping was already
      established to decide how to flush caches.  This is done by checking the
      valid bit in the PTE.  The commit above breaks this logic by setting the
      valid in the PTE in new mappings, which causes kernel crashes.
      
      Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de
      Fixes: f685a533 ("MIPS: make userspace mapping young by default")
      Reported-by: default avatarZhou Yanjie <zhouyanjie@wanyeetech.com>
      Signed-off-by: default avatarThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Huang Pei <huangpei@loongson.cn>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      50c25ee9
    • Linus Torvalds's avatar
      Merge tag 'net-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 9d32fa5d
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from bpf, wireless, netfilter and
        wireguard trees.
      
        The bpf vs lockdown+audit fix is the most notable.
      
        Things haven't slowed down just yet, both in terms of regressions in
        current release and largish fixes for older code, but we usually see a
        slowdown only after -rc5.
      
        Current release - regressions:
      
         - virtio-net: fix page faults and crashes when XDP is enabled
      
         - mlx5e: fix HW timestamping with CQE compression, and make sure they
           are only allowed to coexist with capable devices
      
         - stmmac:
            - fix kernel panic due to NULL pointer dereference of
              mdio_bus_data
            - fix double clk unprepare when no PHY device is connected
      
        Current release - new code bugs:
      
         - mt76: a few fixes for the recent MT7921 devices and runtime power
           management
      
        Previous releases - regressions:
      
         - ice:
            - track AF_XDP ZC enabled queues in bitmap to fix copy mode Tx
            - fix allowing VF to request more/less queues via virtchnl
            - correct supported and advertised autoneg by using PHY
              capabilities
            - allow all LLDP packets from PF to Tx
      
         - kbuild: quote OBJCOPY var to avoid a pahole call break the build
      
        Previous releases - always broken:
      
         - bpf, lockdown, audit: fix buggy SELinux lockdown permission checks
      
         - mt76: address the recent FragAttack vulnerabilities not covered by
           generic fixes
      
         - ipv6: fix KASAN: slab-out-of-bounds Read in
           fib6_nh_flush_exceptions
      
         - Bluetooth:
            - fix the erroneous flush_work() order, to avoid double free
            - use correct lock to prevent UAF of hdev object
      
         - nfc: fix NULL ptr dereference in llcp_sock_getname() after failed
           connect
      
         - ieee802154: multiple fixes to error checking and return values
      
         - igb: fix XDP with PTP enabled
      
         - intel: add correct exception tracing for XDP
      
         - tls: fix use-after-free when TLS offload device goes down and back
           up
      
         - ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service
      
         - netfilter: nft_ct: skip expectations for confirmed conntrack
      
         - mptcp: fix falling back to TCP in presence of out of order packets
           early in connection lifetime
      
         - wireguard: switch from O(n) to a O(1) algorithm for maintaining
           peers, fixing stalls and a large memory leak in the process
      
        Misc:
      
         - devlink: correct VIRTUAL port to not have phys_port attributes
      
         - Bluetooth: fix VIRTIO_ID_BT assigned number
      
         - net: return the correct errno code ENOBUF -> ENOMEM
      
         - wireguard:
            - peer: allocate in kmem_cache saving 25% on peer memory
            - do not use -O3"
      
      * tag 'net-5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
        cxgb4: avoid link re-train during TC-MQPRIO configuration
        sch_htb: fix refcount leak in htb_parent_to_leaf_offload
        wireguard: allowedips: free empty intermediate nodes when removing single node
        wireguard: allowedips: allocate nodes in kmem_cache
        wireguard: allowedips: remove nodes in O(1)
        wireguard: allowedips: initialize list head in selftest
        wireguard: peer: allocate in kmem_cache
        wireguard: use synchronize_net rather than synchronize_rcu
        wireguard: do not use -O3
        wireguard: selftests: make sure rp_filter is disabled on vethc
        wireguard: selftests: remove old conntrack kconfig value
        virtchnl: Add missing padding to virtchnl_proto_hdrs
        ice: Allow all LLDP packets from PF to Tx
        ice: report supported and advertised autoneg using PHY capabilities
        ice: handle the VF VSI rebuild failure
        ice: Fix VFR issues for AVF drivers that expect ATQLEN cleared
        ice: Fix allowing VF to request more/less queues via virtchnl
        virtio-net: fix for skb_over_panic inside big mode
        ipv6: Fix KASAN: slab-out-of-bounds Read in fib6_nh_flush_exceptions
        fib: Return the correct errno code
        ...
      9d32fa5d