1. 18 Jun, 2021 1 commit
    • Fan Du's avatar
      x86/mm: Avoid truncating memblocks for SGX memory · 28e5e44a
      Fan Du authored
      tl;dr:
      
      Several SGX users reported seeing the following message on NUMA systems:
      
        sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
      
      This turned out to be the memblock code mistakenly throwing away SGX
      memory.
      
      === Full Changelog ===
      
      The 'max_pfn' variable represents the highest known RAM address.  It can
      be used, for instance, to quickly determine for which physical addresses
      there is mem_map[] space allocated.  The numa_meminfo code makes an
      effort to throw out ("trim") all memory blocks which are above 'max_pfn'.
      
      SGX memory is not considered RAM (it is marked as "Reserved" in the
      e820) and is not taken into account by max_pfn. Despite this, SGX memory
      areas have NUMA affinity and are enumerated in the ACPI SRAT table. The
      existing SGX code uses the numa_meminfo mechanism to look up the NUMA
      affinity for its memory areas.
      
      In cases where SGX memory was above max_pfn (usually just the one EPC
      section in the last highest NUMA node), the numa_memblock is truncated
      at 'max_pfn', which is below the SGX memory.  When the SGX code tries to
      look up the affinity of this memory, it fails and produces an error message:
      
        sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
      
      and assigns the memory to NUMA node 0.
      
      Instead of silently truncating the memory block at 'max_pfn' and
      dropping the SGX memory, add the truncated portion to
      'numa_reserved_meminfo'.  This allows the SGX code to later determine
      the NUMA affinity of its 'Reserved' area.
      
      Before, numa_meminfo looked like this (from 'crash'):
      
        blk = { start =          0x0, end = 0x2080000000, nid = 0x0 }
              { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
      
      numa_reserved_meminfo is empty.
      
      With this, numa_meminfo looks like this:
      
        blk = { start =          0x0, end = 0x2080000000, nid = 0x0 }
              { start = 0x2080000000, end = 0x4000000000, nid = 0x1 }
      
      and numa_reserved_meminfo has an entry for node 1's SGX memory:
      
        blk =  { start = 0x4000000000, end = 0x4080000000, nid = 0x1 }
      
       [ daveh: completely rewrote/reworked changelog ]
      
      Fixes: 5d30f92e ("x86/NUMA: Provide a range-to-target_node lookup facility")
      Reported-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarFan Du <fan.du@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Reviewed-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarDave Hansen <dave.hansen@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20210617194657.0A99CB22@viggo.jf.intel.com
      28e5e44a
  2. 15 Jun, 2021 1 commit
  3. 10 Jun, 2021 1 commit
  4. 09 Jun, 2021 4 commits
    • Thomas Gleixner's avatar
      x86/pkru: Write hardware init value to PKRU when xstate is init · 510b80a6
      Thomas Gleixner authored
      When user space brings PKRU into init state, then the kernel handling is
      broken:
      
        T1 user space
           xsave(state)
           state.header.xfeatures &= ~XFEATURE_MASK_PKRU;
           xrstor(state)
      
        T1 -> kernel
           schedule()
             XSAVE(S) -> T1->xsave.header.xfeatures[PKRU] == 0
             T1->flags |= TIF_NEED_FPU_LOAD;
      
             wrpkru();
      
           schedule()
             ...
             pk = get_xsave_addr(&T1->fpu->state.xsave, XFEATURE_PKRU);
             if (pk)
      	 wrpkru(pk->pkru);
             else
      	 wrpkru(DEFAULT_PKRU);
      
      Because the xfeatures bit is 0 and therefore the value in the xsave
      storage is not valid, get_xsave_addr() returns NULL and switch_to()
      writes the default PKRU. -> FAIL #1!
      
      So that wrecks any copy_to/from_user() on the way back to user space
      which hits memory which is protected by the default PKRU value.
      
      Assumed that this does not fail (pure luck) then T1 goes back to user
      space and because TIF_NEED_FPU_LOAD is set it ends up in
      
        switch_fpu_return()
            __fpregs_load_activate()
              if (!fpregs_state_valid()) {
        	 load_XSTATE_from_task();
              }
      
      But if nothing touched the FPU between T1 scheduling out and back in,
      then the fpregs_state is still valid which means switch_fpu_return()
      does nothing and just clears TIF_NEED_FPU_LOAD. Back to user space with
      DEFAULT_PKRU loaded. -> FAIL #2!
      
      The fix is simple: if get_xsave_addr() returns NULL then set the
      PKRU value to 0 instead of the restrictive default PKRU value in
      init_pkru_value.
      
       [ bp: Massage in minor nitpicks from folks. ]
      
      Fixes: 0cecca9d ("x86/fpu: Eager switch PKRU state")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Tested-by: default avatarBabu Moger <babu.moger@amd.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144346.045616965@linutronix.de
      510b80a6
    • Thomas Gleixner's avatar
      x86/process: Check PF_KTHREAD and not current->mm for kernel threads · 12f7764a
      Thomas Gleixner authored
      switch_fpu_finish() checks current->mm as indicator for kernel threads.
      That's wrong because kernel threads can temporarily use a mm of a user
      process via kthread_use_mm().
      
      Check the task flags for PF_KTHREAD instead.
      
      Fixes: 0cecca9d ("x86/fpu: Eager switch PKRU state")
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.912645927@linutronix.de
      12f7764a
    • Andy Lutomirski's avatar
      x86/fpu: Invalidate FPU state after a failed XRSTOR from a user buffer · d8778e39
      Andy Lutomirski authored
      Both Intel and AMD consider it to be architecturally valid for XRSTOR to
      fail with #PF but nonetheless change the register state.  The actual
      conditions under which this might occur are unclear [1], but it seems
      plausible that this might be triggered if one sibling thread unmaps a page
      and invalidates the shared TLB while another sibling thread is executing
      XRSTOR on the page in question.
      
      __fpu__restore_sig() can execute XRSTOR while the hardware registers
      are preserved on behalf of a different victim task (using the
      fpu_fpregs_owner_ctx mechanism), and, in theory, XRSTOR could fail but
      modify the registers.
      
      If this happens, then there is a window in which __fpu__restore_sig()
      could schedule out and the victim task could schedule back in without
      reloading its own FPU registers. This would result in part of the FPU
      state that __fpu__restore_sig() was attempting to load leaking into the
      victim task's user-visible state.
      
      Invalidate preserved FPU registers on XRSTOR failure to prevent this
      situation from corrupting any state.
      
      [1] Frequent readers of the errata lists might imagine "complex
          microarchitectural conditions".
      
      Fixes: 1d731e73 ("x86/fpu: Add a fastpath to __fpu__restore_sig()")
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.758116583@linutronix.de
      d8778e39
    • Thomas Gleixner's avatar
      x86/fpu: Prevent state corruption in __fpu__restore_sig() · 484cea4f
      Thomas Gleixner authored
      The non-compacted slowpath uses __copy_from_user() and copies the entire
      user buffer into the kernel buffer, verbatim.  This means that the kernel
      buffer may now contain entirely invalid state on which XRSTOR will #GP.
      validate_user_xstate_header() can detect some of that corruption, but that
      leaves the onus on callers to clear the buffer.
      
      Prior to XSAVES support, it was possible just to reinitialize the buffer,
      completely, but with supervisor states that is not longer possible as the
      buffer clearing code split got it backwards. Fixing that is possible but
      not corrupting the state in the first place is more robust.
      
      Avoid corruption of the kernel XSAVE buffer by using copy_user_to_xstate()
      which validates the XSAVE header contents before copying the actual states
      to the kernel. copy_user_to_xstate() was previously only called for
      compacted-format kernel buffers, but it works for both compacted and
      non-compacted forms.
      
      Using it for the non-compacted form is slower because of multiple
      __copy_from_user() operations, but that cost is less important than robust
      code in an already slow path.
      
      [ Changelog polished by Dave Hansen ]
      
      Fixes: b860eb8d ("x86/fpu/xstate: Define new functions for clearing fpregs and xstates")
      Reported-by: syzbot+2067e764dbcd10721e2e@syzkaller.appspotmail.com
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Reviewed-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarRik van Riel <riel@surriel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210608144345.611833074@linutronix.de
      484cea4f
  5. 08 Jun, 2021 1 commit
    • Tom Lendacky's avatar
      x86/ioremap: Map EFI-reserved memory as encrypted for SEV · 8d651ee9
      Tom Lendacky authored
      Some drivers require memory that is marked as EFI boot services
      data. In order for this memory to not be re-used by the kernel
      after ExitBootServices(), efi_mem_reserve() is used to preserve it
      by inserting a new EFI memory descriptor and marking it with the
      EFI_MEMORY_RUNTIME attribute.
      
      Under SEV, memory marked with the EFI_MEMORY_RUNTIME attribute needs to
      be mapped encrypted by Linux, otherwise the kernel might crash at boot
      like below:
      
        EFI Variables Facility v0.08 2004-May-17
        general protection fault, probably for non-canonical address 0x3597688770a868b2: 0000 [#1] SMP NOPTI
        CPU: 13 PID: 1 Comm: swapper/0 Not tainted 5.12.4-2-default #1 openSUSE Tumbleweed
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
        RIP: 0010:efi_mokvar_entry_next
        [...]
        Call Trace:
         efi_mokvar_sysfs_init
         ? efi_mokvar_table_init
         do_one_initcall
         ? __kmalloc
         kernel_init_freeable
         ? rest_init
         kernel_init
         ret_from_fork
      
      Expand the __ioremap_check_other() function to additionally check for
      this other type of boot data reserved at runtime and indicate that it
      should be mapped encrypted for an SEV guest.
      
       [ bp: Massage commit message. ]
      
      Fixes: 58c90902 ("efi: Support for MOK variable config table")
      Reported-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarJoerg Roedel <jroedel@suse.de>
      Cc: <stable@vger.kernel.org> # 5.10+
      Link: https://lkml.kernel.org/r/20210608095439.12668-2-joro@8bytes.org
      8d651ee9
  6. 06 Jun, 2021 11 commits
    • Linus Torvalds's avatar
      Linux 5.13-rc5 · 614124be
      Linus Torvalds authored
      614124be
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 90d56a3d
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Five small and fairly minor fixes, all in drivers"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: scsi_devinfo: Add blacklist entry for HPE OPEN-V
        scsi: ufs: ufs-mediatek: Fix HCI version in some platforms
        scsi: qedf: Do not put host in qedf_vport_create() unconditionally
        scsi: lpfc: Fix failure to transmit ABTS on FC link
        scsi: target: core: Fix warning on realtime kernels
      90d56a3d
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 20e41d9b
      Linus Torvalds authored
      Pull ext4 fixes from Ted Ts'o:
       "Miscellaneous ext4 bug fixes"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: Only advertise encrypted_casefold when encryption and unicode are enabled
        ext4: fix no-key deletion for encrypt+casefold
        ext4: fix memory leak in ext4_fill_super
        ext4: fix fast commit alignment issues
        ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed
        ext4: fix accessing uninit percpu counter variable with fast_commit
        ext4: fix memory leak in ext4_mb_init_backend on error path.
      20e41d9b
    • Linus Torvalds's avatar
      Merge tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · decad3e1
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A set of fixes that have been coming in over the last few weeks, the
        usual mix of fixes:
      
         - DT fixups for TI K3
      
         - SATA drive detection fix for TI DRA7
      
         - Power management fixes and a few build warning removals for OMAP
      
         - OP-TEE fix to use standard API for UUID exporting
      
         - DT fixes for a handful of i.MX boards
      
        And a few other smaller items"
      
      * tag 'arm-soc-fixes-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits)
        arm64: meson: select COMMON_CLK
        soc: amlogic: meson-clk-measure: remove redundant dev_err call in meson_msr_probe()
        ARM: OMAP1: ams-delta: remove unused function ams_delta_camera_power
        bus: ti-sysc: Fix flakey idling of uarts and stop using swsup_sidle_act
        ARM: dts: imx: emcon-avari: Fix nxp,pca8574 #gpio-cells
        ARM: dts: imx7d-pico: Fix the 'tuning-step' property
        ARM: dts: imx7d-meerkat96: Fix the 'tuning-step' property
        arm64: dts: freescale: sl28: var1: fix RGMII clock and voltage
        arm64: dts: freescale: sl28: var4: fix RGMII clock and voltage
        ARM: imx: pm-imx27: Include "common.h"
        arm64: dts: zii-ultra: fix 12V_MAIN voltage
        arm64: dts: zii-ultra: remove second GEN_3V3 regulator instance
        arm64: dts: ls1028a: fix memory node
        bus: ti-sysc: Fix am335x resume hang for usb otg module
        ARM: OMAP2+: Fix build warning when mmc_omap is not built
        ARM: OMAP1: isp1301-omap: Add missing gpiod_add_lookup_table function
        ARM: OMAP1: Fix use of possibly uninitialized irq variable
        optee: use export_uuid() to copy client UUID
        arm64: dts: ti: k3*: Introduce reg definition for interrupt routers
        arm64: dts: ti: k3-am65|j721e|am64: Map the dma / navigator subsystem via explicit ranges
        ...
      decad3e1
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · bd7b12aa
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "Fix our KVM reverse map real-mode handling since we enabled huge
        vmalloc (in some configurations).
      
        Revert a recent change to our IOMMU code which broke some devices.
      
        Fix KVM handling of FSCR on P7/P8, which could have possibly let a
        guest crash it's Qemu.
      
        Fix kprobes validation of prefixed instructions across page boundary.
      
        Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas,
        Frederic Barrat, Naveen N. Rao, and Nicholas Piggin"
      
      * tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs"
        KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
        powerpc: Fix reverse map real-mode address lookup with huge vmalloc
        powerpc/kprobes: Fix validation of prefixed instructions across page boundary
      bd7b12aa
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 773ac53b
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
       "A bunch of x86/urgent stuff accumulated for the last two weeks so
        lemme unload it to you.
      
        It should be all totally risk-free, of course. :-)
      
         - Fix out-of-spec hardware (1st gen Hygon) which does not implement
           MSR_AMD64_SEV even though the spec clearly states so, and check
           CPUID bits first.
      
         - Send only one signal to a task when it is a SEGV_PKUERR si_code
           type.
      
         - Do away with all the wankery of reserving X amount of memory in the
           first megabyte to prevent BIOS corrupting it and simply and
           unconditionally reserve the whole first megabyte.
      
         - Make alternatives NOP optimization work at an arbitrary position
           within the patched sequence because the compiler can put
           single-byte NOPs for alignment anywhere in the sequence (32-bit
           retpoline), vs our previous assumption that the NOPs are only
           appended.
      
         - Force-disable ENQCMD[S] instructions support and remove
           update_pasid() because of insufficient protection against FPU state
           modification in an interrupt context, among other xstate horrors
           which are being addressed at the moment. This one limits the
           fallout until proper enablement.
      
         - Use cpu_feature_enabled() in the idxd driver so that it can be
           build-time disabled through the defines in disabled-features.h.
      
         - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
           LVT value is read before APIC initialization so that softlockups
           during boot do not happen at least on one machine.
      
         - Mark all legacy interrupts as legacy vectors when the IO-APIC is
           disabled and when all legacy interrupts are routed through the PIC"
      
      * tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/sev: Check SME/SEV support in CPUID first
        x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
        x86/setup: Always reserve the first 1M of RAM
        x86/alternative: Optimize single-byte NOPs at an arbitrary position
        x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
        dmaengine: idxd: Use cpu_feature_enabled()
        x86/thermal: Fix LVT thermal setup for SMI delivery mode
        x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
      773ac53b
    • Daniel Rosenberg's avatar
      ext4: Only advertise encrypted_casefold when encryption and unicode are enabled · e71f99f2
      Daniel Rosenberg authored
      Encrypted casefolding is only supported when both encryption and
      casefolding are both enabled in the config.
      
      Fixes: 471fbbea ("ext4: handle casefolding with encryption")
      Cc: stable@vger.kernel.org # 5.13+
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      e71f99f2
    • Daniel Rosenberg's avatar
      ext4: fix no-key deletion for encrypt+casefold · 63e7f128
      Daniel Rosenberg authored
      commit 471fbbea ("ext4: handle casefolding with encryption") is
      missing a few checks for the encryption key which are needed to
      support deleting enrypted casefolded files when the key is not
      present.
      
      This bug made it impossible to delete encrypted+casefolded directories
      without the encryption key, due to errors like:
      
          W         : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key
      
      Repro steps in kvm-xfstests test appliance:
            mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc
            mount /vdc
            mkdir /vdc/dir
            chattr +F /vdc/dir
            keyid=$(head -c 64 /dev/zero | xfs_io -c add_enckey /vdc | awk '{print $NF}')
            xfs_io -c "set_encpolicy $keyid" /vdc/dir
            for i in `seq 1 100`; do
                mkdir /vdc/dir/$i
            done
            xfs_io -c "rm_enckey $keyid" /vdc
            rm -rf /vdc/dir # fails with the bug
      
      Fixes: 471fbbea ("ext4: handle casefolding with encryption")
      Signed-off-by: default avatarDaniel Rosenberg <drosen@google.com>
      Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      63e7f128
    • Alexey Makhalov's avatar
      ext4: fix memory leak in ext4_fill_super · afd09b61
      Alexey Makhalov authored
      Buffer head references must be released before calling kill_bdev();
      otherwise the buffer head (and its page referenced by b_data) will not
      be freed by kill_bdev, and subsequently that bh will be leaked.
      
      If blocksizes differ, sb_set_blocksize() will kill current buffers and
      page cache by using kill_bdev(). And then super block will be reread
      again but using correct blocksize this time. sb_set_blocksize() didn't
      fully free superblock page and buffer head, and being busy, they were
      not freed and instead leaked.
      
      This can easily be reproduced by calling an infinite loop of:
      
        systemctl start <ext4_on_lvm>.mount, and
        systemctl stop <ext4_on_lvm>.mount
      
      ... since systemd creates a cgroup for each slice which it mounts, and
      the bh leak get amplified by a dying memory cgroup that also never
      gets freed, and memory consumption is much more easily noticed.
      
      Fixes: ce40733c ("ext4: Check for return value from sb_set_blocksize")
      Fixes: ac27a0ec ("ext4: initial copy of files from ext3")
      Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.comSigned-off-by: default avatarAlexey Makhalov <amakhalov@vmware.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      afd09b61
    • Harshad Shirwadkar's avatar
      ext4: fix fast commit alignment issues · a7ba36bc
      Harshad Shirwadkar authored
      Fast commit recovery data on disk may not be aligned. So, when the
      recovery code reads it, this patch makes sure that fast commit info
      found on-disk is first memcpy-ed into an aligned variable before
      accessing it. As a consequence of it, we also remove some macros that
      could resulted in unaligned accesses.
      
      Cc: stable@kernel.org
      Fixes: 8016e29f ("ext4: fast commit recovery path")
      Signed-off-by: default avatarHarshad Shirwadkar <harshadshirwadkar@gmail.com>
      Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a7ba36bc
    • Ye Bin's avatar
      ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed · 082cd4ec
      Ye Bin authored
      We got follow bug_on when run fsstress with injecting IO fault:
      [130747.323114] kernel BUG at fs/ext4/extents_status.c:762!
      [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP
      ......
      [130747.334329] Call trace:
      [130747.334553]  ext4_es_cache_extent+0x150/0x168 [ext4]
      [130747.334975]  ext4_cache_extents+0x64/0xe8 [ext4]
      [130747.335368]  ext4_find_extent+0x300/0x330 [ext4]
      [130747.335759]  ext4_ext_map_blocks+0x74/0x1178 [ext4]
      [130747.336179]  ext4_map_blocks+0x2f4/0x5f0 [ext4]
      [130747.336567]  ext4_mpage_readpages+0x4a8/0x7a8 [ext4]
      [130747.336995]  ext4_readpage+0x54/0x100 [ext4]
      [130747.337359]  generic_file_buffered_read+0x410/0xae8
      [130747.337767]  generic_file_read_iter+0x114/0x190
      [130747.338152]  ext4_file_read_iter+0x5c/0x140 [ext4]
      [130747.338556]  __vfs_read+0x11c/0x188
      [130747.338851]  vfs_read+0x94/0x150
      [130747.339110]  ksys_read+0x74/0xf0
      
      This patch's modification is according to Jan Kara's suggestion in:
      https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/
      "I see. Now I understand your patch. Honestly, seeing how fragile is trying
      to fix extent tree after split has failed in the middle, I would probably
      go even further and make sure we fix the tree properly in case of ENOSPC
      and EDQUOT (those are easily user triggerable).  Anything else indicates a
      HW problem or fs corruption so I'd rather leave the extent tree as is and
      don't try to fix it (which also means we will not create overlapping
      extents)."
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      082cd4ec
  7. 05 Jun, 2021 21 commits