1. 25 Oct, 2021 10 commits
    • Bob Peterson's avatar
      gfs2: change go_lock to go_instantiate · 3278b977
      Bob Peterson authored
      Before this patch, the go_lock glock operations (glops) did not do
      any actual locking. They were used to instantiate objects, like reading
      in dinodes and rgrps from the media.
      
      This patch renames the functions to go_instantiate for clarity.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      3278b977
    • Bob Peterson's avatar
      gfs2: dump glocks from gfs2_consist_OBJ_i · a739765c
      Bob Peterson authored
      Before this patch, failed consistency checks printed out the object
      that failed, but not the object's glock. This patch makes it also
      print out the object glock so we can see the glock's holders and flags
      to aid with debugging.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      a739765c
    • Bob Peterson's avatar
      gfs2: dequeue iopen holder in gfs2_inode_lookup error · 763766c0
      Bob Peterson authored
      Before this patch, if function gfs2_inode_lookup encountered an error
      after it had locked the iopen glock, it never unlocked it, relying on
      the evict code to do the cleanup.  The evict code then took the
      inode glock while holding the iopen glock, which violates the locking
      order.  For example,
      
       (1) node A does a gfs2_inode_lookup that fails, leaving the iopen glock
           locked.
      
       (2) node B calls delete_work_func -> gfs2_lookup_by_inum ->
           gfs2_inode_lookup.  It locks the inode glock and blocks trying to
           lock the iopen glock, which is held by node A.
      
       (3) node A eventually calls gfs2_evict_inode -> evict_should_delete.
           It blocks trying to lock the inode glock, which is now held by
           node B.
      
      This patch introduces error handling to function gfs2_inode_lookup
      so it properly dequeues held iopen glocks on errors.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      763766c0
    • Andreas Gruenbacher's avatar
      gfs2: Save ip from gfs2_glock_nq_init · b016d9a8
      Andreas Gruenbacher authored
      Before this patch, when a glock was locked by function gfs2_glock_nq_init,
      it initialized the holder gh_ip (return address) as gfs2_glock_nq_init.
      That made it extremely difficult to track down problems because many
      functions call gfs2_glock_nq_init. This patch changes the function so
      that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes
      it easy to backtrack which holder took the lock.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      b016d9a8
    • Bob Peterson's avatar
      gfs2: Allow append and immutable bits to coexist · a500bd31
      Bob Peterson authored
      Before this patch, function do_gfs2_set_flags checked if the append
      and immutable flags were being set while already set. If so, error -EPERM
      was given. There's no reason why these two flags should be mutually
      exclusive, and if you set them separately, you will, in essence, set
      one while it is already set. For example:
      
      chattr +a /mnt/gfs2/file1
      chattr +i /mnt/gfs2/file1
      
      The first command sets the append-only flag. Since they are additive,
      the second command sets the immutable flag AND append-only flag,
      since they both coexist in i_diskflags. So the second command should
      not return an error. This bug caused xfstests generic/545 to fail.
      
      This patch simply removes the invalid checks.
      I also eliminated an unused parm from do_gfs2_set_flags.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      a500bd31
    • Bob Peterson's avatar
      gfs2: Switch some BUG_ON to GLOCK_BUG_ON for debug · c98c2ca5
      Bob Peterson authored
      In rgrp.c, there are several places where it does BUG_ON. This tells us
      the call stack but nothing more, which is not very helpful.
      This patch switches them to GLOCK_BUG_ON which also prints the glock,
      its holders, and many of the rgrp values, which will help us debug
      problems in the future.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      c98c2ca5
    • Bob Peterson's avatar
      gfs2: move GL_SKIP check from glops to do_promote · c1442f6b
      Bob Peterson authored
      Before this patch, each individual "go_lock" glock operation (glop)
      checked the GL_SKIP flag, and if set, would skip further processing.
      
      This patch changes the logic so the go_lock caller, function go_promote,
      checks the GL_SKIP flag before calling the go_lock op in the first place.
      This avoids having to unnecessarily unlock gl_lockref.lock only to
      re-lock it again.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      c1442f6b
    • Bob Peterson's avatar
      gfs2: Add GL_SKIP holder flag to dump_holder · 4c69038d
      Bob Peterson authored
      Somehow, the GL_SKIP flag was missed when dumping glock holders.
      This patch adds it to function hflags2str. I added it at the end because
      I wanted Holder and Skip flags together to read "Hs" rather than "sH"
      to avoid confusion with "Shared" ("SH") holder state.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      4c69038d
    • Bob Peterson's avatar
      gfs2: remove redundant check in gfs2_rgrp_go_lock · 6edb6ba3
      Bob Peterson authored
      Before this patch, function gfs2_rgrp_go_lock checked if GL_SKIP and
      ar_rgrplvb were both true. However, GL_SKIP is only set for rgrps if
      ar_rgrplvb is true (see gfs2_inplace_reserve). This patch simply removes
      the redundant check.
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      6edb6ba3
    • Andreas Gruenbacher's avatar
      gfs2: Fix mmap + page fault deadlocks for direct I/O · b01b2d72
      Andreas Gruenbacher authored
      Also disable page faults during direct I/O requests and implement a
      similar kind of retry logic as in the buffered I/O case.
      
      The retry logic in the direct I/O case differs from the buffered I/O
      case in the following way: direct I/O doesn't provide the kinds of
      consistency guarantees between concurrent reads and writes that buffered
      I/O provides, so once we lose the inode glock while faulting in user
      pages, we always resume the operation.  We never need to return a
      partial read or write.
      
      This locking problem was originally reported by Jan Kara.  Linus came up
      with the idea of disabling page faults.  Many thanks to Al Viro and
      Matthew Wilcox for their feedback.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b01b2d72
  2. 24 Oct, 2021 6 commits
    • Andreas Gruenbacher's avatar
      iov_iter: Introduce nofault flag to disable page faults · 3337ab08
      Andreas Gruenbacher authored
      Introduce a new nofault flag to indicate to iov_iter_get_pages not to
      fault in user pages.
      
      This is implemented by passing the FOLL_NOFAULT flag to get_user_pages,
      which causes get_user_pages to fail when it would otherwise fault in a
      page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting
      in pages when page faults are not allowed.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      3337ab08
    • Andreas Gruenbacher's avatar
      gup: Introduce FOLL_NOFAULT flag to disable page faults · 55b8fe70
      Andreas Gruenbacher authored
      Introduce a new FOLL_NOFAULT flag that causes get_user_pages to return
      -EFAULT when it would otherwise trigger a page fault.  This is roughly
      similar to FOLL_FAST_ONLY but available on all architectures, and less
      fragile.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      55b8fe70
    • Andreas Gruenbacher's avatar
      iomap: Add done_before argument to iomap_dio_rw · 4fdccaa0
      Andreas Gruenbacher authored
      Add a done_before argument to iomap_dio_rw that indicates how much of
      the request has already been transferred.  When the request succeeds, we
      report that done_before additional bytes were tranferred.  This is
      useful for finishing a request asynchronously when part of the request
      has already been completed synchronously.
      
      We'll use that to allow iomap_dio_rw to be used with page faults
      disabled: when a page fault occurs while submitting a request, we
      synchronously complete the part of the request that has already been
      submitted.  The caller can then take care of the page fault and call
      iomap_dio_rw again for the rest of the request, passing in the number of
      bytes already tranferred.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      4fdccaa0
    • Andreas Gruenbacher's avatar
      iomap: Support partial direct I/O on user copy failures · 97308f8b
      Andreas Gruenbacher authored
      In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the
      IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and
      return a partial result.  This allows the caller to deal with the page
      fault and retry the remainder of the request.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      97308f8b
    • Andreas Gruenbacher's avatar
      iomap: Fix iomap_dio_rw return value for user copies · 42c498c1
      Andreas Gruenbacher authored
      When a user copy fails in one of the helpers of iomap_dio_rw, fail with
      -EFAULT instead of returning 0.  This matches what iomap_dio_bio_actor
      returns when it gets an -EFAULT from bio_iov_iter_get_pages.  With these
      changes, iomap_dio_actor now consistently fails with -EFAULT when a user
      page cannot be faulted in.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      42c498c1
    • Andreas Gruenbacher's avatar
      gfs2: Fix mmap + page fault deadlocks for buffered I/O · 00bfe02f
      Andreas Gruenbacher authored
      In the .read_iter and .write_iter file operations, we're accessing
      user-space memory while holding the inode glock.  There is a possibility
      that the memory is mapped to the same file, in which case we'd recurse
      on the same glock.
      
      We could detect and work around this simple case of recursive locking,
      but more complex scenarios exist that involve multiple glocks,
      processes, and cluster nodes, and working around all of those cases
      isn't practical or even possible.
      
      Avoid these kinds of problems by disabling page faults while holding the
      inode glock.  If a page fault would occur, we either end up with a
      partial read or write or with -EFAULT if nothing could be read or
      written.  In either case, we know that we're not done with the
      operation, so we indicate that we're willing to give up the inode glock
      and then we fault in the missing pages.  If that made us lose the inode
      glock, we return a partial read or write.  Otherwise, we resume the
      operation.
      
      This locking problem was originally reported by Jan Kara.  Linus came up
      with the idea of disabling page faults.  Many thanks to Al Viro and
      Matthew Wilcox for their feedback.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      00bfe02f
  3. 20 Oct, 2021 6 commits
    • Andreas Gruenbacher's avatar
      gfs2: Eliminate ip->i_gh · 1b223f70
      Andreas Gruenbacher authored
      Now that gfs2_file_buffered_write is the only remaining user of
      ip->i_gh, we can move the glock holder to the stack (or rather, use the
      one we already have on the stack); there is no need for keeping the
      holder in the inode anymore.
      
      This is slightly complicated by the fact that we're using ip->i_gh for
      the statfs inode in gfs2_file_buffered_write as well.  Writing to the
      statfs inode isn't very common, so allocate the statfs holder
      dynamically when needed.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      1b223f70
    • Andreas Gruenbacher's avatar
      gfs2: Move the inode glock locking to gfs2_file_buffered_write · b924bdab
      Andreas Gruenbacher authored
      So far, for buffered writes, we were taking the inode glock in
      gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of
      not holding the inode glock while iomap_write_actor faults in user
      pages.  It turns out that iomap_write_actor is called inside iomap_begin
      ... iomap_end, so the user pages were still faulted in while holding the
      inode glock and the locking code in iomap_begin / iomap_end was
      completely pointless.
      
      Move the locking into gfs2_file_buffered_write instead.  We'll take care
      of the potential deadlocks due to faulting in user pages while holding a
      glock in a subsequent patch.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      b924bdab
    • Bob Peterson's avatar
      gfs2: Introduce flag for glock holder auto-demotion · dc732906
      Bob Peterson authored
      This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that
      will allow glocks to be demoted automatically on locking conflicts.
      When a locking request comes in that isn't compatible with the locking
      state of an active holder and that holder has the HIF_MAY_DEMOTE flag
      set, the holder will be demoted before the incoming locking request is
      granted.
      
      Note that this mechanism demotes active holders (with the HIF_HOLDER
      flag set), while before we were only demoting glocks without any active
      holders.  This allows processes to keep hold of locks that may form a
      cyclic locking dependency; the core glock logic will then break those
      dependencies in case a conflicting locking request occurs.  We'll use
      this to avoid giving up the inode glock proactively before faulting in
      pages.
      
      Processes that allow a glock holder to be taken away indicate this by
      calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag.
      Later, they call gfs2_holder_disallow_demote() to clear the flag again,
      and then they check if their holder is still queued: if it is, they are
      still holding the glock; if it isn't, they can re-acquire the glock (or
      abort).
      Signed-off-by: default avatarBob Peterson <rpeterso@redhat.com>
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      dc732906
    • Andreas Gruenbacher's avatar
      gfs2: Clean up function may_grant · 61444649
      Andreas Gruenbacher authored
      Pass the first current glock holder into function may_grant and
      deobfuscate the logic there.
      
      While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant.  To make
      that build cleanly, de-constify the may_grant arguments.
      
      We're now using function find_first_holder in do_promote, so move the
      function's definition above do_promote.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      61444649
    • Andreas Gruenbacher's avatar
      gfs2: Add wrapper for iomap_file_buffered_write · 2eb7509a
      Andreas Gruenbacher authored
      Add a wrapper around iomap_file_buffered_write.  We'll add code for when
      the operation needs to be retried here later.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      2eb7509a
    • Andreas Gruenbacher's avatar
      iov_iter: Introduce fault_in_iov_iter_writeable · cdd591fc
      Andreas Gruenbacher authored
      Introduce a new fault_in_iov_iter_writeable helper for safely faulting
      in an iterator for writing.  Uses get_user_pages() to fault in the pages
      without actually writing to them, which would be destructive.
      
      We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that
      the iterator passed to .read_iter isn't in memory.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      cdd591fc
  4. 18 Oct, 2021 3 commits
    • Andreas Gruenbacher's avatar
      iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable · a6294593
      Andreas Gruenbacher authored
      Turn iov_iter_fault_in_readable into a function that returns the number
      of bytes not faulted in, similar to copy_to_user, instead of returning a
      non-zero value when any of the requested pages couldn't be faulted in.
      This supports the existing users that require all pages to be faulted in
      as well as new users that are happy if any pages can be faulted in.
      
      Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make
      sure this change doesn't silently break things.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      a6294593
    • Andreas Gruenbacher's avatar
      gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} · bb523b40
      Andreas Gruenbacher authored
      Turn fault_in_pages_{readable,writeable} into versions that return the
      number of bytes not faulted in, similar to copy_to_user, instead of
      returning a non-zero value when any of the requested pages couldn't be
      faulted in.  This supports the existing users that require all pages to
      be faulted in as well as new users that are happy if any pages can be
      faulted in.
      
      Rename the functions to fault_in_{readable,writeable} to make sure
      this change doesn't silently break things.
      
      Neither of these functions is entirely trivial and it doesn't seem
      useful to inline them, so move them to mm/gup.c.
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      bb523b40
    • Andreas Gruenbacher's avatar
      powerpc/kvm: Fix kvm_use_magic_page · 0c8eb288
      Andreas Gruenbacher authored
      When switching from __get_user to fault_in_pages_readable, commit
      9f9eae5c broke kvm_use_magic_page: like __get_user,
      fault_in_pages_readable returns 0 on success.
      
      Fixes: 9f9eae5c ("powerpc/kvm: Prefer fault_in_pages_readable function")
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      0c8eb288
  5. 12 Oct, 2021 1 commit
  6. 11 Oct, 2021 1 commit
  7. 10 Oct, 2021 3 commits
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · efb52a7d
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       "A bit of a big batch, partly because I didn't send any last week, and
        also just because the BPF fixes happened to land this week.
      
        Summary:
      
         - Fix a regression hit by the IPR SCSI driver, introduced by the
           recent addition of MSI domains on pseries.
      
         - A big series including 8 BPF fixes, some with potential security
           impact and the rest various code generation issues.
      
         - Fix our program check assembler entry path, which was accidentally
           jumping into a gas macro and generating strange stack frames, which
           could confuse find_bug().
      
         - A couple of fixes, and related changes, to fix corner cases in our
           machine check handling.
      
         - Fix our DMA IOMMU ops, which were not always returning the optimal
           DMA mask, leading to at least one device falling back to 32-bit DMA
           when it shouldn't.
      
         - A fix for KUAP handling on 32-bit Book3S.
      
         - Fix crashes seen when kdumping on some pseries systems.
      
        Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric
        Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem,
        Christoph Hellwig, Johan Almbladh, Stan Johnson"
      
      * tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init
        powerpc/32s: Fix kuap_kernel_restore()
        powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler
        powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
        powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
        powerpc/64: warn if local irqs are enabled in NMI or hardirq context
        powerpc/traps: do not enable irqs in _exception
        powerpc/64s: fix program check interrupt emergency stack path
        powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000
        powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END
        powerpc/bpf ppc32: Fix JMP32_JSET_K
        powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation
        powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
        powerpc/security: Add a helper to query stf_barrier type
        powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
        powerpc/bpf: Fix BPF_MOD when imm == 1
        powerpc/bpf: Validate branch ranges
        powerpc/lib: Add helper to check if offset is within conditional branch range
        powerpc/iommu: Report the correct most efficient DMA mask for PCI devices
      efb52a7d
    • Linus Torvalds's avatar
      Merge tag 'objtool_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 75cd9b01
      Linus Torvalds authored
      Pull objtool fixes from Borislav Petkov:
      
       - Remove an extra section.len member in favour of section.sh_size
      
       - Align .altinstructions section creation with the kernel's by creating
         them with entry size of 0
      
       - Fix objtool to convert a reloc symbol to a section offset and not to
         not warn about not knowing how
      
      * tag 'objtool_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Remove redundant 'len' field from struct section
        objtool: Make .altinstructions section entry size consistent
        objtool: Remove reloc symbol type checks in get_alt_entry()
      75cd9b01
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c22ccc4a
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - A FPU fix to properly handle invalid MXCSR values: 32-bit masks them
         out due to historical reasons and 64-bit kernels reject them
      
       - A fix to clear X86_FEATURE_SMAP when support for is not
         config-enabled
      
       - Three fixes correcting misspelled Kconfig symbols used in code
      
       - Two resctrl object cleanup fixes
      
       - Yet another attempt at fixing the neverending saga of botched x86
         timers, this time because some incredibly smart hardware decides to
         turn off the HPET timer in a low power state - who cares if the OS is
         relying on it...
      
       - Check the full return value range of an SEV VMGEXIT call to determine
         whether it returned an error
      
      * tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/fpu: Restore the masking out of reserved MXCSR bits
        x86/Kconfig: Correct reference to MWINCHIP3D
        x86/platform/olpc: Correct ifdef symbol to intended CONFIG_OLPC_XO15_SCI
        x86/entry: Clear X86_FEATURE_SMAP when CONFIG_X86_SMAP=n
        x86/entry: Correct reference to intended CONFIG_64_BIT
        x86/resctrl: Fix kfree() of the wrong type in domain_add_cpu()
        x86/resctrl: Free the ctrlval arrays when domain_setup_mon_state() fails
        x86/hpet: Use another crystalball to evaluate HPET usability
        x86/sev: Return an error on a returned non-zero SW_EXITINFO1[31:0]
      c22ccc4a
  8. 09 Oct, 2021 5 commits
  9. 08 Oct, 2021 5 commits