- 25 Oct, 2021 21 commits
-
-
Alexander Aring authored
Add a might_sleep call into gfs2_glock_put which can sleep in DLM when the last reference is released. This will show problems earlier, and not only when the last reference is put. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
So far, glock_hash_walk took a reference on each glock it iterated over, and it was the examiner's responsibility to drop those references. Dropping the final reference to a glock can sleep and the examiners are called in a RCU critical section with spin locks held, so examiners that didn't need the extra reference had to drop it asynchronously via gfs2_glock_queue_put or similar. This wasn't done correctly in thaw_glock which did call gfs2_glock_put, and not at all in dump_glock_func. Change glock_hash_walk to not take glock references at all. That way, the examiners that don't need them won't have to bother with slow asynchronous puts, and the examiners that do need references can take them themselves. Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
In gfs2_inode_lookup and gfs2_create_inode, we're calling gfs2_cancel_delete_work which currently cancels any remote delete work (delete_work_func) synchronously. This means that if the work is currently running, it will wait for it to finish. We're doing this to pevent a previous instance of an inode from having any influence on the next instance. However, delete_work_func uses gfs2_inode_lookup internally, and we can end up in a deadlock when delete_work_func gets interrupted at the wrong time. For example, (1) An inode's iopen glock has delete work queued, but the inode itself has been evicted from the inode cache. (2) The delete work is preempted before reaching gfs2_inode_lookup. (3) Another process recreates the inode (gfs2_create_inode). It tries to cancel any outstanding delete work, which blocks waiting for the ongoing delete work to finish. (4) The delete work calls gfs2_inode_lookup, which blocks waiting for gfs2_create_inode to instantiate and unlock the new inode => deadlock. It turns out that when the delete work notices that its inode has been re-instantiated, it will do nothing. This means that it's safe to cancel the delete work asynchronously. This prevents the kind of deadlock described above. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Bob Peterson authored
Before this patch, function gfs2_create_inode called glock_set_object to set the gl_object for inode and iopen glocks before the glock was locked. That's wrong because other competing processes like evict may be blocked waiting for the glock and still have gl_object set before the actual eviction can take place. This patch moves the call to glock_set_object until after the glock is acquire in function gfs2_create_inode, so it waits for possibly competing evicts to finish their processing first. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
The new GLF_INSTANTIATE_NEEDED flag obsoletes the old rgrp flag GFS2_RDF_UPTODATE, so this patch replaces it like we did with inodes. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
With the addition of the new GLF_INSTANTIATE_NEEDED flag, the GIF_INVALID flag is now redundant. This patch removes it. Since inode_instantiate is only called when instantiation is needed, the check in inode_instantiate is removed too. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, when a glock was locked, the very first holder on the queue would unlock the lockref and call the go_instantiate glops function (if one existed), unless GL_SKIP was specified. When we introduced the new node-scope concept, we allowed multiple holders to lock glocks in EX mode and share the lock. But node-scope introduced a new problem: if the first holder has GL_SKIP and the next one does NOT, since it is not the first holder on the queue, the go_instantiate op was not called. Eventually the GL_SKIP holder may call the instantiate sub-function (e.g. gfs2_rgrp_bh_get) but there was still a window of time in which another non-GL_SKIP holder assumes the instantiate function had been called by the first holder. In the case of rgrp glocks, this led to a NULL pointer dereference on the buffer_heads. This patch tries to fix the problem by introducing two new glock flags: GLF_INSTANTIATE_NEEDED, which keeps track of when the instantiate function needs to be called to "fill in" or "read in" the object before it is referenced. GLF_INSTANTIATE_IN_PROG which is used to determine when a process is in the process of reading in the object. Whenever a function needs to reference the object, it checks the GLF_INSTANTIATE_NEEDED flag, and if set, it sets GLF_INSTANTIATE_IN_PROG and calls the glops "go_instantiate" function. As before, the gl_lockref spin_lock is unlocked during the IO operation, which may take a relatively long amount of time to complete. While unlocked, if another process determines go_instantiate is still needed, it sees GLF_INSTANTIATE_IN_PROG is set, and waits for the go_instantiate glop operation to be completed. Once GLF_INSTANTIATE_IN_PROG is cleared, it needs to check GLF_INSTANTIATE_NEEDED again because the other process's go_instantiate operation may not have been successful. Functions that previously called the instantiate sub-functions now call directly into gfs2_instantiate so the new bits are managed properly. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, function do_promote had a section of code that did the actual instantiation. This patch splits that off into its own function, gfs2_instantiate, which prepares us for the next patch that will use that function. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
This patch further simplifies function do_promote by eliminating some redundant code in favor of using a lock_released flag. This is just prep work for a future patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
This patch simply re-factors function do_promote to reduce the indents. The logic should be unchanged. This makes future patches more readable. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Remove the 'first' argument of trace_gfs2_promote: with GL_SKIP, the 'first' holder isn't the one that instantiates the glock (gl_instantiate), which is what the 'first' flag was apparently supposed to indicate. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, the go_lock glock operations (glops) did not do any actual locking. They were used to instantiate objects, like reading in dinodes and rgrps from the media. This patch renames the functions to go_instantiate for clarity. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, failed consistency checks printed out the object that failed, but not the object's glock. This patch makes it also print out the object glock so we can see the glock's holders and flags to aid with debugging. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, if function gfs2_inode_lookup encountered an error after it had locked the iopen glock, it never unlocked it, relying on the evict code to do the cleanup. The evict code then took the inode glock while holding the iopen glock, which violates the locking order. For example, (1) node A does a gfs2_inode_lookup that fails, leaving the iopen glock locked. (2) node B calls delete_work_func -> gfs2_lookup_by_inum -> gfs2_inode_lookup. It locks the inode glock and blocks trying to lock the iopen glock, which is held by node A. (3) node A eventually calls gfs2_evict_inode -> evict_should_delete. It blocks trying to lock the inode glock, which is now held by node B. This patch introduces error handling to function gfs2_inode_lookup so it properly dequeues held iopen glocks on errors. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Before this patch, when a glock was locked by function gfs2_glock_nq_init, it initialized the holder gh_ip (return address) as gfs2_glock_nq_init. That made it extremely difficult to track down problems because many functions call gfs2_glock_nq_init. This patch changes the function so that it saves gh_ip from the caller of gfs2_glock_nq_init, which makes it easy to backtrack which holder took the lock. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Bob Peterson authored
Before this patch, function do_gfs2_set_flags checked if the append and immutable flags were being set while already set. If so, error -EPERM was given. There's no reason why these two flags should be mutually exclusive, and if you set them separately, you will, in essence, set one while it is already set. For example: chattr +a /mnt/gfs2/file1 chattr +i /mnt/gfs2/file1 The first command sets the append-only flag. Since they are additive, the second command sets the immutable flag AND append-only flag, since they both coexist in i_diskflags. So the second command should not return an error. This bug caused xfstests generic/545 to fail. This patch simply removes the invalid checks. I also eliminated an unused parm from do_gfs2_set_flags. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
In rgrp.c, there are several places where it does BUG_ON. This tells us the call stack but nothing more, which is not very helpful. This patch switches them to GLOCK_BUG_ON which also prints the glock, its holders, and many of the rgrp values, which will help us debug problems in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, each individual "go_lock" glock operation (glop) checked the GL_SKIP flag, and if set, would skip further processing. This patch changes the logic so the go_lock caller, function go_promote, checks the GL_SKIP flag before calling the go_lock op in the first place. This avoids having to unnecessarily unlock gl_lockref.lock only to re-lock it again. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Somehow, the GL_SKIP flag was missed when dumping glock holders. This patch adds it to function hflags2str. I added it at the end because I wanted Holder and Skip flags together to read "Hs" rather than "sH" to avoid confusion with "Shared" ("SH") holder state. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, function gfs2_rgrp_go_lock checked if GL_SKIP and ar_rgrplvb were both true. However, GL_SKIP is only set for rgrps if ar_rgrplvb is true (see gfs2_inplace_reserve). This patch simply removes the redundant check. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Also disable page faults during direct I/O requests and implement a similar kind of retry logic as in the buffered I/O case. The retry logic in the direct I/O case differs from the buffered I/O case in the following way: direct I/O doesn't provide the kinds of consistency guarantees between concurrent reads and writes that buffered I/O provides, so once we lose the inode glock while faulting in user pages, we always resume the operation. We never need to return a partial read or write. This locking problem was originally reported by Jan Kara. Linus came up with the idea of disabling page faults. Many thanks to Al Viro and Matthew Wilcox for their feedback. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 24 Oct, 2021 6 commits
-
-
Andreas Gruenbacher authored
Introduce a new nofault flag to indicate to iov_iter_get_pages not to fault in user pages. This is implemented by passing the FOLL_NOFAULT flag to get_user_pages, which causes get_user_pages to fail when it would otherwise fault in a page. We'll use the ->nofault flag to prevent iomap_dio_rw from faulting in pages when page faults are not allowed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Introduce a new FOLL_NOFAULT flag that causes get_user_pages to return -EFAULT when it would otherwise trigger a page fault. This is roughly similar to FOLL_FAST_ONLY but available on all architectures, and less fragile. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Add a done_before argument to iomap_dio_rw that indicates how much of the request has already been transferred. When the request succeeds, we report that done_before additional bytes were tranferred. This is useful for finishing a request asynchronously when part of the request has already been completed synchronously. We'll use that to allow iomap_dio_rw to be used with page faults disabled: when a page fault occurs while submitting a request, we synchronously complete the part of the request that has already been submitted. The caller can then take care of the page fault and call iomap_dio_rw again for the rest of the request, passing in the number of bytes already tranferred. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
-
Andreas Gruenbacher authored
In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and return a partial result. This allows the caller to deal with the page fault and retry the remainder of the request. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
-
Andreas Gruenbacher authored
When a user copy fails in one of the helpers of iomap_dio_rw, fail with -EFAULT instead of returning 0. This matches what iomap_dio_bio_actor returns when it gets an -EFAULT from bio_iov_iter_get_pages. With these changes, iomap_dio_actor now consistently fails with -EFAULT when a user page cannot be faulted in. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
Andreas Gruenbacher authored
In the .read_iter and .write_iter file operations, we're accessing user-space memory while holding the inode glock. There is a possibility that the memory is mapped to the same file, in which case we'd recurse on the same glock. We could detect and work around this simple case of recursive locking, but more complex scenarios exist that involve multiple glocks, processes, and cluster nodes, and working around all of those cases isn't practical or even possible. Avoid these kinds of problems by disabling page faults while holding the inode glock. If a page fault would occur, we either end up with a partial read or write or with -EFAULT if nothing could be read or written. In either case, we know that we're not done with the operation, so we indicate that we're willing to give up the inode glock and then we fault in the missing pages. If that made us lose the inode glock, we return a partial read or write. Otherwise, we resume the operation. This locking problem was originally reported by Jan Kara. Linus came up with the idea of disabling page faults. Many thanks to Al Viro and Matthew Wilcox for their feedback. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 20 Oct, 2021 6 commits
-
-
Andreas Gruenbacher authored
Now that gfs2_file_buffered_write is the only remaining user of ip->i_gh, we can move the glock holder to the stack (or rather, use the one we already have on the stack); there is no need for keeping the holder in the inode anymore. This is slightly complicated by the fact that we're using ip->i_gh for the statfs inode in gfs2_file_buffered_write as well. Writing to the statfs inode isn't very common, so allocate the statfs holder dynamically when needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
So far, for buffered writes, we were taking the inode glock in gfs2_iomap_begin and dropping it in gfs2_iomap_end with the intention of not holding the inode glock while iomap_write_actor faults in user pages. It turns out that iomap_write_actor is called inside iomap_begin ... iomap_end, so the user pages were still faulted in while holding the inode glock and the locking code in iomap_begin / iomap_end was completely pointless. Move the locking into gfs2_file_buffered_write instead. We'll take care of the potential deadlocks due to faulting in user pages while holding a glock in a subsequent patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
This patch introduces a new HIF_MAY_DEMOTE flag and infrastructure that will allow glocks to be demoted automatically on locking conflicts. When a locking request comes in that isn't compatible with the locking state of an active holder and that holder has the HIF_MAY_DEMOTE flag set, the holder will be demoted before the incoming locking request is granted. Note that this mechanism demotes active holders (with the HIF_HOLDER flag set), while before we were only demoting glocks without any active holders. This allows processes to keep hold of locks that may form a cyclic locking dependency; the core glock logic will then break those dependencies in case a conflicting locking request occurs. We'll use this to avoid giving up the inode glock proactively before faulting in pages. Processes that allow a glock holder to be taken away indicate this by calling gfs2_holder_allow_demote(), which sets the HIF_MAY_DEMOTE flag. Later, they call gfs2_holder_disallow_demote() to clear the flag again, and then they check if their holder is still queued: if it is, they are still holding the glock; if it isn't, they can re-acquire the glock (or abort). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Pass the first current glock holder into function may_grant and deobfuscate the logic there. While at it, switch from BUG_ON to GLOCK_BUG_ON in may_grant. To make that build cleanly, de-constify the may_grant arguments. We're now using function find_first_holder in do_promote, so move the function's definition above do_promote. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Add a wrapper around iomap_file_buffered_write. We'll add code for when the operation needs to be retried here later. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Introduce a new fault_in_iov_iter_writeable helper for safely faulting in an iterator for writing. Uses get_user_pages() to fault in the pages without actually writing to them, which would be destructive. We'll use fault_in_iov_iter_writeable in gfs2 once we've determined that the iterator passed to .read_iter isn't in memory. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 18 Oct, 2021 3 commits
-
-
Andreas Gruenbacher authored
Turn iov_iter_fault_in_readable into a function that returns the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename iov_iter_fault_in_readable to fault_in_iov_iter_readable to make sure this change doesn't silently break things. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
Turn fault_in_pages_{readable,writeable} into versions that return the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename the functions to fault_in_{readable,writeable} to make sure this change doesn't silently break things. Neither of these functions is entirely trivial and it doesn't seem useful to inline them, so move them to mm/gup.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
When switching from __get_user to fault_in_pages_readable, commit 9f9eae5c broke kvm_use_magic_page: like __get_user, fault_in_pages_readable returns 0 on success. Fixes: 9f9eae5c ("powerpc/kvm: Prefer fault_in_pages_readable function") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 12 Oct, 2021 1 commit
-
-
Andreas Gruenbacher authored
Both iov_iter_get_pages and iov_iter_get_pages_alloc return the number of bytes of the iovec they could get the pages for. When they cannot get any pages, they're supposed to return 0, but when the start of the iovec isn't page aligned, the calculation goes wrong and they return a negative value. Fix both functions. In addition, change iov_iter_get_pages_alloc to return NULL in that case to prevent resource leaks. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
-
- 11 Oct, 2021 1 commit
-
-
Linus Torvalds authored
-
- 10 Oct, 2021 2 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linuxLinus Torvalds authored
Pull powerpc fixes from Michael Ellerman: "A bit of a big batch, partly because I didn't send any last week, and also just because the BPF fixes happened to land this week. Summary: - Fix a regression hit by the IPR SCSI driver, introduced by the recent addition of MSI domains on pseries. - A big series including 8 BPF fixes, some with potential security impact and the rest various code generation issues. - Fix our program check assembler entry path, which was accidentally jumping into a gas macro and generating strange stack frames, which could confuse find_bug(). - A couple of fixes, and related changes, to fix corner cases in our machine check handling. - Fix our DMA IOMMU ops, which were not always returning the optimal DMA mask, leading to at least one device falling back to 32-bit DMA when it shouldn't. - A fix for KUAP handling on 32-bit Book3S. - Fix crashes seen when kdumping on some pseries systems. Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem, Christoph Hellwig, Johan Almbladh, Stan Johnson" * tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init powerpc/32s: Fix kuap_kernel_restore() powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler powerpc/64s: Fix unrecoverable MCE calling async handler from NMI powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG powerpc/64: warn if local irqs are enabled in NMI or hardirq context powerpc/traps: do not enable irqs in _exception powerpc/64s: fix program check interrupt emergency stack path powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000 powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END powerpc/bpf ppc32: Fix JMP32_JSET_K powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC powerpc/security: Add a helper to query stf_barrier type powerpc/bpf: Fix BPF_SUB when imm == 0x80000000 powerpc/bpf: Fix BPF_MOD when imm == 1 powerpc/bpf: Validate branch ranges powerpc/lib: Add helper to check if offset is within conditional branch range powerpc/iommu: Report the correct most efficient DMA mask for PCI devices
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull objtool fixes from Borislav Petkov: - Remove an extra section.len member in favour of section.sh_size - Align .altinstructions section creation with the kernel's by creating them with entry size of 0 - Fix objtool to convert a reloc symbol to a section offset and not to not warn about not knowing how * tag 'objtool_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: objtool: Remove redundant 'len' field from struct section objtool: Make .altinstructions section entry size consistent objtool: Remove reloc symbol type checks in get_alt_entry()
-