- 27 Feb, 2020 4 commits
-
-
Bob Peterson authored
Before this patch, function gfs2_write_revokes would call gfs2_ail1_empty, then traverse the sd_ail1_list looking for transactions that had bds which were no longer queued to a glock. And if it found some, it would try to issue revokes for them, up to a predetermined maximum. There were two problems with how it did this. First was the fact that gfs2_ail1_empty moves transactions which have nothing remaining on the ail1 list from the sd_ail1_list to the sd_ail2_list, thus making its traversal of sd_ail1_list miss them completely, and therefore, never issue revokes for them. Second was the fact that there were three traversals (or partial traversals) of the sd_ail1_list, each of which took and then released the sd_ail_lock lock: First inside gfs2_ail1_empty, second to determine if there are any revokes to be issued, and third to actually issue them. All this taking and releasing of the sd_ail_lock meant other processes could modify the lists and the conditions in which we're working. This patch simplies the whole process by adding a new parameter to function gfs2_ail1_empty, max_revokes. For normal calls, this is passed in as 0, meaning we don't want to issue any revokes. For function gfs2_write_revokes, we pass in the maximum number of revokes we can, thus allowing gfs2_ail1_empty to add the revokes where needed. This simplies the code, allows for a single holding of the sd_ail_lock, and allows gfs2_ail1_empty to add revokes for all the necessary bd items without missing any. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, function check_journal_clean would give messages related to journal recovery. That's fine for mount time, but when a node withdraws and forces replay that way, we don't want all those distracting and misleading messages. This patch adds a new parameter to make those messages optional. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, the rgrp_go_inval and inode_go_inval functions each checked if there were any items left on the ail count (by way of a count), and if so, did a withdraw. But the withdraw code now uses glocks when changing the file system to read-only status. So we can not have glock functions withdrawing or a hang will likely result: The glocks can't be serviced by the work_func if the work_func is busy doing its own withdraw. This patch removes the checks from the go_inval functions and adds a centralized check in do_xmote to warn about the problem and not withdraw, but flag the error so it's eventually caught when the logd daemon eventually runs. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
When a node withdraws from a file system, it often leaves its journal in an incomplete state. This is especially true when the withdraw is caused by io errors writing to the journal. Before this patch, a withdraw would try to write a "shutdown" record to the journal, tell dlm it's done with the file system, and none of the other nodes know about the problem. Later, when the problem is fixed and the withdrawn node is rebooted, it would then discover that its own journal was incomplete, and replay it. However, replaying it at this point is almost guaranteed to introduce corruption because the other nodes are likely to have used affected resource groups that appeared in the journal since the time of the withdraw. Replaying the journal later will overwrite any changes made, and not through any fault of dlm, which was instructed during the withdraw to release those resources. This patch makes file system withdraws seen by the entire cluster. Withdrawing nodes dequeue their journal glock to allow recovery. The remaining nodes check all the journals to see if they are clean or in need of replay. They try to replay dirty journals, but only the journals of withdrawn nodes will be "not busy" and therefore available for replay. Until the journal replay is complete, no i/o related glocks may be given out, to ensure that the replay does not cause the aforementioned corruption: We cannot allow any journal replay to overwrite blocks associated with a glock once it is held. The "live" glock which is now used to signal when a withdraw occurs. When a withdraw occurs, the node signals its withdraw by dequeueing the "live" glock and trying to enqueue it in EX mode, thus forcing the other nodes to all see a demote request, by way of a "1CB" (one callback) try lock. The "live" glock is not granted in EX; the callback is only just used to indicate a withdraw has occurred. Note that all nodes in the cluster must wait for the recovering node to finish replaying the withdrawing node's journal before continuing. To this end, it checks that the journals are clean multiple times in a retry loop. Also note that the withdraw function may be called from a wide variety of situations, and therefore, we need to take extra precautions to make sure pointers are valid before using them in many circumstances. We also need to take care when glocks decide to withdraw, since the withdraw code now uses glocks. Also, before this patch, if a process encountered an error and decided to withdraw, if another process was already withdrawing, the second withdraw would be silently ignored, which set it free to unlock its glocks. That's correct behavior if the original withdrawer encounters further errors down the road. But if secondary waiters don't wait for the journal replay, unlocking glocks will allow other nodes to use them, despite the fact that the journal containing those blocks is being replayed. The replay needs to finish before our glocks are released to other nodes. IOW, secondary withdraws need to wait for the first withdraw to finish. For example, if an rgrp glock is unlocked by a process that didn't wait for the first withdraw, a journal replay could introduce file system corruption by replaying a rgrp block that has already been granted to a different cluster node. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
- 20 Feb, 2020 1 commit
-
-
Bob Peterson authored
We need to allow some glocks to be enqueued, dequeued, promoted, and demoted when we're withdrawn. For example, to maintain metadata integrity, we should disallow the use of inode and rgrp glocks when withdrawn. Other glocks, like iopen or the transaction glocks may be safely used because none of their metadata goes through the journal. So in general, we should disallow all glocks with an address space, and allow all the others. One exception is: we need to allow our active journal to be demoted so others may recover it. Allowing glocks after withdraw gives us the ability to take appropriate action (in a following patch) to have our journal properly replayed by another node rather than just abandoning the current transactions and pretending nothing bad happened, leaving the other nodes free to modify the blocks we had in our journal, which may result in file system corruption. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
- 10 Feb, 2020 12 commits
-
-
Bob Peterson authored
Before this patch function check_journal_clean was in ops_fstype.c. This patch moves it to util.c so we can make use of it elsewhere in a future patch. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
When a node fails, user space informs dlm of the node failure, and dlm instructs gfs2 on the surviving nodes to perform journal recovery. It does this by calling various callback functions in lock_dlm.c. To mark its progress, it keeps generation numbers and recover bits in a dlm "control" lock lvb, which is seen by all nodes to determine which journals need to be replayed. The gfs2 on all nodes get the same recovery requests from dlm, so they all try to do the recovery, but only one will be granted the exclusive lock on the journal. The others fail with a "Busy" message on their "try lock." However, when a node is withdrawn, it cannot safely do any recovery or replay any journals. To make matters worse, gfs2 might withdraw as a result of attempting recovery. For example, this might happen if the device goes offline, or if an hba fails. But in today's gfs2 code, it doesn't check for being withdrawn at any step in the recovery process. What's worse is that these callbacks from dlm have no return code, so there is no way to indicate failure back to dlm. We can send a "Recovery failed" uevent eventually, but that tells user space what happened, not dlm's kernel code. Before this patch, lock_dlm would perform its recovery steps but ignore the result, and eventually it would still update its generation number in the lvb, despite the fact that it may have withdrawn or encountered an error. The other nodes would then see the newer generation number in the lvb and conclude that they don't need to do recovery because the generation number is newer than the last one they saw. They think a different node has already recovered the journal. This patch adds checks to several of the callbacks used by dlm in its recovery state machine so that the functions are ignored and skipped if an io error has occurred or if the file system is withdrawn. That prevents the lvb bits from being updated, and therefore dlm and user space still see the need for recovery to take place. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, all io errors received by the quota daemon or the logd daemon would cause a complaint message to be issued, such as: gfs2: fsid=dm-13.0: Error 10 writing to journal, jid=0 This patch changes it so that the error message is only issued the first time the error is encountered. Also, before this patch function gfs2_end_log_write did not set the sd_log_error value, so log errors would not cause the file system to be withdrawn. This patch sets the error code so the file system is properly withdrawn if an io error is encountered writing to the journal. WARNING: This change in function breaks check xfstests generic/441 and causes it to fail: io errors writing to the log should cause a file system to be withdrawn, and no further operations are tolerated. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, gfs2 kept track of journal io errors in two places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags. This patch consolidates the two into sd_log_error so that it reflects the first error encountered writing to the journal. In future patches, we will take advantage of this by checking this value rather than having to check both when reacting to io errors. In addition, this fixes a tight loop in unmount: If buffers get on the ail1 list and an io error occurs elsewhere, the ail1 list would never be cleared because they were always busy. So unmount would hang, waiting for the ail1 list to empty. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
Before this patch, the rgrp code had a serious problem related to how it managed buffer_heads for resource groups. The problem caused file system corruption, especially in cases of journal replay. When an rgrp glock was demoted to transfer ownership to a different cluster node, do_xmote() first calls rgrp_go_sync and then rgrp_go_inval, as expected. When it calls rgrp_go_sync, that called gfs2_rgrp_brelse() that dropped the buffer_head reference count. In most cases, the reference count went to zero, which is right. However, there were other places where the buffers are handled differently. After rgrp_go_sync, do_xmote called rgrp_go_inval which called gfs2_rgrp_brelse a second time, then rgrp_go_inval's call to truncate_inode_pages_range would get rid of the pages in memory, but only if the reference count drops to 0. Unfortunately, gfs2_rgrp_brelse was setting bi->bi_bh = NULL. So when rgrp_go_sync called gfs2_rgrp_brelse, it lost the pointer to the buffer_heads in cases where the reference count was still 1. Therefore, when rgrp_go_inval called gfs2_rgrp_brelse a second time, it failed the check for "if (bi->bi_bh)" and thus failed to call brelse a second time. Because of that, the reference count on those buffers sometimes failed to drop from 1 to 0. And that caused function truncate_inode_pages_range to keep the pages in page cache rather than freeing them. The next time the rgrp glock was acquired, the metadata read of the rgrp buffers re-used the pages in memory, which were now wrong because they were likely modified by the other node who acquired the glock in EX (which is why we demoted the glock). This re-use of the page cache caused corruption because changes made by the other nodes were never seen, so the bitmaps were inaccurate. For some reason, the problem became most apparent when journal replay forced the replay of rgrps in memory, which caused newer rgrp data to be overwritten by the older in-core pages. A big part of the problem was that the rgrp buffer were released in multiple places: The go_unlock function would release them when the glock was released rather than when the glock is demoted, which is clearly wrong because our intent was to cache them until the glock is demoted from SH or EX. This patch attempts to clean up the mess and make one consistent and centralized mechanism for managing the rgrp buffer_heads by implementing several changes: 1. It eliminates the call to gfs2_rgrp_brelse() from rgrp_go_sync. We don't want to release the buffers or zero the pointers when syncing for the reasons stated above. It only makes sense to release them when the glock is actually invalidated (go_inval). And when we do, then we set the bh pointers to NULL. 2. The go_unlock function (which was only used for rgrps) is eliminated, as we've talked about doing many times before. The go_unlock function was called too early in the glock dq process, and should not happen until the glock is invalidated. 3. It also eliminates the call to rgrp_brelse in gfs2_clear_rgrpd. That will now happen automatically when the rgrp glocks are demoted, and shouldn't happen any sooner or later than that. Instead, function gfs2_clear_rgrpd has been modified to demote the rgrp glocks, and therefore, free those pages, before the remaining glocks are culled by gfs2_gl_hash_clear. This prevents the gl_object from hanging around when the glocks are culled. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
This patch fixes a bug in which function gfs2_log_flush can get into an infinite loop when a gfs2 file system is withdrawn. The problem is the infinite loop "for (;;)" in gfs2_log_flush which would never finish because the io error and subsequent withdraw prevented the items from being taken off the ail list. This patch tries to clean up the mess by allowing withdraw situations to move not-in-flight buffer_heads to the ail2 list, where they will be dealt with later. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Bob Peterson authored
File system withdraws can be delayed when inconsistencies are discovered when we cannot withdraw immediately, for example, when critical spin_locks are held. But delaying the withdraw can cause gfs2 to ignore the error and keep running for a short period of time. For example, an rgrp glock may be dequeued and demoted while there are still buffers that haven't been properly revoked, due to io errors writing to the journal. This patch introduces a new concept of a pending withdraw, which means an inconsistency has been discovered and we need to withdraw at the earliest possible opportunity. In these cases, we aren't quite withdrawn yet, but we still need to not dequeue glocks and other critical things. If we dequeue the glocks and the withdraw results in our journal being replayed, the replay could overwrite data that's been modified by a different node that acquired the glock in the meantime. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Andreas Gruenbacher authored
The gfs2_assert functions only print messages when the filesystem hasn't been withdrawn yet, and they indicate whether or not they've printed something in their return value. However, none of the callers use that information, so simply return whether or not the assert has failed. (The gfs2_assert functions are still backwards; they return false when an assertion is true.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Andreas Gruenbacher authored
Change the various gfs2_consist functions to return void. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Andreas Gruenbacher authored
These arguments are always passed as 0, and they are never evaluated. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Andreas Gruenbacher authored
In gfs2_rgrp_verify and compute_bitstructs, make sure to report errors before withdrawing the filesystem: otherwise, when we withdraw first and withdraw is configured to panic, we'll never get to the error reporting. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
Andreas Gruenbacher authored
Split gfs2_lm_withdraw into a function that prints an error message and a function that withdraws the filesystem. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
-
- 06 Feb, 2020 3 commits
-
-
Andreas Gruenbacher authored
In gfs2_file_write_iter, for direct writes, the error checking in the buffered write fallback case is incomplete. This can cause inode write errors to go undetected. Fix and clean up gfs2_file_write_iter along the way. Based on a proposed fix by Christoph Hellwig <hch@lst.de>. Fixes: 967bcc91 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Christoph Hellwig authored
Set current->backing_dev_info just around the buffered write calls to prepare for the next fix. Fixes: 967bcc91 ("gfs2: iomap direct I/O support") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
Abhi Das authored
When the first log header in a journal happens to have a sequence number of 0, a bug in gfs2_find_jhead() causes it to prematurely exit, and return an uninitialized jhead with seq 0. This can cause failures in the caller. For instance, a mount fails in one test case. The correct behavior is for it to continue searching through the journal to find the correct journal head with the highest sequence number. Fixes: f4686c26 ("gfs2: read journal in large chunks") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Abhi Das <adas@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
-
- 31 Jan, 2020 20 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2Linus Torvalds authored
Pull gfs2 updates from Andreas Gruenbacher: - Fix some corner cases on filesystems with a block size < page size. - Fix a corner case that could expose incorrect access times over nfs. - Revert an otherwise sensible revoke accounting cleanup that causes assertion failures. The revoke accounting is whacky and needs to be fixed properly before we can add back this cleanup. - Various other minor cleanups. In addition, please expect to see another pull request from Bob Peterson about his gfs2 recovery patch queue shortly. * tag 'gfs2-for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: eliminate tr_num_revoke_rm" gfs2: remove unused LBIT macros fs/gfs2: remove unused IS_DINODE and IS_LEAF macros gfs2: Remove GFS2_MIN_LVB_SIZE define gfs2: Fix incorrect variable name gfs2: Avoid access time thrashing in gfs2_inode_lookup gfs2: minor cleanup: remove unneeded variable ret in gfs2_jdata_writepage gfs2: eliminate ssize parameter from gfs2_struct2blk gfs2: Another gfs2_find_jhead fix
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull iomap fix from Darrick Wong: "A single patch fixing an off-by-one error when we're checking to see how far we're gotten into an EOF page" * tag 'iomap-5.6-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: fs: Fix page_mkwrite off-by-one errors
-
Linus Torvalds authored
Pull updates from Andrew Morton: "Most of -mm and quite a number of other subsystems: hotfixes, scripts, ocfs2, misc, lib, binfmt, init, reiserfs, exec, dma-mapping, kcov. MM is fairly quiet this time. Holidays, I assume" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (118 commits) kcov: ignore fault-inject and stacktrace include/linux/io-mapping.h-mapping: use PHYS_PFN() macro in io_mapping_map_atomic_wc() execve: warn if process starts with executable stack reiserfs: prevent NULL pointer dereference in reiserfs_insert_item() init/main.c: fix misleading "This architecture does not have kernel memory protection" message init/main.c: fix quoted value handling in unknown_bootoption init/main.c: remove unnecessary repair_env_string in do_initcall_level init/main.c: log arguments and environment passed to init fs/binfmt_elf.c: coredump: allow process with empty address space to coredump fs/binfmt_elf.c: coredump: delete duplicated overflow check fs/binfmt_elf.c: coredump: allocate core ELF header on stack fs/binfmt_elf.c: make BAD_ADDR() unlikely fs/binfmt_elf.c: better codegen around current->mm fs/binfmt_elf.c: don't copy ELF header around fs/binfmt_elf.c: fix ->start_code calculation fs/binfmt_elf.c: smaller code generation around auxv vector fill lib/find_bit.c: uninline helper _find_next_bit() lib/find_bit.c: join _find_next_bit{_le} uapi: rename ext2_swab() to swab() and share globally in swab.h lib/scatterlist.c: adjust indentation in __sg_alloc_table ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linuxLinus Torvalds authored
Pull module updates from Jessica Yu: "Summary of modules changes for the 5.6 merge window: - Add "MS" (SHF_MERGE|SHF_STRINGS) section flags to __ksymtab_strings to indicate to the linker that it can perform string deduplication (i.e., duplicate strings are reduced to a single copy in the string table). This means any repeated namespace string would be merged to just one entry in __ksymtab_strings. - Various code cleanups and small fixes (fix small memleak in error path, improve moduleparam docs, silence rcu warnings, improve error logging)" * tag 'modules-for-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module.h: Annotate mod_kallsyms with __rcu module: avoid setting info->name early in case we can fall back to info->mod->name modsign: print module name along with error message kernel/module: Fix memleak in module_add_modinfo_attrs() export.h: reduce __ksymtab_strings string duplication by using "MS" section flags moduleparam: fix kerneldoc modules: lockdep: Suppress suspicious RCU usage warning
-
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds authored
Pull MIPS changes from Paul Burton: "Nothing too big or scary in here: - Support mremap() for the VDSO, primarily to allow CRIU to restore the VDSO to its checkpointed location. - Restore the MIPS32 cBPF JIT, after having reverted the enablement of the eBPF JIT for MIPS32 systems in the 5.5 cycle. - Improve cop0 counter synchronization behaviour whilst onlining CPUs by running with interrupts disabled. - Better match FPU behaviour when emulating multiply-accumulate instructions on pre-r6 systems that implement IEEE754-2008 style MACs. - Loongson64 kernels now build using the MIPS64r2 ISA, allowing them to take advantage of instructions introduced by r2. - Support for the Ingenic X1000 SoC & the really nice little CU Neo development board that's using it. - Support for WMAC on GARDENA Smart Gateway devices. - Lots of cleanup & refactoring of SGI IP27 (Origin 2*) support in preparation for introducing IP35 (Origin 3*) support. - Various Kconfig & Makefile cleanups" * tag 'mips_5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (60 commits) MIPS: PCI: Add detection of IOC3 on IO7, IO8, IO9 and Fuel MIPS: Loongson64: Disable exec hazard MIPS: Loongson64: Bump ISA level to MIPSR2 MIPS: Make DIEI support as a config option MIPS: OCTEON: octeon-irq: fix spelling mistake "to" -> "too" MIPS: asm: local: add barriers for Loongson MIPS: Loongson64: Select mac2008 only feature MIPS: Add MAC2008 Support Revert "MIPS: Add custom serial.h with BASE_BAUD override for generic kernel" MIPS: sort MIPS and MIPS_GENERIC Kconfig selects alphabetically (again) MIPS: make CPU_HAS_LOAD_STORE_LR opt-out MIPS: generic: don't unconditionally select PINCTRL MIPS: don't explicitly select LIBFDT in Kconfig MIPS: sync-r4k: do slave counter synchronization with disabled HW interrupts MIPS: SGI-IP30: Check for valid pointer before using it MIPS: syscalls: fix indentation of the 'SYSNR' message MIPS: boot: fix typo in 'vmlinux.lzma.its' target MIPS: fix indentation of the 'RELOCS' message dt-bindings: Document loongson vendor-prefix MIPS: CU1000-Neo: Refresh defconfig to support HWMON and WiFi. ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arcLinus Torvalds authored
Pull ARC updates from Vineet Gupta: - Wire up clone3 syscall - ARCv2 FPU state save/restore across context switch - AXS10x platform and misc fixes * tag 'arc-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARCv2: fpu: preserve userspace fpu state ARC: fpu: declutter code, move bits out into fpu.h ARC: wireup clone3 syscall ARC: [plat-axs10x]: Add missing multicast filter number to GMAC node ARC: update feature support for jump-labels
-
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linuxLinus Torvalds authored
Pull RISC-V updates from Palmer Dabbelt: "This contains a handful of patches for this merge window: - Support for kasan - 32-bit physical addresses on rv32i-based systems - Support for CONFIG_DEBUG_VIRTUAL - DT entry for the FU540 GPIO controller, which has recently had a device driver merged These boot a buildroot-based system on QEMU's virt board for me" * tag 'riscv-for-linus-5.6-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: dts: Add DT support for SiFive FU540 GPIO driver riscv: mm: add support for CONFIG_DEBUG_VIRTUAL riscv: keep 32-bit kernel to 32-bit phys_addr_t kasan: Add riscv to KASAN documentation. riscv: Add KASAN support kasan: No KASAN's memmove check if archs don't have it.
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fixes from Ingo Molnar: "Misc fixes: - three fixes and a cleanup for the resctrl code - a HyperV fix - a fix to /proc/kcore contents in live debugging sessions - a fix for the x86 decoder opcode map" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/decoder: Add TEST opcode to Group3-2 x86/resctrl: Clean up unused function parameter in mkdir path x86/resctrl: Fix a deadlock due to inaccurate reference x86/resctrl: Fix use-after-free due to inaccurate refcount of rdtgroup x86/resctrl: Fix use-after-free when deleting resource groups x86/hyper-v: Add "polling" bit to hv_synic_sint x86/crash: Define arch_crash_save_vmcoreinfo() if CONFIG_CRASH_CORE=y
-
Dmitry Vyukov authored
Don't instrument 3 more files that contain debugging facilities and produce large amounts of uninteresting coverage for every syscall. The following snippets are sprinkled all over the place in kcov traces in a debugging kernel. We already try to disable instrumentation of stack unwinding code and of most debug facilities. I guess we did not use fault-inject.c at the time, and stacktrace.c was somehow missed (or something has changed in kernel/configs). This change both speeds up kcov (kernel doesn't need to store these PCs, user-space doesn't need to process them) and frees trace buffer capacity for more useful coverage. should_fail lib/fault-inject.c:149 fail_dump lib/fault-inject.c:45 stack_trace_save kernel/stacktrace.c:124 stack_trace_consume_entry kernel/stacktrace.c:86 stack_trace_consume_entry kernel/stacktrace.c:89 ... a hundred frames skipped ... stack_trace_consume_entry kernel/stacktrace.c:93 stack_trace_consume_entry kernel/stacktrace.c:86 Link: http://lkml.kernel.org/r/20200116111449.217744-1-dvyukov@gmail.comSigned-off-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Andy Shevchenko authored
Use PHYS_PFN() macro in io_mapping_map_atomic_wc() instead of open coded variant. Link: http://lkml.kernel.org/r/20191209165624.56351-1-andriy.shevchenko@linux.intel.comSigned-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
There were few episodes of silent downgrade to an executable stack over years: 1) linking innocent looking assembly file will silently add executable stack if proper linker options is not given as well: $ cat f.S .intel_syntax noprefix .text .globl f f: ret $ cat main.c void f(void); int main(void) { f(); return 0; } $ gcc main.c f.S $ readelf -l ./a.out GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RWE 0x10 ^^^ 2) converting C99 nested function into a closure https://nullprogram.com/blog/2019/11/15/ void intsort2(int *base, size_t nmemb, _Bool invert) { int cmp(const void *a, const void *b) { int r = *(int *)a - *(int *)b; return invert ? -r : r; } qsort(base, nmemb, sizeof(*base), cmp); } will silently require stack trampolines while non-closure version will not. Without doubt this behaviour is documented somewhere, add a warning so that developers and users can at least notice. After so many years of x86_64 having proper executable stack support it should not cause too many problems. Link: http://lkml.kernel.org/r/20191208171918.GC19716@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Will Deacon <will@kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Yunfeng Ye authored
The variable inode may be NULL in reiserfs_insert_item(), but there is no check before accessing the member of inode. Fix this by adding NULL pointer check before calling reiserfs_debug(). Link: http://lkml.kernel.org/r/79c5135d-ff25-1cc9-4e99-9f572b88cc00@huawei.comSigned-off-by: Yunfeng Ye <yeyunfeng@huawei.com> Cc: zhengbin <zhengbin13@huawei.com> Cc: Hu Shiyuan <hushiyuan@huawei.com> Cc: Feilong Lin <linfeilong@huawei.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christophe Leroy authored
This message leads to thinking that memory protection is not implemented for the said architecture, whereas absence of CONFIG_STRICT_KERNEL_RWX only means that memory protection has not been selected at compile time. Don't print this message when CONFIG_ARCH_HAS_STRICT_KERNEL_RWX is selected by the architecture. Instead, print "Kernel memory protection not selected by kernel config." Link: http://lkml.kernel.org/r/62477e446d9685459d4f27d193af6ff1bd69d55f.1578557581.git.christophe.leroy@c-s.frSigned-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Acked-by: Kees Cook <keescook@chromium.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arvind Sankar authored
Patch series "init/main.c: minor cleanup/bugfix of envvar handling", v2. unknown_bootoption passes unrecognized command line arguments to init as either environment variables or arguments. Some of the logic in the function is broken for quoted command line arguments. When an argument of the form param="value" is processed by parse_args and passed to unknown_bootoption, the command line has param\0"value\0 with val pointing to the beginning of value. The helper function repair_env_string is then used to restore the '=' character that was removed by parse_args, and strip the quotes off fully. This results in param=value\0\0 and val ends up pointing to the 'a' instead of the 'v' in value. This bug was introduced when repair_env_string was refactored into a separate function, and the decrement of val in repair_env_string became dead code. This causes two problems in unknown_bootoption in the two places where the val pointer is used as a substitute for the length of param: 1. An argument of the form param=".value" is misinterpreted as a potential module parameter, with the result that it will not be placed in init's environment. 2. An argument of the form param="value" is checked to see if param is an existing environment variable that should be overwritten, but the comparison is off-by-one and compares 'param=v' instead of 'param=' against the existing environment. So passing, for example, TERM="vt100" on the command line results in init being passed both TERM=linux and TERM=vt100 in its environment. Patch 1 adds logging for the arguments and environment passed to init and is independent of the rest: it can be dropped if this is unnecessarily verbose. Patch 2 removes repair_env_string from initcall parameter parsing in do_initcall_level, as that uses a separate copy of the command line now and the repairing is no longer necessary. Patch 3 fixes the bug in unknown_bootoption by recording the length of param explicitly instead of implying it from val-param. This patch (of 3): Commit a99cd112 ("init: fix bug where environment vars can't be passed via boot args") introduced two minor bugs in unknown_bootoption by factoring out the quoted value handling into a separate function. When value is quoted, repair_env_string will move the value up 1 byte to strip the quotes, so val in unknown_bootoption no longer points to the actual location of the value. The result is that an argument of the form param=".value" is mistakenly treated as a potential module parameter and is not placed in init's environment, and an argument of the form param="value" can result in a duplicate environment variable: eg TERM="vt100" on the command line will result in both TERM=linux and TERM=vt100 being placed into init's environment. Fix this by recording the length of the param before calling repair_env_string instead of relying on val. Link: http://lkml.kernel.org/r/20191212180023.24339-4-nivedita@alum.mit.eduSigned-off-by: Arvind Sankar <nivedita@alum.mit.edu> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Krzysztof Mazur <krzysiek@podlesie.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arvind Sankar authored
Since commit 08746a65 ("init: fix in-place parameter modification regression"), parse_args in do_initcall_level is called on a copy of saved_command_line. It is unnecessary to call repair_env_string during this parsing, as this copy is not used for anything later. Remove the now unnecessary arguments from repair_env_string as well. Link: http://lkml.kernel.org/r/20191212180023.24339-3-nivedita@alum.mit.eduSigned-off-by: Arvind Sankar <nivedita@alum.mit.edu> Cc: Krzysztof Mazur <krzysiek@podlesie.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Arvind Sankar authored
Extend logging in `run_init_process` to also show the arguments and environment that we are passing to init. Link: http://lkml.kernel.org/r/20191212180023.24339-2-nivedita@alum.mit.eduSigned-off-by: Arvind Sankar <nivedita@alum.mit.edu> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Krzysztof Mazur <krzysiek@podlesie.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
Unmapping whole address space at once with munmap(0, (1ULL<<47) - 4096) or equivalent will create empty coredump. It is silly way to exit, however registers content may still be useful. The right to coredump is fundamental right of a process! Link: http://lkml.kernel.org/r/20191222150137.GA1277@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
array_size() macro will do overflow check anyway. Link: http://lkml.kernel.org/r/20191222144009.GB24341@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
Comment says ELF header is "too large to be on stack". 64 bytes on 64-bit is not large by any means. Link: http://lkml.kernel.org/r/20191222143850.GA24341@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Alexey Dobriyan authored
If some mapping goes past TASK_SIZE it will be rejected by kernel which means no such userspace binaries exist. Mark every such check as unlikely. Link: http://lkml.kernel.org/r/20191215124355.GA21124@avx2Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-